Dapr’s native practice in alicloud

Time:2021-12-9

Introduction:In the FAAS scenario, what attracts users is the cost and R & D efficiency. The cost is mainly achieved through on-demand distribution and extreme elastic efficiency. Application developers expect to provide a multilingual programming environment through FAAS to improve R & D efficiency, including startup time, release time and development efficiency. ​

Author Cao Shengli

What is service mesh?

Since 2010, SOA architecture has become popular among large and medium-sized Internet companies, and Alibaba also opened Dubbo in 2012. After that, micro service architecture became popular, and a large number of Internet and traditional enterprises devoted themselves to the construction of micro services. Dubbo and spring cloud have gradually formed two micro service camps in China. In 2016, a more cutting-edge microservice scheme in the field of microservices, which is more in line with containers and kubernetes, is being conceived. This technology is called service mesh. Today, the concept of service mesh has been widely popularized, and many companies have landed in the field of service mesh.

Service mesh definition

Service mesh is an infrastructure layer, which mainly focuses on the communication between services. The service topology of cloud native applications is very complex. Service mesh can realize reliable request transmission in this complex topology. Service mesh runs in sidecar mode. An independent service mesh process will run next to the application. Service mesh is responsible for remote service communication. Military tricycles are very similar to service mesh. On military tricycles, a soldier is responsible for driving and a soldier is responsible for shooting people.

Pain points solved by service mesh


Most of the traditional microservice architectures are based on RPC communication framework. The RPC SDK provides the capabilities of service registration / discovery, service routing, load balancing, full link tracking and so on. The application business logic and RPC SDK are in the same process, which brings many challenges to the traditional microservice architecture: the middleware capability related code intrudes into the business code, with high coupling; The upgrade cost of RPC SDK is very high, which also leads to serious differentiation of SDK versions. At the same time, this method has high requirements for application developers, requires rich service governance operation and maintenance capabilities, has middleware background knowledge, and the threshold for using middleware is high.

Sink some RPC capabilities through service mesh, which can well realize the separation of concerns and the definition of responsibility boundary. With the development of container and kubernetes technology, service mesh has become a cloud native infrastructure.

Introduction to istio


In the field of service mesh, istio is undoubtedly the king. Istio consists of control plane and data plane. In servicemesh, different services communicate through proxy sidecar. Istio’s core function is traffic management, which is coordinated through the data plane and control plane. Istio is initiated by Google, IBM and LYFT. It is the purest lineage in the service mesh field of CNCF ecological territory and is expected to become the de facto standard of service mesh.

The data plane of istio uses envoy by default, which is the best data plane in the community by default. The interaction protocol between istio data plane and control plane is XDS.

Service mesh summary

Finally, make a summary of the service mesh:

  • Service mesh positioning is to provide the infrastructure for inter service communication. RPC and HTTP are mainly supported in the community.
  • Sidecar deployment is adopted to support deployment on kubernetes and virtual machines.
  • Service mesh adopts the original protocol for forwarding, so service mesh is also called network agent. It is precisely because of this way that zero intrusion into the application can be achieved.

What is dapr?

Challenges encountered by service mesh


The forms of users deploying services on the cloud mainly include common application types and FAAS types. In the FAAS scenario, what attracts users is the cost and R & D efficiency. The cost is mainly achieved through on-demand distribution and extreme elastic efficiency. Application developers expect to provide a multilingual programming environment through FAAS to improve R & D efficiency, including startup time, release time and development efficiency.

The essence of the implementation of service mesh is the forwarding of the original protocol, which can bring the advantage of zero intrusion to the application. However, the original protocol forwarding also brings some problems. The application side middleware SDK still needs to realize serialization and encoding and decoding, so there is still a certain cost in multilingual implementation; With the continuous development of open source technology, the technologies used are also constantly iterative. If you want to migrate from spring cloud to Dubbo, either application developers need to switch the dependent SDK. If you want to achieve this effect with the help of service mesh, service mesh needs protocol conversion and high cost.

Service mesh focuses more on the communication between services, while there is very little support for other forms of mesh. For example, envoy’s attempts in redis, messaging and other fields have not been successful except in the RPC field. The integration of RPC and message is supported in ant’s mosn. The overall demand for multi mesh forms exists, but each mesh product develops independently and lacks abstraction and standards. Do so many forms of mesh share the same process? If you share a process, do you share a port? Many questions have no answers. In terms of control surface, most of them focus on flow from the perspective of function. After reading the content of the XDS protocol, the core is around discovering the service and the origin. Other types of distributed capabilities are not involved in the control surface of service mesh, let alone Abstract various XDS like protocols to support these distributed capabilities.

Because of cost and R & D efficiency, FAAS has been selected by more and more customers. FAAS has more demands on the friendliness of multilingual and programming APIs, so service mesh still can not bring additional value to customers in these two areas.

Requirements of distributed applications


Bilgin ibryam is the author of kubernetes patterns and the chief middleware architect of red hat. He is very active in the Apache community. He published an article abstracting some difficulties and problems of current distributed applications, and divided the requirements of distributed applications into four categories: life cycle, network, state and binding. Under each type, there are some sub capabilities, such as point to point, pub / sub, caching and other classic middleware capabilities. Applications have so many requirements for distributed capabilities, and service mesh obviously can not meet the current requirements of applications. Biligin ibryam also proposed the concept of multiple runtime in the article to solve the dilemma of service mesh.

Multiple runtime concept derivation


In the traditional middleware mode, application and distributed capabilities are integrated in a process in the form of SDK. With the sinking of various infrastructures, various distributed capabilities have moved from applications to outside applications. For example, k8s is responsible for the requirements related to the life cycle, and istio and knative are responsible for some distributed capabilities. If these capabilities are moved to an independent runtime, this situation can not be accepted from the operation and maintenance level or the resource level. Therefore, it is necessary to integrate some runtime at this time. The best way is to integrate them into one. This method is defined as mecha, which means mecha in Chinese. Just like the hero in Japanese animation turns into a mecha, each part of the mecha is like a distributed capability. The person in the mecha corresponds to the main application, also known as micrologic runtime. The two runtimes can be one-to-one sidecar, which is very suitable for traditional applications; It can also be a many to one node mode, which is suitable for edge scenarios or network management mode.

So the goal of mecha runtime integrating various distributed capabilities is not a big problem, so how to integrate it? What are the requirements for mecha?

  1. Mecha’s component capabilities are abstract, and any open source product can be extended and integrated quickly.
  2. Mecha needs to be configurable and can be configured and activated through yaml / JSON. These file formats should preferably be aligned with the mainstream cloud native methods.
  3. Mecha provides a standard API. The interactive network communication with the main application is completed based on this API. It is no longer the forwarding of the original protocol, which can bring great convenience to component expansion and SDK maintenance.
  4. In the life cycle of distributed capabilities, some capabilities can be handed over to the underlying infrastructure, such as k8s. Of course, some complex scenarios may require k8s, app and mecha runtime to complete together.

Since there is only one runtime left, why is it called multiple runtime? Because the application itself is actually a runtime, plus the mecha runtime, there are at least two runtimes.

Introduction to dapr

The previous introduction to multiple runtime is quite abstract. You can re understand multiple runtime from dapr. Dapr is a good practitioner of multiple runtime, so dapr must coexist with applications, either sidecar mode or node mode. The word dapr is actually not made, but spliced from the initials of the distributed application runtime. The icon dapr can be seen as a hat. In fact, the hat is a waiter’s hat, which means to serve the application well.

Dapr is open source by Microsoft and Alibaba is deeply involved in cooperation. The current dapr has released version 1.1 and is now close to production capacity.


Since dapr is the best practitioner of multiple, the operation mechanism of dapr is also built based on the concept of multiple runtime. Dapr abstracts distributed capabilities and defines a set of APIs for distributed capabilities. These APIs are built based on HTTP and grpc. This abstraction and capability is called building block in dapr; In order to support different types of products such as open source products and commercialization, dapr extends the distributed capabilities in dapr. It has a set of SPI extension mechanism called components. After using dapr, application developers only need to program APIs for various distributed capabilities without paying too much attention to the specific implementation, and dapr can freely activate the corresponding components according to the yaml file.

Dapr properties

Application developers can directly have various distributed capabilities by using various multilingual dapr SDKs. Of course, developers can also complete the call based on HTTP and grpc. Dapr can run in most environments, including your own computer environment, any kubernetes environment, or edge computing scenarios, or cloud vendors such as Alibaba cloud, AWS and GCP.

Dapr community has integrated 70 + components implementation, so application developers can quickly select and use. The replacement of components with similar capabilities can be completed in dapr, and the application side can achieve no perception.

Dapr core module


Let’s analyze from the perspective of dapr product module to see why dapr is a good practice of mulitiple runtime.

The component mechanism ensures the implementation of rapidly scalable capabilities. Now there are more than 70 components implemented in the community, including not only open source products, but also commercial products on the cloud.

There are only 7 distributed capabilities represented by building block. More distributed capabilities are needed in the future. BuildingBlock now supports HTTP and grpc, two open and highly popular protocols. The specific components under the building block in dapr will be activated and need to rely on yaml files. Because dapr uses HTTP and grpc to expose capabilities, it becomes easier to support multi language standard API programming interface on the application side.

Dapr core: component & building block

Dapr component is the core of dapr plug-in extension and the spi of dapr. Currently, the supported components include bindings, pub / sub, middleware, servicediscovery, secret stores, and state. Some extension points are functional latitude, such as bindings, pub / sub, state, etc., and some are horizontal, such as middleware. Suppose you want to implement the dapr integration of redis, you only need to implement the state component of dapr. Dapr building block is a capability provided by dapr and supports grpc and HTTP methods. The supported capabilities now include service invocation, state, pub / sub, etc.

A building block consists of one or more components. The building block of binding includes two components: bindings and middleware.

Dapr overall architecture

Like istio, dapr also has data and control surfaces. The control surfaces include actor placement, sidecar injector, sentry and operator. Actor placement mainly serves actor, sentry does security and certificate related work, and sidecar injector is mainly responsible for the injection of dapr sidecar. The activation of a component in dapr is realized through yaml file, which can be specified in two ways: one is to specify the runtime parameters locally, and the other is to complete it through the control plane operator. The file activated by the component is stored in k8s CRD and distributed to the sidecar of dapr. The two core components of the control surface depend on k8s to operate. The function of the current dapr dashboard is still very weak, and there is no enhancement direction in the short term. Now, after the integration of various components, the operation and maintenance of various components still needs to be completed in the original console. The dapr control plane does not participate in the operation and maintenance of specific components.

The standard operation form of dapr is in the same pod as the application, but it belongs to two containers. Other contents of dapr have been fully introduced before, but they will not be introduced here.

Dapr Microsoft landing scenario

Dapr has experienced about two years of development. What is the landing situation within Microsoft?

There are two projects on dapr’s GitHub: workflows and azure functions extensions. Azure logic app is a cloud based automatic workflow platform of Microsoft. Workflow integrates azure logic app and dapr. There are several key concepts in azure logic app, and trigger and connector are very consistent with dapr. Trigger can be completed by using dapr’s input binding. Relying on the implementation of a large number of components of dapr’s input binding, it can expand the type of traffic entry. The capabilities of connector and dapr’s output binding or service invocation match very well, and external resources can be accessed quickly. Azure functions dapr extensions is dapr support based on azure function extension, which can enable azure functions to quickly use various building blocks of dapr, and bring relatively simple and consistent programming experience in multiple languages to function developers.

The perspective of azure API management service is different from the two landing scenarios mentioned above. It is based on the premise that applications have been accessed through dapr sidecar, and the services provided by applications are exposed through dapr. At this time, if non k8s applications or cross cluster applications want to access the services of the current cluster, they need a gateway. This gateway can directly expose the capabilities of dapr, and some security and permission controls will be added to the gateway. Currently, three building blocks are supported: service invocation, pub / sub and resource bindings.

Dapr summary

The capability oriented APIs provided by dapr can bring developers a consistent programming experience supporting multiple languages. At the same time, the SDKs of these APIs are relatively lightweight. These features are well suited for FAAS scenarios. With the continuous improvement of dapr integration ecology, the advantages of developer oriented programming will be further expanded. The implementation of dapr components can be replaced more conveniently through dapr without code adjustment by developers. Of course, the original component and the new component implementation must be the same type of distributed capability.

Differences between and service mesh:

Providing ability: service mesh focuses on service invocation; Dapr provides a wider range of distributed capabilities, covering a variety of distributed primitives.

Working principle: service mesh adopts the original protocol forwarding to achieve zero intrusion; Dapr adopts multilingual SDK + standard API + various distributed capabilities.

Domain oriented: service mesh is friendly to the non intrusive upgrade support of traditional micro services; Dapr provides a more friendly programming experience for application-oriented developers.

Ali’s Exploration on dapr

Ali’s development route in dapr

In October 2019, Microsoft opened source dapr and released version 0.1.0. At this time, Ali and Microsoft just started to evaluate the dapr project because OAM had started some cooperation and learned about it. At the beginning of 2020, Alibaba and Microsoft had a round of communication with dapr offline, and learned Microsoft’s views, investment and subsequent development plans on dapr. At this time, Ali has determined that the dapr project has great value. It was not until mid-2020 that work began around dapr. By October, dapr started the online grayscale function under the function calculation scene. So far, the grayscale of all functions of dapr related to function calculation has been basically completed and open beta has been started. By February 2021, version 1.0 was finally released.

Alicloud function computing integration dapr

In addition to the benefits of the operation and maintenance side such as extreme flexibility, the difference between function computing and medium-sized applications is that function computing pays more attention to bringing better R & D experience to developers and improving the overall R & D efficiency. The value that dapr can give function computing is to provide a unified multi language capability oriented programming interface, and developers do not need to pay attention to specific products. For example, if you want to use OSS services on Alibaba cloud in Java language, you need to introduce Maven dependencies and write some OSS code. Through dapr, you only need to call the binding method of the dapr SDK. While programming is convenient, the entire executable package does not need to introduce redundant dependent packages, but is controllable.


The English name of function calculation is function compute, which is called FC for short. The architecture of FC contains many systems, mainly including function compute gateway and function running environment. FC gateway is mainly responsible for undertaking traffic. At the same time, it will expand and shrink the capacity of the current function instance according to the size of the traffic undertaken and the current CPU and memory usage. The function calculation runtime environment is deployed in a pod, the function instance is in the main container, and the dapr is in the sidecar container. When there is external traffic accessing the service calculated by the function, the traffic will go to the gateway first, and the gateway will forward the traffic to the function instance providing the current service according to the accessed content. After the function instance receives the request, if it needs to access external resources, it can initiate the call through dapr’s multilingual SDK. At this time, the SDK will initiate a grpc request to the dapr instance, and in the dapr instance, select the corresponding capabilities and component implementations according to the request type and body, and then initiate calls to external resources.


In the service mesh scenario, the mesh exists in the form of sidecar and is deployed in two containers of the same pod with the application, which can well meet the requirements of service mesh. However, in the function calculation scenario, running dapr as an independent container consumes too much resources, and multiple function instances are deployed in one pod to save resource expenditure and second elasticity. Therefore, in the function calculation scenario, the function instance and the dapr process need to be deployed in the same container, but exist as two processes.

In the function calculation scenario, you can set the number of reserved instances to represent the minimum number of instances of the current function. If there are reserved instances, but these instances have no traffic access for a long time, they need to enter the pause / sleep state. This method is the same as that of AWS. The process or thread in the instance needs to stop running for the function that enters the sleep state. The extension structure is added to the function runtime to support the scheduling of dapr life cycle. When the function instance enters the sleep state, the extension notifies dapr to enter the sleep state; When the function instance resumes running, extension notifies dapr to resume the running state before. Dapr’s internal component implementation needs to support this way of life cycle management. Taking Dubbo as an example, Dubbo’s registry Nacos needs to regularly send heartbeat to the Nacos server to keep understanding. At the same time, the Dubbo consumer integrated by dapr also needs to send heartbeat to the Dubbo provider. After entering the transient state, the heartbeat needs to exit; When the operation is resumed, the whole operation state needs to be restored.

The combination of function calculation and dapr mentioned above is based on external traffic. What about incoming traffic? Can message traffic flow directly into dapr without going through the gateway? To achieve this, it is also necessary to timely report some performance data to the gateway at dapr sidecar, so that the gateway can achieve resource flexibility.

On the cloud of SAS Business

With more and more SaaS businesses incubated inside Alibaba, the demand for external services of SaaS business is very strong. The SaaS business has a strong demand for multi cloud deployment. Customers expect SaaS business to be deployed on Alibaba cloud public cloud or Huawei proprietary cloud. Moreover, customers expect the underlying technology to be the commercial products of open source or standard cloud manufacturers.

Take Alibaba’s SaaS service on the cloud as an example. On the left is the original system in Alibaba, and on the right is the modified system. The goal of the transformation is to switch the dependent system in Alibaba to open source software. Ali RPC is switched to Dubbo, while Alibaba’s internal cache, message and config are switched to redis, rocketmq and Nacos respectively. It is expected to achieve minimum cost switching through dapr.

Since we want to use dapr to accomplish this mission, the simplest and crudest way is to make the application rely on dapr SDK, but the transformation cost is too high, so we adapt the underlying implementation to dapr SDK while keeping the original API unchanged. In this way, applications can directly use the original API to access dapr, and only need to upgrade the corresponding dependent jar package version. After the transformation, developers still program for the original SDK, but the underlying layer has been replaced by dapr’s capability oriented programming. Therefore, during the migration process, applications can use a set of code without maintaining different branches for each cloud environment or different technologies. When dapr sidecar is used within the group, rpc.yaml, cache.yaml, msg.yaml and config.yaml are used to activate the component implementation. In the public cloud, dubbo.yaml, redis.yaml, rocketmq.yaml and nacos.yaml are used to activate the component implementation suitable for the Alibaba cloud environment. This way of shielding component implementation by activating different components through different yaml files brings great convenience to the multi cloud deployment form of SaaS business.

Nailing is an important partner and promoter of dapr. It works with the cloud native team to promote the landing of dapr in nailing. By sinking some middleware capabilities into dapr sidecar, the middleware implementation with similar capabilities at the bottom is shielded. However, nailing also has its own business pain points. Nailing’s common business components are strong business binding and require some customization of specific businesses, which also leads to low reuse. Therefore, nailing expects to sink the capabilities of some business components to dapr. In this way, different businesses can have the same programming experience, and component maintainers only need to maintain the components implementation.

Dapr outlook

Infrastructure sinking has become the trend of software development

The development history and brilliance of software architecture. Reviewing the evolution history of Alibaba’s system architecture can let people understand the development history of domestic and even global software architecture. When Taobao was first established, it was a single application; With the development of business scale, the system first upgrades the hardware in the form of scale up; However, it was soon found that this method encountered various problems, so the microservice solution was introduced in 2008; SOA solutions are distributed. For stability and observability, high availability solutions such as fusing, isolation and full link monitoring need to be introduced; The next problem is how to make the business reach more than 99.99% of the available SLA at the computer room and IDC level. At this time, there are solutions such as dual computer rooms in the same city and multi activity in different places. With the continuous development of cloud technology, Alibaba embraces and guides the development of cloud native technology, actively embraces cloud native technology, and actively carries out the upgrading of cloud native technology based on k8s.

From this history, we can find that there are more and more new demands for software architecture. The original underlying infrastructure can not be completed, but can only be completed by the application side rich SDK. After k8s and container gradually become the standard, microservices and some distributed capabilities are returned to the infrastructure again. The future trend is the sinking of distributed capabilities represented by service mesh and dapr, releasing the dividends of cloud and cloud native technology development.

Demands of application developers in cloud native scenarios

Future application developers should expect to be capability oriented, speechless, not tied to specific cloud manufacturers and technologies, and achieve the cost advantage brought by extreme flexibility through the dividend of cloud technology. I believe this ideal is still possible. From the current perspective, how can we achieve this goal?

  1. The concept of multiple runtime can be truly implemented and can develop continuously;
  2. Taking dapr as an example, it is expected to promote dapr’s API for distributed capabilities into an industry standard, and this standard needs sustainable development;
  3. With the continuous development of k8s and serverless technologies, the flexibility can be maximized in the future.

Dapr community direction

Finally, let’s take a look at dapr’s community development:

1. Promote API standardization and integrate more distributed capabilities;
2. More component integration and perfect dapr ecology;
3. More companies landed, expanded product boundaries and polished dapr products to achieve production availability;
4. Enter the CNCF, the factual standard of the multiple runtime native to the member cloud.

clickhttps://developer.aliyun.com/community/cloudnative, learn more about cloud native content

Copyright notice:The content of this article is spontaneously contributed by Alibaba cloud real name registered users, and the copyright belongs to the original author. Alibaba cloud developer community does not own its copyright or bear corresponding legal liabilities. Please refer to Alibaba cloud developer community user service agreement and Alibaba cloud developer community intellectual property protection guidelines for specific rules. If you find any content suspected of plagiarism in the community, fill in the infringement complaint form to report. Once verified, the community will immediately delete the content suspected of infringement.