From monomer to chaotic microservices, how was alicloud managed service grid born?


From monomer to chaotic microservices, how was alicloud managed service grid born?

Author Wang Xining   Alibaba senior technical expert

Participate in Alibaba cloud’s original text end message interaction, that is, you have the opportunity to get book gift benefits!

Before the use of service grid technology, in order to carry out business innovation faster and more flexibly, we often modernize the existing applications and divide the single applications into distributed micro service architecture. Generally speaking,   In the process of the change of microservice architecture patterns, they are initially code base oriented patterns.

For the implementation of these micro service governance, the logic of these service governance is often built in the application itself in the form of code base. These code bases include some functions such as traffic management, fuse, Retry, client load balancing, security and observability. With the continuous enhancement of the functions of these code bases, the versions also change, because the conflicts caused by different versions can be seen everywhere. In addition, once the version of the library is changed, even if there is no change in your application logic, the whole application will be changed accordingly. It can be seen that with the growth of applications and the increase of the number of teams, the consistent use of these service governance functions across services will become very complex.

From monomer to chaotic microservices, how was alicloud managed service grid born?

Sidecar of service governance capability

By sidecar these service governance capabilities, we can understand and couple the service governance capabilities with the application itself, which can better support multiple programming languages. At the same time, these sidecar capabilities do not need to depend on a specific technical framework. This is what we often call the sidecar proxy oriented architecture pattern.

With the enhancement of these sidecar agent functions, the service governance functions that originally need to be implemented in the code base are abstracted into general components and gradually sink into the agent. The standardization and unification of these service governance capabilities can solve the problems of large differences and lack of commonness in the implementation of complex system micro services, and can well support different programming languages and frameworks.

By abstracting application service communication capabilities into infrastructure,   So that developers can focus more on the development of business applications,   And let the infrastructure provide these common capabilities.

At the same time, the more mature container orchestration technology also accelerates the popularity and convenience of sidecar agent. Kubernetes is an excellent container deployment and management platform and istio is a platform for application service governance. The combination of the two has become a carrier to sink these application service communication capabilities into the infrastructure.

In the cloud native application model, an application may contain hundreds of services, and each service is composed of hundreds of instances. How to uniformly manage the sidecar agents of hundreds of applications is the problem to be solved in the control plane defined in the service grid. As an agent, envoy is very suitable for the scenario of service grid, but to give full play to envoy’s maximum value, it needs to work closely with the underlying infrastructure or components.

From monomer to chaotic microservices, how was alicloud managed service grid born?

These sidecar agents form a mesh data plane through which traffic between all services is processed and observed. The data plane plays a role in establishing, protecting and controlling traffic through the grid.

The management component responsible for how the data plane is executed is called the control plane. The control plane is the brain of the service grid and provides a public API for grid users to easily manipulate network behavior.

After enabling the service grid,   Developers, operation and maintenance personnel and SRE team will solve the problem of application service management in a unified and declarative manner.

From monomer to chaotic microservices, how was alicloud managed service grid born?

Service grid ASM product architecture

As the first fully hosted istio compatible service grid product in the industry, ASM has maintained consistency with the community and industry trends from the beginning. The components of the control plane are hosted on Alibaba cloud side and independent of the user cluster on the data side. ASM products are customized and implemented based on istio, which is open source in the community, and provide component capabilities for supporting refined traffic management and security management on the managed control side. Through the hosting mode, the life cycle management of istio components and the managed k8s cluster is decoupled, which makes the architecture more flexible and improves the scalability of the system.

  • In terms of in-depth analysis of service grid, it provides grid diagnosis capability, turns the problems encountered by customers in the past year and the means to solve these problems into product capability, and helps users quickly locate the problems encountered;
  • In terms of expansion and integration, ASM products integrate Alibaba cloud services, including observability services, link tracking / logging services / Prometheus monitoring, cross VPC network interconnection Cen capabilities, etc. at the same time, they also optimize and integrate community open source software, including OPA support, customization of authorization services, current limiting services, etc.

In addition, with the optimization of istio’s new architecture, webassembly technology is introduced into service grid to solve the problem of agent expansion. thus,   ASM architecture becomes the mode of “managed high availability elastic control plane + extensible plug-in data plane”.

In terms of data plane support, ASM products can support a variety of different computing infrastructures, including public cloud ack clusters provided by Alibaba cloud (including hosted k8s clusters and proprietary k8s clusters) and serverless kubernetes container service ask clusters. At the same time, it supports the grid of non container applications, such as application services running on ECs virtual machines.

In addition, ASM has also launched a capability to support multi cloud and hybrid clouds, which can support external non Alibaba cloud k8s clusters. Whether the cluster is in the IDC room built by the user or on other public clouds, unified service governance can be carried out through ASM.

Infrastructure for unified management of multiple types of computing services

next,   This paper will introduce how managed service grid provides unified traffic management capability, unified service security capability, unified service observability capability, and how to realize unified data plane scalability based on webassembly in becoming an infrastructure for unified management of various types of computing services.

1. Unified traffic management capability

About the unified traffic management capability, it focuses on two aspects.

The first is to route traffic requests based on location.In a large-scale service scenario, thousands of services run on various types of computing facilities in different regions, and these services need to be called to complete complete complete functions. In order to ensure the best performance, the traffic should be routed to the nearest service to make the traffic in the same area as much as possible, rather than relying only on the polling method provided by kubernetes by default for load balancing. The service grid should provide such a location-based routing capability. On the one hand, it can route the traffic to the nearest container, realize the local priority load balancing capability, and switch to the standby service in case of failure of the main service. On the other hand, it provides locally weighted load balancing capability, which can split the traffic into different regions in proportion according to the actual needs.

The second is about the grid unified management of non k8s workload。 In a managed service grid instance, we can add several k8s clusters, define the configuration of routing rules on the control plane, or define gateway services. In order to uniformly manage non k8s workloads, we define the label of the workload and the IP address of the workload through a CRD of the workload entry. Then register the workload in the service grid through serviceentry CRD, and provide a processing mechanism similar to k8s pod to handle these non k8s workloads. For example, it can be routed to the corresponding pod or this non container application through the selector mechanism.

2. Unified service security capability

With regard to the unified service security capability, the managed service grid provides unified master sub account support / RAM authorization support for application services on a variety of different computing infrastructures. On this basis, it provides unified TLS authentication and JWT authentication,   Support simple switching between enabling and disabling TLS authentication, and support gradual realization of two-way TLS authentication; Support fine-grained authentication ranges, including namespace and workload levels. In addition, the service grid also provides support for JWT authentication capability, so that this token authentication mechanism can be unified without relying on a specific implementation framework.

  • In terms of RBAC authorization, unified authorization policies are provided for different protocols, which can be supported at different granularity, including namespace / service / port level authorization;
  • In terms of audit, you can flexibly turn on the grid audit function, view audit reports, view log records and set alarm rules;
  • In the aspect of policy management, it provides the integration of open policy agent (OPA). Users can use descriptive policy language to define corresponding security policies. In addition, the user-defined authorization service external is also provided_ Auth grpc docking. As long as this interface definition is followed, any authorization service can be integrated into the service grid.

3. Unified service observability

Unified service observability is divided into three aspects.

  • First, log analysis capability: by collecting and analyzing the accesslog of the data plane, especially the entry gateway log, you can analyze the traffic and status code proportion of service requests, so as to further optimize the calls between these services;
  • Second, distributed tracking capability: it provides complete call link restoration, call request statistics, link topology, application dependency analysis and other tools for developers of distributed applications, which can help developers quickly analyze and diagnose performance bottlenecks under distributed application architecture and improve development diagnosis efficiency in the era of microservices;
  • Third, monitoring ability: generate a set of service indicators according to the four dimensions of monitoring (delay, traffic, error and saturation) to understand and monitor the behavior of services in the grid.

4. Unified data plane scalability

Although sidecar agent has encapsulated and implemented some functions commonly used in the process of service governance, its scalability must be possessed, such as how to connect with the existing back-end system and how to solve some specific needs of users. At this time, the scalability of a sidecar agent is particularly important, and it will affect the popularity of sidecar agent to a certain extent.

In the architecture before istio, the expansion of sidecar capability mainly focused on mixer components. Each service to service connection of sidecar agent needs to be connected to mixer for indicator report and authorization check, which will lead to longer call delay and poor scalability between services. At the same time, envoy requires that it be written in the programming language c + + of the agent, and then compiled into the agent binary file. For most istio users, this expansion capability is challenging.

After adopting the new architecture, istio moved the extension capability of the agent from mixer to envoy itself in the data plane, and combined its extension model with envoy using web assembly technology. Web assembly supports development in several different languages, and then compiles the extension into portable bytecode format. This extension not only simplifies the process of adding custom functions to istio, but also reduces the delay by transferring the decision-making process to the agent rather than planting it on the mixer. The following benefits can be achieved using the wasm filter extension:

  • Agility: the filter can be dynamically loaded into the running envoy process without stopping or recompiling;
  • Maintainability: envoy can expand its functions without changing its own basic code base;
  • Diversity: popular programming languages (such as C / C + + and rust) can be compiled into wasm, so developers can use the programming language they choose to implement filter;
  • Reliability and isolation: the filter will be deployed in the VM sandbox, so it is isolated from the envoy process itself; Even if the wasm filter crashes due to a problem, it will not affect the envoy process;
  • Security: filters communicate with envoy agents through predefined APIs, so they can access and modify only a limited number of connection or request properties.

Alibaba cloud service grid ASM provides support for web assembly (wasm) technology. Service grid users can deploy the extended wasm filter to the corresponding envoy agent in the data plane cluster through ASM. Through the self-developed asmfilterdeployment component,   It can support dynamic loading of plug-ins, easy to use, and hot update.

From monomer to chaotic microservices, how was alicloud managed service grid born?

Through this filter extension mechanism, the functions of envoy can be easily extended and its application in service grid can be pushed to a new height.

Maturity model of service grid practice

From monomer to chaotic microservices, how was alicloud managed service grid born?

As a unified infrastructure for application service communication,   It can (and should) be adopted gradually. Here, we introduce its practice maturity model, which is divided into five levels:One click enableObservable liftSafety reinforcementMultiple infrastructure support, andMulti cluster Hybrid Management。 These five aspects cover the unified traffic management, unified observability, unified service security and hybrid management supporting different computing infrastructures and multi cluster non container applications described above.

Readers of istio service grid technology analysis and practice can experience ASM products for free! Click to learn about alicloud service grid product

Introduction to the author

Wang XiningAlibaba cloud senior technical expert, technical director of Alibaba cloud service grid product ASM and istio on ACK, focusing on kubernetes, cloud native, service grid and other fields. Previously, he worked in IBM China Development Center and served as the chairman of the Patent Technology Review Committee. As an architect and major developer, he was responsible for participating in a series of work in SOA middleware, cloud computing and other fields, and has more than 50 international technology patents in related fields. He is the author of istio service grid technology analysis and practice. xining. [email protected]

Before 11:00 August 14th, Alibaba’s cloud official account message area.Welcome to discuss and exchange your doubts about istio, a service grid technology,Select the top 3 in the comments and praise, and send out a Book of istio service grid technology analysis and practice!

Alibaba cloud nativeFocus on micro service, Serverless, container, Service Mesh and other technology areas, focusing on cloud native technology trends, cloud native large-scale landing practice, do the best understanding of the official account of cloud developers.