Nowadays, microservice has become the dominant mode of building modern cloud applications. It decomposes individual components into independent services around specific business functions. But then there are other problems: more and more systems have been disassembled into many cell like microservices. How to manage the microservices has become a headache for many engineers.
I believe that many mature enterprises have a complex R & D environment: hundreds of product lines, thousands of developers and thousands of services.
Services are deployed in multiple computer rooms in multiple regions, and there are many operating environments for various services. There are many development languages, such as go, C + +, Java Node.js Mainly, a small amount of Python and PHP, with different business lines, the technical frameworks used are different. Call protocols include rest, non rest HTTP, and custom TCP protocol
How to unify management? Service governance came into being. Through service governance to solve the operation and maintenance problems in the overall development and operation of distributed services and microservices, handle the relationship between services, and provide a series of data basis and tools.
On April 21, Zhang Junfeng, architect of the cloud product R & D Department of Jingdong cloud and AI business department, gave a detailed explanation of service governance, spring cloud microservice architecture features, service mesh and JD Zhilian cloud’s exploration of microservices in the fifth lecture of “six weeks of playing cloud primitiveness” on April 21.
The following is the essence of sharing content. Let’s take a look.
Six weeks of cloud native
Evolution of service governance system under microservice architecture
– Zhang Junfeng, architect of Jingdong cloud and AI product R & D department
1. Evolution of service governance
Service governance is a concept derived from the continuous expansion of business scale and the evolution of architecture design. Let’s understand the emergence process of service governance according to the evolution process of architecture.
1、 Monomer architecture
In the “prehistoric” era of service governance, whether it is the interface or business processing, data processing are simply and roughly put in a package, with the growth of business, to the development and maintenance of great pressure.
2、 Layered architecture
With the rapid development of business, developers begin to split the business system to solve the concurrency problem. At this time, the system is divided into front-end and back-end, and hierarchical architecture appears. Compared with single-layer architecture, it reduces coupling and increases cooperation, while its disadvantage is repeated development. If the design capability is insufficient, an interface design problem will also affect the whole system.
3、 Distributed architecture
Based on the vertical products, horizontal splitting is carried out, and the foundation is extracted to build a distributed architecture. Its advantages are to improve the code reuse rate and improve the development efficiency, but the disadvantages are complex network call, static configuration address and expansion.
At that time, the concept of “service governance” appeared. At that time, the service governance was simple DNS service discovery and load balancing.
4、 SOA Architecture
At this time, AWS developed a new architecture SOA (Service-Oriented Architecture). SOA is a coarse-grained, loosely coupled service architecture. Services communicate with each other through simple and precise defined interfaces.
It adopts centralized service governance, and realizes service registration, load balancing and other service governance through ESB (Enterprise Service Bus) centralization. The advantages of SOA are that the application is easier to maintain, the coupling degree is lower, and the scalability is higher. The disadvantage is that it is greatly affected by ESB and the maintenance cost is very high.
Based on this, the domestic giant Internet companies adopt the optimization strategy of “decentralization”, and ESB no longer does service governance calls. The advantages of decentralization are automatic service registration and discovery, automatic push of service list, dynamic monitoring of service status, and artificial control of service status.
5、 Microservice architecture
Because SOA is for structured programming, lack of fusing, grayscale and other functions. With the emergence of microservice architecture, it provides rich service governance functions such as configuration management, service flow limiting and link tracking. However, the disadvantage is that when a set of framework is supported, there is not enough supported programming language, and through the form of SDK integration, it is difficult to upgrade.
From the above architecture evolution process, we can summarize the development stages of service governance
1. From the original pure load balancing form, such as nginx or VIP load balancing, the function is relatively single, static configuration is difficult to expand.
2. With the development of governance logic code, business code and governance logic are coupled as a whole, but with the increase of services, maintenance becomes difficult.
3. Next, the governance logic is separated into a code base, which is provided in the form of SDK. However, it does not support multiple languages. If there is a problem with the SDK, it is difficult to upgrade.
So what will the next generation service governance architecture look like? We take the spring cloud framework, which is the most widely used framework at present, as an example to understand the change of service governance mode from traditional architecture to cloud native architecture.
2 “cloud” in service governance: from traditional framework to cloud native
Spring cloud makes use of the development convenience of spring boot to skillfully simplify the development of distributed system infrastructure, and provides service discovery registration, configuration center, message bus, load balancing, circuit breaker, data monitoring and other deployment.
Note: spring cloud overall service core framework
The advantages of spring cloud include: providing microservice gateway for external user access; providing service discovery and configuration center for service governance, including monitoring system and fault tolerance system; providing message bus, big data driving and other functions for underlying middleware and data layer.
Spring The service governance deployment of cloud under the physical machine or virtual machine is: when a request comes in, it goes through the gateway layer first. The microservice gateway obtains the address of the microservice instance to be accessed from the registry. When the microservice instance starts, it obtains the relevant configuration from the configuration center, and then writes its own service address and other parameters to the registration center, and the microservice obtains the industry through the registry Service depends on the address of other microservices. Then the user’s registry is found and public configuration is performed in the configuration center.
There are some shortcomings in this traditional architecture
1. Spring cloud does not support grayscale publishing;
2. Service gateway: it does not support dynamic routing and is easy to break through the business system and needs secondary development;
3. Service tracking: it relies on the third party to support link tracking and lacks support for APM;
4. UI is scattered and crude;
5. It only supports Java, not heterogeneous system;
6. Code intrusion is also serious. It is difficult to upgrade spring cloud V1 to spring cloud v2.
Despite the above problems, spring cloud provides a dedicated spring cloud kubernetes project and k8s integration, providing a flexible way to seamlessly integrate with code programming, which may increase competitiveness. After the spring cloud is deployed to the k8s environment, the original dependency needs to be replaced by a complete set of spring cloud kubernetes
1. Microservice gateway is replaced by ingress provided by k8s;
2. The service data is stored in the etcd of k8s cluster, and the registration discovery is provided through the API server of k8s;
3. The configuration center is replaced by configmap of k8s.
However, there are still some deficiencies in the service governance of spring cloud under k8s: firstly, it does not support heterogeneous multi language; secondly, it is difficult to upgrade the framework; thirdly, the UI is still the original spring cloud.
In order to solve the problems existing in the current microservice governance, a new service governance architecture is proposed, that is, the service governance code is independent of the process, and is generated by consent. This new service governance architecture is called service mesh. It separates the whole service code into a set of services, and separates the business code from the governance logic as a whole. In this way, the upgrade is simple.
3 service mesh: business and governance code layering
Service mesh is a dedicated infrastructure layer, which aims to “realize reliable, fast and secure inter service invocation in microservice architecture”. It is not a grid of “services”, but a grid of “agents”. Services can be inserted into this agent to abstract the network. The essence of service mesh is to provide traffic and security management and observability between applications.
Service mesh has four characteristics: the middle layer of communication between applications, lightweight network proxy, application insensibility, decoupling application and service governance.
In this way, service mesh separates business modules from service governance.
From the above figure, we can see that the control plane and the data plane are separated. When the application is deployed, each application is attached with a side car, which intercepts the external requests of each application. At the same time, the service governance policy of the control plane is implemented in the side car. In this way, even the upgrade of business module and service governance can not affect each other, and the rules and policies of service governance can be dynamically adjusted.
From the structure and characteristics of service mesh, we can summarize its concept of service governance
1. Micro service governance and business logic decoupling: most of the SDK capabilities are separated from the application and disassembled into independent processes, and deployed in the sidecar mode.
2. Unified governance of heterogeneous systems: facilitate the implementation of multi language, unlock the difficulties of upgrading.
(1) Observability: Service Grid captures line data such as source, destination, protocol, URL, status code, delay, duration, etc;
(2) Flow control: provide intelligent routing, timeout retrying, fusing, fault injection, traffic mirroring and other control capabilities for services.
(3) High security: authentication of services, encryption of communication between services, enforcement of security related policies;
(4) Robustness: it supports fault injection, which is of great help to the robustness test such as disaster recovery and fault drill.
In addition, it can be divided into three parts: one is service, the other is control, the other is service; the other is service, the other is service control.
Istio can be used in combination with k8s. K8s provides the management of service life cycle. Istio realizes the overall function of service governance based on k8s.
Istio’s service discovery and load balancing function
Note: istio’s service discovery and load balancing function
Pilot obtains service discovery data from k8s platform and provides a unified service discovery interface; envoy obtains service data from pilot to realize service discovery and dynamically updates the load balancing pool; then selects an instance to forward the request according to the load balancing algorithm.
It provides three kinds of load balancing algorithms: polling, random and minimum number of connections.
Istio’s service is down
Service fusions are provided in two forms:
One is connection pool management: when the request does not exceed the configured maximum number of connections, it can be called. Once it exceeds the configured threshold value, it will be rejected when requesting, so as to ensure the normal operation of the whole service.
The second is to check the outliers. When the number of errors allowed to be called exceeds a threshold, the back-end instance will be removed. In this way, when the list is merged by load balancing, it will not be called to a specific instance. For example, HTTP returns 5xx errors continuously, and TCP will be kicked out of the service pool if there are continuous connection timeouts. However, there is a recovery check mechanism after kicking. If you can re connect or call it again, it can be added to the available list. If you continue to make mistakes, you can continue to kick out and repeat the whole process.
Note: istio service fusing vs hystrix
Gray publishing of istio
Istio provides two forms of gray-scale publishing, one is based on traffic ratio, the other is based on request content.
Fault injection of istio
Fault injection is realized by applying for a yaml, including httpcode fault, timeout fault, etc.
Security features of istio
The implementation of security function involves four components: the first is citadel, which is used for key and certificate management; the second is proxy, which realizes secure communication between client and server; the third is pilot, which distributes authorization policy and security naming information to proxy; the fourth is mixer, which verifies authorization and audit.
Although istio’s overall design is advanced, there are some challenges in large-scale landing:
One is the online challenge — management scale and performance. In terms of management scale, it has not been verified by large-scale scenarios. In terms of performance, if an envoy middle layer is added, there will be performance losses in network path and resource consumption.
Second, stability and reliability need to be improved, and many bugs are encountered in the process of practice.
The third is the challenge of microservice system migration. If the existing system is moved to istio, how to minimize the code modification to achieve the migration.
Fourth, the interworking of internal and external systems in the grid. It is impossible for all services of service mesh or istio to be moved in, which poses great risks to the overall application, and it is necessary to ensure the normal access of applications inside and outside the grid. Istio is strongly related to k8s, and the support for other platforms needs further improvement, such as virtual machine, bare metal or other container platforms.
4. Jingdong Zhilian cloud will solve the problem of service mesh!
As mentioned above, the internal development environment of Jingdong Zhilian cloud is complex, so when it comes to service governance, JD Zhilian cloud has its own expectations:
- The service governance framework can meet the needs of various service governance teams
- Minimize changes to the product line code level
- Try to reduce the change of product line calling mode
- Minimize changes to the Devops process of the product line
- The framework needs to be able to meet the business growth of Jingdong Zhilian cloud by 10 + times a year
- Try to control the input and risk of service framework
The framework of Jingdong Zhilian cloud deployment is container, virtual machine, cloud wing and other services; it can cross regions and multiple availability zones; support a variety of networks, including classic network + multiple VPC networks; and super large service scale.
Jingdong Zhilian cloud first relies on the cloud wing deployment service. After the deployment, the registration data of the service is registered in the current service tree, and then the istio control panel is deployed in the cloud wing. The original Devops system is used as a whole. In addition, the agent based on cloud wing ensures the survival of Evoy, and the virtual machine is a one-to-one service.
Service discovery process
1. First, after deploying a service, record its information in the service tree, and update the instance information to DNS.
2. When the service is called, the address of the service is obtained through the DNS address
3. The request was hijacked to envoy when the call was initiated.
4. Envoy gets the list of services and policies.
5. Get the actual call address according to the policy.
Service degradation refers to the degradation process of adaptability in case of abnormal envoy. When envoy is abnormal, some conversion rules are removed. During the service call process, the original instance information is updated to the service. According to the service information, that is, DNS information, the service address is obtained. DNS initiates the overall call and gets the call result.
Extended security features
Jingdong Zhilian cloud develops token service and integrates it into Evoy; it also develops black-and-white list plug-in to facilitate service providers to define their own security policies in more detail.
Extending call chain function
Extending call chain function is an essential function of service relationship alienation in the process of service governance. The system integrates the collection and output of call chain in Evoy. Jaeger is used to collect the data and put it in ES. The user R & D personnel can see the call relationship and time-consuming by relying on the graph analysis service.
Intercommunication between inside and outside the grid
Jingdong Zhilian cloud has developed a set of istio gateway to support gateway interworking. A known request is placed inside a grid when it comes over. In the process of service discovery and call through Evoy, if the request is in the grid, it is sent to Evoy through the call of pure grid. If the service cannot be found in the grid, it will call the istio gateway, and istio will directly call the extra services of the network to adapt. This is the intercommunication between inside and outside the grid.
Istio doesn’t support cross region, but JD cloud solves this problem perfectly. Through the establishment of core clusters, North China is deployed across 3az clusters. Each computer room independently deploys a set of istiod services and multi-level cache to improve performance. Service discovery is allocated according to priority. In the process of one service calling another service, the AZ is called first. If this AZ is not available, it is called in the current region, and then it is called across regions to realize cross region.
Finally, let me tell you some good news,Through internal verification of relevant functions, Jingdong Zhilian cloud now exports to the public cloud, which already has the basic functions of grid and is trying to export to large-scale cloud.
The era of cloud origin has come, and you will become one of the builders of this new era!
Click【read】Get more related course videos!