Write it at the front
In order to reduce the coupling of various business modules within the company and improve the efficiency of development, delivery and operation and maintenance, we completed the transformation of the company’s internal business micro service based on spring cloud in 2017, and realized the spring cloud to uk8s in 2019 Platform migration.
This paper introduces our transformation practice of spring cloud based on uk8s from the following aspects: business architecture, Prometheus JVM monitoring, peak elastic scaling based on HPA, APM link tracking based on elastic and istio service governance.
Why K8S & Why UK8S
As the mainstream micro service framework, spring cloud defines a series of standards for service governance, such as intelligent routing, fusing mechanism, service registration and discovery, and provides corresponding libraries and components to realize these standard features, providing maximum support for the surrounding environment of microservices.
Before the transformation, the business architecture of spring cloud is as follows: Eureka component of spring cloud is used for service discovery, hystrix is used for fuse component, zuul and spring cloud gateway are used for service gateway (historical reasons), spring cloud config (Apollo is used by some teams) and feign is adopted for distributed configuration The load balance of the client server is realized.
However, spring cloud also has some unavoidable disadvantages, such as the high application threshold and learning cost brought by different components based on different frameworks, and the requirement of controlling many components at code level, which runs counter to the goal of multi language collaboration of microservices.
Within us, due to historical reasons, the API gateway architecture used by different groups is not unified, and there are multiple sets of spring clouds, which is inconvenient for unified management; spring cloud is unable to achieve gray-scale publishing, which also brings some inconvenience to the company’s business publishing. What’s more, as a peripheral travel website, we often hold some promotional activities. Faced with the demand of elastic expansion and contraction of resources in the peak period of business, only relying on spring cloud can not achieve resource scheduling to meet the demand of automatic business expansion and reduction.
When we decided to transform to uk8s, we also considered using kubespreay to build k8s cluster, and through cloud provider to realize the connection between k8s cluster and cloud resources, such as using load balance, storage class, cluster autoscaler (CA), etc., but in this case, new node nodes need to be deployed and installed separately Provider brings some complexity to the operation and maintenance work.
Uk8s realizes seamless connection with internal uhost virtual machine, ULB load balancing, Udisk cloud disk and other products. We can easily create and call the above products in the uk8s cluster. In the scenario of peak elasticity, the CA plug-in in of uk8s can also be used to realize the automatic expansion and reduction of node level resources, which greatly improves the operation and maintenance efficiency. Through its CNI plug-in, uk8s is connected with uccloud’s own VPC network, without using other open-source network solutions, which reduces the network complexity; moreover, uk8s has the feature of no encapsulation, which also gives more room for transformation, and can quickly check and locate solutions in case of failure.
Overall business structure
The process from spring cloud to uk8s is also the process of reorganizing and unifying internal service modules. In this process, we have made the following changes to the overall business architecture:
1. Remove the original Eureka and use discovery under the spring cloud kubernetes project.Spring cloud kubernetes, an official project launched by spring cloud, provides a common interface to call kubernetes service, so that spring cloud and spring boot programs can run better in kubernetes environment. In kubernetes environment, etcd already has the necessary information for service discovery. There is no need to use Eureka again. Through discovery, the service list registered in kubernetes etcd can be obtained for service discovery.
2. Remove feign load balancing and use spring cloud kubernetes ribbon instead.There are two modes of ribbon load balancing: Service / pod. In the service mode, kubernetes can be used to balance the original load, and istio is used to realize service governance.
3. The gateway is marginalized.Gateway as the original entrance, all the removal needs large-scale transformation of the original code. We deploy the original gateway as a micro service in kubernetes, and use istio to manage the traffic entrance. At the same time, we also remove fuses and intelligent routing, and implement service governance based on istio as a whole.
4. Distributed configuration config is unified as Apollo.Apollo can centrally manage the configuration of applications in different environments and clusters, and push them to the application side in real time after modification, and has the characteristics of normative authority and process governance.
5.Add Prometheus monitoring.In particular, some parameters of the JVM and some defined indicators are monitored, and HPA elastic scaling is realized based on the monitoring indicators.
After kubernetes, the business architecture separates the control plane from the data plane. Kubernetes master is naturally used as the control plane to control the whole set of services without deploying any actual services. The data plane contains projects based on different languages or architectures such as Java, PHP, swote, and. Net core. Because different languages have different requirements for machine performance, we deploy various projects on node nodes with different configurations through node label in kubernetes, so that applications do not interfere with each other.
JVM monitoring based on Prometheus
After the spring cloud is migrated to kubernetes, we still need to obtain a series of underlying parameters of the JVM to monitor the running state of the service in real time. Prometheus is a mature monitoring plug-in at present, and Prometheus also provides spring cloud plug-in, which can obtain the underlying parameters of the JVM for real-time monitoring.
We set a series of detailed parameters, such as response time, number of requests, JVM memory, JVM misc, garbage collection, etc., to provide reliable basis for problem solving and business optimization.
Peak elastic stretching based on HPA
As a peripheral tourism service ordering platform, the business process often involves scenic spots, hotel tickets and other scenarios that need peak flexibility. The HPA function of kubernetes provides a good way to implement elastic scaling scenarios.
In kubernetes, HPA is usually implemented through the CPU and memory utilization of pod. However, in Java, the memory control is implemented by the JVM. When the memory consumption is too high, the JVM will recycle the memory, but the JVM will not return it to the host or container. It is unreasonable to expand or shrink the cluster based on the pod / CPU indicator. We get HTTP in Java through Prometheus_ server_ requests_ seconds_ The number of requests is converted into a parameter recognized by kubernetes API server through the adapter, and the number of pods is dynamically adjusted based on this indicator.
The uk8s product also provides its own cluster scaling plug-in. By setting the scaling group and matching the corresponding scaling conditions, it can create the corresponding virtual machine as the node node in time, which is convenient for us to pull up resources more quickly and efficiently in the peak period of business.
Link tracking of APM based on elastic
Under the microservice framework, a request often involves multiple services, so the service performance monitoring and troubleshooting becomes complex; different services may be developed by different teams, or even implemented by different programming languages; services may be deployed on thousands of servers across multiple different data centers.
Therefore, we need some tools that can help us understand the system behavior and analyze the performance problems, so that we can quickly locate and solve the problems when the fault occurs.
At present, there are many open source APM components on the market, such as Zipkin, pinpoint, skywalking, etc. Finally, we chose the open source APM server based on elastic. It is because there are too many monitoring open source projects in the market, but the projects can not communicate well. Elastic collects business logs through filebeat, monitors application service performance through metricbeat, implements tracing between services through APM server, and stores data in ES uniformly. It integrates logging, metrics and tracing together, breaking barriers between projects, and can help operation and maintenance and development to locate faults more quickly and ensure the stability of the system.
Istio service governance
Based on the consideration of application security, observability, continuous deployment, elastic scaling and performance, integration of open source tools, support of open source control plane, and maturity of solution, we finally chose istio as the solution of service governance, which mainly involves the following parts:
1. Istio gateway gateway:The ingress gateway is logically equivalent to a load balancer at the edge of the grid, which is used to receive and process the outbound and inbound network connections of the grid edge, including the configuration of open ports and TLS, to achieve the governance of the north-south traffic within the cluster.
2. Mesh gateway:The virtual gateway inside istio represents all the sidecars in the grid, and realizes the communication between all grid internal services, that is, the governance of east-west traffic.
3. Traffic management:After removing the original components of spring cloud, such as fusing and intelligent routing, we realized the function of HTTP traffic management through a series of configuration and management of kubernetes cluster. It includes grouping specific service processes (such as V1 / V2 version applications) and traffic scheduling by using pod tags, defining service load balancing strategy separately through destination rule in istio, redirecting according to source service and URL, and limiting flow through menquota and redisquota.
4. telemetering:Through Prometheus, the telemetry data can be obtained to monitor the success rate of gray scale projects, the differentiation of East, West, North and south traffic, peak flow of service and dynamic topology of service.
At present, we have migrated all of our “cloud like” social e-commerce apps to uk8s, and the development languages include Java, php-fpm, nodejs, etc. Combined with CI / CD, it can quickly realize service iteration and launch new projects, greatly improving the efficiency of development and operation and maintenance; through the perfect log, monitoring, link tracking and alarm system, it can quickly locate the fault, predict the peak value in advance according to the telemetry data, realize automatic service scaling through HPA, scientifically allocate resources, and greatly reduce the cost of computing resources Through istio service governance, the traffic management is well realized, and the gray-scale publishing is realized easily based on this.
Next, we will enrich the CI / CD pipeline, add unit testing, code scanning, performance testing, etc. to improve the test efficiency; introduce chatops to enrich the operation and maintenance methods; and implement multi cloud management with istio to further ensure the stability of the business.
The author introduces Wang Qiong, the operation and maintenance architect and operation and maintenance manager of “to set out for the peripheral Tour”, is responsible for the company’s cloud native landing and enterprise containerization transformation. In 2016, he started to contact k8s, and continued to cultivate in the field of k8s and service mesh, and was committed to building a production level available container service platform.