Elastic scaling is an important application management strategy to meet business requirements, ensure service quality and balance service cost. Elastic scaling enables the deployment scale of the application to be dynamically adjusted according to the real-time business volume, expanding the deployment scale in the business peak period to ensure that the service is not destroyed by the business; reducing the deployment scale in the business downturn period to avoid resource waste.
Because most cloud resources are on-demand and pay as you go mode, compared with IDC, the cost advantage of cloud users from elastic scaling is very obvious, and elastic scaling is also the choice of most cloud users. How to make good use of elastic scaling has always been a problem of great concern to users. This paper attempts to give some relevant thinking and optimization practice around this topic.
There are two ways to achieve elastic scaling“Vertical elasticity””Scale up”, or “scale up”“Horizontal elasticity””Scale out”.
1. Vertical elastic expansion and contraction
Vertical elastic scaling generally refers to the elastic scaling realized by lifting the specification of server. This scaling method has almost no constraints on the application itself and can be used by most applications or components
- It is difficult for many cloud manufacturers to dynamically adjust the server specifications without affecting the application of the upper layer deployment, which is a difficult problem for many cloud manufacturers, and can not realize the completely insensitive dynamic configuration;
- Vertical flexibility can not break through the specification limit of a single physical device. Facing the huge amount of sudden business growth, the coping capacity of vertical elasticity has an upper limit.
2. Horizontal elastic expansion and contraction
Horizontal elastic scaling, on the contrary, relies on increasing or decreasing the number of servers to achieve elastic scaling. It does not require high infrastructure requirements. In addition to solving the problem of capacity ceiling, multi replica deployment can also bring higher reliability. Because it is widely used in production systems, horizontal elasticity has become a synonym for elastic scaling. Therefore, we have adopted the concept of horizontal elastic scaling This article is mainly about horizontal elasticity.
Microservices and elastic scaling
Although horizontal elasticity has many advantages, it has higher requirements for application itself than vertical elasticity. Developers should consider the following problems before using it:
- Multi replica deployment requires the application itself to be stateless. How to extract the state information from the application and keep the configuration synchronized?
- Elastic scaling leads to the instability of application instances themselves. How to ensure reliable mutual call between application instances?
These are exactly the problems to be solved by the microservice architecture. As a widely used microservice framework, spring cloud is no exception
- First, through spring cloud, developers can disassemble the stateless part of the original single application and organize business logic in the form of services. Stateless services can scale horizontally. In addition, spring cloud provides an easy-to-use centralized configuration management capability to ensure that configuration information can be efficiently distributed and synchronized;
- Second, the service registration and discovery mechanism of spring cloud enables services to dynamically add or remove instances. Service governance mechanisms such as fusing can further improve the reliability of remote calls.
In other words, one of the driving forces for the birth of microservices is that developers hope to use the cloud’s elastic scalability to achieve the balance between operating costs and service quality. Therefore, microservices are designed to take advantage of the ability of elastic scaling. They are complementary and closely related.
Original elastic expansion
Application architecture support is only one of the necessary conditions to use elastic scaling. If you want to make good use of elastic scaling, there are two other key points to consider: when to trigger elastic scaling and how to deploy the applications generated by elastic scaling, that isRule triggeringandInstance scheduling。
In the cloud native system, k8s controls the application lifecycle management. In terms of elastic rule triggering and instance scheduling, k8s also provides relevant capabilities, which is enough to complete the whole process of application elastic scaling.
In k8s, stateless applications are usually deployed in the form of deployment. The elastic scaling process is controlled by horizontal pod autoscaler (HPA). The developer sets the target CPU or memory utilization rate and the number of copies of deployment. HPA is responsible for regularly calculating and setting the target number of copies from the monitoring data, and the instance scheduling process is handed over to k8s Scheduler.
How to optimize elastic scaling?
It seems that with the help of the HPA mechanism of k8s, it is easy to provide elastic scalability for microservice applications, but is this really enough? It is not so simple. In terms of elastic scaling, there are still many shortcomings in the default mechanism of spring cloud and k8s. If there is a lack of a complete and robust solution, it is easy to step into a pit if it is directly used in the production system. What can we do to ensure that the elastic expansion is accurate and in place, and the process is as smooth as silk. This is the significance of our writing to provide best practice.
As a one-stop distributed application management platform, EDAs has made a systematic design on the support of elastic scaling, which involves all aspects of application monitoring, management and control, and has polished many function points. The purpose of EDAs is to “reduce the burden” of elastic scalability for users, so that it can be truly implemented in the user production system.
Along with the two key points mentioned just now, rule triggering and instance scheduling, we can see how EDAs thinks and implements the optimization of elastic scaling.
Trigger rules 1
The commonly used elastic scaling rules are triggered based on monitoring data. K8s also has the function of triggering elastic scaling based on CPU and memory monitoring. However, these two indicators are not enough. Compared with the basic monitoring data, the application index data is more direct and sensitive to the feedback of business volume, which can be said to be the “gold indicator” for elastic scaling reference.
However, because k8s cannot obtain the monitoring information of the application, these information can only be realized by the way of custom extension API. For users, it needs to understand the extension mechanism of k8s, which has a certain learning cost; moreover, the rules based on monitoring data can not realize the expansion of the number of instances from 0 to 1, which is not conducive to achieving the ultimate cost control.
In view of these pain points, the open source community has developed many projects, the most typical of which is Keda (kubernetes event driven autoscaling), which assists k8s in elastic scaling through event flow. The architecture is as follows:
Keda project address:https://keda.sh/
Keda can be easily installed in any k8s cluster. Its controller (operator) provides the ability to scale applications from 0 copies. Keda also provides monitoring index service. It connects various open source and vendor monitoring indicators through different scalers, and provides these indicators to HPA controller to complete elastic scaling of multiple copies.
2) Application flexibility strategy of EDAs
Keda’s function is very powerful, but for ordinary users, the threshold is relatively high. In order to solve this problem, EDAs combined with the application monitoring capability provided by arms, while retaining Keda’s core functions, enhanced it to make flexible rule configuration easier to use.
The elastic scaling of EDAs supports the configuration capability of native HPA rules of k8s
You can also use application gold metrics, such as service requests per second (QPS) and average response time (RT): 1
In addition, EDAs not only supports index elasticity, but also supports timing elasticity. By setting reasonable copy number interval for different periods, timing elasticity can ensure user experience to the greatest extent
2. Instance scheduling
Elastic scaling rules trigger will generate instance scheduling requests. However, how to allocate these requests to nodes and deploy application instances in nodes is a necessary process and the core capability of k8s. For the elastic scaling scenario, because it involves a large number of new instance creation and old instance replacement, the instance scheduling action is very frequent, so it is very important to select the appropriate scheduling strategy first.
1) Scheduling strategy
K8s provides rich configuration items for developers to set scheduling policies, such as node selection, node affinity, taint and tolerance. However, how to match the required scheduling rules is not universally applicable. It needs to be formulated according to the actual business situation. Here are some common considerations when setting scheduling rules:
- When elastic scaling occurs, the system is often in a busy state. Publishing new instances to different nodes as much as possible can effectively avoid service quality damage or resource waste caused by uneven cluster pressure distribution;
- The availability of the system can be effectively improved by separating nodes and deploying them evenly to multiple zones;
- For closely related application instances, deployment to the same node or availability zone can be considered, which can reduce the call overhead and improve the stability.
For the third point, it is not enough to adjust the scheduling rules alone, but also rely on the ability of microservice governance to make the associated instances complete the service routing nearby. This is also one of the problems being solved by EDAs. However, EDAs has provided direct support for the deployment of EDAs in the same application sub node or availability zone. Users only need to check the deployment:
Another common problem in elastic scaling is that the cluster node resources are exhausted, because k8s will not actively expand nodes. At this time, even if elastic scaling produces accurate scheduling requests, k8s cannot allocate new application instances. Considering this possibility, users are required to reserve a part of node resources to meet the demand of flexibility. However, due to the existence of resource pool, the cost paid by users is not completely based on consumption, which is contrary to the original intention of elastic scaling. We can’t help thinking, how can we make the node resources not only make the best use of them, but also can’t be exhausted?
The cluster autoscaler project of the community provides the automatic scaling of cluster nodes, which solves the problem of on-demand application of resources to a certain extent. Various container service providers also provide the corresponding cluster autoscaler implementation. Alibaba cloud is no exception. The control console of ACK can directly configure the automatic scaling of the cluster
However, cluster autoscaler also has its shortcomings, such as:
- First of all, cluster autoscaler starts to intervene after the instance scheduling request that cannot be met. However, it takes a long time to purchase a new instance (possibly at the minute level), which is much longer than the start-up time of pod, which reduces the sensitivity of elastic scaling and increases the risk of service damage;
- Secondly, when the capacity is reduced, because the application instances are randomly released, some legacy application instances will be scattered on different nodes and become fragments. Cluster autoscaler will try to migrate these instances before the shrinking nodes, which will consume time and even cause stability problems, which is not conducive to cost control.
The root of the problem is scheduling. K8s needs to match instances and nodes. In this process, there is too much complexity for users to consider and deal with. Is eliminating the nodes of the k8s cluster and removing the scheduling process is the direction to the ultimate flexibility? The answer is yes, and Alibaba cloud’s serverless kubernetes service (ask) provides a ready-made way in this direction.
Using ask service, users don’t need to care about the adequacy of node resources. The application instance is scheduled in seconds and charged by quantity, which perfectly meets the demand of elastic scaling
EDAs also supports the takeover of ask clusters. Users can directly create serverless applications in EDAs, and they can get “complete” elastic scalability
This paper introduces the elastic scalability of microservice applications in cloud native system, and tries to explore the optimization direction and practice from two key points of elastic scaling, and how EDAs views and solves these problems.
Elastic scaling includes the whole process of the application life cycle, which involves the linkage of multiple application management capabilities. For example, how to perform lossless offline of nodes when the application instance is scaled down is a very important function in the process of elastic scaling. There are many similar scenarios, which are limited by space, so this paper will not expand.
For EDAs, elastic scaling is a comprehensive ability test. Only when every link is in place, can we better escort the user business. We also believe that EDAs can bring users a better experience by fully integrating with cloud native technology and cloud products, and help users “lie down” to make good use of elastic scaling and enjoy the benefits of cloud.
Link to original text
This article is the original content of Alibaba cloud and can not be reproduced without permission.