Performance comparison and pressure test before and after Nacos 2.0 upgrade

Time:2021-10-25

Introduction:Nacos 2.0 improves the performance by about 10 times by upgrading the communication protocol, framework and data model, and solves the performance problems gradually exposed after the release of Nacos 1.0. In this paper, through the pressure test of Nacos 1.0, Nacos 1.0 and the process of upgrading Nacos 2.0, Nacos 2.0 makes a comprehensive performance comparison to intuitively show the performance improvement brought by Nacos 2.0.

Author Xi Weng

Nacos 2.0 improves the performance by about 10 times by upgrading the communication protocol, framework and data model, and solves the performance problems gradually exposed after the release of Nacos 1.0. In this paper, through the pressure test of Nacos 1.0, Nacos 1.0 and the process of upgrading Nacos 2.0, Nacos 2.0 makes a comprehensive performance comparison to intuitively show the performance improvement brought by Nacos 2.0.

Pressure test preparation

Environmental preparation

In order to facilitate Nacos deployment, upgrade and display core performance indicators, we are from Alibaba cloud micro service engine MSE (\_https://cn.aliyun.com/product/aliware/mse\_)A 2-core CPU + 4G memory three node Nacos cluster purchased in.

Pressure measurement model

In order to show the system performance under different scales, we conduct pressure measurement by means of gradual pressurization, divide the pressure into three batches for gradual start-up, and observe the operation performance of the cluster under each batch. At the same time, a demo of Dubbo service will be added outside the pressure cluster, and JMeter will be used to continuously call under the pressure of 100 TPS to simulate the possible impact on the actual business call under different pressures.

During the pressure test, the server and client will be upgraded at an appropriate time; The upgrade of the server will directly use the one click upgrade function provided by MSE, and the upgrade of the client will be carried out by batch and alternate restart.

Pressure measurement process

Nacos1.X Server + Nacos1.X Client

First, start the first batch of pressure clusters to put pressure on MSE Nacos 1.2.1. Under the pressure of 6000 providers, when the cluster is stable, the CPU is about 25%, which can stably maintain 6000 instances.


Then start the second batch of pressure clusters, add 4000 providers and collect 10000 providers. At this time, the peak CPU of the cluster has reached 60%, and the stable operation is about 45%. The cluster can run stably.

Under the pressure of the first two batches, there was no stability problem in the cluster, so Dubbo calls remained normal and no errors occurred.

After the third batch of pressure clusters started, the pressure totaled 14000 providers. At this time, the cluster registers 13000 instances briefly, and then the number of instances drops and the CPU runs full. And by narrowing the time range, it can be seen that the falling instances are still shaking in a small range.


At the same time, there was an error in Dubbo calling. It can be seen from the consumer log that the Dubbo provider was removed because the server could not support this level of pressure, so there was an error of no provider when calling.


Nacos2.X Server + Nacos1.X Client

Since the instance double write operation will be performed during the upgrade of the server, the number of instances stored on the server will be twice the actual instance value during the upgrade. According to the above test results, you need to roll back the number of instances to the first batch of 6000 instances, or upgrade the configuration and capacity expansion machine before trying to upgrade. This paper uses the rollback pressure method to stop and then start the pressure cluster. Let the cluster return to normal before upgrading.

It can be seen from the monitoring diagram that after stopping the two batches of pressure, the cluster quickly returned to normal, the operation was stable, and the Dubbo call also returned to normal. Then use the upgrade function of MSE to upgrade. During the upgrade process, due to the performance loss of double write, the CPU has a large jitter; Moreover, the number of instances caused by double writing is doubled, which is actually equivalent to the limit pressure of 12000 instances. The server still has a certain jitter, which leads to some Dubbo errors. If it is upgraded under non limit pressure, it will not have this effect.

With the completion of the upgrade of the server, the dual write is stopped, the performance loss caused by dual write is eliminated, the CPU usage is reduced and tends to be stable, the number of instances is no longer jitter, and the Dubbo call is completely restored; Like the 1. X server, start the pressure cluster in two batches to compare the performance of the two versions under the same pressure.


Because the client still uses the 1. X client, the service water level is still very high. After all the pressure is started, the CPU reaches almost 100%; Although there is no large-scale instance drop like the 1. X server, there is still a small amount of instance jitter after running for a period of time, indicating that only upgrading the Nacos server to version 2.0 can make some improvement, but the performance problem is not completely solved.

Nacos2.X Server + Nacos2.X Client

In order to fully release the performance of Nacos 2.0, you also need to upgrade the cluster clients to versions above 2.0. Similarly, it will be replaced in three batches. During this period, due to the restart of the provider, it is normal for the server to fall and then recover. With the upgrading of the cluster, it can be found that the CPU has decreased significantly. When it finally reaches stability, the CPU decreases from nearly 100% to 20%, and the cluster runs 14000 instances stably.


Pressure measurement results

As mentioned above, we can get the performance differences of three node clusters with 2-core CPU + 4G memory under different versions:

Server version Client version Pressure scale Cluster stability CPU usage
Nacos1.X Nacos1.X 14000 Completely unstable 100%
Nacos2. X (upgrading) Nacos1.X 6000 There is a certain jitter 100%
Nacos2.X Nacos1.X 14000 There is a certain jitter 100%
Nacos2.X Nacos2.X 14000 stable 20%

It can be seen that Nacos 2.0 does greatly improve the performance. New users are recommended to directly adopt Nacos 2.0. Old users are recommended to upgrade the server side first, and then release dividends in the gradual upgrade client. Finally, from the monitoring of the whole pressure measurement perspective, we can intuitively feel the performance of different versions in different stages:


More information

clickhttps://www.aliyun.com/product/aliware/mse, learn more about MSE Nacos 2.0.

Copyright notice:The content of this article is spontaneously contributed by Alibaba cloud real name registered users, and the copyright belongs to the original author. Alibaba cloud developer community does not own its copyright or bear corresponding legal liabilities. Please refer to Alibaba cloud developer community user service agreement and Alibaba cloud developer community intellectual property protection guidelines for specific rules. If you find any content suspected of plagiarism in the community, fill in the infringement complaint form to report. Once verified, the community will immediately delete the content suspected of infringement.