As a registry, spring cloud Alibaba Nacos not only provides the functions of service registration and service discovery, but also provides a mechanism for service availability monitoring. With this mechanism, Nacos can perceive the health status of the service, so as to provide healthy service instances for service callers, and finally ensure the normal implementation of the business system.
Two health examination mechanisms
Two health check mechanisms are provided in Nacos:
How to understand these two mechanisms?
Imagine a scene where a geological disaster suddenly occurs in your area. You are covered under the ruins. The search and rescue team must know that you are in the ruins before they can rescue you. Is there any way to let the rescue team know that you are under the ruins?
Third, you shout help in the ruins! help! I am here! ， Let the search and rescue team know your location and health.
Second, the search and rescue team used their professional inspection equipment to detect that you are buried under the ruins.
The above two methods are similar to the two health check mechanisms of Nacos, that is, the client active reporting mechanism is that the client actively reports its health status to the Nacos server every once in a while, while the server-side reverse detection mechanism is the Nacos server-side to detect whether the client is healthy.
How to set up the health examination mechanism?
The health check mechanism in Nacos cannot be set actively, but the health check mechanism is strongly related to the service instance type of Nacos.
In other words, the two service instances in Nacos correspond to two health check mechanisms:
Temporary instance (also known as non persistent instance): corresponds to the client active reporting mechanism.
Persistent instance (also known as persistent instance): server side reverse detection mechanism.
Take Taobao as an example. During the promotion of the 11th National Congress of the Communist Party of China, the traffic will be much higher than usual. At this time, the service must add more instances to deal with high concurrency, and these instances do not need to continue to be used after the 11th National Congress of the Communist Party of China. It is more appropriate to use temporary instances. For some standing instances of services, it is more appropriate to use permanent instances.
Client active reporting mechanism
The temporary instance will actively report its health status every 5 seconds, the data packet sent is called heartbeat packet, and the mechanism of sending heartbeat packet is called heartbeat mechanism.
If the heartbeat packet interval exceeds 15 seconds, the Nacos server will mark this service instance as an unhealthy instance. If the heartbeat packet exceeds 30 seconds, the Nacos server will delete this service instance from the service list。
When running the Nacos project, you can see the log of heartbeat packets actively reported by the client, as shown in the following figure:
It can be seen from the above picture that the Nacos client will report its health status every 5S. The request information is as follows:
Server reverse detection mechanism
The permanent instance uses the server-side reverse detection method to realize the health check. Its detection cycle is 2000 milliseconds + random number (within 5000 milliseconds). If an exception is detected, the service instance will be marked as a non-health instance, but the service instance will not be deleted as a temporary instance.
At present, there are three built-in detection protocols for Nacos server reverse detection: http detection, TCP detection and MySQL detection.
Generally speaking, HTTP and TCP detection can cover most health check scenarios. MySQL is mainly used in special business scenarios. For example, when the primary and standby databases need to provide external access through the service name, and it is necessary to determine whether the currently accessed database is the primary database, our health check interface at this time is a MySQL command to check whether the database is the primary database.
By default, the persistent instance uses TCP detection, which can be observed on the Nacos console, as shown in the following figure:
By default, the IP port will be used to check, as shown in the following figure:
The general logic of TCP detection is to establish a channel with the registered instance and constantly Ping the port of the registered instance to judge whether the instance is healthy.
HTTP detection needs to be manually configured on the Nacos console, as shown in the following figure:
We add the implementation code of the probe interface to the service instance:
At this time, we restart the service instance. In the service details, we can see that the HTTP probe we configured has taken effect. We can check that the instance is healthy, as shown in the following figure:
The Nacos server determines whether the instance is in a healthy state by checking whether the HTTP interface returns a 200 status code.
Health examination mechanism under Cluster
The health examination mechanism under the cluster can be summarized in one sentence, that is, “each performs his own duties”. Each service corresponds to a primary registry. When the registry receives the heartbeat packet of the temporary instance, it synchronizes the health status to other registries. The permanent instance is similar. Each service corresponds to a main registry. When the responsible registry detects that the health status of the service instance has changed, it will synchronize the health status of the instance to other registries, so as to realize the health inspection mechanism under the cluster.
Nacos provides two health check mechanisms: the client-side active reporting mechanism of temporary instances and the server-side reverse detection mechanism of permanent instances. The temporary instance sends a heartbeat packet to the Nacos server every 5S. After receiving the heartbeat packet, the server synchronizes the health status to other registration centers. Permanent instances support three kinds of detection protocols, TCP, HTTP and mysql. The default detection protocol is TCP, which is to judge whether the instance is healthy by continuously Ping