Practice and understanding of k8s health examination mechanism

Time:2021-8-16

Ibackground
Recently, Devops (integration of development, operation and maintenance) was implemented in the company, and the original docker swarm cluster was migrated to k8s cluster. In case of inconsistency between the health check of the original cluster and the health check of the existing cluster, here is a summary.
IIPrinciple and necessity
Application health check, as the name suggests, is to check the existing situation of the application (including database, cache and socket connection), so that the cluster scheduling manager can find the survival of the application, so as to control the application (rescheduling and allocation, restartable, scheduling to other nodes, etc.), especially for high stability requirements (business can not be interrupted), The deployment mechanism is flexible (it supports rolling publishing and does not interrupt applications during deployment, so it does not affect business use)
IIIpractice
1. The original dockersswarm cluster health check mechanism was to specify the health check address of docker in dockerfile, as follows:
health examination
HEALTHCHECK –interval=120s –timeout=5s CMD curl –fail http://localhost:8080/api/pub… || exit 1
2. Logic of application / API / public / health / check (example):
(1) Controller
Practice and understanding of k8s health examination mechanism
If the detection fails, an error code other than 200 needs to be returned in the httpservletresponse to allow the cluster to judge the HTTP call exception
(2) Service processing
Practice and understanding of k8s health examination mechanism
2. The new k8s cluster health check configuration is now available
Practice and understanding of k8s health examination mechanism
(1) Delete the calling code of health check in the original dockerfile. This method does not take effect in the new k8s cluster
(2) As shown in the figure aboveReady statusInspection strategy
This policy can be used to judge whether the application has been started and can accept business normally. It can be configured according to the application startup, in seconds

(3) As shown in the figure aboveSurvival stateInspection strategy
This policy configuration can be used by the cluster to judge whether the application is alive and can accept business normally. It can be configured according to the actual situation of the application, in seconds