Yesterday, after a new cluster was established, a new problem occurred. One of the master nodes could not work normally. Although it can be used normally, there is a single point of failure. Today, etcd health check failed during repair.
Yesterday, after a new cluster was established, a new problem a problem occurred, and one of the master nodes did not work properly. Although can be used normally, but there is a single point of failure, today in the repair of the etcd health check self-test failed.
When joining the cluster, the following error messages appear:
When you join a cluster, the following error occurs
Prompt that the etcd monitoring check fails. Check the kubedm configuration information in the kubernetes cluster.
Prompt the etcd monitoring check to fail and review the kubeadm configuration information in the Kubernetes cluster.
\[[email protected] ~\]# kubectl describe configmaps kubeadm-config -n kube-system ---- apiEndpoints: master-01: advertiseAddress: 10.0.0.11 bindPort: 6443 master-02: advertiseAddress: 10.0.0.12 bindPort: 6443 master-03: advertiseAddress: 10.0.0.13 bindPort: 6443 apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterStatus Events: <none>
When the cluster is set up, the etcd is a mirror image. After a problem occurs on master 02, the etcd is still stored on each master after elimination. Therefore, when you add it again, you will know that the health check has failed.
Because when the cluster is built, etcd is mirrored, after the problem on master02, after the cull is completed, etcd is still stored on top of each master, so when you add again, you will learn that the health check failed.
At this point you need to go inside the container to manually delete this etcd, first get the list of etcd pods in the cluster to see, and go inside to give a sh window
\[[email protected] ~\]# kubectl get pods -n kube-system | grep etcd \[[email protected] ~\]# kubectl exec -it etcd-master-03 sh -n kube-system
After entering the container, do the following：
After entering the container, do the following
\##Configuration environment $ export ETCDCTL\_API=3 $ alias etcdctl='etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key' ##Check etcd Cluster member list $ etcdctl member list ##Delete etcd Cluster members master-02 $ etcdctl member remove ## View again etcd Cluster member list $ etcdctl member list ##Exit container $ exit
View the list and delete the master that no longer exists
View the list and remove the master that no longer exists
Join the master again to succeed.
Join master again and you’ll be successful
High tech Park