Using Leader Selection in kubernetes to realize high availability of components


In kubernetes, generally, Kube schduler and Kube controller manager are deployed in multiple copies to ensure high availability, but there is only one instance that is really working. Here we use itleaderelectionTo ensure that the leader is in the working state, and after the leader is down, select a new leader from other nodes to ensure that the components work normally.

Not only the two components in k8s are used, but also the use of this package can be seen in other services, such as cluster autoscaler. Today, let’s take a look at the use of this package and how it is implemented internally.


The following is a simple example. After compiling, multiple processes are started at the same time, but only one process is working. When the leader process is killed, a leader will be re elected to work, that is, to execute therunMethods: 1

The example comes from the example package in client go

package main

import (

	metav1 ""
	clientset ""

func buildConfig(kubeconfig string) (*rest.Config, error) {
	if kubeconfig != "" {
		cfg, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
		if err != nil {
			return nil, err
		return cfg, nil

	cfg, err := rest.InClusterConfig()
	if err != nil {
		return nil, err
	return cfg, nil

func main() {

	var kubeconfig string
	var leaseLockName string
	var leaseLockNamespace string
	var id string

	flag.StringVar(&kubeconfig, "kubeconfig", "", "absolute path to the kubeconfig file")
	flag.StringVar(&id, "id", uuid.New().String(), "the holder identity name")
	flag.StringVar(&leaseLockName, "lease-lock-name", "", "the lease lock resource name")
	flag.StringVar(&leaseLockNamespace, "lease-lock-namespace", "", "the lease lock resource namespace")

	if leaseLockName == "" {
		klog.Fatal("unable to get lease lock resource name (missing lease-lock-name flag).")
	if leaseLockNamespace == "" {
		klog.Fatal("unable to get lease lock resource namespace (missing lease-lock-namespace flag).")

	// leader election uses the Kubernetes API by writing to a
	// lock object, which can be a LeaseLock object (preferred),
	// a ConfigMap, or an Endpoints (deprecated) object.
	// Conflicting writes are detected and each client handles those actions
	// independently.
	config, err := buildConfig(kubeconfig)
	if err != nil {
	client := clientset.NewForConfigOrDie(config)

	run := func(ctx context.Context) {
		// complete your controller loop here
		klog.Info("Controller loop...")

		select {}

	// use a Go context so we can tell the leaderelection code when we
	// want to step down
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	// listen for interrupts or the Linux SIGTERM signal and cancel
	// our context, which the leader election code will observe and
	// step down
	ch := make(chan os.Signal, 1)
	signal.Notify(ch, os.Interrupt, syscall.SIGTERM)
	go func() {

Description of key start-up parameters:

Kubeconfig: Specifies the kubeconfig file address
Lease lock Name: Specifies the name of the lock
Lease lock namespace: Specifies the namespace of lock storage
ID: the difference parameter provided in the example, which is used to distinguish the instance
Logtostderr: a parameter provided by Klog to specify the log output to the console
v: Specifies the log output level

Start two processes at the same time:
Start process 1

go run main.go -kubeconfig=/Users/silenceper/.kube/config -logtostderr=true -lease-lock-name=example -lease-lock-namespace=default -id=1 -v=4
I0215 19:56:37.049658   48045 leaderelection.go:242] attempting to acquire leader lease  default/example...
I0215 19:56:37.080368   48045 leaderelection.go:252] successfully acquired lease default/example
I0215 19:56:37.080437   48045 main.go:87] Controller loop...

Start process 2:

➜  leaderelection git:(master) ✗ go run main.go -kubeconfig=/Users/silenceper/.kube/config -logtostderr=true -lease-lock-name=example -lease-lock-namespace=default -id=2 -v=4
I0215 19:57:35.870051   48791 leaderelection.go:242] attempting to acquire leader lease  default/example...
I0215 19:57:35.894735   48791 leaderelection.go:352] lock is held by 1 and has not yet expired
I0215 19:57:35.894769   48791 leaderelection.go:247] failed to acquire lease default/example
I0215 19:57:35.894790   48791 main.go:151] new leader elected: 1
I0215 19:57:44.532991   48791 leaderelection.go:352] lock is held by 1 and has not yet expired
I0215 19:57:44.533028   48791 leaderelection.go:247] failed to acquire lease default/example

Here, we can see that the process with id = 1 holds the lock and runs the program, while the process with id = 2 indicates that it cannot obtain the lock and is constantly trying.

Now kill the process with id = 1. After waiting for the lock to be released (there is a leadduration time), the leader becomes the process with id = 2 to perform the work

I0215 20:01:41.489300   48791 leaderelection.go:252] successfully acquired lease default/example
I0215 20:01:41.489577   48791 main.go:87] Controller loop...

In depth understanding

The basic principle is to use theconfigmap , endpointsOrleaseThe resource implements a distributed lock. The node that grabs the lock becomes the leader and updates it regularly. Other processes are also trying to preempt. If they can’t, they will continue to wait for the next cycle. When the leader node fails, the lease expires, and other nodes become new leaders.


adoptleaderelection.RunOrDieStart up,

func RunOrDie(ctx context.Context, lec LeaderElectionConfig) {
	le, err := NewLeaderElector(lec)
	if err != nil {
	if lec.WatchDog != nil {

Pass in parametersLeaderElectionConfig :

type LeaderElectionConfig struct {
	//Type of lock
	Lock rl.Interface
	//Time to hold lock
	LeaseDuration time.Duration
	//Timeout for updating lease
	RenewDeadline time.Duration
    //Competing for lock time
	RetryPeriod time.Duration
    //There are three functions to be executed when the state changes
    //1. Onstartedleading startup is the executed business code
    //2. Method to stop the execution of onstoppedleading leader
    //3. Onnewleader the method executed when a new leader is generated
	Callbacks LeaderCallbacks

    //Monitor and check
	// WatchDog is the associated health checker
	// WatchDog may be null if its not needed/configured.
	WatchDog *HealthzAdaptor
    //Whether to execute the release method when the leader exits
	ReleaseOnCancel bool
	// Name is the name of the resource lock for debugging
	Name string

LeaderElectionConfig.lock It can be saved in the following three resources:
There is also amultilockIn other words, you can select two options. When one of them fails to save, select the second one
Can be in interface.go We can see that:

switch lockType {
	Case endpointsresourcelock // is saved in endpoints
		return endpointsLock, nil
	Case configmapsresourcelock // is saved in configmaps
		return configmapLock, nil
	Case leasesresourcelock // is saved in leases
		return leaseLock, nil
	Case endpointsleasesresourcelock: //, try to save in lease when endpoint fails
		return &MultiLock{
			Primary:   endpointsLock,
			Secondary: leaseLock,
		}, nil
	Case configmapsleasesresourcelock: //, tries to save in configmap first, and saves in lease if it fails
		return &MultiLock{
			Primary:   configmapLock,
			Secondary: leaseLock,
		}, nil
		return nil, fmt.Errorf("Invalid lock-type %s", lockType)

Taking the lease resource object as an example, you can view the saved content in the following table:

➜  ~ kubectl get lease example -n default -o yaml
kind: Lease
  creationTimestamp: "2020-02-15T11:56:37Z"
  name: example
  namespace: default
  resourceVersion: "210675"
  selfLink: /apis/
  uid: a3470a06-6fc3-42dc-8242-9d6cebdf5315
  // acquire time t1712:15acquire
  Holderidentity: "2" // the identity of the process holding the lock
  // duration
  Leadtransitions: 1 // leader replacement times
  Renewtime: "2020-02-15t12:05:37.134655z" // the time when the lease is updated

Pay attention to the fields in the spec and mark them separately. The corresponding structures are as follows:

type LeaderElectionRecord struct {
	Holderidentity string ` JSON: "holderidentity" '// the identity of the process holding the lock. Generally, the host name can be used
	Lease durationsseconds Int ` JSON: "leasedurationseconds" '// lease of lock
	Acquiretime metav1. Time ` JSON: "acquiretime" '// time of lock holding
	Renewtime metav1. Time ` JSON: "renewtime" // update time
	Leadertransitions Int ` JSON: "leadertransitions" '// number of leader changes

Acquired lock and update lock

The run method contains the entry to acquire and update the lock

// Run starts the leader election loop
func (le *LeaderElector) Run(ctx context.Context) {
	defer func() {
        //Exit execution
        //Stop the callback execution when the method is executed
    //Continue to obtain the lock. If the lock is successful, the following methods will be executed. Otherwise, it will continue to try again
	if !le.acquire(ctx) {
		return // ctx signalled done
	ctx, cancel := context.WithCancel(ctx)
	defer cancel()
    //If the lock is obtained successfully, the current process becomes the leader and the business code in the callback function is executed
	go le.config.Callbacks.OnStartedLeading(ctx)
    //The lease is updated continuously to ensure that the lock is always held

le.acquireandle.renewAll of them are calledle.tryAcquireOrRenewFunction, but the processing of the returned result is different.

le.acquireaboutle.tryAcquireOrRenewReturn success to exit, failure to continue.

le.renewOn the contrary, success continues and failure exits.

Let’s seetryAcquireOrRenewMethods: 1

func (le *LeaderElector) tryAcquireOrRenew() bool {
	now := metav1.Now()
    //Lock resource object content
	leaderElectionRecord := rl.LeaderElectionRecord{
		HolderIdentity:        le.config.Lock . identity(), // unique identity
		LeaseDurationSeconds: int(le.config.LeaseDuration / time.Second),
		RenewTime:            now,
		AcquireTime:          now,

	// 1. obtain or create the ElectionRecord
    //Step 1: obtain the original lock from k8s resource
	oldLeaderElectionRecord, oldLeaderElectionRawRecord, err := le.config.Lock.Get()
	if err != nil {
		if !errors.IsNotFound(err) {
			klog.Errorf("error retrieving resource lock %v: %v", le.config.Lock.Describe(), err)
			return false
        //Resource object does not exist, lock resource creation
		if err = le.config.Lock.Create(leaderElectionRecord); err != nil {
			klog.Errorf("error initially creating leader election record: %v", err)
			return false
		le.observedRecord = leaderElectionRecord
		le.observedTime = le.clock.Now()
		return true

	// 2. Record obtained, check the Identity & Time
    //The second step is to compare whether the lock resource stored in k8s is consistent with the lock resource acquired last time
	if !bytes.Equal(le.observedRawRecord, oldLeaderElectionRawRecord) {
		le.observedRecord = *oldLeaderElectionRecord
		le.observedRawRecord = oldLeaderElectionRawRecord
		le.observedTime = le.clock.Now()
    //Judge whether the lock held is expired and whether it is held by yourself
	if len(oldLeaderElectionRecord.HolderIdentity) > 0 &&
		le.observedTime.Add(le.config.LeaseDuration).After(now.Time) &&
		!le.IsLeader() {
		klog.V(4).Infof("lock is held by %v and has not yet expired", oldLeaderElectionRecord.HolderIdentity)
		return false

	// 3. We're going to try to update. The leaderElectionRecord is set to it's default
	// here. Let's correct it before updating.
    //Step 3: I am a leader now, but it is divided into two groups. The last time I was a leader and the first time I became a leader
	if le.IsLeader() {
        //If you are a leader, you do not need to update acquiretime and leadertransition
		leaderElectionRecord.AcquireTime = oldLeaderElectionRecord.AcquireTime
		leaderElectionRecord.LeaderTransitions = oldLeaderElectionRecord.LeaderTransitions
	} else {
        //If you become a leader for the first time, update the replacement times of the leader
		leaderElectionRecord.LeaderTransitions = oldLeaderElectionRecord.LeaderTransitions + 1

    //Update the lock resource. If there is a change between get and update, the update will fail
	// update the lock itself
	if err = le.config.Lock.Update(leaderElectionRecord); err != nil {
		klog.Errorf("Failed to update lock: %v", err)
		return false

	le.observedRecord = leaderElectionRecord
	le.observedTime = le.clock.Now()
	return true

What if concurrent operations occur at this step?

It is important to take advantage of the atomicity of k8s API operation

stayle.config.Lock.Get()One of the objects that will get the lock inresourceVersionField is used to identify the build of a resource object, and its value is updated each time the update operation. If an update operation is attachedresourceVersionField, then the apiserver will pass the validation of the currentresourceVersionWhether the value of is matched with the specified value to ensure that there are no other update operations in this update operation cycle, thus ensuring the atomicity of the update operation.


Leaderelection mainly uses the atomicity of k8s API operation to implement a distributed lock, which can be elected in the continuous competition. It is very common in k8s that the specific business code will be executed only when the leader is selected. In addition, we can easily use this package to complete the writing of components, so as to realize the high availability of components. For example, deployment as a multi replica deployment will restart after the leader’s pod exits, and the lock may be acquired by other pods to continue execution.

Complete code:

Pay attention to “learning procedures” official account, dry cargo content学点程序