Nacos as configuration center — election mechanism

Time:2020-12-17

Through the first two articles, we can see that Nacos is powerful from the perspective of use. Our existing configuration support is more friendly and less intrusive to the project. This is also the motivation for me to continue to study him and see if it can be introduced into the project. The following three topics are my main research directions

  • Electoral mechanism
  • Data synchronization mechanism
  • performance

The function of Nacos as configuration center is based on raft protocol. Why raft?

The answer is in two words: simple. Compared with Paxos protocol, raft protocol is much simpler. We should do the same when we develop and do the scheme, which is simple and effective, time-saving, easy to implement and easy to maintain. We gradually cultivate our ability to abstract the most simple and direct solutions from complex business, and cultivate our ability to simplify.

Next, let’s not talk nonsense, but go directly to the election mechanism part of raft agreement.

In raft, a server can play one of the following roles at any time:

  1. Leader: the handler of all requests. The leader copy accepts the client’s update request, and then synchronizes multiple other copies after local processing;
  2. Follower: the passive updater of the request receives the update request from the leader and writes it to the local log file
  3. Candidate: if the follower copy does not receive the heartbeat of the leader copy within a period of time, it is judged that the leader may have failed. At this time, the process of selecting the master will be started. At this time, the replica will become candidate status until the end of the election.
  4. Elections in this democratic society are very similar. Each new term of office is called a term of office
 

Seeing this, what do you think of the election process of raft? Here you can meditate for 5 minutes to test your ability to make plans. Then look at how the Tauren is realized, learn from the contrast of other people’s ideas. Before you do something, you should have your own ideas and opinions, think about it first, and then do it. This has two advantages:

1. Not blindly follow, can get rid of its shortcomings, learning a bit;

2. Can exercise their own ability to do things, can make themselves more independent, independent of others, become the core and pillar of the team.

 
The election process is as follows:

    Nacos as configuration center -- election mechanismNacos as configuration center -- election mechanism

  1. The system has just been started. The tenure of all nodes is 0 and everyone’s role is follower
  2. The first trigger of a start-up node does not detect the heartbeat timeout. The self increment tenure is 1, and the time is reset (voting start time), vote for itself, and then vote to all other nodes
  3. The current tenure of other nodes is 0, and the log is not empty, so they will definitely vote for it. Moreover, these nodes will clear their heartbeat blank waiting time because they have received the voting election of candidate. They will not vote before the timeout, so as to avoid the possibility of invalid voting caused by multiple voting
  4. The first voting node receives half of the votes and becomes the leader.
     

Nacos as configuration center -- election mechanismNacos as configuration center -- election mechanismNacos as configuration center -- election mechanismNacos as configuration center -- election mechanism

1. Each time a follower receives a heartbeat from the leader, it will reset its own heartbeat timer and start timing again. If the current heartbeat timer times out and still does not receive the leader’s heartbeat, it will change from follower to candidate

2. Since the current term of office is increased and the timing (election timing) is started, voting is initiated to other nodes

3. Other nodes will compare the tenure and the serial number of the log. At least they can’t be older than their own data before voting for the first node

4. If more than half of the nodes vote successfully, they will become the leader. Otherwise, they will have to wait for the election time-out to launch a second round of voting.

Dynamic process: https://raft.github.io/

Do you have any questions when you see this?
     

Personal questions:

After selecting the master node, how can the slave node know who is the master node?

How to set the length of tenure? All nodes are the same?

Explain the implementation process of Nacos from the source code.

The raft protocol is implemented in the raft core class.

    Nacos as configuration center -- election mechanism

Two subclasses of raft are responsible for election and heartbeat.

1. Entrance to the election

public static final long TICK_ PERIOD_ MS =  TimeUnit.MILLISECONDS.toMillis (500L); public void init() throws exception {// omit other logical codes Loggers.RAFT.info ("finish to load data from disk, cost: {} ms.", ( System.currentTimeMillis () - start));         GlobalExecutor.registerMasterElection (new MasterElection());         GlobalExecutor.registerHeartbeat (new HeartBeat());         Loggers.RAFT.info ("timer started: leader timeout ms: {}, heart-beat timeout ms: {}",                 GlobalExecutor.LEADER_ TIMEOUT_ MS,  GlobalExecutor.HEARTBEAT_ INTERVAL_ MS);    }    public static void registerMasterElection(Runnable runnable) {        NAMING_ TIMER_ EXECUTOR.scheduleAtFixedRate (runnable, 0, TICK_ PERIOD_ MS,  TimeUnit.MILLISECONDS );    }    public static void registerHeartbeat(Runnable runnable) {        NAMING_ TIMER_ EXECUTOR.scheduleWithFixedDelay (runnable, 0, TICK_ PERIOD_ MS,  TimeUnit.MILLISECONDS );    }

You can see that the election task and heartbeat task will be triggered every 500ms

2. Let’s take a look at how the heartbeat works

   Nacos as configuration center -- election mechanism

   Nacos as configuration center -- election mechanism

Looking at the source code “1,” we can see that no election will be held during the leader due period. The leaderdue and heartbeatdue (heartbeat detection duration) are reset only after the leaderdue has expired, and a vote is sent. Here is a detail. There is a random value at code “3”. Have you ever thought about why you want to add this random value?

A: the random value is to make the leader duems of each node different, that is, the leader tenure of each node is different, so as to avoid voting at the same time and improve the success rate of leader election. In other words, a node’s leaderduems first decreases to 0, increases term automatically, and then issues a vote. This is because the term + 1 value of the node is larger than that of other nodes, so it successfully becomes the leader. If we don’t add random value, we will start the first vote at the same time and term + 1, so there will be no leader in this round of election.

3. Specific process of election

    Nacos as configuration center -- election mechanism

    Nacos as configuration center -- election mechanism

The process of initiating the first ticket is that the initiator of the first ticket sends the first ticket request to other nodes that do not contain itself. After receiving the request, the other nodes will process “3.” in the code above to see whether the term is larger than its own, and vote to him if the term is larger, and then set its own term as the term to initiate the first ticket request, and reset leaderduems (in order to avoid initiating it again) First round ticket request). Finally, the result of the first ticket is returned to the initiator. The initiator of the first ticket receives the result of the first ticket, and then according to the result, half of the leader of the first ticket becomes the real leader. This is the end of the election.

So the question is, how do other nodes know the result of the first ticket? If so, how do you inform other nodes?

At this time, because other nodes choose a node as the main node, and then reset their own leaderduems, no election will be held.

4. Heartbeat process

    Nacos as configuration center -- election mechanism

After the first step, duets will only be processed like the heartbeat. The heartbeat processing cycle here is far less than the term period of election. In addition, both the initiator and receiver of the heartbeat reset the election time. By prolonging the time, each node is prevented from initiating the first ticket request.

The above code solves the problem of how to inform other nodes of this message after a node becomes a leader. The answer is to pass the leader to other nodes through heartbeat. After receiving the heartbeat request, other nodes update the leader. The code to receive the heartbeat request is as follows.

 Nacos as configuration center -- election mechanism

This is the end of the introduction of the electoral mechanism.

Now I have three new questions:

1. Follower timed out. Is there a problem?

2. Leader timeout, any questions?

3. How to deal with the problem of cleft brain?

Question 1:

If the follower time out, it will re launch the election. If it is not connected with other nodes, it will always be in the election state. If it recovers after a period of time-out, it will become a new leader or (receive heartbeat message to complete the election) or become the follower of the original leader (after receiving the heartbeat message before sending the election request, it becomes a follower). At this time, there will be two leaders. However, because the term of the old leader is small, sending heartbeat messages is not effective, and the new leader is finally synchronized as a follower. This conclusion is only an analysis conclusion

There are two leaders that will affect the release of configuration information?

Question 2:

Leader timeout re election, poor students new leader. If the old leader is restored, it will be synchronized as a follower through the heartbeat.

Question 3:

Through the answers to question 1 and question 2, we can see that the old leader is synchronized as a follower through time renewal and term comparison.