1、 Log storage
The storage contents of each log are as follows:
Term: the term of a leader
Application operation content: the request sent by the client and the command to be executed by the replicated state machine. The above is a kV system. Each operation is the content of a key.
2、 Log status
The log can be divided into the following states:
That is, it has just been added to the system
If a log is received by most nodes, it will be submitted, that is, it can be applied to the state machine.
That is, it has been applied to the state machine and can be returned to the client.
3、 Log related messages
This message is sent by the leader and has two functions:
A. Send the logs generated by the commands sent by the client to the follower, so as to promote the logs to reach a consistent state;
B. Heartbeat indicates that there is a leader in the cluster and there is no need to initiate an election;
The first scenario is to receive the command from the client and send it back to the client after most of the followers receive it;
The second scenario is sent out by the leader regularly;
The relevant rules are as follows:
1. For leaders
- For a follower, if the last received log index is larger than the next index, all log entries after the next index will be sent out through appendentries RPC
- If successful: update nextindex and matchindex of the follower
- If appendentries RPC fails due to inconsistent logs: nextindex decrement and resend (section 5.3)
- If there is an n that satisfies n > commitindex and matchindex [i] > = n and log [n]. Term = = currentterm, then commitindex is assigned to n
The above rule is easy to understand
Suppose that the nodes are a, B, C, D and E from top to bottom, and the current leader is node A. as mentioned earlier, the leader will record nextindex and matchindex for each follower. Let’s assume the most ideal situation, that is, the nextindex recorded by a for B is 9 and matchindex is 8, so the nextindex in the heartbeat message is 9. When B receives the message, it will check whether the log before nextindex exists locally, Because the log of 6-8 does not exist, it returns false. Therefore, a will backtrack nextindex to 6 before matching, and then send the log of 6-8 to B, and update both matchindex and nextindex.
Rule 2 is difficult to understand
If n > commitindex and matchindex [i] > = n andLog [n]. Term = = n of currentterm, the commitindex is assigned to n
The key point is the bold content. To put it simply, the current leader cannot submit the log directly. The term is not his own, that is, the predecessor’s log cannot be submitted directly. For example:
This is a cluster of five nodes. The server numbers are S1-S5. The top is the log index. The number in each box represents the term of the log, and the bottom letter represents the scenario, a-e.
Scenario a: S1 is the leader, term is 2, and the log with index 2 is copied to S2;
Scenario B: S1 hangs up, S5 is selected as leader, term grows to 3, and S5 receives a new log at the position of index 2;
Scenario C: S5 hangs up, S1 is selected as leader, term is increased to 4, S1 copies the log with index 2 and term 2 to S3, and more than half of them are satisfied at this time.
The problem is scenario C: term is 4,Before, more than half of the logs with term 2 have been submitted. Should S1 submit the log or not？
If S1 submits, the log with index 2 and term 2 will be applied to the state machine and cannot be undone;
At this time, if S1 hangs up and comes to scenario D, S5 can be selected as the leader, because according to the previous log comparison strategy, the term of the last log of S5 is 3, which is larger than that of the last log of S2, S3 and S4.
Once S5 is selected as the leader, that is, scenario D, S5 will copy the log with index 2 and term 3 to the above machine. At this time, the position with index 2 submitted by S1 will be re covered, thus violating the consistency.
If S1 does not submit, it will wait until there are more than half of the logs in term 4, and then submit the logs of the previous terms together. That is, in the e scenario, if S1 hangs at this time, S5 will not be selected as the leader, because the term of the last log of S2 and S3 is 4, which is larger than 3 of S5, so S5 will not get the vote, and then S5 will not be able to cover the above submission.
In summary: the leader can not submit the predecessor’s log directly, even if the predecessor’s log has been received by most nodes. Instead, the leader submits the predecessor’s log indirectly after the current tenure’s log is received by most nodes.
4、 Other technical details
1. Appendentries RPC message parameters
Term: the term of a leader
Leaderid: leader ID
Prevlogindex: previous log index number
Prevlogterm: the term number of the previous log
Specific message to copy
Ladercommitindex: leader’s commitindex
2. Follower receives the response processing
The first is the inspection of term, which is the common logic
Check whether the other party’s term is smaller than your own. If so, return your own term and fail;
If your own term is larger than the other party’s, no matter what role you are playing becomes follower, update your term to the other party’s term and add a log;
If term is equal to itself, only follower and candidate become follower, update term and add log. For leader, failure should be returned.
3. On the problem of too large log
For a 7 * 24-hour service, if the log is always added, the final disk space is certainly not enough, and the recovery is too slow when there is a problem, which produces the demand for sunshine. This problem was taken into account in the design of raft algorithm, which will be left to the later chapter for analysis.