Distributed protocol: Paxos


Ask questions

How to realize cross process communication in distributed system is a classical problem. There are two main solutions:

  • Shared memory
  • message passing

This article does not record the way of shared memory, but introduces Paxos algorithm from message passing. In the distributed system based on message passing communication model, the process will inevitably slow down, be killed or restart, and the messages may be delayed, repeated, lost and so on. In a typical scenario, for a distributed database system, if the initial state of each node is consistent and each node performs the same operation, then they are in a consistent state at last. In order to ensure the consistent state of each node, one command needs to be executed on each instructionConsensus algorithmTo ensure that the instructions (insert, update, delete) seen by each node are consistent. A general consensus algorithm can be applied in many scenarios and is an important problem in distributed computing. Therefore, the research on consensus algorithm has not stopped since the 1980s.

In 1982, a great god named Lamport and two other authors published a paperThe Byzantine Generals Problem, it has been explained in a more obscure wayPaxosAlgorithm, which opens the door of computer in distributed consistency, which is also recognized as an effective way to solve the problem of distributed consistency. This paper is difficult to understand. For a better understanding, please refer toPaxos Made Simple

If you want to understand the consistency algorithm, you must first clarify several actual requirements. Assuming that there is a group of processes that can make proposals, the following points need to be guaranteed for a consistent algorithm:

  • If no proposal is put forward, there will be no selected proposal.
  • Among the multiple proposals put forward at one time, only one proposal can be selected.
  • When a proposal is selected, other processes can obtain the proposal.

With these requirements, let’s have an in-depth understanding of Paxos algorithm.

Algorithm content

There are three roles involved in Paxos, represented by proposer, acceptor and learner (of course, multiple roles are allowed). Proposer is responsible for putting forward proposals, which include proposal number and proposal content (for example, proposal 001 – increase tax rate by 10%). Acceptor can receive proposals. If the proposals are accepted by most acceptors, they will be submitted for approval (chosen), Learner can only learn approved proposals from acceptor.

The above statement may not be easy to understand. It can be understood as the relationship between representatives and the masses. Representatives can be both proponents and acceptors. They put forward proposals at the meeting, and other representatives will decide whether to unify the proposal. If the majority agrees, the proposal means passing, and the masses will learn the spirit of the proposal from each representative. Clarify the role relationship and look back at the previous questions:

  • Constraint 1: a resolution (value) can only be approved after it is proposed by the proposer (a resolution without approval is called a proposal).
  • Constraint 2: during the execution of Paxos algorithm (there may be multiple proposals), only one proposal will be approved.
  • Constraint 3: only when a proposal is approved as a resolution can learner obtain it.

Derivation process

In the process of approving value, the proposer first sends the value to the acceptor, and then the acceptor accepts the value. In order to meet the constraint of approving only one value, a proposal that requires acceptance by the majority becomes valueresolution。 This is because both groups of majorities have at least one public acceptor, whether divided by number or weight. If each acceptor can only accept one value, constraint 2 can be guaranteed.

This creates an obvious new constraint:

P1: an acceptor must accept the proposal received for the first time.

Note that P1 is incomplete. If exactly half of the proposals accepted by the acceptor have valuea and the other half have valueb, a majority cannot be formed and any value cannot be approved.

Constraint 2 does not require approval of only one proposal, implying that there may be multiple proposals. As long as the value of the proposal is the same, approving multiple proposals does not violate constraint 2. The constraint P2 can then be generated:

P2: once a proposal with valuev is approved (chosen), the proposal approved later (chosen) must have valuev.

Note: in some way, each proposal can be assigned a number to establish a full order relationship between proposals. The so-called “after” refers to all proposals with larger numbers.

If both p1 and P2 can be guaranteed, constraint 2 can be guaranteed.

Approving a value means that multiple acceptors accept the value. Therefore, P2 can be strengthened:

P2a: once a proposal with valuev is approved (chosen), any proposal accepted again by the acceptor must have valuev.

Since the communication is asynchronous, P2a and P1 will conflict. If a value is approved and a proposer and an acceptor wake up from sleep, the former puts forward a proposal with a new value. According to P1, the latter should be accepted, but according to P2a, it should not be accepted. In this scenario, P2a and P1 are contradictory. Therefore, we need to change our thinking and restrict the proposer’s behavior:

P2b: once a proposal with valuev is approved (chosen), any proposal made by the proposer in the future must have valuev.

Since all proposals accepted by the acceptor must be put forward by the proposer, P2b contains P2a, which is a stronger constraint.

However, it is difficult to propose implementation means according to P2b. Therefore, P2b needs to be further strengthened.

Suppose a valuev with number m has been approved (chosen), to see under what circumstances any proposal with number n (n > m) contains valuev. Because m has been approved (chosen), it is obvious that there is an acceptor majority C, and they all accept v. Considering that any majority has at least one public member with C, a constraint P2C containing P2b can be found:

P2C: if a proposal numbered n has valuev and the proposal is approved (chosen), then there is a majority. Either none of them accept any proposal numbered less than N, or they have accepted all proposals numbered less than N. the proposal with the largest number has valuev

To meet the constraints of P2C, before proposing a proposal, the proposer must first communicate with the acceptor enough to form a majority to obtain their latest accepted proposal (prepare process), then determine the value of the proposal according to the recovered information, form a proposal and start voting. When the majority of acceptors accept the proposal, the proposal is approved (chosen), and the acceptor informs the learner of this message. After further refinement of this simple process, Paxos algorithm is formed.

In a Paxos instance, each proposal needs to have a different number, and there should be a full order relationship between the numbers. This can be achieved in a variety of ways, such as splicing ordinal numbers and proposer names. How to do this is beyond the scope of Paxos algorithm.

If an acceptor who has not chosen any proposer’s proposal answers a proposer’s question about proposal n in the preparation process, but accepts another proposal with a number less than n (e.g. n-1) before voting on N, if n-1 and n have different values, the voting will violate P2C. Therefore, in the preparation process, the answer given by the acceptor should also include a commitment: it will not accept proposals with a number less than n. This is an enhancement of P1:

P1A: if and only if the acceptor has not responded to the prepare request with number greater than N, the acceptor accepts the proposal with number n.

Complete algorithm

The adoption of a resolution is divided into two stages:

  • Prepare phase:

    • The proposer selects a proposal number m and sends the prepare request to a majority (more than half of the subset) in the acceptor;
    • After the acceptor receives the prepare message, if the number m of the proposal is greater than all the prepare messages it has replied (the reply message indicates acceptance), the acceptor replies the proposal it accepted last time to the proposer (ACK) and promises not to reply to the proposal less than m;

      For example, if the current acceptor has replied to the proposals numbered 1, 2, 3, 4 and 5, the acceptor will reply to the proposal numbered 6 and promise not to receive the proposal numbered less than 6.

  • Approval stage:

    • When a proposer receives the reply from more than half of the acceptors to prepare, it will send an accept request to the acceptors replying to the prepare request, including number N and value [M, value] determined according to P2C. If there is no accepted value according to P2C, then value can be any value.
    • On the premise of not violating its commitment to other proposers, the acceptor will approve the request after receiving the accept request.

In actual operation, each proposer may generate multiple proposals, but as long as the system runs according to the current algorithm, the correctness can be guaranteed. If a proposer has generated a larger proposal number, it is a better choice to delete the proposal with a smaller number. Therefore, the acceptor receives a request with a number of N, but it finds that it has accepted a number larger than N, it will notify the proposer sending the proposal with a number of n to delete the proposal.

Issuance of resolutions

  • An obvious way is to send this message to all learners when the acceptor approves a value as a resolution. In other words, when the proposal is passed, there will be at least the number of acceptors * the number of learners, which is not very friendly to a distributed system.
  • At present, we do not consider the Byzantine general problem. Learners can communicate with other learners to obtain the adopted resolution. Therefore, the acceptor only needs to send the approved message to a designated learner (that is, it can be understood as the main learner), and other learners ask it for the adopted resolution. This method reduces the message volume, but the main learner has a single point problem, and its offline will cause system failure. Therefore, the acceptor needs to send the accept message to a subset of learners, and then these learners will notify all learners.
  • However, due to the uncertainty of message delivery, there may not be any messages that have been approved by the decision. When learners need to know about the adoption of the resolution, they can ask a proposer to make a proposal again. Note that a learner may also be a proposer.

Guarantee of sustainability

According to the above process, when a proposer finds a proposal with a larger number, it will terminate the proposal. This means that a proposal with a larger number will terminate the previous proposal process. If two proposers turn to a proposal with a larger number in this case, they may fall into a live lock, that is, they keep coming up with larger proposals, which violates the requirements of algorithm sustainability. Generally, the movable lock can pass throughRandom sleep - retryTo solve the problem.

However, a better solution in this case is to elect a leader proposer and only allow leaders to make proposals. In this way, as long as the main proposer and more than half of the acceptors can communicate, the proposal can be approved.

The solution in this case is to elect a leader and only allow the leader to make proposals. However, due to the uncertainty of message delivery, multiple proposers may think they have become leaders. This requires high availability of the leader proposer.


By analyzing the message passing model of distributed system, this paper extends Paxos algorithm. Paxos further optimizes the problems contained in 2pc and 3pc by introducing the more than half principle and the mechanism based on 2pc, and solves the problems of synchronous blocking, brain crack, infinite waiting and so on. It can be said that Paxos algorithm is an ideal implementation in distributed consensus protocol.