Consensus, linear consistency and sequential consistency

Time:2020-8-14

Etcd is a linear consistent reading, while ZK is a sequential consistent reading. With a variety of consensus and strong and weak consistent terms, it is always confusing to read. This document lists those “consistency terms” in distributed systems, citing many other articles, but there will be more examples to help understand.

What is consistency

When talking about consistency, do you think of consistency in cap theory, consistency in acid, coherence in cache conformance protocol, consensus in raft / Paxos?

Consistency has different meanings in different fields. After all, this Chinese word corresponds to different terms in English. Consistency, coherence and consensus are translated into “consistency”. Therefore, before we talk about consistency, it is necessary to make a distinction between these concepts, otherwise it is easy to be confused

coherence

Coherence only appears in the word cache coherence, which is called “cache consistency”. To study the multi-core scenario, that is, how to ensure that the CPU cache data on multiple cores is consistent, which is generally in the single machine dimension, not in the distributed domain. You can refer to this article

consensus

The accurate translation of consensus is consensus, that is, the process in which multiple proponents reach a consensus. For example, Paxos is a consensus algorithm, raft is a consensus theory, distributed system is his scenario, and consistency is his goal.

Some common misunderstandings: systems using raft or Paxos are all linearly consistent (i.e. strong consistency). In fact, consensus algorithm can only provide the basis, and more efforts need to be made on the algorithm to achieve linear consistency.

Because multiple nodes are introduced into the distributed system, the larger the size of the nodes, the downtime, network delay, and network partition will become normal. Any problem may lead to data inconsistency between nodes. Therefore, Paxos and raft are the common recognition algorithms to solve the consistency problem, which are used in distributed scenarios, not in single machine scenarios such as “cache consistency”. Therefore, many articles also referred to as “Paxos is a consistency algorithm in distributed systems”,

The meaning of consistency is broader than consensus. Consistency refers to the state that multiple copies present to the outside world. It includes sequence consistency, linear consistency and final consistency. Consensus refers to the process of reaching an agreement. However, it should be noted that consensus does not mean that consensus has been achieved. In some cases, it cannot be achieved.

Paxos and raft

Paxos is a kind of protocol. Paxos includes basic Paxos, multi Paxos, cheap Paxos and other variants. Raft is a variant of multi Paxos. By simplifying the model of multi Paxos, raft realizes a consensus algorithm that is easier to understand and implement in engineering,

Paxos is the first proven complete consensus algorithm, which can keep the nodes in the distributed network consistent even if there are no malicious nodes, that is, the Byzantine general problem. In the field of traditional distributed system, there is no need to worry about this problem, because whether it is distributed database, message queue, distributed storage, your machine will not deliberately send error messages, but the most common problem is that the nodes lose response. Therefore, Paxos is sufficient under this premise.

Replication state machine

Consensus belongs to the category of replicated state machine in terms of implementation mechanism. Replication state machine is a very effective fault-tolerant technology, which is implemented based on replication log. Each server stores a log file containing command sequence, and the state machine executes these commands in sequence. Because the commands and order in the log are the same, all nodes get the same data.

Therefore, to ensure system consistency is simplified to ensure the consistency of operation logs. This way of copying logs is widely used, such as GSF, HDFS, zookeeper and etcd.

Blockchain

Another important area of consensus algorithm is blockchain, which is popular. For example, proof of work (POW), proof of rights (POS), proof of trust (dpos), proof of confidence (POB) are all consensus algorithms. This article lists 30 kinds of consensus algorithms

As we all know, ZK and etcd are called “traditional distributed”, which means that compared with the “new distributed system” of blockchain, they are all multi node working together, but the blockchain has several special points

  1. What blockchain needs to solve is the Byzantine general problem, and consistency algorithms such as Paxos can’t resist fraudulent nodes
  2. There is no central controller in the blockchain, and no node can control or coordinate the generation of ledger data
  3. If the consensus algorithm in the blockchain fails to achieve consistency, anyone can branch hard and build another community or chain
  4. In theory, the performance of distributed system can be improved infinitely, but the blockchain can only process a few to dozens of transactions per second due to its relatively low efficiency

consistency

After introducing the coherence and consensus consensus consensus, let’s look at consistency consistency, that is, cap, base and acid that we usually talk about most.

In the simplest way, client C1 updates a value K in the system from V1 to V2, and client C2 / C3 / C4.. needs to read the latest value of K immediately

Consistency requires consistency, which is not correct. If all nodes give an "error" answer consistently, it is also called consistency

For different scenarios, the consistency requirements of user perspective are different, for example:

  • Banking system: when you deposit a sum of money at the counter, your friend transfers money to you, and your girlfriend spends money on Taobao at the same time, you may feel very confused, but you believe that your balance must be right in the end. The bank can slow down, but it will not make a mistake.
  • E-commerce system: you see a garment with a stock of 5 on Taobao, and then you place an order quickly, but you are prompted “insufficient inventory, unable to buy”. You will feel that your action is too slow and has been robbed. You don’t care why the inventory shows 5.
  • Forum station: you register a forum, you need a mobile phone verification code, click to send, there has been no response, after a day you received this message, but only small station, not registered is just.

The above is an exaggerated user situation. In actual business, consistency is hierarchical, such as strong consistency and weak consistency. How to use it depends on the specific situation and the tolerance of the system.

Strong consistency and weak consistency are just a general term, which can be divided into strong consistency and weak consistency

  • Linear consistency is also called atomicity
  • Sequential consistency
  • Causal consistency
  • Final consistency

Strong consistency includes linear consistency and order consistency, others such as final consistency are weak consistency.

For the definition of strong and weak, we can refer to the slide of Cambridge University

Strong consistency
– ensures that only consistent state can be seen.

* All replicas return the same value when queried for the attribute of an object * All replicas return the same value when queried for the attribute of an object. This may be achieved at a cost – high latency.

Weak consistency
 – for when the “fast access” requirement dominates.

* update some replica, e.g. the closest or some designated replica
* the updated replica sends up date messages to all other replicas.
* different replicas can return different values for the queried attribute of the object the value should be returned, or “not known”, with a timestamp
* in the long term all updates must propagate to all replicas …….

In a strong consistency cluster, a request to any node will get the same reply, but it will cause relatively high delay. However, weak consistency has lower response delay, but it may recover expired data. The final consistency is the weak consistency that will reach consistency after a period of time.

background

If you buy the last ticket, the two ticket offices have confirmed the existence of this ticket in some way. At this time, two ticket offices almost at the same time a passenger to buy the ticket, from their own “observation” that their own side of the passengers are the first to arrive, in this case, how can we reach a consensus on the results? It seems easy to sell to the passenger who first submitted the request in physical time.

However, for two requests from different locations, it is not easy to judge the “priority” relationship in time. The clock time of the two stations may be inconsistent, and the clock timing may be inaccurate. According to the theory of relativity, the time at different spatial positions is inconsistent. Therefore, the scheme of pursuing absolute timestamp is not feasible. What we can do is to sort the events.

This is also the core secret to solve many problems in the field of distributed systems: the globally unique sorting of multiple events occurring in different time and space, and this order must be recognized by everyone. It is only necessary to arrange the order and process them one by one, which is no different from that of a single machine (regardless of the sudden failure, only consider the consensus mechanism)

If there is a reliable physical clock, it is often easier to implement sorting. The drift rate of high-precision quartz clock is 10-7 power, and the most accurate atomic vibration clock is 10-13 power. Google has used the “TrueTime” scheme based on atomic clock and GPS in its distributed database spanner, which can control the time deviation of different data centers within 10 ms confidence interval. Without considering the cost, this scheme is simple and effective. However, the clock error of computer system is much larger, which makes it very challenging or impossible for distributed system to reach consensus order.

It costs a lot to achieve strict consistency of absolute ideals. Unless there is no fault in the system and communication between all nodes does not take any time, then the whole system is equivalent to a machine. Therefore, according to the actual demand, people may choose the consistency of different strength.

Sequential consistency

Although the intensity of linear consistency > sequential consistency, but because of the earlier occurrence of sequence consistency (1979), linearity is based on the order of strengthening (1990). So let’s first introduce sequence consistency

Sequence consistency is also a kind of strong consistency, his principle is more obscure, see here

Example 1: the following graph satisfies the order consistency, but does not satisfy the linear consistency.

Consensus, linear consistency and sequential consistency

  • The initial values of X and y are 0
  • Write (x, 4) means write x = 4, read (y, 2) is read y = 2

From the diagram, the consistency of processes P1 and P2 does not conflict. From the perspective of these two processes, the order should be as follows:

Write(y,2), Read(x,0), Write(x,4), Read(y,2)

This order is reasonable for the internal reading and writing order of the two processes, but it is not the same as the sequence seen under the global clock. From the point of view of global clock, P2 process read variable X after P1 process wrote variable x, but P2 read old data 0

Example 2:

Suppose we have a distributed kV system, the following is the sequence and result of the operation of the four processes:

–It refers to the duration. Because a write or read, the client has time from initiation to response. If the client initiates early, it may not get the data early, but it may be later because of network delay.

Case 1:

A: --W(x,1)----------------------
B:  --W(x,2)----------------------
C:                      -R(x,1)-   --R(x,2)-
D:                 -R(x,1)-      --R(x,2)--

Case 2:

A: --W(x,1)----------------------
B:  --W(x,2)----------------------
C:                      -R(x,2)-   --R(x,1)-
D:                 -R(x,2)-      --R(x,1)--

In the above cases, 1 and 2 satisfy the order consistency. The order of C and D is 1-2, or 2-1. As long as the order of CD is consistent, it is consistent. Only from the overall point of view, case 1 is more real, and case 2 appears “wrong”, because case 2 is in this order

B W(x,2) -> A W(x,1) -> C R(x,2) -> D R(x,2) -> C R(x,1) -> D R(x,1)

However, consistency does not guarantee correctness, so it is still a sequential consistency. Add three more cases:

Case 3:

A: --W(x,1)----------------------
B:  --W(x,2)----------------------
C:                      -R(x,2)-   --R(x,1)-
D:                 -R(x,1)-      --R(x,2)--

Case 3 does not belong to order consistency, because the reading order of C and D processes is different.

Back to case 2, the time when C and d get data is different and overlapped. It is possible that when C gets 1, D has already got 2, which leads to different clients obtaining different data at the same time. However, this mode is widely used in reality

For example: if you write two tweets on twitter, your operation will take a certain amount of time to penetrate into the layer by layer cache system. Different friends will see your information at different times, but each friend will see your two tweets in the same order, which will not be out of order. It’s just that a friend has seen the second one, and a friend has just seen the first one, but it doesn’t matter. He will always see two items in the right order, which is harmless.

But sometimes, the order is not satisfied. For example, 3:

Consensus, linear consistency and sequential consistency

From the timeline, we can see that B0 occurs before A0, and the read x value is 0. B2 occurs after A0, and the read x value is 1. While read operations B1, C0, C1 and write operations A0 overlap on the time axis, so they may read the old value 0 or the new value 1. Note that C1 occurs after B1 (the two do not overlap on the timeline), but B1 sees the new value of X and C1 sees the old value instead. That is, the value of X jumps back to the user.

That is to say, any read can read the latest data, consistent with the global clock. For the ratio 1, the order is consistent and the linearity is consistent

![](http://vermouth-blog-image.os…
)

Each read operation reads the latest write result of the variable. At the same time, the operation order seen by the two processes is the same as that of the global clock, which are write (y, 2), read (x, 4), write (x, 4), read (y, 2)

ZooKeeper

One way of saying is that zookeeper is final consistency, because due to multiple copies and the most successful Zab protocol, when a client process writes a new value, another client process cannot guarantee that it can read this value immediately, but it can guarantee that it can eventually read this value.

Another view is that zookeeper’s Zab protocol is similar to Paxos protocol and provides strong consistency.

However, these two statements are not accurate. The zookeeper document clearly states that its consistency is sequential consistency, that is, sequence consistency.

For the same follower in zookeeper Although some followers may not see them immediately after the request is submitted successfully (i.e. strong consistency), after synchronization between themselves and the leader, these followers must first see request1 and then request2 when they see these two requests. The two requests will not be out of order, that is, sequential consistency

In fact, the consistency of zookeeper is more complex in implementation. The read operation of zookeeper is sequential consistency, and the writing operation of zookeeper is linear. This statement is not written in the official document of zookeeper, but it is discussed in detail in the mail group of the community. This view is also mentioned in zookeeper’s paper modular composition of coordination services.

In summary, zookeeper can be understood as follows: from the overall (read operation + write operation), it is sequential consistency, and the write operation implements linearizability.

Linear consistency

Linear consistency is also called strong consistency, strict consistency and atomic consistency. It is the highest consistency model that the program can realize and the most expected consistency of distributed system users. C in cap generally refers to it

In order consistency, the process only cares about the sequence that everyone agrees with. It doesn’t need to be consistent with the global clock, and the linearity is more strict. From this partial order, the total order should be achieved

The requirements are:

  • 1. Any reading can read the latest data written in a certain data.
  • 2. All processes in the system can see the operation order in accordance with the order under the global clock.

Continue with example 3 above

B1 sees the new value of X, while C1 sees the old value. That is, the value of X jumps back to the user.

In a linear consistent system, if B1 sees the value of X as 1, then C1 must see a value of 1. Any operation at the time when the system takes effect corresponds to a point on the timeline. If we connect these moments, as shown by the purple line in the figure below, the line will always move forward along the time axis and will not bounce back in reverse. So any operation needs to be compared with each other to decide who happened in the first place and who happened in the last. For example, B1 occurs before A0 and C1 occurs after A0. In the previous sequential consistency model, we can’t compare the precedence of B1 and A0.

Consensus, linear consistency and sequential consistency

What is the embodiment of linear consistency theory in software?

Etcd and raft

As mentioned above, zookeeper is linear in writing and sequential in reading. And etcd read and write are linear consistent, that is, etcd is the strong consistency guarantee of the standard.

Etcd is based on raft and raft is a consensus algorithm. Although the relationship between consensus and consistency is very subtle and often discussed together, consensus algorithm only provides the basis. To achieve linear consistency, more efforts need to be made on the algorithm, such as library encapsulation, code implementation, etc. For example, in raft, there are two schemes for consistent reading to ensure that the leader who processes this read request must be the leader

  • ReadIndex
  • LeaseRead

There are many raft based software, such as etcd, tidb, sofajraft and so on. These softwares are based on these two ways to achieve consistent reading.

There is no description of the master selection architecture of etcd. You can see this article. Here, we will explain readindex and lease read, that is, the concrete implementation of linear consistent reading in etcd

In raft algorithm, a successful write operation only means that the log has reached an agreement (the disk has been dropped), but it does not guarantee that the current state machine has applied the log. The behavior of state machine apply log is asynchronous in most raft algorithms, so reading state machine can not accurately reflect the state of data at this time, and it is likely to read expired data.

Based on the above reasons, to achieve linear consistent reading, a relatively simple and general strategy is to record the committed index of the cluster during each read operation, and read and return the data only when the application index of the state machine is greater than or equal to the committed index. Since the state machine has applied the submitted log when the read request is initiated, the state of the state machine can reflect the state of the read request when the read request is initiated, which meets the requirements of linear consistent read. That’s itReadindex algorithm.

How to accurately obtain the committed index of a cluster? If the committed index obtained is not accurate, the readindex algorithm based on the inaccurate committed index may get the expired data.

To ensure the accuracy of the committed index, we need to:

  • Let the leader handle the read request;
  • If the follower receives a read request, forward the request to the leader;
  • Ensure that the current leader is still a leader;

The leader will send a broadcast request. If it can still receive the response from most nodes, it indicates that the leader is still the leader at this time. This is very important. Without this link, the leader may no longer be a leader due to network partition and other reasons. If the read request is still processed by the expired leader, it will be possible to read the past data.

In this way, the committed index we get from the leader is used as the readindex of this read request.

Take network partition as an example

Consensus, linear consistency and sequential consistency
As shown in the figure above:

  1. In the initial state, the cluster has five nodes: A, B, C, D and E, where a is the leader;
  2. In case of network isolation, the cluster is divided into two parts, one is a and B, the other is C, D and E. Although a will continue to send heartbeat to several other nodes, due to network isolation, C, D and e will not receive heartbeat from a. By default, a does not handle the failure of sending heartbeat to the follower node (here is the network timeout) (the protocol does not explicitly state that heartbeat is a two-way process that must receive a follower ACK);
  3. C. After the partition composed of D and E does not receive the heartbeat from the leader for a certain period of time, the selection timeout is triggered, and C becomes the leader. At this time, the original 5-node cluster is divided into two clusters due to the network partition: small cluster A and B, a is the leader; large cluster C, D and E, C is the leader;
  4. At this time, there is a client to read and write. In raft algorithm, the client can’t sense the change of the leader in the cluster (let alone the event that the server has network isolation). When a client sends a read / write request to the cluster, it usually selects one of the nodes in the cluster randomly for access. If the client chooses node C at the beginning and writes the data successfully (the cluster of node C has already committed the operation log), then for some reasons of the client (such as disconnection and reconnection), select node a for read operation. Because a does not know that the other three nodes have formed the majority of the current cluster and have written new data, node a cannot return accurate data. The client will read the expired data. However, if the client initiates a write operation to node a at this time, the write operation will fail because a cannot receive the write response from most nodes due to network isolation;
  5. In view of the above situation, in fact, the new cluster composed of nodes C, D and E is the majority of the current 5-node cluster. Read and write operations should occur in this cluster instead of the original small clusters (nodes a and b). If node a can perceive that it is no longer the leader of the cluster, then node a will no longer process read and write requests. Therefore, when the leader processes the read request, we can initiate a check quota phase: the leader sends a broadcast to all nodes in the cluster, and processes the read request if it still receives the response from most nodes. When the leader can also receive the response from most nodes in the cluster, it indicates that the leader is still the effective leader of the current cluster and has the complete data of the current cluster. Otherwise, the read request fails, which will force the client to re select a new node for read and write operations.

In this way, raft algorithm can guarantee the C and P in the cap, but can not guarantee a: when the network is partitioned, not all nodes can respond to the request, and the partitions of a few nodes will not be able to provide services, which does not conform to availability. Therefore, raft algorithm is a CP type consistency algorithm.

The method of raft to ensure the linearity of read request:

  • 1. The leader takes each read request as a log record, submits it in the form of log replication, and after applying it to the state machine, it reads the data in the state machine and returns it. (one RTT, one disk write)
  • 2. Use leader lease to ensure that there is only one leader in the whole cluster. After receiving all requests, the leader records the current commitindex as readindex. When applyindex is greater than or equal to readindex, the data in the state machine can be read and returned. (0 RTT, 0 disk write)
  • 3. Do not use leader A leader ensures that there is only one working leader in the cluster through the following two points: (1) at the beginning of each term, the newly selected leader may not know the commitindex of the previous term, so it is necessary to submit a log of null operation in the current new term first; (2) after receiving a read request, the leader sends heartbeat confirmation to most nodes Your leader identity. After that, the reading process is the same as that of leader lease. (one RTT, 0 disk write)
  • 4. Read from the follower node: the follower first asks the leader for readindex. After receiving the follower’s request, the leader still needs to confirm his leader’s identity through the method in 2 or 3, and then returns the current commitindex as readindex. After the follower gets the readindex, the follower can read the data in the state machine and return after waiting for the local applyindex to be greater than or equal to readindex. (2 or 1 RTT, 0 disk write)

Linearizability and serializability

Serializability is a concept in the field of database, while linearizability is something in the field of distributed system and concurrent programming. In this era of distributed SQL, naturally, linearizability and serializability often appear together.

  • Serializability: I in acid of database domain. The four isolation levels of database are read uncommitted, read committed (RC), repeatable read (RR) and serializable.

Serializable means that the result of scheduling the operations contained in concurrent transactions is the same as that of executing these transactions one by one. The simplest scheduling implementation is to queue all transactions and execute them one by one. Obviously, this satisfies serializability, and the problem is performance. We can see that serializability is a concept related to database transactions. A transaction contains multiple read and write operations, which involve multiple data objects.

  • Linearizability: for a single operation, a single data object. It belongs to the category of C in cap. After a data is updated, it can be read immediately by subsequent read operations.
  • Strict serializability: meet both serializability and linearizability.

Take the simplest example: two transactions T1, T2, T1 start first, update data object o, T1 commit. Then T2 starts, reading data object o and submitting. There are two kinds of scheduling:

  1. T1 and T2 meet serializability and linearity.
  2. T2 and T1 meet serializability, but not linearizability, because the data updated before T1 cannot be read by T2.

Causal consistency

Causal consistency belongs to weak consistency, because in the causal consistency, only the sequence of events with causality is required.

When there is no causal consistency, the following will happen:

  • Xia Hou Tiezhu published the status of “I lost my ring” in the circle of friends
  • Xiahou Tiezhu commented “I found it” in the same state
  • Zhuge Jianguo commented “great” in the same state
  • Keyboard man in the United States saw “I lost my ring” and “great”, and began to spray Zhuge Jianguo
  • Keyboard man in the United States saw “I lost my ring”, “I found it” and “great”, and realized that he was spraying the wrong person

Therefore, many systems use causal consistency system to avoid this problem. For example, wechat’s circle of friends adopts causal consistency. You can refer to: https://www.useit.com.cn/thre…

Consensus, linear consistency and sequential consistency

Final consistency

In the end, the word “consistency” should be heard the most times, which is also weak consistency. However, since most scenarios are acceptable to users, it is widely used.

Concept: it is not guaranteed that the same data on any node at any time is the same, but with the migration of time, the same data on different nodes always changes in the direction of convergence.

In short, after a period of time, the data between nodes will eventually reach a consistent state. However, the requirement of final consistency is very low. In addition to the agreement that explicitly takes final consistency as the selling point like gossip, including redis primary / standby, mongodb, and even MySQL hot standby, can be regarded as final consistency. Even if I record the operation log and manually execute the log on the replica after 100 days of failure of the replica to reach agreement, it is also considered to be in line with the final consistency Righteousness. Some people say that ultimate consistency is no consistency, because no one can know when it is final.

The causal consistency mentioned above can be understood as a variation of the final consistency. If process a notifies process B that it has updated a data item, the subsequent access of process B will return the updated value, and the write operation will be guaranteed to replace the previous write. Access to C that has no causal relationship with process a will follow normal final consistency rules.

In the end, there are many branches, and the following are his varieties:

  • Causal consistency
  • Read your writes consistency
  • Session consistency
  • Monotonic read consistency
  • Monotonic write consistency

The E in the base theory to be mentioned later is the ultimate consistency

Acid theory

Acid is the principle of transaction processing, which generally refers to the consistency constraint of database. Acid consistency is completely related to database rules, including constraints, cascades, triggers, etc. These invariants must be observed before and after the transaction starts and ends to ensure that the integrity of the database is not destroyed. Therefore, the C in acid represents the consistency of the state of the database before and after the transaction, so as to prevent the database from being destroyed by illegal transactions. For example, the total balance of a and B accounts in the banking system is 100. No matter how a and B are converted, the balance sum remains unchanged and consistent.

Here C stands for consistency: transactions must follow the defined rules and constraints of the database, such as constraints, cascades, and triggers. Therefore, any data written to the database must be valid, and any transaction completed will change the state of the database. No transaction can create invalid data state. Note that this is different from “consistency” defined in cap theorem.

Acid can be translated into acid, corresponding to the base, that is, base. However, before mentioning base, we should first talk about cap. After all, base is based on the compromise theory proposed by cap

Cap theory

In cap theory, C is what we often call the consistency in distributed systems. More precisely, it refers to one of the distributed consistency: linear consistency, also known as atomic consistency.

Cap theory is also a misused word. The correct definition of cap can refer to the cap FAQ. Many times we will use cap model to evaluate a distributed system, but this article will tell you the limitations of cap theory, because according to cap theory, many systems include mongodb and zookeeper Neither consistency (linear consistency) nor availability (any working node should be able to handle requests), but this does not mean that they are not excellent systems, but the limitations of cap theorem (without considering processing delay, fault tolerance, etc.).

Base theory

Because the consistency and availability of cap are strong consistency and high availability, some people put forward base theory based on cap theory, namely basic available, soft state and eventualconsistency. The core idea of base is that even if strong consistency cannot be achieved, each application can adopt appropriate methods to achieve the final consistency according to its own business characteristics. Obviously, the final consistency is weaker than the linear consistency in cap. Many distributed systems are based on “basic availability” and “final consistency” in base, such as MySQL / PostgreSQL replication asynchronous replication.

Difference between acid consistency and cap consistency

Acid consistency is related to database rules. If the data table structure defines that a field value is unique, then the consistency system will solve the problem that the field value is not unique in all operations. If a row of records with a foreign key is deleted, the foreign key related records should also be deleted, which is the meaning of acid consistency.

The consistency of cap theory is to ensure that the copies of the same data on all different servers are the same. This is a logical guarantee, not a physical one. Because of the speed limit of light, this kind of replication takes time on different servers. The cluster maintains the logical view by preventing clients from viewing the data not synchronized on different nodes.

When providing acid across distributed systems, the two concepts will be confused. Google’s spanner system can provide acid for distributed systems, which includes acid + cap design, that is, two-stage commit 2pc + multi replica synchronization mechanism (such as Paxos)

Consensus, linear consistency and sequential consistency

Acid / 2pc / 3pc / TCC / Paxos relationship

Acid is the principle of transaction processing, which limits atomicity, consistency, isolation and persistence. Acid, cap and base are only theories, which are only the goals or compromises in the implementation. Acid focuses on distributed transactions, while cap and base are general distributed theories.

There are 2pc, 3pc, TCC and other ways to solve the distributed transaction. By adding a coordinator to negotiate, there is also the idea of final consensus.

Paxos protocol and distributed transaction are not at the same level. Paxos is used to solve the problem of consistency among multiple replicas. For example, log synchronization ensures the log consistency of each node and the uniqueness of the primary node. In short, 2pc is used to ensure the atomicity of transactions on multiple data slices, and Paxos protocol is used to ensure the consistency of the same data fragment in multiple copies. Therefore, the relationship between the two can be complementary rather than substitute. For the 2pc coordinator single point problem, Paxos protocol can be used to solve. When the coordinator has problems, select a new coordinator to continue to provide services. Paxos is similar to 2pc in principle, but different in purpose. Etcd also has transaction operations, such as mini transactions

reference resources

You can also look at mit-6.824 for raft

  • https://wudaijun.com/2018/09/…
  • https://zhuanlan.zhihu.com/p/…
  • https://www.itcodemonkey.com/…
  • http://zookeeper.apache.org/d…
  • http://comments.gmane.org/gma…
  • https://zhengyinyong.com/post…
  • https://www.sofastack.tech/bl…
  • https://feilengcui008.github….
  • http://codefever.github.io/20…
  • https://www.useit.com.cn/thre…
  • https://blog.csdn.net/chao201…
  • https://lentil1016.cn/consist…
  • https://www.jdon.com/artichec…