Distributed things


Distributed transaction solution

In the case of a single database, data transaction operation has four characteristics of acid, but if multiple databases are operated in one transaction, database transaction cannot be used to ensure consistency.

That is to say, when two databases operate data, one database operation may succeed while the other fails. We cannot roll back two data operations through a single database transaction.

Distributed transaction is to solve the problem of inconsistent database operation data of different nodes under the same transaction. When a transaction requests multiple services or database nodes, either all requests succeed or all requests fail and roll back. Generally, there are many ways to implement distributed transactions, such as second-order commit (2pc), third-order commit (3pc) and TCC compensatory transaction.

Before we know 2pc and 3pc, we need to know XA protocol first. Xa protocol is a distributed transaction processing specification proposed by X / open organization. At present, only InnoDB storage engine in MySQL supports XA protocol.

Xa specification

Before the XA specification, there was a DTP Model, which standardized the model design of distributed transactions.

DTP specification mainly includes AP, RM and TM. AP is the application program, which is the place where the transaction starts and ends; RM is the resource manager, which is mainly responsible for managing the connection data source of each database; TM is a transaction manager, which is responsible for the global management of transactions, including transaction lifecycle management and resource allocation and coordination.

Distributed things

Xa standardizes the communication interface between TM and RM, forming a two-way communication bridge between TM and multiple RMS, thus ensuring the four characteristics of acid in multiple database resources.

Second order commit (2pc)

The distributed transaction implemented by XA specification belongs to the second-order commit transaction. As the name suggests, the transaction is committed through two stages.

The first stageThe application initiates a transaction request to the transaction manager (TM), and the transaction manager sends a transaction pre-processing request to each participating Resource Manager (RM). At this time, these resource managers will open the local database transaction, and then start to execute the database transaction, but the transaction will not be committed immediately after the execution, Instead, it returns the ready or not ready state to the transaction manager. If each participating node returns to the state, it will enter the second phase.

Distributed things

The second stageIf the resource manager returns the ready status, the transaction manager will send a commit notification to each resource manager, the resource manager will complete the transaction commit of the local database, and finally return the commit result to the transaction manager.

Distributed things

In the second stage, if any resource manager returns to the not ready state, the transaction manager will send a rollback notification to all resource managers. At this time, each resource manager will roll back the local database transaction, release the resources, and return the result notification.

Distributed things

The idea of the two-stage commit algorithm can be summarized as follows: the coordinator sends the request transaction operation to the participant, the participant receives the request, carries on the related operation and notifies the coordinator of the operation result, and the coordinator decides whether the participant wants to submit the operation or cancel the operation according to the feedback result of all participants.

Second order transaction commit also has some defects.

First, in the whole process, we will find that each resource manager node is blocked. Only when all nodes are ready to complete, the transaction manager will send out the notification of global transaction submission. If the process is very long, many resource manager nodes will occupy resources for a long time, which will affect the performance of the whole node. Only the coordinator has a time-out mechanism. In the second stage, if the resource manager (RM) has not received the “rollback” or “commit” operation from the transaction manager (TM), the resource manager will block all the time.

Second, there is still the possibility of data inconsistency. For example, when the global transaction is submitted, some nodes may not receive the notification due to network failure. Because these nodes do not submit the transaction, data inconsistency will occur.

Third, single point of failure. Once the transaction manager fails, the whole system will be stagnant. Especially in the commit phase, once the transaction manager fails, the resource manager will lock the transaction resources all the time because of waiting for the message from the transaction manager, resulting in the blocking of the whole system.

Third level commit (3pc)

The three-stage commit protocol includes cancommit, precommit and docommit. Let’s take a look at these three stages.

First, cancommit stage.

The coordinator sends a cancommit request to the participant, asking whether the participant can perform the transaction commit operation, and then waiting for the response of the participant; After receiving the cancommit request, the participant replies yes, indicating that the transaction can be executed successfully; Otherwise, reply No.

Second, the precommit stage.

According to the response of participants in the first stage (cancommit), the coordinator decides whether the precommit operation can be carried out (pre delivery stage).

  1. If all participants reply “yes”, the coordinator sends a precommit request to the participants to enter the precommit phase.

  2. After receiving the precommit request, the participant performs the transaction operation, and records the undo and redo information in the transaction log. If the participant successfully executed the transaction operation, it returns an ACK response and starts to wait for the final instruction.

  3. If any participant sends a “no” message to the coordinator, or the coordinator does not receive a response from the participant after waiting for a timeout, the transaction is interrupted: the coordinator sends an “abort” message to all participants, and the participant interrupts the transaction after receiving the “abort” message;

    If the participant fails to receive the coordinator’s precommit message after the timeout, the transaction will also be interrupted.

The pre delivery phase ensures that the status of all participants is consistent before the final submission phase (docmmit phase).

Third, the docommit stage.

In the docmmit phase, the real transaction is committed, and according to the message sent by the coordinator in the precommit phase, it enters the execution commit phase or the transaction interruption phase.

Implementation submission phase: if the coordinator receives the ACK response from all participants, it sends a docommit message to all participants and starts the execution phase. After the participant receives the docommit message, the transaction is formally committed. After the transaction is committed, all locked resources are released and an ACK response is sent to the coordinator. After the coordinator receives the ACK responses from all participants, the transaction is completed.

Transaction interruption phase:The coordinator sends abort requests to all participants. After the participant receives the abort message, it uses the undo information recorded in the precommit phase to perform the rollback operation of the transaction, releases all locked resources, and sends the ACK message to the coordinator. After receiving the ACK message from the participant, the coordinator interrupts the transaction and ends the transaction.

3pc divides the preparation phase of 2pc into preparation phase and preprocessing phase. In the first phase, it only asks each resource node whether it can execute the transaction. In the second phase, all nodes start to execute the transaction only when they can feedback that they can execute the transaction. Finally, in the third phase, it performs the commit or rollback operation.In the third stage, if the resource manager can not receive the commit or rollback request from the transaction manager, it will continue to commit the transaction after the timeout.

Therefore, 3pc can avoid the long-term blocking problem caused by the transaction manager hanging up through the timeout mechanism, but in fact, it still can’t solve the problem that some nodes can’t be notified due to the network failure when the global transaction is finally submitted, especially the rollback notification, which will lead to the transaction waiting timeout and default submission.

Transaction compensation mechanism (TCC)

The above transaction commit based on XA specification, due toPerformance issues such as congestionIt has a more obvious effectLow performance, low throughputThe characteristics of the system. So it is difficult to meet the concurrent performance of the system.

In addition to performance problems, JTA can only solve the distributed transaction problem of operating multiple data sources under the same service. Under the microservice architecture, there may be the same transaction operation, connecting data sources on different services and submitting database operations.

TCC is a distributed transaction solution to solve the above problems. TCC implements a flexible distributed transaction in the way of final consistency. Different from the second-order transaction implemented by XA specification, the implementation of TCC is a second-order transaction commit based on service layer.

TCC is divided into three stages, namely try, confirm and cancel.

Distributed things

  • Try stage: it mainly tries to execute business and try methods in various services, mainly including reservation operation;
  • Confirm phase: confirm the successful execution of each method in try, and then call the confirm method of each service through TM. This phase is the submit phase;
  • Cancel phase: when one of the try methods fails in the try phase, such as failure to reserve resources, code exception, etc., TM will be triggered to call the cancel method of each service, roll back the global transaction, and cancel the execution of business.

The above execution only guarantees successful or failed commit and rollback operations in try phase. You will surely think about how TCC will handle if there are exceptions in confirm and cancel phases? At this time, TCC will try to call the failed confirm or cancel method again and again until it succeeds.

However, TCC compensatory transaction also has obvious disadvantages, that is, it is very invasive to business.

First, we need to consider reserving resources in business design; Then, we need to write a lot of business code, such as try, confirm and cancel methods; Finally, we need to consider idempotency for each method. The implementation and maintenance cost of this kind of transaction is very high, but generally speaking, this kind of implementation is the most commonly used distributed transaction solution.

Reliable message final consistency

The core idea of 2pc and 3pc is to implement distributed transactions in a centralized way, both of which have two common disadvantages: one is synchronous execution and poor performance; Second, data inconsistency. In order to solve these two problems, distributed message is used to ensure the final consistency of transactions.

In the distributed system architecture of eBay, the core idea for architects to solve the consistency problem is to asynchronously execute the transactions that need to be distributed through messages or logs. Messages or logs can be saved to local files, databases or message queues, and then they can be retried through business rules.

Based on the transaction processing of the final consistency scheme of distributed message, a message middleware (in this case, we use message queue, MQ, message queue) is introduced to deliver messages among multiple applications. In practice, Alibaba uses rocketmq mechanism to support message transaction.

Atomicity of local transaction and message sending

Atomicity of local transaction and message sending: after the local transaction is successfully executed, the message must be sent out by the transaction initiator, otherwise the message will be discarded. That’s the truth

At present, the atomicity of local transaction and message sending either succeeds or fails. The atomicity of local transaction and message sending is the most important problem to realize reliable message

The key problem of final consistency scheme.

The first scheme:First send the message, then operate the database

begin transaction;

In this case, the consistency between the database operation and the sending message cannot be guaranteed, because the message may be sent successfully but the database operation fails

The second option is to:First operate the database, then send the message

begin transaction;

If the message fails and an exception is thrown, the database transaction can be rolled back.

If the message is sent with a timeout exception, throwing the exception will roll back the database transaction. However, the message has been successfully sent, which will also lead to data inconsistency


Local message table scheme

The scheme of local message table was originally proposed by eBay. The core of this scheme is to ensure the consistency of data business operations and messages through local transactions, and then

The message is sent to the message middleware through the timing task, and the message is deleted after the confirmation message is sent to the consumer.

The purpose of “sending a message” is to inform another system or module to update the data,The “transaction” in message queue mainly solves the problem of data consistency between message producers and message consumers.

Take the familiar e-commerce as an example. Generally speaking, when users shop on the e-commerce app, they first add the goods to the shopping cart, then place an order for several goods together, and finally pay to complete the shopping process. Then they can happily wait for the goods to be received.

There is a message queue step in this process. After the order system creates an order, it sends a message to the shopping cart system to delete the ordered goods from the shopping cart. Because the step of deleting the ordered goods from the shopping cart is not a necessary step in the main process of order payment, it is more reasonable to use message queue to clean the shopping cart asynchronously.

Distributed things

Take the above figure as an example: there are two micro service interactions: order service and shopping cart service. The order service is responsible for adding and creating orders, and the shopping cart service is responsible for cleaning up shopping carts.

The interaction process is as follows:

  1. Create order

    Order service transacts locallyCreate orderAnd increase “Cleaning up the shopping cart message log“。( Order table and message table are consistent through local transaction)

    Here’s the pseudo code

    begin transaction;

    In this case, localStore orderOperation andStore cleanup cart message logIn the same transaction, the two operations are atomic.

  2. Scheduled task scan log

    How to ensure that messages are sent to message queues? After the first step, the message has been written to the message log table. You can start an independent thread to scan the message in the message log table regularly and send it to the message middleware. After the message middleware reports that the message is sent successfully, delete the message log. Otherwise, wait for the next cycle of the timing task to try again.

  3. Consumer News

    How to ensure that consumers will be able to consume information? Here, MQ’s ack (message confirmation) mechanism can be used. Consumers listen to MQ. If consumers receive messages and send ack (message confirmation) to MQ after business processing is completed, it means that consumers normally consume messages. MQ will no longer push messages to consumers, otherwise consumers will try again and again to send messages to consumers.

    Shopping cart service received”Cleaning up the shopping cart“Message, start to clean the shopping cart, and respond ack to the message middleware after the shopping cart is cleaned successfully, otherwise the message middleware will repeatedly post this message. As the message will be delivered repeatedly, the shopping cart service is difficult”Cleaning up the shopping cart“Functions need to be idempotent.

Message queue for distributed transaction

Transaction message needs message queue to provide corresponding functions. Kafka and rocketmq both provide transaction related functions.

Back to the order and shopping cart examples, let’s look at how to implement distributed transactions with message queues.

Distributed things

First, the order system opens a transaction on the message queue. Then the order system sends a “half message” to the message server. The half message does not mean that the message content is incomplete, but it contains the complete message content. The only difference between the half message and the ordinary message is that the message is invisible to the consumer before the transaction is submitted.

After half message is sent successfully, the order system canPerform local transactionsThen, create an order record in the order library and submit the database transaction of the order library. Then the transaction message is committed or rolled back according to the execution result of the local transaction. If the order is created successfully, the transaction message is submitted, and the shopping cart system can consume the message and continue the subsequent process. If the order creation fails, the transaction message is rolled back and the shopping cart system will not receive this message. In this way, the consistency requirement of “either all succeed or all fail” is basically realized.

If you are careful enough, you may have found that in the process of implementation, there is a problem that has not been solved. What if you fail to commit the transaction message in step 4? Kafka and rocketmq give two different solutions to this problem.

Kafka’s solution is relatively simple and crude. It throws an exception directly and lets users handle it by themselves. We can try to submit again and again in the business code until the submission is successful, or delete the order created before for compensation. Rocketmq provides another solution.

In the transaction implementation of rocketmq, the mechanism of transaction reverse query is added to solve the problem of transaction message submission failure. If a network exception occurs when the producer, that is, the order system, submits or rolls back the transaction message, and the broker of rocketmq does not receive the request to submit or roll back, the broker will regularly go to the producer to check the status of the local transaction corresponding to the transaction, and then decide to submit or roll back the transaction according to the result of the check.

In order to support this transaction anti query mechanism, our business code needs to implement an interface to anti query the local transaction status, which tells rocketmq whether the local transaction is successful or failed.

In our example, the logic of back checking local transactions is also very simple. We only need to query whether the order exists in the order library according to the order ID in the message. If the order exists, success will be returned, otherwise failure will be returned.Rocketmq will automatically submit or roll back transaction messages according to the result of transaction reverse query.

The implementation of this reverse query local transaction does not depend on the sender of the message, that is, any data on an instance node of the order service. In this case, even if the order service node sending the transaction message is down, rocketmq can still perform the reverse query through other order service nodes to ensure the integrity of the transaction.

Based on the above-mentioned implementation of general transaction message and rocketmq’s transaction anti query mechanism, the flow chart of using rocketmq transaction message function to realize distributed transaction is as follows:

Distributed things

Rigid transaction and flexible transaction

Rigid transaction follows acid principle and has strong consistency. For example, database transactions.

Flexible transaction, in fact, is to use different methods to achieve the final consistency according to different business scenarios, that is to say, we can make some trade-offs according to the characteristics of the business, and tolerate data inconsistency in a certain period of time.

In summary, unlike rigid transactions, flexible transactions allow data inconsistency for a certain period of time, but require final consistency. The final consistency of flexible transaction follows the base theory.

Dan Pritchett, an engineer of eBay, once proposed a design pattern of distributed storage system base theory. Base theory includes basic available, soft state and eventual consistency.

  • Basic availability: when the distributed system fails, it is allowed to lose the availability of some functions to ensure the availability of core functions. For example, some e-commerce 618 will degrade the functions of some non core links.
  • In flexible transactions, the system is allowed to have an intermediate state, which does not affect the overall availability of the system. For example, if the database is read-write separated, there will be a delay when the write database is synchronized to the read database (the master database is synchronized to the slave database), which is actually a flexible state.
  • Final consistency: in the process of transaction operation, inconsistency may be caused by synchronization delay, but in the final state, all data are consistent.

In order to support large-scale distributed system, base theory obtains high availability by sacrificing strong consistency and ensuring final consistency, which weakens acid principle. Acid and base are different results of the trade-off between consistency and availability, but both guarantee the persistence of data. Acid chooses strong consistency instead of system availability. Different from the acid principle, the base theory ensures the availability of the system, allowing data to be inconsistent for a period of time, and finally reaching the consistent state, that is, sacrificing part of the data consistency and choosing the final consistency.

The two-stage and three-stage submission methods follow the acid principle, while the final message consistency scheme follows the base theory.


This work adoptsCC agreementReprint must indicate the author and the link of this article