14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

Time:2021-7-13

preface

From CPU to memory, to disk, to operating system, to network, there are unreliable factors in computer system. Engineers and scientists try to use various software and hardware methods to fight against this unreliable factor and ensure that data and instructions are processed correctly. In the field of network, there are TCP reliable transmission protocol, RAID5 and raid6 algorithms in the field of storage, and transaction mechanism based on Aries algorithm in the field of database

This article first introduces the acid characteristics of single database transaction, and then points out the difficulties faced by operating multiple data sources in the distributed scenario, and leads to the commonly used distributed transaction solutions in the distributed system. These solutions can ensure that the business code can have the acid characteristics when operating multiple data sources, just like operating a single data source. At the end of the paper, the implementation of Seata’s at mode global transaction, which is a mature distributed transaction framework in the industry, is given.

1、 Single data source transaction & multi data source transaction

If an application only connects to and queries a specific database through the connection driver and data source interface in a business flow, the application can make use of the transaction mechanism provided by the database (if the database supports transactions) to ensure the reliability of the operation on the records in the database, There are four semantics of reliability

  • Atomicity, a
  • Consistency, C
  • Isolation, I
  • Persistence, D

The author will not explain these four semantics here. Understanding single data source transaction and its acid characteristics is the premise for readers to read this article. It is a complex and delicate process for a single database to realize its own transaction characteristics. For example, the InnoDB engine of MySQL is implemented by Undo log + redo log + Aries algorithm. This is a grand topic, not in the scope of this paper. Readers can study it if they are interested.

A single data source transaction can also be called a stand-alone transaction or a local transaction.

In the distributed scenario, a system is composed of multiple subsystems, and each subsystem has its own data source. Multiple subsystems are transferred to each other to combine more complex services. In the current popular microservice system architecture, each subsystem is called a microservice, and each microservice maintains its own database to maintain independence.

For example, an e-commerce system may consist of shopping micro service, inventory micro service, order micro service, etc. Shopping microservice integrates shopping business by calling inventory microservice and order microservice. When the user requests the shopping micro service provider to complete the order, the shopping micro service, on the one hand, calls the inventory micro service to deduct the inventory quantity of the corresponding goods, on the other hand, calls the order micro service to insert the order record (for the convenience of describing the distributed transaction solution later, here is the simplest e-commerce system micro service partition and the simplest shopping business process, Subsequent payment, logistics and other businesses are not considered). The e-commerce system model is shown in the following figure:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

In the business scenario of user shopping, the business of shopping service involves two databases: repo_ DB) and repo_ In other words, G shopping business is composed of calling multiple data sources. As a consumer oriented system, e-commerce system should ensure the high reliability of shopping business. The reliability here also has four semantics of acid.

However, the local transaction mechanism of a database only works on its own query operation (the query here is generalized, including adding, deleting, modifying, etc.) and cannot interfere with the query operation of other databases. Therefore, the local transaction mechanism provided by the database itself can not ensure the reliability of the global operation of multiple data sources.

Based on this, the distributed transaction mechanism for multi data source operation appears.

A distributed transaction can also be called a global transaction.

2、 Common distributed transaction solutions

2.1 distributed transaction model

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

To describe distributed transactions, the following terms are often used:

  • Transaction participant: for example, each database is a transaction participant
  • Transaction coordinator: a service program that accesses multiple data sources. For example, shopping service is the transaction coordinator
  • Resource Manager (RM): usually synonymous with transaction participants
  • Transaction manager (TM): usually synonymous with transaction coordinator

In the distributed transaction model, a TM manages multiple RMS, that is, a service program accesses multiple data sources; TM is a global transaction manager, which coordinates the progress of multi-party local transactions, makes them commit or roll back together, and finally achieves a global acid feature.

2.2 two general problem and idempotency

The second general problem is a classic problem in the field of network, which is used to express the delicacy and complexity of the design of interconnection protocol in computer network. A simplified version of the two general problem is given here

A White army was besieged in a valley with blue troops on both sides. The number of the White army trapped in the valley is more than any blue army on either side of the valley, but less than the sum of the two blue armies. If a blue army attacks the White army alone, it will surely lose; But if two Blues attack at the same time, they can win. The commander-in-chief of the two blues is located on the left side of the valley. He hopes that the two Blues will attack at the same time. In this way, he will send the order to the blues on the right side of the valley to tell the specific time of the attack. Assuming that they can only send soldiers across the valley where the White army is located (the only communication channel) to deliver messages, the soldiers may be captured when they cross the valley.

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

The commander-in-chief can confirm the victory of the war only after the successful return of the messenger soldiers (picture above). Now the question is, if the soldier sent out to deliver the message doesn’t come back, can the commander in chief of the left blue army decide to launch the attack at the time agreed in the order?

The answer is not sure. The soldier sent out to deliver the letter did not come back. He may encounter two situations:

1) He was captured before the order was delivered (middle picture). At this time, the blue army on the right didn’t know when to attack;

2) The order arrived, but was captured on the way back (below). At this time, the right blue army knew when to attack, but the left blue army did not know whether the right blue army knew the attack time.

Similar problems are common in computer networks. For example, the sender sends an HTTP request to the receiver, or the MySQL client sends an insert statement to the MySQL server, and then it times out and does not get a response. Is the server writing successful or failed? The answer is not sure

1) It is possible that the request was not sent to the server due to network failure, so the write failed;

2) It is possible that the server receives and writes successfully, but the server is down before sending a response to the client;

3) Maybe the server received it, wrote it successfully, and sent a response to the client, but it was not sent to the client due to network failure.

In either scenario, the client sees the same result: its request is not responded. In order to ensure that the server successfully writes data, the client can only resend the request until it receives the response from the server.

A similar problem is called network two general problem.

The existence of network two-dimensional problem makes the sender often send the message repeatedly, until receiving the confirmation from the receiver, it is considered that the message is sent successfully, but this often leads to the repeated sending of the message. For example, when the order module in the e-commerce system calls the payment module to deduct money, if the network failure causes the second general problem and the deduction request is sent repeatedly, the repeated deduction result is obviously unacceptable. Therefore, to ensure that no matter how many times a deduction request is sent in a transaction, the receiver has and only performs one deduction action. This guarantee mechanism is called idempotence of the receiver.

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

2.3 two stage submission (2pc) & three stage submission (3pc) scheme

2pc is a simple model to implement distributed transaction

1) Preparation stage: the transaction coordinator sends a query request to each transaction participant: “I’m going to execute a global transaction. The resources involved in this transaction are distributed in your data sources. They are… And you prepare your own resources (that is, you execute local transactions to the stage to be submitted). Each participant coordinator replies Yes (indicating that it is ready to commit global transactions) or no (indicating that the participant cannot get the local resources required by the global transaction because it is locked by other local transactions) or times out.

2) Commit phase: if all participants reply yes, the coordinator initiates transaction commit operation to all participants, and then all participants execute local transaction commit operation and send ack to the coordinator after receiving it; If any participant replies no or times out, the coordinator initiates transaction rollback operation to all participants, and then all participants execute local transaction rollback operation and send ack to the coordinator after receiving it.

The flow chart of 2pc is as follows:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

As can be seen from the figure above, to implement 2pc, all participants need to implement three interfaces:

  • Prepare(): TM calls the interface to ask if each local transaction is ready
  • Commit (): TM calls the interface to require each local transaction to commit
  • Rollback(): TM calls the interface and requires local transactions to be rolled back

These three interfaces can be simply (but not strictly) understood as XA protocol. Xa protocol is a distributed transaction processing standard proposed by X / open. Mysql, Oracle and DB2 all implement XA protocol, so they can be used to implement 2pc transaction model.

2pc is easy to understand, but it has the following problems

1) The performance is poor. In the preparation phase, all participants have to wait for their return before they can enter phase 2. During this period, the related resources on each participant are locked exclusively, and the local transactions on the participants who intend to use these resources can only wait. Because of this synchronization blocking problem, the local transaction concurrency of each participant is affected;

2) After the completion of the preparation phase, if the coordinator is down, all participants will not receive the submit or rollback instructions, which will cause all participants to be “at a loss”;

3) In the submission phase, the coordinator sends a submission instruction to all participants. If a participant does not return an ACK, the coordinator does not know what is going on inside the participant; It may also be received and the local commit is successfully executed, but the returned ack is not sent to the coordinator due to network failure), so it is impossible to decide whether to roll back all participants in the next step.

3pc appeared after 2pc, which changed the two-stage process into three-stage process, namely: inquiry stage, preparation stage, submission or rollback stage, which will not be detailed here. 3pc uses the timeout mechanism to solve the synchronization blocking problem of 2pc, avoid the resource being locked permanently, and further enhance the reliability of the whole transaction process. However, 3pc is also unable to cope with similar downtime problems, except that the probability of data inconsistency in multiple data sources is smaller.

In addition to the performance and reliability problems, 2pc is also limited in its application scenarios. It requires the participants to implement the XA protocol. For example, the database that implements the XA protocol can be used as the participants to complete the 2pc process. But when multiple system services use API interface to call each other, they don’t comply with XA protocol. At this time, 2pc is not applicable. So 2pc is rarely used in distributed application scenarios.

Therefore, 2pc cannot be used in the e-commerce scenario mentioned above, because shopping service calls repo service and order service through RPC interface or rest interface to access repo indirectly_ DB and order_ db。 Unless shopping service configures repo directly_ DB and order_ DB as its own database.

2.4 TCC scheme

The e-commerce micro service model used to describe the TCC solution is shown in the figure below. In this model, shopping service is the transaction coordinator, and repo service and order service are the transaction participants.

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

As mentioned above, 2pc requires the participants to implement the XA protocol, which is usually used to solve the transaction problems between multiple databases. When multiple system services use API interface to call each other, they don’t comply with XA protocol. At this time, 2pc is not applicable. Modern enterprises mostly use distributed micro services, so it is more to solve the problem of distributed transactions among multiple micro services.

TCC is a solution to the problem of distributed transactions among multiple microservices. TCC is the abbreviation of try, confirm and cancel. Its essence is a 2pc on the application level, which is also divided into two stages

1) Stage one: preparation. The coordinator calls all try interfaces provided by each microservice to lock the resources involved in the whole global transaction. If the locking is successful, the try interface returns yes to the coordinator.

2) Stage two: submission stage. If the try interface of all services returns yes in phase one, the commit phase will be entered, and the coordinator will call the confirm interface of all services, and each service will commit the transaction. If the try interface of any service returns no or times out in phase one, the coordinator calls the cancel interface of all services.

The process of TCC is as follows:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

Here is a key problem. Since TCC is a 2pc on the service level, how can it solve the defect that 2pc can’t cope with the problem of downtime? The answer is to try again and again. Because try operation locks all the resources involved in the global transaction and ensures that all the preconditions of business operation are met, no matter whether the confirm phase fails or the cancel phase fails, it can be retried continuously until confirm or cancel succeeds (success means that all services return ack to confirm or cancel).

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

There is also a key problem here. In the process of constantly retrying confirm and cancel (considering the existence of network two generals problem), it is possible to repeat confirm or cancel. Therefore, it is necessary to ensure that the operations of confirm and cancel are idempotent, that is, in the whole global transaction, each participant only performs confirm or cancel once. There are many solutions to realize the idempotency of confirm and cancel operations. For example, each participant can maintain a de duplication table (which can be realized by using database tables or memory kV components), and record whether each global transaction (distinguished by global transaction flag XID) has carried out confirm or cancel operations. If it has, it will not be repeated.

TCC, proposed by Alipay team, is widely used in financial system. When we purchase funds with bank account balance, we will notice that the part of bank account balance used to purchase funds will be frozen first. Therefore, we can guess that this process is probably the first stage of TCC.

2.5 transaction status table scheme

In addition, there is a transaction solution similar to TCC, which is implemented with the help of transaction state table. Suppose we want to implement two processes in a distributed transaction: calling repo service to deduct inventory and calling order service to generate order. In this scheme, the coordinator shopping service maintains a transaction status table as follows:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

The initial state is 1, and the state is updated every time a service is successfully called. Finally, all services are successfully called, and the state is updated to 3.

With this table, you can start a background task to scan the status of transactions in this table. If a distributed transaction fails to reach state 3 (setting a transaction cycle threshold), it means that the transaction has not been successfully executed. Then you can call repo service again to deduct inventory and order service to generate orders. Until all the calls are successful, the transaction state is 3.

If the state of the transaction is still 3 after repeated retries, the transaction state can be set to error and intervened by human intervention.

Since there are service call retries, the interface of each service should be idempotent according to the global distributed transaction ID. the principle is the same as the idempotent implementation in Section 2.4.

2.7 final consistent transaction scheme based on Message Oriented Middleware

No matter 2pc & 3pc or TCC or transaction state table, they basically follow the idea of XA protocol, that is to say, these schemes are essentially the transaction coordinator coordinating the progress of local transactions of each transaction participant, making all local transactions commit or roll back together, and finally achieving a global acid feature. In the process of coordination, the coordinator needs to collect the current state of each local transaction, and issue the next stage operation instructions according to these states.

However, these global transaction schemes have complicated operation, large time span, or exclusive locking of related resources during the global transaction, which makes the global transaction concurrency of the whole distributed system not too high. It is difficult to meet the transaction throughput requirements of high concurrency scenarios such as e-commerce. Therefore, Internet service providers have explored many distributed transaction solutions that run counter to XA protocol. The final consistent global transaction implemented by message oriented middleware is a classic solution.

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

In order to show the essence of this solution, I will use the following e-commerce system micro service architecture to describe it:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

In this model, the user no longer requests the integrated shopping service to place an order, but directly requests the order service to place an order. On the one hand, order service adds order records, on the other hand, it calls repo service to deduct inventory.

This kind of final consistent transaction scheme based on message oriented middleware is often misunderstood as the following implementation mode:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

The implementation process is as follows:

1) Order service is responsible for sending inventory deduction message (repo) to MQ server_ deduction_ msg); Repo service subscribes to the inventory deduction message in MQ server and is responsible for consumption message.

2) After the user places an order, the order service first executes the query statement to insert the order record, and then repo_ deduction_ When MSG is sent to the message middleware, these two processes are carried out in a local transaction. Once the “execute query statement of inserting order record” fails, the transaction will be rolled back and “repo” will be executed_ deduction_ If MSG is sent to message middleware, it will not happen; Similarly, once “will repo”_ deduction_ If MSG is sent to message middleware, it will fail and throw an exception, which will also cause the operation of “executing the query statement of inserting order record” to be rolled back, and nothing will happen in the end.

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

3) Repo service received repo_ deduction_ After MSG, the inventory deduction query statement is executed first, and then the message consumption completion ack is fed back to MQ sever. These two processes are carried out in a local transaction. Once the “execute inventory deduction query statement” fails, resulting in transaction rollback, “feedback message consumption completion ack to MQ sever” will not occur, Driven by the confirm mechanism, MQ server will continue to push the message to repo service until the whole transaction is successfully submitted; Similarly, once “feedback message consumption completion ack to MQ sever” fails, an exception will be thrown, and the operation that causes “execute inventory deduction query statement” will be rolled back. Driven by the confirm mechanism, MQ server will continue to push the message to repo service until the whole transaction is submitted successfully.

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

This seems very reliable. However, it does not take into account the existence of network two generals, which has the following defects:

1) There is a problem of network 2. In step 2 above, order service sends repo_ deduction_ For the sender order service, the message middleware may not receive the message; It may also be that the middleware receives the message, but the ACK responding to the sender’s order service is not received by the order service due to network failure. Therefore, it is wrong for the order service to rashly roll back the transaction and cancel the “execute the query statement inserted into the order record”, because the repo service may have received a repo_ deduction_ MSG and successfully made inventory deduction, so the data inconsistency between order service and repo service is caused.

2) Repo service and order service put network calls (communicating with MQ server) in local database transactions, which may cause long database transactions due to network delay and affect the concurrency of local database transactions.

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

The above is the misunderstood implementation, and the correct implementation is given below, as follows:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

In the scheme shown in the figure above, message middleware such as rabbitmq is used to realize the final consistency of distributed order placing and inventory deduction process. Explain this picture as follows:

1) In order service,

At t_ Add order record to order table&&

At t_ local_ MSG add corresponding deduction inventory message

These two processes should be completed in one transaction to ensure the atomicity of the process. Similarly, in repo service,

Check whether the inventory deduction operation has been executed&&

If the deduction has not been carried out this time, the inventory will be deducted&&

Write weight table&&

Feedback message consumption to MQ sever and complete ACK

These four processes should also be completed in one transaction to ensure the atomicity of the process.

2) There is a background program in the order service, which continuously transmits the messages in the message table to the message middleware. After the success, the corresponding messages in the message table are deleted. If it fails, it will try to retransmit again and again. Because of the network 2 problem, when the message sent by the order service to the message middleware is out of time, the message middleware may receive the message but fail to respond to the ACK, or it may not receive it. The order service will send the message again until the message middleware responds to the ACK successfully. In this way, the message may be sent repeatedly, but it doesn’t matter, As long as the message is not lost and out of order, the repo service will do de reprocessing later.

3) Message middleware pushes repo to repo service_ deduction_ MSG, after the repo service successfully processes the message, it will respond to the ack to the middleware. After receiving the ACK, the message middleware will consider that the repo service successfully processes the message, otherwise it will repeatedly push the message. However, there is such a situation: repo service successfully processes the message, and the ACK sent to middleware is lost due to network failure during network transmission, resulting in middleware not receiving ACK and pushing the message again. This also depends on the message de duplication feature of repo service to avoid repeated consumption of messages.

4) In 2) and 3), there are two reasons for repo service to receive messages repeatedly. One is the producer’s repeated production, and the other is the middleware’s retransmission. In order to achieve the idempotency of business, repo service maintains a duplicate table, which records the ID of the successfully processed message. Every time the repo service receives a new message, it first determines whether the message has been successfully processed, and if so, it will not repeat the processing.

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

Through this design, the message will not be lost at the sender, and the message will not be consumed repeatedly at the receiver. In combination, the message will not be leaked or duplicated, and the final consistency of the data in the two databases of order service and repo service is strictly realized.

The final consistent global transaction scheme based on message middleware is an innovative application mode explored by Internet companies in high concurrency scenarios. MQ is used to realize asynchronous call, decoupling and traffic peak clipping between microservices, support high concurrency of global transactions, and ensure the final consistency of distributed data records.

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

3、 Implementation of Seata in at mode

In Chapter 2, the common theoretical models of centralized transaction are given. This chapter presents the implementation of the industry open source distributed transaction framework Seata.

Seata provides users with at, TCC, Saga and XA transaction modes. At mode is the main transaction mode of Seata, so this chapter analyzes the implementation of Seata in at mode. There is a premise for using at, that is, the database used by microservice must be a relational database supporting transactions.

3.1 overview of Seata in at mode workflow

Seata’s at mode is based on the local transaction characteristics of relational database. It intercepts and parses the SQL executed by the database through the data source agent class, and records custom rollback logs. If you need to rollback, you can replay these custom rollback logs. Although at mode is evolved from XA transaction model (2pc), it breaks the blocking restriction of XA protocol and achieves a balance between consistency and performance.

At mode is evolved from XA transaction model, and its overall mechanism is an improved version of two-phase commit protocol. The two basic stages of at mode are as follows

1) The first stage: firstly, acquire the local lock, execute the local transaction, commit the business data operation and record the rollback log in the same local transaction, and finally release the local lock;

2) The second stage: if the global commit is needed, the rollback log can be deleted asynchronously, and this process can be completed soon. If it is necessary to roll back, the first phase of the rollback log is used for reverse compensation.

This chapter describes the working principle of Seata in at mode, and the e-commerce micro service model used is shown in the following figure:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

In the above figure, the coordinator Shopping-Service first calls the participant repo-service to deduct the stock, and then calls the participant order-service to generate the order. The global transaction process of this business flow after using Seata in XA mode is shown in the following figure:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

The global transaction execution process described in the figure above is as follows:

1) Shopping service registers global transaction with Seata and generates a global transaction ID XID

2) Set repo-service.repo_ db、order-service.order_ The local transaction of DB is executed to the stage of to be submitted, and the transaction content includes the response to repo-service.repo_ db、order-service.order_ DB and write the undo of each library_ Log record

3)repo-service.repo_ db、order-service.order_ DB registers the branch transaction with Seata and brings it into the global transaction scope corresponding to the XID

4) Submit repo-service.repo_ db、order-service.order_ Local transaction of DB

5)repo-service.repo_ db、order-service.order_ DB reports the commit status of branch transaction to Seata

6) Seata summarizes the commit status of all DB branch transactions and determines whether the global transaction should be committed or rolled back

7) Seata notifies repo-service.repo_ db、order-service.order_ DB commits / rolls back the local transaction. If it needs to roll back, it adopts the compensatory method

Among them, 1) 2) 3) 4) 5) belongs to the first stage, 6) 7) belongs to the second stage.

3.1 details of Seata in at mode workflow

In the above e-commerce business scenario, the shopping service calls the inventory service to deduct inventory, and calls the order service to create an order. Obviously, these two calling processes need to be placed in one transaction. Namely:

start global_trx

 Deduction inventory interface of call inventory service

 Create order interface of call order service

commit global_trx

In the inventory service database, there is the following inventory table t_ repo:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

In the order service database, there is the following order table t_ order:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

Now, the user with ID 40002 wants to buy a mouse with commodity code 20002. The content of the whole distributed transaction is as follows:

1) This is recorded in the inventory table of the inventory service

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

Amend to read

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

2) Add a record to the order table of the order service

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

The flow chart of the above operation in the first stage of at mode is as follows:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

From the process of the first stage of at mode, after the local transaction of the branch is submitted in the first stage, the local record locked by the local transaction will be released. This is the biggest difference between at mode and Xa. In the two-phase commit of XA transaction, the locked records are not released until the end of the second phase. Therefore, at mode reduces the time of lock recording and improves the efficiency of distributed transaction processing. The reason why at mode can release the locked records at the completion of the first phase is that Seata maintains an undo in the database of each service_ Log table, which records the_ order / t_ Even if there is an exception in the second stage, only the undo of each service needs to be played back_ The global rollback can be realized by the corresponding records in the log.

undo_ Log table structure:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

After the first phase, Seata will receive the commit status of all branch transactions, and then decide whether to commit the global transaction or roll back the global transaction.

1) If all branch transactions are committed locally, Seata decides to commit globally. Seata sends the message submitted by the branch to each branch transaction. After receiving the message, each branch transaction puts the message into a buffer queue, and then directly returns the successful submission to Seata. After that, each local transaction will slowly process the branch commit message by deleting the undo of the corresponding branch transaction_ Log records. The reason is that you only need to delete undo of branch transaction_ Log records, and there is no need to do other commit operations, because the commit operation has been completed in the first phase (this is also the difference between at and XA). The process is shown in the figure below

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

The reason why the branch transaction can directly return success to Seata is that the really critical commit operation has been completed in the first stage, clearing undo_ Log log is just a finishing job. Even if the cleanup fails, it has no real impact on the whole distributed transaction.

2) If the local commit of any branch transaction fails, Seata decides to roll back the transaction globally and sends the rollback message to each branch transaction. Because Undo is recorded in the database of each service in the first stage_ Log records, branch transaction rollback operations only need to be based on undo_ Log records can be compensated. The global transaction rollback process is shown in the following figure:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

Here is a further explanation of steps 2 and 3 in the figure

1) Due to the above description of undo_ Log, so you can use XID and branch_ ID to find all undo of the current branch transaction_ Log record;

2) Get the undo of the current branch transaction_ If the records in afterimage are inconsistent with those in the current table, it means that from the completion of the first stage to the present moment, other transactions have modified these records, which will cause the branch transaction to fail to roll back and report the failure of rolling back to Seata; If the records in afterimage are consistent with the current table records, it means that from the completion of the first phase to this moment, no other transaction can modify these records, and the branch transaction can be rolled back. Then, according to the beforeimage and afterimage, the compensation SQL is calculated, the compensation SQL is executed to roll back, and the corresponding Undo is deleted_ Log, and feedback to Seata that the rollback is successful.

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

Transactions have acid characteristics, and global transaction solutions are trying to realize these four characteristics. The above description of Seata in at mode clearly reflects the atomicity, consistency and persistence of at. The following focuses on how at ensures the isolation of multiple global transactions.

In at, when multiple global transactions operate on the same table, global lock is used to ensure transaction isolation. The following describes the principle of global lock in two scenarios: read isolation and write isolation

1) Write isolation (if a global transaction is modifying / writing / deleting a record, the modification / writing / deletion of the same record by another global transaction should be isolated, i.e. write mutually exclusive): write isolation is used when multiple global transactions update the same field in the same table, To avoid the data involved in a global transaction being modified by other global transactions before it is committed successfully. The basic principle of write isolation is to ensure that the global lock is obtained before the local transaction is committed in the first stage (when the local transaction is opened, the local transaction will add a local lock to the record involved). If the global lock cannot be obtained, the local transaction cannot be submitted, and the global lock can be obtained continuously until the number of retries is exceeded. The global lock is abandoned, the local transaction is rolled back, and the local lock imposed on the record by the local transaction is released.

Suppose there are two global transaction gtrx_ 1 and gtrx_ 2. In concurrent operation of inventory service, it is intended to reduce the inventory quantity recorded as follows:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

The sequence diagram of at’s write isolation process is as follows:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

In the figure, 1, 2, 3 and 4 belong to the first stage, and 5 belongs to the second stage.

In the figure above, gtrx_ 1 and gtrx_ If gtrx_ 1 perform the rollback operation in the second stage, then gtrx_ 1. You need to restart the local transaction to obtain the local lock, and then according to undo_ Log to compensate the rollback of the record with id = 10002. In this case, gtrx_ 2 is still waiting for the global lock, and holds the local lock of the record with id = 10002, so gtrx_ 1 will roll back failure (gtrx)_ The global lock and the local lock on the record with id = 10002 are required to roll back the failed gtrx_ 1 will always try to roll back. Up to the next gtrx_ 2. The number of attempts to acquire global lock exceeds the threshold, gtrx_ 2 will give up acquiring the global lock and initiate a local rollback. After the local rollback, the local lock added to the record with id = 10002 will be released naturally. At this point, gtrx_ 1 can finally add a local lock to the record with id = 10002, and get the gtrx of the local lock and the global lock at the same time_ 1 can be successfully rolled back. In the whole process, the global lock is always in gtrx_ 1. There is no problem of dirty writing. The flow chart of the whole process is as follows:

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

2) Read isolation (if a global transaction is changing / writing / deleting records, another global transaction’s reading of the same record should be isolated, that is, read-write mutual exclusion): when the isolation level of the local transaction of the database is read committed, repeatable read, serialization (read uncommitted does not play an isolation role, and it is generally not used), The isolation level generated by the Seata at global transaction model is read uncommitted, that is to say, a global transaction will see the uncommitted data of another global transaction, resulting in dirty read. This can also be seen from the flow charts of the first and second stages of the previous article. This is acceptable in the final consistent distributed transaction model.

If the at model is required to achieve the isolation level of read submitted transactions, the select for UPDATE statement can be proxied by Seata’s select for update executor. When the select for UPDATE statement is executed, it will apply for a global lock. If the global lock has been occupied by other global transactions, it will roll back the execution of the select for UPDATE statement, release the local lock, and try the select for UPDATE statement again. In this process, the query request will be blocked until the global lock is obtained (that is, the record to be read is submitted by other global transactions), and the data that has been submitted by the global transaction is read. The process is shown in the figure below

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

4、 Concluding remarks

Xa protocol is a distributed transaction processing standard proposed by X / open. The essence of 2pc, 3pc, TCC, local transaction table and seat in at mode mentioned in this paper is that the transaction coordinator coordinates the progress of local transactions of each transaction participant, so that all local transactions can be committed or rolled back together, and finally achieve a global acid feature. In the process of coordination, the coordinator needs to collect the current state of each local transaction, and issue the next stage operation instructions according to these states. This idea is the essence of XA protocol. We can say that these transaction models comply with or roughly comply with XA protocol.

The final consistent transaction scheme based on message middleware is an innovative application mode explored by Internet companies in high concurrency scenarios. MQ is used to realize asynchronous call, decoupling and traffic peak clipping between microservices to ensure the final consistency of distributed data records. It clearly does not comply with the XA protocol.

For a certain technology, there may be industry standards or protocols, but it is a common phenomenon in the engineering field that practitioners give the implementation that is not completely consistent with the standards or even completely inconsistent with the requirements of specific application scenarios or for the sake of simplicity. The same is true for TCC scheme, final consistent transaction scheme based on message middleware and Seata in at mode. And new standards are often generated in these innovations.

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

Do you really not find the business vulnerability in the correct solution given in Section 2.6 (the final consistent transaction solution based on Message Oriented Middleware)? Please take a new look at this picture, carefully review the direction of two microservices, and leave your thoughts in the comments area: -)

14000 words distributed transaction principle analysis, master all of them, are you afraid of being asked in the interview?

Write at the end

Welcome to my official account.Calm as a yard】, massive Java related articles and learning materials will be updated in it, and the sorted materials will also be put in it.

If you think the writing is good, just like it and pay attention to it! Focus, don’t get lost, keep updating!!!