Interviewer: do you know MySQL transaction isolation and mvcc multi version concurrency control?

Time:2020-12-2

Transaction isolation of database

Some time ago, there was a problem in the company, that is, the same batch of tasks created by us were not assigned to different instances for execution, which led to problems in online results.

The leader of another group said that the transaction was not opened, so it is OK to set the transaction.

As soon as there is a consistency problem in the database, if we start the transaction, I have some doubts about how the transaction of the database can ensure the consistency.

In looking at the following content, we can first think about a few questions.

What are the isolation levels of the database?

How to realize mvvc view of database?

What is the isolation level of database to solve?

After reading the above three questions, how many can I answer? No hurry. Let’s move on

Database transactions

The transaction of database is simply used to ensure the correctness of data. It has only two operations: transaction success or failure and rollback.

Why do you do that? This is because in general, when we conduct transaction operations, we will carry out a set of operations. For example, your usual financial transfer.

There are two operations in this transfer transaction:

  • Deduct money from one’s bank account
  • Add the money received to the corresponding account.

Now think about it. What would happen if we didn’t add a transaction?

  1. If the deduction is successful, the implementation of adding money to others fails. And the money has been deducted, the other party did not receive the money, what do you say?
  2. If you add money to the other party first, and you are not deducted successfully. Is this money subsidized by the bank? Hehe, the bank must be unhappy.

Therefore, we can only use transactions in this operation to ensure the success and failure of the execution. If it fails, we need to roll back and ensure that the operation of deduction will not be executed.

The acid of the transaction

These four features are referred to as acid

  • Atomicity: the same group of operations, either do or do not do, a group of one or two successful execution does not represent success, all success can. This is atomicity, do or don’t do (rollback if failure).
  • Consistency: data consistency, as the above example shows, if you deduct money and the other party does not add money, it will certainly not work.
  • Isolation: when multiple databases operate on the same data, they cannot affect each other. You can’t change the data space there.
  • Persistence: after the transaction result is committed, the change is permanent, and the subsequent operation or system failure cannot make this record lost.

Today’s main business is isolation. See how transactions ensure data isolation

Isolation level of the transaction

Different transaction isolation levels correspond to different data execution efficiency. The more strict the isolation is, the lower the execution efficiency is. The next four isolation levels are the more stringent.

  • Read uncommitted: when the data is not committed when the transaction is executed, other transactions can see the results
  • Read committed: refers to the data after its transaction is committed, other transactions can see the results.The view is created when SQL statements are executed. For specific views, see how the following data isolation is implemented
  • Repeatable read: during the execution of a transaction, the results it sees are consistent with what it sees when it starts. A view snapshot will be created at startup. In this transaction state, the contents of the view snapshot will be consistent, and other transaction changes will not be visible.Note the reading process. If it is an update, the current read will be used. That is to say, the update operation of other transactions will get the results to ensure data consistency
  • Serializable: as the name implies, it is to serialize multiple transactions (locking, read-write conflict, read-read conflict) in the process of reading and writing. After one transaction is finished, another transaction can be executed. The effect is that the parallelization is not good and the efficiency is low.

The default transaction isolation level in MySQL is repeatable. Use this command to view the current transaction level,

show variables like 'transaction_isolation';

#The following statement modifies the level of the transaction.

Set session transaction isolation level serializable; (parameters can be read uncommitted, read committed, repeatable, serializable)

Interviewer: do you know MySQL transaction isolation and mvcc multi version concurrency control?

How to start a transaction

In the program, we often default to auto commit, that is, a SQL operation is a transaction, but sometimes we need to combine multiple SQL, we need to explicitly open the transaction.

Start or start transaction is used to display the open statement. Similarly, commit is used at the end of the transaction, and rollbakc is used to roll back the failure.

Of course, if you don’t want SQL to commit automatically, we will turn it offset autocommit=0In this way, the transaction will not be committed automatically, and we need to execute commit manually

How to avoid long transactions

After the auto commit transaction is turned off, we need to commit the transaction ourselves. At this time, the execution of each statement is like this.

begin

SQL statement

commit

If we forget to commit a transaction when we are writing a program, and we do not commit until the next SQL, then we will have a long transaction.

Long transactions often cause a lot of congestion and lock timeout. If there are read-write operations (read-read conflict, read-write conflict, write conflict) in the transaction, the data will be locked and other transactions will have to wait.

Therefore, in the program, we should try to avoid the use of large transactions, but also to avoid accidental large transactions (errors) when we write programs.

The solution is that we will automatically commit to open, only when transactions need to be usedOpen transactions displayed

How to deal with a lot of transaction waiting in the program

It is very convenient to locate a long transaction problem in MySQL.

First, let’s find out what the long transaction is.


select t.*,to_seconds(now())-to_seconds(t.trx_started) idle_time from INFORMATION_SCHEMA.INNODB_TRX t G

This statement will show the start time of transaction execution. We can easily calculate how long the current transaction has been executed, and the idle above_ Time is the transaction time of execution

Suppose that all transactions executed over 30s are long transactions. You can use the following statement to filter long transactions over 30s.


select * from information_schema.innodb_trx where TIME_TO_SEC(timediff(now(),trx_started))>30

Through information_ schema.innodb_ TRX, we can locate long transactions.

It can decide whether the long transaction will be killed or continue to wait. If a deadlock occurs, the processing method is similar. Find the deadlock statement, and then execute the execution to kill a statement(A little violent)。

How is data isolation implemented

Note: this data isolation is based on repeatable read scenarios

In the repeatable read scenario, we understand that each time a transaction is started, a view is currently started, and this view is a view snapshot of the entire database.

Hey hey, is it that big as the database, why don’t we feel the consumption of creating snapshot time?

This is because the view snapshot created by the database takes advantage ofAll data has multiple versions to enable the ability to quickly create view snapshots

What about multiple versions of data?

Prepare the data

Don’t worry. Let’s get the data ready.

Now create a table and insert three pieces of data.


create table scores

(

id int not null

primary key,

score float null

);

INSERT INTO scores (id, score) VALUES (1, 3.5);

INSERT INTO scores (id, score) VALUES (2, 3.65);

INSERT INTO scores (id, Score) VALUES (3, 4);

Before using it, we should know two little knowledge points. Begin / start transaction and start transaction with consistent snapshot.

  • The creation of the begin / start transaction view is based on the start / start transaction before the SQL statement creates the view, such as the following example
begin

Select source from scores; // the view is created here, not at begin

commit
  • Start transaction with consistent snapshot: the view is created after the statement is executed.

After understanding the difference between the above two creation transactions, let’s see how the view creates multiple data versions. The following SQL is opened in two windows.

Transaction a | transaction B | result

Start transaction with consistent snapshot | open the transaction and create the view|

–|Start transaction with consistent snapshot | open the transaction and create the view

Select score from scores where id = 2 | — the value in transaction a is 3.65

–| update scores set scores = 10 where id = 2 | transaction B is modified to 10

–|Select score from scores where id = 2 | transaction B is displayed as 10

Select score from scores where id = 2 | — transaction a is displayed as 3.65

Select score from scores where id = 2 for update | — will be locked and wait for transaction B to release the lock (gap lock)

–Commit transaction B

Select score from scores where id = 2 for update

Select score from scores where id = 2 | — without for update, the result is still 3.65

Submit the result of a

The above process is two different requests, different operations on the same table in the database.

After transaction a executes start transaction with consistent snapshot, the view of a is created. At this time, the modification of transaction B cannot be seen. Even after transaction bcommit, as long as transaction a does not end, the result it sees is the value at its start time.

This is corresponding to the sentence that the results seen during the execution process are consistent with the results seen at the start-up without repeated submission

Snapshot multi version

As mentioned above, the snapshot is based on the entire database when the transaction is started, and the whole database is very large. How can MySQL make us insensitive and quickly create a snapshot.

Snapshot multi version you can think of as consisting of the following two parts.

  • Transaction ID: This is requested from InnoDB when the transaction is started. And be sure to note that it’s incremental.
  • row trx_ ID: this ID is actually the transaction ID. each time a transaction updates the data, the transaction ID is assigned to the transaction ID of the data version, and the transaction ID of this data version is called row TRX_ ID.

When there are multiple data versions in a row, there are multiple row TRX_ id 。 for instance

Statement operation corresponding to version value transaction ID

v1 | score =3 | 89| —

v2 | score =5| 90| update scores set score = 5 where id =3; select score from scores where id =3;|

v3 | score = 6 | 91 | update scores set score = 6 where id =3;|

V1 > V2 > V3, which involves three versions of iteration. In the middle, the undo log is used to save the updated records.

Note that after starting the snapshot, the value of V1 can be obtained in the case of repeated read isolation. Instead of the value directly stored in mysql, it is calculated by using the latest version of this record and the undo log log log. For example, the score value in V1 is calculated through V3 > V2 > V1.

Interviewer: do you know MySQL transaction isolation and mvcc multi version concurrency control?

Version calculation

The calculation rules of the next version are briefly described above, but in mysql, the version is not so simple to calculate. Let’s take a look at how to calculate,

We are paying attention to these two points

  • When a transaction is started, it will apply for a transaction ID from the InnoDB transaction system, which is strictly incremental.
  • Each row of data has multiple versions. The ID of this version is row TRX_ ID, and transactionUpdate data(a new version is generated only when the data is updated). A new data version is generated and the transaction ID is assigned to the transaction id = = row TRX_ ID,
  1. When a transaction is started, you can see the results of all transactions that have been committed. However, after the transaction is started, the changes of other transactions cannot be seen.
  2. When a transaction starts, there will be running transactions at the moment of creation in addition to those already committed. MySQL puts these running transaction IDS into an array.The smallest transaction ID in the arrayRecord as low water level, current systemMaximum created transaction ID + 1It is recorded as high water level.

Take a simple example.

a. One thing to note: getting the transaction ID and creating the array are not atomic operations, so there are transactions with transaction ID 8 and then transactions with active transaction ID 9 10 in MySQL.

b. If the transaction ID is lower than the low water level, it must be visible to the current transaction. If the transaction ID is higher than the transaction ID value of the high water level, it is not visible to the current transaction

c. The transaction ID is located between the low water level and the high water level.

  • If the transaction ID is in the active array, it indicates that the version is in execution, but the result has not been committed, so the transaction changes will not be seen by the transaction of course.
  • If the transaction ID is not in the active array, it means that the transaction has been committed, so it is visible. For example, 90, 91, 92 transactions are created now. 91 is executed relatively fast. The commit is completed, and 90 and 92 have not yet been committed. At this time, a new transaction ID of 93 is created, and the transactions in the active array are 90, 92, 93. You can see that 91 has been committed, and its transaction is still between the low water level and the high water level, but the result is visible to 93.

Generally speaking, the transaction results are visible when I create them, and then they are invisible after they are committed.

Read process

As mentioned above, the data in the old version view is calculated by the latest version and undo log. How can we calculate it?

Transaction a | transaction B | result

Start transaction with consistent snapshot transaction ID 89|

–|Start transaction with consistent snapshot transaction ID 92

Select score from scores where id = 2 | — the value in transaction a is 3.65

–| update scores set scores = 10 where id = 2 | transaction B is modified to 10

–|Select score from scores where id = 2 | transaction B is displayed as 10

Select score from scores where id = 2 | — transaction a is displayed as 3.65

Submit the result of a

Let’s look at this transaction operation.

Here is the flow of data changes.

  • Suppose there are two active transaction IDS 78, 88 before the start
  • When transaction a starts, it will put 78 88, including itself, into the active array.
  • Statement of transaction a operationselect score from scors where id =2Think of the results as V1 version data, such as row TRX_ id(be careful:row trx_ ID is the transaction ID assigned to row TRX after the data row has been updated) is 86 and saved.
  • When transaction B starts, it will find that the active array is 78, 88, 89, and its own 92
  • After transaction B executes the update statement statement, a new version V2 will be generated, and the data transformation is V1 — > v2. What changes in the middle are recordedundo logjournal. In this way, the data stored in ID 89 becomes historical data. Data version row TRX_ ID is 92
  • Transaction a queries the score data and finds the corresponding row TRX by querying the current V2 version view_ Id = 92, row TRX found_ If Id is on the high water level, discard this value and find V1, row TRX through v2_ The ID is 86, and 86 is greater thanLow water level, and lower thanHigh water level89 + 1. However, since 86 is not in the active array and belongs to a committed transaction, the current transaction can see the result, so transaction a can get the read value.

You see, after a few simple steps, we can get the transaction data we want to read, so no matter when transaction a queries, the result it gets is consistent with the data it reads.

You can see that with mvcc (multi version concurrency control), the process of reading the results of the current transaction will not be affected if the value of other transactions is changed.

We often say not to write a long transaction. From the above reading process, we can see that if a long transaction exists for a long time, there will be a lot of data versions. Then undo log logs need to be kept for a long time, and these rollback logs will take a lot of timeMemoryStorage space.

When no transaction needs to read the log and version data, the log can be deleted to free up memory space.

Update process

Transaction a | transaction B | result

Start transaction with consistent snapshot transaction ID 89|

–|Start transaction with consistent snapshot transaction ID 92

Select score from scores where id = 2 | — the value in transaction a is 3.65

–| update scores set scores = 10 where id = 2 | transaction B is modified to 10

–|Select score from scores where id = 2 | transaction B is displayed as 10

Select score from scores where id = 2 | — transaction a is displayed as 3.65

Select score from scores where id = 2 for update | — will be locked and wait for transaction B to release the lock (gap lock)

–Commit transaction B

Select score from scores where id = 2 for update

Select score from scores where id = 2 | — without for update, the result is still 3.65

Submit the result of a

The above mentioned reading process. In fact, in the transaction, we still have an update process. The update process is relatively simple. In the update process, we need to ensure the consistency of the data. We can’t say that someone else has modified it, and we can’t see it. That will cause data inconsistency.

In order to see the latest data, the update row operation will be locked (row lock). After locking, other transactions will update the row, and the latest value can only be obtained after other transactions commit. This process is calledCurrent reading

To get the latest value in the reading process, you can use the above statement select score from scores where id = 2 for update to see the current latest value.

summary

This section mainly combs the isolation level of transaction and the principle of mvcc multi version concurrency control.

Business in the interview is more than a point, such a topic can be a variety of changes, we have just mentioned the title of the three questions can be solved.

You try to answer?

The next issue will talk about the unreal reading in the database. Unreal reading is also a common problem in the interview.