Detailed explanation of MySQL mvcc mechanism


What is mvcc

Mvcc, full name: multi version concurrency control, i.e. multi version concurrency control. Mvcc is a method of concurrency control. Generally, it realizes concurrent access to database in database management system and transaction memory in programming language.

We know that in general, we use the InnoDB storage engine when using MySQL database. The InnoDB storage engine supports transactions. When multiple threads execute transactions at the same time, concurrency problems may occur. At this time, we need a method to control concurrency, and mvcc plays this role.

Lock and transaction isolation level of MySQL

Before understanding the principle of mvcc mechanism, you need to understand MySQL lock mechanism and transaction isolation level, leaving aside MyISAM storage engine. For InnoDB storage engine, there are row lock and table lock respectively. Table lock is to lock the whole table in one operation, so that the granularity of lock is the largest, but the performance is the lowest, and deadlock will not occur. Row lock is to lock one row at a time. In this way, the granularity of lock is small and the concurrency is high, but deadlock will occur.

InnoDB row locks are divided into shared locks (read locks) and exclusive locks (write locks). When a transaction adds a read lock to a row, other transactions are allowed to read the row, but they are not allowed to write, and other transactions are not allowed to write the row, but they can add a read lock.

When a transaction adds a write lock to a row, other transactions are not allowed to write to the row, but can read. At the same time, other transactions are not allowed to add a read-write lock to the row.

Let’s take a look at the transaction isolation levels of MySQL, which are divided into the following four types:

  1. Read uncommitted: a transaction can read data that has not been committed by other transactions, and dirty reads will occur. For example, there is a payroll. Transaction a starts first and then queries the salary of the employee with ID 1. Assuming that the salary at this time is 1000, transaction B also starts and performs the update operation. The salary of the employee with ID 1 is reduced by 100, but the transaction is not submitted. At this time, execute the query operation of transaction a to read the updated data of transaction B. if transaction B rolls back at this time, what transaction a reads is “dirty” data. When transaction a performs an update operation, unreal reading may also occur.
  2. Read committed: a transaction can only read the modified data of another committed transaction, and the transaction can query the latest value every time the data is modified and committed by other transactions. For the same example, when the transaction isolation level is read committed this time, and transaction B does not commit the transaction, transaction a cannot read the updated data of transaction B, thus avoiding the generation of dirty data. However, when transaction B commits and transaction a executes the same data, it will find that the data has changed. This is the so-called non repeatable reading, which means that the results obtained by executing the same query multiple times in the same transaction are inconsistent. At the same time, phantom reading still exists.
  3. Repeatable reading: after a transaction reads a record for the first time, even if other transactions modify the value of the record and commit, when the transaction reads the record again, it still reads the value read for the first time, rather than reading different data every time. This is repeatable reading. This isolation level solves the problem of non repeatable, but phantom reading will still occur.
  4. Serialization: in this isolation level, because the operations on the same record are serial, there will be no dirty reading, phantom reading, etc., but this is not a concurrent transaction.

Undo log of MySQL

The underlying layer of mvcc relies on MySQL’s undo log, which records database operations. Because undo log is a logical log, it can be understood that when a record is deleted, the undo log will record a corresponding insert record, and when a record is updated, the undo log will record an opposite update record. When a transaction fails and a rollback operation is required, You can roll back by reading the corresponding contents in the undo log, and mvcc makes use of the undo log.

Implementation principle of mvcc

The implementation of mvcc makes use of the implicit fields of the database, undo log and readview. First, let’s look at the implicit fields. In fact, MySQL implicitly records the DB behind each row of records in the table_ TRX_ ID (recently modified (modified / inserted) transaction ID), DB_ ROLL_ PTR (rollback pointer, pointing to the previous version of this record), DB_ ROW_ ID (self incrementing ID, if the data table has no primary key, it will be clustered and indexed by this ID by default).

Undo logs are divided into two types: insert undo log. The undo log generated when a new record is inserted is only required when the transaction is rolled back and can be discarded immediately after the transaction is committed. There is also update undo log, which is generated when the transaction is updated or deleted; Not only when the transaction is rolled back, but also when the snapshot is read; Therefore, it cannot be deleted casually. Only when the log is not involved in fast read or transaction rollback, the corresponding log will be uniformly cleared by the purge thread. Mvcc uses update undo log.

In fact, undo log records a version chain. Suppose there is a record in the database as follows:

Now, a transaction a modifies the record and changes the name to Tom. The operation flow at this time is as follows:

  • Transaction a first adds an uplink lock to the row record
  • Then copy the record to undo log as an old version
  • After copying, change the name of the row to Tom, and then change the DB of the row_ TRX_ The ID value is changed to the ID of transaction A. at this time, assuming that the ID of transaction a is 1, the DB of this row is changed_ POLL_ PTR points to the record copied to undo log
  • Release the lock after the transaction is committed

The situation is as follows:

At this time, another transaction B modifies the record, changing the age to 28. The operation flow is as follows:

  • Transaction B adds an uplink lock to the line change record
  • Copy the record of this row to the undo log as an old version. At this time, it is found that the undo log has been recorded. Then, a new undo log is inserted as the header of the linked list into the front of the undo log of this row
  • After copying, change the age of the row to 28, and then change the DB of the row_ TRX_ The ID value is changed to the ID of transaction B. at this time, assuming that the ID of transaction B is 2, the DB of this row is changed_ POLL_ PTR points to the record copied to undo log
  • Release lock after transaction commit

The situation is as follows:

From the above, we can see that the modification of the same row of records by different transactions or the same transaction will make the undo log of the row form a version chain. The head of the undo log chain is the latest old record, and the tail of the chain is the earliest old record.

Now let’s assume that transaction a and transaction B are not committed. At this time, a transaction C modifies the record with the name Tom, changes the age to 30, and then commits the transaction. The ID of transaction C is 3. Similarly, a record will be inserted into the undo log. At this time, the undo log version is the DB recorded at the beginning of the chain_ TRX_ ID is 3.

Now there is a transaction D, which queries the record with name Tom. At this time, snapshot reading will be enabled. Snapshot is a data snapshot triggered by the query operation at the beginning of the transaction. Unlocked reads are snapshot reads by default at the repeatable read isolation level. Compared with snapshot reads, there is another one called current reads, and update operations are current reads. During snapshot reading, a read view will be generated. At the moment when the transaction executes snapshot reading, a snapshot of the current database will be generated, and the ID of the currently active transaction will be recorded and maintained. Because the transaction ID is self increasing, the newer the transaction ID is, the larger the transaction ID is. The read view follows the visibility algorithm, and whether it is visible or not requires some judgment. In addition to recording the currently active transaction ID, the read view also records the currently created maximum transaction ID. when reading the snapshot, it needs to be compared with the read view to obtain the visibility result.

Read view mainly compares the ID of the current transaction with the ID of the active transaction in the system. The comparison rules are as follows:

First, in the read view, there will be an array of active transaction IDS in the system generated by the read view, which is temporarily called ID_ list

Then an ID will be recorded in the read view_ The smallest transaction ID in the list is temporarily called low_ id

Finally, the read view will also record a transaction ID that has not been allocated in the system at the generation time of the read view, that is, the current maximum transaction ID + 1, which is temporarily called high_ id

  • If the current transaction ID is less than low_ ID, the current transaction is visible
  • If the current transaction ID is greater than high_ ID, the current transaction is invisible
  • Current transaction is greater than low_ ID less than high_ ID, and then judge whether it is in ID_ If it is in the list, it means that the active transaction has not been committed and the current transaction is not visible, but it is visible for the active transaction itself. If it is not in the ID_ List, the current transaction is visible

If the visibility result is invisible, you need to go through dB_ ROLL_ PTR gets the DB of the record from undo log_ TRX_ ID is compared, and the version chain is traversed until a DB satisfying a specific condition is found_ TRX_ ID, then this DB_ TRX_ The old record where the ID is located is the latest old version that can be seen by the current transaction.

The above is the detailed explanation of the principle of MySQL mvcc mechanism. For more information about the principle of MySQL mvcc mechanism, please pay attention to other relevant articles of developeppaer!

Recommended Today


Supervisor [note] Supervisor – H view supervisor command help Supervisorctl – H view supervisorctl command help Supervisorctl help view the action command of supervisorctl Supervisorctl help any action to view the use of this action 1. Introduction Supervisor is a process control system. Generally speaking, it can monitor your process. If the process exits abnormally, […]