Explain in detail the implementation principle of transaction persistence in MySQL



When it comes to database transactions, it must be easy to jump out of your mind a pile of transaction related knowledge, such as acid characteristics of transactions, isolation levels, problems to be solved (dirty reading, non repeatable reading, phantom reading), etc., but few people may really know how these characteristics of transactions are realized and why there are four isolation levels.

We will continue to understand the principle of MySQL transaction isolation before we talk about the implementation of MySQL.

Of course, MySQL is broad and profound, and omissions in the article are inevitable. Criticism and correction are welcome.


The transaction implementation logic of MySQL is located in the engine layer, and not all engines support transactions. The following instructions are based on the InnoDB engine.

InnoDB data reading and writing principle

Before moving on, we need to understand how InnoDB reads and writes data. We know that the data of the database is stored in the disk, and then we also know that the cost of disk I / O is very high. If you have to access the disk every time you read and write data, the efficiency of the database will be very low. To solve this problem, InnoDB provides buffer pool as a buffer for accessing database data.

Buffer pool is located in memory and contains the mapping of some data pages in the disk. When data needs to be read, InnoDB will first try to read from the buffer pool. If it cannot be read, it will read from the disk and put it into the buffer pool; When writing data, the page of buffer pool will be written first, marked dirty, and put on the special flush list. These modified data pages will be refreshed to the disk at a later time (this process is called scrubbing, which is the responsibility of other background threads). As shown in the figure below:

The advantage of this design is that a large number of disk I / O can be converted into memory reading and writing, and the multiple modifications to a page can be merged into one I / O operation (brushing dirty once into the whole page), so as to avoid accessing the disk every time, which greatly improves the performance of the database.

Persistence definition

Persistence means that once a transaction is committed, its changes to the database should be permanent. Other subsequent operations or failures should not have any impact on the modification of this transaction.

Through the previous introduction, we know that InnoDB uses buffer pool # to improve the performance of reading and writing. However, the buffer pool is in memory and volatile. If MySQL suddenly goes down after a transaction is committed, and the modified data in the buffer pool has not been refreshed to the disk, the data will be lost and the durability of the transaction cannot be guaranteed.

In order to solve this problem, InnoDB introduces redo log to realize the persistence of data modification. When modifying the data, InnoDB will not only modify the data in the buffer pool, but also record the operation in the redo log, and ensure that the redo log falls on the disk earlier than the corresponding page (usually when the transaction is submitted), which is often called wal. If MySQL suddenly goes down and does not brush the data back to the disk, after restarting, MySQL will restore the data page that has not been refreshed to the disk through the redo log written to the disk.

Implementation principle: redo log

In order to improve performance, like data pages, redo log also includes two parts: one is the redo log buffer in memory, which is volatile; The second is the redo log file on the disk, which is persistent. Redo log is a physical log that records the physical pages in the database.

When the data is modified, InnoDB will not only modify the data in the buffer pool, but also record the operation in the redo log buffer; When the transaction is committed, the redo log buffer will be flushed and recorded in the redo log file. If MySQL goes down, you can read the data in redo log file and recover the database when restarting. In this way, you don’t need to brush dirty in real time every time you commit a transaction.

Write process


  • First modify the buffer pool, and then write the redo log buffer.
  • The redo log is written back to the disk before the data page: when the transaction is committed, the redo log buffer will be written to the redo log file, and the submission will be successful only if the writing is successful (there are other scenarios that trigger the writing, which will not be expanded here), while the data of the buffer pool will be written to the disk by the background thread at a later time.
  • When cleaning, you must ensure that the corresponding redo log has been dropped, that is, the so-called wal (pre written log), otherwise there will be the possibility of data loss.


When a transaction is committed, writing redo log has three main advantages over directly brushing dirty:

Brushing dirty is random I / O, but writing redo log is sequential I / O. sequential I / O is much faster than random I / O, so it is unnecessary.
Dirty brushing is based on the data page. Even if a page is only slightly modified, the whole page should be written; The redo log contains only the parts that are really modified. The amount of data is very small, and the invalid IO is greatly reduced.
For example, when the data is successfully written to the buffer, it is necessary to write to the buffer in the smallest size. For example, when the data is successfully written to the buffer, it is impossible to write to the buffer in one sector.

Write redo log first or modify data first

A DML may involve data modification and redo log recording. What is their execution order? Some articles on the Internet say to modify the data first and then record the redo log. Others say to record the redo log first and then change the data. What’s the real situation?

First of all, we know from the above description that the redo log buffer will write the redo log file when the transaction is committed, and the dirty brushing is at a later time. Therefore, we can be sure to record the redo log first and then modify the data page (wal, of course, writes the log first).

The next question is whether to write redo log buffer or modify buffer pool first. To understand this problem, we first need to understand the execution process of a DML in InnoDB. The execution process of a DML involves data modification, locking, unlocking, redo log recording and undo log recording, which also need to ensure the atomicity, while InnoDB ensures the atomicity of a DML operation through MTR (Mini transactions).

First, let’s look at the definition of MTR:

An internal phase of InnoDB processing, when making changes at the physical level to internal data structures during DML operations. A Mini-transactions (mtr) has no notion of rollback; multiple Mini-transactionss can occur within a single transaction. Mini-transactionss write information to the redo log that is used during crash recovery. A Mini-transactions can also happen outside the context of a regular transaction, for example during purge processing by background threads. See https://dev.mysql.com/doc/refman/8.0/en/glossary.html

MTR is a short atomic operation and cannot be rolled back because it is atomic in itself. The change of data page must be through MTR, which will record the modification of data page by DML operation in redo log.

Let’s take a brief look at the process of MTR:

  • When MTR is initialized, a copy of MTR will be initialized_ buf
  • When modifying data, redo log record will be generated while modifying the pages in the memory buffer pool and saved in MTR_ BUF.
  • Executing MTR_ When the commit function submits this MTR, it will_ The redo log record in the buffer is updated to the redo log buffer, and the dirty page is added to the flush list for subsequent dirty brushing. In the log buffer, every time a 496 byte log record is received, the group of log records will be packaged with a 12 byte block header and a 4-byte block tail to form a 512 byte log block, which is convenient for aligning 512 bytes when swiping the disk.

It can be seen that InnoDB first modifies the buffer pool and then writes the redo log buffer.

Process of recovering data

In any case, InnoDB will attempt to perform the recovery operation when it starts. Redo log is required to participate in the recovery process. If binlog is also enabled, binlog and undo log are also required to participate. It is possible that the data has been written to binlog, but the database crashed before the redo log was flushed (transaction is the feature of InnoDB engine, and the modified data is not necessarily submitted, while binlog is the feature of MySQL service layer, and the modified data will be recorded). At this time, the participation of redo log, binlog and undo log is needed to judge whether there are uncommitted transactions, Roll back or commit the uncommitted transaction.

Let’s briefly describe the process of recovering data only using redo log:

  • When starting InnoDB, find the location of the last checkpoint and use the checkpoint LSN to find the redo log larger than the LSN for log recovery.
  • If the intermediate recovery fails, it will not be affected. When recovering again, it will continue to recover from the location of the checkpoint successfully saved last time.

Recovery process: fault recovery includes three stages: analysis, redo and undo. The main task of the analysis phase is to use the information in the checkpoint and log to confirm the operation scope of the subsequent redo and undo phases, correct the dirty page set information recorded in the checkpoint through the log, and use the least LSN position involved as the starting position of the next redo redo redolsn. At the same time, modify the active transaction set (uncommitted transaction) recorded in the checkpoint as the rollback object of undo process; In the redo phase, the redo content in all logs is replayed from the redolsn obtained by analysis. Note that uncommitted transactions are also included here; In the last undo phase, all uncommitted transactions are rolled back by using undo information. Through the prevlsn of the log, the modifications of the transaction that need to be rolled back can be found in sequence. See for details http://catkang.github.io/2019/01/16/crash-recovery.html

What is LSN?

LSN, that is, log sequence number, or log sequence number, is a monotonically increasing 64 bit unsigned integer. Both redo log and data page save LSN, which can be used as the basis for data recovery. A larger LSN indicates that the changes described by the referenced log record occur later.

What is checkpoint?

Checkpoint indicates a save point. The modification of the data page before this point (log LSN < checkpoint LSN) has been written to the disk file. InnoDB will record the checkpoint every time it swipes the disk, and record the latest redo log LSN into the checkpoint LSN to facilitate the judgment of the starting point when recovering data.

The above is a detailed explanation of the implementation principle of transaction persistence in MySQL. For more information about the persistence of MySQL transactions, please pay attention to other relevant articles of developeppaer!