The principle of transaction persistence in MySQL



When it comes to database transactions, it’s easy for you to have a lot of knowledge about transactions in your mind, such as the acid characteristics of transactions, isolation levels, problems solved (dirty read, non repeatable read, magic read), etc. but few people really know how these characteristics of transactions are realized, and why there are four isolation levels.

We’ve seen that in previous articlesThe principle of transaction isolation in MySQLToday, let’s continue to talk about the implementation principle of MySQL persistence.

Of course, MySQL is extensive and profound, and the article’s omissions are inevitable. We welcome criticism and correction.


MySQL transaction implementation logic is located in the engine layer, and not all engines support transactions. The following description is based on InnoDB engine.

Principle of InnoDB reading and writing data

Before going on, we need to understand how InnoDB reads and writes data. We know that the data of the database is stored in the disk, and then we also know that the cost of disk I / O is very high. If you have to access the disk every time you read and write data, the efficiency of the database will be very low. To solve this problem, InnoDB provides buffer pool as a buffer to access database data.

Buffer pool is located in memory and contains the mapping of some data pages in disk. When the data needs to be read, InnoDB will first try to read from the buffer pool. If it cannot read, it will read from the disk and put it into the buffer pool; When writing data, the buffer pool page will be written first, and such page will be marked as dirty and put on the special flush list. These modified data pages will be refreshed to the disk at a later time (this process is called scrubbing, and is responsible by other background threads). As shown in the figure below:

The principle of transaction persistence in MySQL

The advantage of this design is that it can convert a large number of disk I / O to read and write in memory, and merge the multiple modifications of a page into one I / O operation (brushing the whole page once), so as to avoid accessing the disk every time, thus greatly improving the performance of the database.

Persistence definition

Persistence means that once a transaction is committed, its changes to the database should be permanent, and subsequent operations or failures should not have any impact on the modification of this transaction.

From the previous introduction, we know that InnoDB uses buffer pool to improve the performance of reading and writing. However, the buffer pool is in memory, which is volatile. If a transaction is submitted and MySQL suddenly goes down, and the modified data in the buffer pool has not been refreshed to the disk at this time, the data will be lost, and the persistence of the transaction cannot be guaranteed.

In order to solve this problem, InnoDB introduces redo log to realize the persistence of data modification. When the data is modified, InnoDB will not only modify the data in the buffer pool, but also record the operation in the redo log, and ensure that the redo log drops the disk earlier than the corresponding page (generally when the transaction is submitted), which is commonly known as wal. If MySQL suddenly goes down and does not brush the data back to the disk, after restart, MySQL will restore the data pages that have not been refreshed to the disk through the redo log that has been written to the disk.

Implementation principle: redo log

To improve performance, like data pages, redo log also includes two parts: one is the redo log buffer in memory, which is volatile; the other is the redo log buffer; The second is the redo log file on disk, which is persistent. Redo log is a physical log that records the physical pages in the database.

When the data is modified, InnoDB will not only modify the data in the buffer pool, but also record the operation in the redo log buffer; When a transaction is committed, the redo log buffer is flushed and recorded in the redo log file. If MySQL goes down, you can read the data in the redo log file and recover the database when you restart. In this way, there is no need to clean every transaction in real time.

Write process

The principle of transaction persistence in MySQL


  • First modify the buffer pool, then write the redo log buffer.
  • Redo log is written back to disk before the data page: when a transaction is submitted, the redo log buffer is written to the redo log file. Only when the transaction is successfully written can the submission be considered successful (there are other scenarios that trigger writing, so we won’t expand here). The data of buffer pool is written to disk by the background thread at a later time.
  • When cleaning dirty, you must make sure that the corresponding redo log has been dropped, that is, the so-called wal (pre written log), otherwise there will be the possibility of data loss.


When a transaction is committed, there are three main advantages of writing redo log over direct scrubbing

  • Brushing dirty is random I / O, but writing redo log is sequential I / O. sequential I / O is much faster than random I / O, so it is unnecessary.
  • Scrubbing is based on the data page. Even if a page has only a little modification, the whole page should be written; The redo log only contains the real modified part, so the amount of data is very small, and the invalid IO is greatly reduced.
  • When cleaning dirty data, you may need to clean many pages of data, which cannot guarantee atomicity (for example, if you write only part of the data, you will fail). The redo log buffer writes log block to the redo log file according to 512 bytes, that is, the size of a sector. The sector is the smallest unit of writing, so you can ensure that the writing will be successful.

Write redo log or modify data first

A DML may involve data modification and redo log recording. What is the execution order of them? Some of the articles on the Internet said that the data should be modified first and then the redo log should be recorded. Some said that the redo log should be recorded first and then the data should be changed. What’s the real situation?

First of all, we know from the above description that the redo log buffer will be written to the redo log file when the transaction is submitted, and the scrubbing is at a later time, so we can be sure thatFirst record the redo log, then modify the data page(wal, of course, is written in the log first.).

The next question is whether to write redo log buffer or modify buffer pool first. To understand this problem, we first need to understand the implementation process of a DML in InnoDB. The execution process of a DML involves data modification, locking, unlocking, redo log and undo log, which also needs to ensure atomicity. InnoDB uses MTR (Mini transactions) to ensure atomicity of a DML operation.

First, let’s look at the definition of MTR

An internal phase of InnoDB processing, when making changes at the physical level to internal data structures during DML operations. A Mini-transactions (mtr) has no notion of rollback; multiple Mini-transactionss can occur within a single transaction. Mini-transactionss write information to the redo log that is used during crash recovery. A Mini-transactions can also happen outside the context of a regular transaction, for example during purge processing by background threads.


MTR is a short atomic operation and cannot be rolled back because it is atomic. The change of data page must pass through MTR, which will record the modification of data page by DML operation in redo log.

Let’s take a brief look at the process of MTR

  • When MTR is initialized, an MTR is initialized_ buf
  • When modifying data, while modifying the pages in the memory buffer pool, a redo log record is generated and saved in the MTR_ In buf
  • When executing MTR_ When the commit function submits this MTR, it will_ The redo log record in the buf is updated to the redo log buffer, and the dirty pages are added to the flush list for subsequent flushing. In the log buffer, every time a 496 byte log record is received, the set of log records is packaged with a 12 byte block header and a 4-byte block tailer to form a 512 byte log block, which is convenient to align 512 byte log blocks when disk brushing.

It can be seen that InnoDB modifies the buffer pool first and then writes the redo log buffer.

The process of recovering data

In any case, InnoDB attempts to perform a recovery operation when it starts. During the recovery process, redo log is required. If binlog is also enabled, binlog and undo log are also required. It is possible that the data has been written to the binlog, but the database will crash before the redo log swipes the disk (transaction is a feature of InnoDB engine, and modifying the data does not necessarily commit, while binlog is a feature of MySQL service layer, and modifying the data will be recorded). At this time, the participation of redo log, binlog and undo log is needed to determine whether there are uncommitted transactions, The uncommitted transaction is rolled back or committed.

Let’s talk about the process of recovering data only by using redo log

  • When starting InnoDB, find the location of the latest checkpoint, and use checkpoint LSN to find redo log larger than the LSN for log recovery.
  • If the intermediate recovery fails, it doesn’t affect it. When recovering again, it will continue to recover from the last saved checkpoint.

Recovery process:Failure recovery includes three stages: analysis, redo and undo. The main task of the analysis phase is to confirm the operation range of the subsequent redo and undo phases by using the information in checkpoint and log, correct the dirty page set information recorded in checkpoint through log, and use the minimum LSN involved as the starting position of the next redo, redolsn. At the same time, the active transaction set (uncommitted transaction) recorded in checkpoint is modified as the rollback object of undo process; In the redo phase, starting from the redolsn obtained from analysis, all the redo contents in the log are replayed. Note that uncommitted transactions are also included here; Finally, in the undo phase, all uncommitted transactions are rolled back by using the undo information. Through the prevlsn of the log, the changes that need to be rolled back can be found sequentially.


What is LSN?

LSN, also known as log sequence number, is a monotonically increasing 64 bit unsigned integer. Both redo log and data page hold LSN, which can be used as the basis of data recovery. Larger LSN means that the changes described by the referenced log record occur later.

What is checkpoint?

Checkpoint represents a save point, and the modification of the data page before this point (log LSN < checkpoint LSN) has been written to the disk file. InnoDB records checkpoint after each disk swipe, and records the latest redo log LSN in checkpoint LSN, which is convenient for judging the starting point when recovering data.

Copyright notice

Please indicate the author and the source of the article
Author: Mr. X

If you think it’s good, please help collect it~