MySQL architecture


MySQL basic architecture (the execution process of an SQL statement)

MySQL can be divided intoServer layerandStorage engine tier.
The server layer includes connectors, query caches, analyzers, optimizers, executors, etc. all functions across storage engines are implemented in this layer.
The storage engine layer is responsible for data storage and extraction. Its architecture mode is plug-in and supports InnoDB (the default storage engine after 5.5), MyISAM, memory and other storage engines.

Connector: it is responsible for user login to the database and user identity authentication, including verifying account password, authority and other operations.

Query Cache: after the client establishes a connection with the server, MySQL will query the cache first when executing the query statement to verify whether the SQL has been executed before. Previously executed statements and their results will be directly cached in memory in the form of key value pairs. (it is not recommended to use query cache. Query cache invalidation is very frequent. As long as a table is updated, all query caches on the table will be cleared. This function has been deleted in MySQL 8.0)

Analyzer: analyze the function of SQL statements, which is mainly divided into the following two steps
1. Lexical analysis and keyword extraction
2. Syntax analysis to judge whether the SQL statement is correct

Optimizer: automatically optimize SQL statements, modify the form of the syntax analysis tree, change the syntax analysis tree into a query tree, and determine the execution scheme.

Executor: executes statements. First, check whether the user has the permission to execute the query. If he has the permission, he will call the engine interface and return the execution result of the interface.

Storage engine InnoDB, MyISAM

There are several common storage engines:
InnoDB engine: the InnoDB engine provides support for database acid transactions. Row level locks and foreign key constraints are also provided. The goal of its design is to deal with the database system with large data capacity.
Myiasm engine: it does not support transactions, row level locks and foreign keys.
Memory engine: all data is in memory. The data processing speed is fast, but the security is not high.

Differences between MyISAM and InnoDB

About full-text indexing: MySQL full-text indexing only supports English before 5.7. After 5.7, an NGram full-text retrieval plug-in is built to support Chinese word segmentation, which is effective for both MyISAM and InnoDB engines.
The author still has questions about how to do it, and it is difficult to give an accurate answer. However, in more cases, we prefer to use elasticsearch and other tools to realize full-text retrieval.

Differences in index implementation under InnoDB and MyISAM

MyISAM index implementation
MyISAM engine uses B + tree as the index structure, and the data field of leaf node stores the address of data record.
MyISAM assigns row numbers in the order of data insertion, starting from 0, and then stores them on disk in the order of data insertion. Because the row is fixed length, you can skip the corresponding bytes from the beginning of the table to find the required row.
The index file of MyISAM only saves the line number of the data record, and then queries the required data back to the table through this line number

In MyISAM, there is no structural difference between the primary index and the secondary index, except that the key of the primary index is required to be unique, and the key of the secondary index can be repeated.

Therefore, the index retrieval algorithm in MyISAM is to first search the index according to the B + tree search algorithm. If the specified key exists, take out the value of its data field, and then take the value of the data field as the address to read the corresponding data record. The index mode of MyISAM is that the index and data storage are separated and non clustered.

InnoDB index implementation
Although InnoDB also uses B + tree as the index structure, the specific implementation is quite different from MyISAM. In InnoDB, the table data file itself is an index structure organized by B + tree. The leaf node data field of the tree saves complete data records. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.
Each leaf node of the cluster index contains the primary key value, transaction ID, rollback pointer for transaction and mvcc, and all remaining columns.
The leaf node of InnoDB’s secondary index stores not the row number (row pointer), but the primary key column. The disadvantage of this strategy is that the secondary index needs two index lookups, the first time to find the primary key in the secondary index, and the second time to find the required data rows through the primary key in the cluster index.

To sum up, both include the implementation of non clustered indexes, but the InnoDB engine supports clustered indexes.

Why do InnoDB and MyISAM engines use B + tree as index

Comparison between B-tree and B + tree:

  1. B tree is only suitable for random retrieval, while B + tree supports both random retrieval and sequential retrieval
  2. B + tree space utilization is higher, which can reduce I / O times and lower disk read-write cost.
    Generally speaking, the index itself is too large to be stored in memory, so the index is often stored on disk in the form of index file. In this way, disk I / O consumption will occur in the process of index lookup.
    The internal node of the B + tree does not have a pointer to the specific information of the keyword, but is only used as an index. Its internal node is smaller than that of the B tree. The disk block can accommodate more keywords in the node, and the more keywords can be found in the memory at one time. Relatively, the IO reading and writing times are reduced. The number of IO reads and writes is the biggest factor affecting the efficiency of index retrieval.
  3. The query efficiency of B + tree is more stable
    B-tree search may end at non leaf nodes. The closer to the root node, the shorter the search time of records. As long as the keyword is found, the existence of records can be determined. Its performance is equivalent to a binary search in the complete set of keywords.
    In B + tree, sequential retrieval is obvious. In random retrieval, the search of any keyword must go from root node to leaf node. The search path length of all keywords is the same, resulting in the same query efficiency of each keyword.
  4. B-tree not only improves the disk IO performance, but also does not solve the problem of low efficiency of element traversal.
    The leaf nodes of the B + tree are connected together in sequence by pointers. As long as the leaf nodes are traversed, the traversal of the whole tree can be realized, but the B tree does not support such an operation.
  5. When adding and deleting files (nodes), the B + tree is more efficient.
    Because the leaf node of B + tree contains all keywords and is stored in an orderly linked list structure, it can improve the efficiency of addition and deletion.

Comparison between B + tree and hash index:
The bottom implementation of B + tree is multi-channel balanced lookup tree, and the bottom of hash index is hash table.

  1. Hash index is faster to perform equivalent query, but it can not perform range query, and the equivalent query is unstable and the performance is unpredictable
    When a key value exists and repeats a lot, hash collision occurs, and the hash indexing efficiency may be very poor this time
    The query efficiency of B + tree is relatively stable. All queries are from root node to leaf node, and the height of the tree is low
  2. Hash index does not support sorting with index
  3. Hash index does not support fuzzy query and leftmost prefix matching of multi column index
  4. Hash index can not avoid back to table query at any time, and B + tree can complete query only through index when it meets some conditions

B + tree vs. binary tree:
Binary tree when the amount of data in the database is particularly large, the number of layers is also particularly large. The number of disk IO is determined by the height of the tree. The binary tree is not used to compress the height of the tree and reduce the number of disk IO.

Basic principles of indexing

The principle of index is very simple, which is to turn unordered data into ordered query
Sort the contents of the indexed column
Generate inverted table for sorting results
Spell the data address chain on the contents of the inverted table
When querying, first get the contents of the inverted table, and then take out the data address chain to get the specific data

InnoDB logical storage structure

From the logical storage structure of InnoDB, all data are logically stored in one space, which is called a tablespace

A table space consists of segments, extents, and pages
Pages are sometimes called blocks in some documents
Segment and area are logical concepts, and segment management is completed internally by InnoDB storage engine.

Table space: table space can be regarded as the highest level of the logical structure of InnoDB storage engine. All data is stored in table space
By default, InnoDB has a shared tablespace ibdata1, in which all data is stored

Segment: a table space is composed of multiple segments. Common segments include data segment, index segment, rollback segment, etc
InnoDB is organized by index, so data is index and index is data
The data segment is the leaf node of the B + tree
The index segment is the non leaf node of the B + tree

Area: an area is a space composed of consecutive pages. In any case, the size of each area is 1MB. In order to ensure the continuity of pages in a zone, the InnoDB storage engine requests 4 ~ 5 zones from the disk at a time, that is, there are 64 consecutive pages in a zone,

Row (page): the smallest unit of InnoDB storage disk management. The page size is 16kb by default. The records stored on each page are also hard defined. It is allowed to store up to 16kb / 2-200 rows of records, that is, 7992 rows of records

InnoDB crash recovery principle

reference material: following is only the author’s personal summary, and the specific implementation principle refers to the original text.

Basic knowledge

LSN: the amount of redo logs generated since the database was created. The larger this value is, the more the database is updated. It can also be understood as the update time. In addition, there is also an LSN on each data page, indicating the LSN when it was last modified. The larger the value, the later it was modified.
For example, the LSN of data page a is 100, the LSN of data page B is 200, the checkpoint LSN is 150, and the system LSN is 300, indicating that the current system has been updated to 300, and the data pages less than 150 have been brushed to the disk. Therefore, the latest data of data page a must be on the disk, while data page B may not be, and may still be in memory.

Redo log: Modern databases need to write redo logs. For example, to modify a piece of data, first write the redo log, and then write the data. After writing the redo log, it will directly return success to the client.
In this way, although the disk was written once more in the past, the performance has been greatly improved due to the conversion of random writes (write data) to sequential writes (write redo logs).
After the database hangs, you can scan the redo log to find the data pages that have not been flushed (the data pages may have been modified in memory before the crash, but there is no time to write to the disk) to ensure that the data is not lost.

Undo log: the database also provides a function similar to undo. When you find some wrong data to be modified, you can use the rollback instruction to rollback the previous operation. This function needs to be supported by Undo logs.
In addition, modern relational databases have implemented a mechanism similar to mvcc to improve concurrency (for the same record, the reading of different threads does not conflict, the reading and writing do not conflict, and only writing at the same time conflicts). In InnoDB, this also depends on undo logs.
In order to achieve unified management, unlike redo logs, undo logs have corresponding data pages in bufferpool, which are managed together with ordinary data pages. According to LRU rules, they will also be eliminated from memory and read from disk later. Like ordinary data pages, redo logs need to be written first to modify undo pages.

Checkpoint: in order to improve the performance of the database, the data page will not be brushed to the disk every time after memory modification.
The data pages before the checkpoint are guaranteed to be dropped, so the previous logs are useless (because the InnoDB redolog logs are recycled, this part of the logs can be overwritten). The data pages after the checkpoint may or may not be dropped,
Therefore, the log after checkpoint still needs to be used during crash recovery. InnoDB will regularly promote checkpoints according to the refresh of dirty pages, so as to reduce the time of database crash recovery. The checkpoint information is in the header of the first log file.

Crash recovery: the user has modified the data and received a successful message. However, for the database, the modified data may not be on the disk at this time. If the database hangs at this time, after restarting, the database needs to retrieve the modified data from the log and write it to the disk again to ensure that the user’s data is not lost.
This process of retrieving data from logs is the main task of crash recovery, which can also be called database roll forward.
Of course, in crash recovery, you also need to roll back uncommitted transactions and commit uncommitted transactions. Because the rollback operation needs the support of undo logs, and the integrity and reliability of undo logs need to be guaranteed by redo logs, the crash recovery first rolls forward redo, and then rolls back undo.

InnoDB crash recovery process

  1. Redo log roll forward database
    Rolling forward the database is mainly divided into two stages. The first is the log scanning stage, which is based on the space of the data page_ ID and page_ No distribute redo logs to hash_ Table to ensure that the logs of the same data page are distributed to the same hash bucket and sorted from small to large according to the LSN size.
    After scanning, the entire hash table is traversed, and the logs of each data page are applied in turn. After application, the state of the data page is restored to at least the state before the crash.

  2. Redo log parsing

  • First, through block_ The relationship between NO and LSN and the log checksum determine whether the last log is read. If the last log is read, it returns. (as mentioned earlier, even if the database is shut down normally, the crash recovery logic must be followed, so it will be returned here, because the checkpoint value of normal shutdown must point to the end of the log.)
    Otherwise, the log will be put into a recv_ In sys – > buf, some control information and checksum values are stored in the log header. They are only used for verification and positioning, and are not useful in real applications.
  • From recv_ Parse logs in sys – > buf. There are two types of logs: single_ Rec and multi_ Rec, the former means that only one operation is performed on one data page, and the latter means that multiple operations are performed on one or more data pages.
  • After parsing the corresponding log, press space_ ID and page_ No to hash (if the corresponding table space does not exist in memory, it means that the table has been deleted) and put it into the hash_ In the table (the real storage location of the log is still in the buffer pool), waiting for subsequent applications.
  1. Redo log application
    The main function is to traverse the hash_ Table, read from the disk, and apply the logs in the hash bucket for each data page in turn.
    After the redo roll forward database is executed, all data pages of the database have been in a consistent state, and the undo rollback database can be executed safely.
    Because there may be some uncommitted transactions or committed transactions when the database crashes, you need to decide whether to commit at this time.
    It is mainly divided into three steps. First, scan the undo log and re-establish the undo log linked list. Then, rebuild the transaction before the crash according to the linked list established in the previous step, that is, restore the transaction state at that time. Finally, rollback or commit is performed according to different transaction states.

  2. Undo log rollback database
    Step 1: traverse the entire undo log space. If an undo segment is found to be non empty, initialize the segment. If an undo slot is found to be non empty, initialize the slot. Then put different types of undo logs into different linked lists.
    Undo logs are mainly divided into two types: TRX_ UNDO_ Insert and TRX_ UNDO_ UPDATE。 The former is mainly used for insert operations, while the latter is used for update and delete operations.
    Undo logs are used for transaction rollback and mvcc snapshot reading.
    Due to TRX_ UNDO_ The insert data does not need to be provided to other threads, so TRX can be deleted as long as the transaction is committed_ UNDO_ Undo logs of type insert.
    TRX_ UNDO_ The update cannot be deleted after the transaction is committed. You need to ensure that there is no snapshot to use it before it can be cleaned up through the background purge thread.
    Step 2: in step 1, undo has been established in memory_ insert_ List and undo_ update_ List (each undo segment of the linked list is independent), so this step only needs to traverse all the linked lists to rebuild the transaction state.
    If the undo log status is TRX_ UNDO_ Active, the transaction status is TRX_ Active, if the status of undo log is TRX_ UNDO_ Prepared, the status of the transaction is TRX_ PREPARED。
    After the transaction is rebuilt, it is added to TRX according to the transaction ID_ sys->trx_ List.
    Step 3: count the total number of rows of data that need to be rolled back for all transactions that need to be rolled back (the transaction status is trx_active) and output it to the error log.
    The operation of the third step is called in two places. One is in recv_ recovery_ from_ checkpoint_ Finish, the other is in recv_ recovery_ rollback_ Active.
    The former mainly rolls back the operation on the data dictionary, that is, the operation of rolling back DDL statements, and the latter rolls back DML statements. The former must be completed before the database can provide services, and the latter can continue after the database provides services (i.e. crash recovery is completed) (processed by opening a new background thread trx_rollback_or clean_all_recovered).
    Because InnoDB believes that the data dictionary is the most important and must be rolled back to a consistent state. The data in the user table can be a little slower and can be recovered slowly after providing services to the outside world. Therefore, we often find that the database has been started, and then the rollback transaction information is constantly printed in the error log.

MySQL lock mechanism

MySQL locks can be divided into row locks, table locks and page locks according to lock granularity
Row lock:
Row level lock is the lock with the smallest locking granularity in mysql, which means that it is locked only for the rows of the current operation.
Row level locking can greatly reduce the conflict of database operations. Its locking granularity is the smallest, but the locking overhead is also the largest. Deadlock may occur.
Row level locks are divided into shared locks and exclusive locks according to the use mode.

A shared lock allows a transaction to read data and not modify data. If other transactions want to lock this row, only a shared lock can be added. That is, shared locks allow multiple threads to acquire a lock at the same time, and a lock can be owned by multiple threads at the same time.
An exclusive lock is a lock added when modifying data. Data can be read and modified. Once a transaction locks the data in this row, other transactions cannot add task locks to the data. That is, a lock can only be occupied by one thread at a certain time, and other threads must wait for the lock to be released before they can obtain the lock.

Table lock:
Table level locks are the most granular of MySQL locks, which means that the current operation locks the whole table. The resource overhead is less than row locks, and there will be no deadlock, but the probability of lock conflict is very high.
Supported by most MySQL engines, both MyISAM and InnoDB support table level locks, but InnoDB defaults to row level locks.

Page lock:
Page level lock is a kind of lock with locking granularity between row level lock and table level lock in MySQL. Table level locks are fast but have many conflicts, while row level locks have few conflicts but are slow.
Therefore, the compromised page level is taken to lock a group of adjacent records at a time. BDB supports page level locks.

Optimistic lock & pessimistic lock
The task of concurrency control in database management system (DBMS) is to ensure that when multiple transactions access the same data in the database at the same time, it does not destroy the isolation and unity of transactions and the unity of database.
Optimistic concurrency control (optimistic lock) and pessimistic concurrency control (pessimistic lock) are the main technical means of concurrency control
Optimistic concurrency control and pessimistic concurrency control should be narrowly understood as concepts in DBMS, and they should not be confused with the locking mechanisms provided in data (row lock, table lock, exclusive lock and shared lock). In fact, in DBMS, pessimistic locking is realized by using the locking mechanism provided by the database itself.

Pessimistic lock
Pessimistic concurrency control (pessimistic lock) is a method of concurrency control. It can prevent a transaction from modifying data in a way that affects other users.
If the operation performed by a transaction applies a lock to its row data, other transactions can perform operations that conflict with the lock only when the transaction releases the lock.
Pessimistic concurrency control is mainly used in the environment of fierce data contention and the environment where the cost of using locks to protect data in case of concurrency conflict is lower than the cost of rolling back transactions.

The specific process of locking is as follows:
Try to put an exclusive lock on any record before modifying it.
If locking fails, it indicates that the record is being modified. The current query may have to wait or throw an exception. The specific response method is determined by the developer according to the actual needs.
If you lock successfully, you can modify the record and unlock it after the transaction is completed.
If there are other operations to modify or add an exclusive lock to the record, we will wait for us to unlock or throw an exception directly.

Advantages and disadvantages of pessimistic lock:
Pessimistic locking actually adopts the strategy of “taking the lock first before accessing”, which provides a guarantee for the security of data processing.
However, in terms of efficiency, the additional locking mechanism generates additional overhead and increases the chance of deadlock.
And reduce the concurrency; When a transaction has a row of data, other things must wait for the transaction to be committed before operating this row of data.

Optimistic lock
Optimistic concurrency control (optimistic lock) is a method of concurrency control. It assumes that multi-user concurrent transactions will not affect each other during processing, and each transaction can process the affected part of data without locks.
Before committing data updates, each transaction will first check whether other transactions have modified the data after the transaction reads the data. If other transactions have updates, the committing transaction will be rolled back.

Compared with pessimistic lock, optimistic lock assumes that data will not cause conflict under normal circumstances, so it will formally detect the conflict of data when submitting and updating data. If a conflict is found, it will return the user’s error information and let the user decide how to do it.
Optimistic locking does not use the locking mechanism provided by the database when processing the database. The general way to implement optimistic locking is to record the data version.

Data version: a version ID added to the data. When reading the data, read out the value of the version ID together, and update the version ID every time the data is updated.
When we submit the update, we judge that the current version information of the corresponding record of the database table is compared with the version ID obtained for the first time. If the current version number of the database table is equal to the version ID value obtained for the first time, it will be updated, otherwise it will be considered as expired data.

Advantages and disadvantages of optimistic lock:
Optimistic concurrency control believes that the probability of data race between transactions is relatively small, so it should be done directly as far as possible and locked until the time of submission, so no locks and deadlocks will be generated.
However, if you simply do this directly, you may encounter unexpected results. For example, two transactions read a row in the database and write it back to the database after modification. At this time, you encounter a problem.

Characteristics of InnoDB lock

  1. When querying without index conditions, InnoDB does use table locks.
  2. Since the row lock of MySQL is a lock for the index, not for the record, although the records in different rows are accessed, if the same index key is used, the lock conflict will occur.
  3. When a table has multiple indexes, different transactions can use different indexes to lock different rows. In addition, InnoDB will use row locks to lock data whether using primary key indexes, unique indexes or ordinary indexes
  4. Even if the index field is used in the condition, whether to use the index to retrieve data is determined by MySQL by judging the cost of different execution plans. If MySQL thinks that full table scanning is more efficient, for example, for some small tables, it will not use the index
    In this case, InnoDB will use table locks instead of row locks.


The full name of mvcc is: multi version concurrency control, which provides concurrent access to the database and processes the memory read in the transaction to avoid the concurrency of write operations blocking read operations.
For a database that supports mvcc, when updating some data, it does not overwrite the old data with the new data, but marks the old data as outdated, and adds a data version elsewhere. Therefore, multiple versions of the same data are stored, but only one is the latest.
Mvcc provides a processing idea of time consistency. When reading transactions under mvcc, you usually use a timestamp or transaction ID to determine which state database and which version of data to access. Read transactions and write transactions are isolated from each other and will not affect each other. Assuming that the same data has both read transaction access and write transaction operation, in fact, the write transaction will create a new data version, while the read transaction accesses the old data version. The read transaction will not access the new data version until the write transaction is committed.
Mvcc has two implementations. The first implementation is to save multiple versions of data records in the database. When these different versions of data are no longer needed, the garbage collector will recycle these records. This method is adopted by PostgreSQL and Firebird / InterBase. SQL server uses a similar mechanism. The difference is that the old version data is not saved in the database, but in another database tempdb different from the main database. The second implementation method only saves the latest version of data in the database, but will dynamically reconstruct the old version of data when using undo. This method is used by Oracle and MySQL / InnoDB.

Mvcc implementation mechanism of InnoDB

Mvcc can be considered as a variant of row level lock. It can avoid locking operation in many cases, so the overhead is lower. Most mvcc implementations implement non blocking read operations, and write operations only lock the necessary rows. The mvcc implementation of InnoDB is realized by saving a snapshot of data at a certain point in time. No matter how long a transaction is executed, the data seen inside it is consistent. That is, transactions will not affect each other during execution. Let’s briefly describe the implementation of mvcc in InnoDB.

InnoDB’s mvcc is realized by saving two hidden columns behind each row of records: the creation time of the saved row and the expiration time (deletion time) of the saved row. Of course, the time here is not a timestamp, but the system version number. The system version number will increase every time a new transaction starts. At RR isolation level, mvcc operates as follows:

Select operation. InnoDB only finds data rows whose version is earlier than (including) the current transaction version. You can ensure that the rows read by the transaction are either records that exist before the transaction starts, or records inserted or modified by the transaction itself. The deleted version of the row is either undefined or greater than the current transaction version number. You can ensure that the rows read by the transaction are not deleted before the transaction starts.
Insert operation. Save the newly inserted row with the current version number as the row version number.
Delete operation. Save the deleted line with the current version number as the deletion ID.
Update operation. It is a combination of insert and delete operations. The current version number of the insert line is saved as the line version number, and delete saves the current version number to the original line as the deletion ID.
Because the old data is not really deleted, it must be cleaned up. InnoDB will start a background thread to clean up. The specific rule is to delete the rows whose deleted version number is less than the current system version. This process is called purge.

Recommended Today


Supervisor [note] Supervisor – H view supervisor command help Supervisorctl – H view supervisorctl command help Supervisorctl help view the action command of supervisorctl Supervisorctl help any action to view the use of this action 1. Introduction Supervisor is a process control system. Generally speaking, it can monitor your process. If the process exits abnormally, […]