It’s design — why does MySQL use B + trees?



I believe every background development engineer has been asked “what is the default storage engine of MySQL? What data structure is MySQL index? ” Such a problem. I believe everyone who is well prepared (familiar with the eight part essay) can easily answer “the default storage engine of MySQL is InnoDB, and the MySQL index uses B + tree.” Such an answer. But why did the programmer uncle who wrote MySQL design like this?


First of all, it should be clear that MySQL is not directly related to the B + tree. What is really related to the B + tree is MySQL’s default storage engine InnoDB. The main role of the storage engine in MySQL is to store and extract data. In addition to InnoDB, MySQL also supports engines such as MyISAM as the underlying storage engine of tables. However, whether InnoDB or MyISAM, the data structure of the index is a B + tree. Only InnoDB adopts cluster index, and the actual data is on the leaf node of B + tree; MyISAM will create a separate index for the primary key of the table, and the leaf node will save the pointer to the actual data.

Next, let’s discuss why MySQL uses B + trees?

1、 Start with disk I / O

1. Basic concept of disk

Let’s go back to the time when the programmer uncle designed mysql. When it comes to storage media, we can think of two types: disk and SSD. SSD hard disk must be delicious, but it’s also expensive, and the database should support the storage of T data. In 2021, think about this way, it’s too expensive. Uncle, you’d better use the disk ~

It's design -- why does MySQL use B + trees?

The traditional hard disk structure is shown in the figure above. It has one or more disks. Each disk can have two sides for storing data. There is a spindle in the middle, and all discs rotate around the spindle. A combined arm has multiple head arms, and each head arm has a head, which is responsible for reading and writing data.

It's design -- why does MySQL use B + trees?

As shown in the figure above, the disk surface of each disk is divided into multiple narrow concentric rings, and the data is stored on the concentric rings as shown in the figure above. We call such rings tracks. Depending on the specifications of the hard disk, the number of tracks can range from hundreds to thousands. Each track can store several KB of data, but the computer doesn’t have to read and write so much data every time.

Therefore, each track is divided into several arc segments, and each arc segment is a sector. Sector is the physical unit stored on the hard disk. Now it has become an agreement in the industry that each sector can store 512 bytes of data. In other words, even if the computer only needs one byte of data, it has to read all the 512 bytes of data into the memory, and then select the required byte.

It's design -- why does MySQL use B + trees?

Cylinder is an abstract logical concept. In short, tracks in the same vertical region are called cylinder. As shown in the figure above, on each panelCollection of tracks at the same locationBelong to the same cylinder.

It should be noted that the disk reads and writes data according to the cylinder. When the magnetic head reads and writes data, it starts from the starting data cylinder of the disk. After reading, it operates on different disk surfaces of the same cylinder in turn. Only after all the heads on the same cylinder have finished reading and writing, the head will move to the next cylinder. The reason for the design of reading and writing is that selecting the magnetic head (from which magnetic head the data is obtained) only needs to be switched electronically, while selecting the cylindrical surface must be switched mechanically (moving the magnetic head position). The speed of mechanical switching must be much lower than that of electronic switching. In order to read and write faster, data is read and written according to the cylinder, not according to the disk. Because of this, it is valuable to store data on the same cylinder.

According to the above information, we can get the calculation formula of disk capacity as follows:

Hard disk capacity = number of disks ×  Number of cylinders ×  Number of sectors ×  512 bytes

2. Disk read / write

CHS (cylinder head sector) is adopted in modern hard disk seek. We can divide the disk read-write data into three parts.

  • When the hard disk reads data, the read-write head moves radially above the track where the sector to be read is located. This period of mechanical switching is calledSeek time。 Due to the different distance between the starting position of the read-write head and the target position, the seek time is also different.
  • After the magnetic head reaches the specified track, the sector to be read is turned to the lower part of the read-write head through the rotation of the disk. This period of time is calledRotational latency
  • Reading and writing data also takes time, which is calledTransfer time

Through the introduction, we can easily understandSeek timeandRotation delay timeIt takes a lot of time. Because the goal of computer is higher, faster and stronger. The database depends on computer storage, so when designing the data structure, uncle MySQL must also consider the characteristics of disk reading and writing to design a data structure that makes the query faster.

3. Continuous I / O vs random I / O

As we all know, the functions of database software such as MySQL are actually divided into saving data and querying data. Query data depends on saving data, and the way of saving data must also affect the speed of query. Because the data is stored on the disk, the computer memory must deal with the disk, and this process is accompanied by disk I / O. We can divide disk I / O into the following two types according to the way of querying disks:

  • Continuous I / O: the initial sector address given by this I / O and the end sector address of the previous I / O are completely continuous or not much apart.
  • Random I / O: if the initial sector address given by this I / O is very different from the end sector of the previous I / O, it is counted as a random I / O.

Because when doing continuous I / O, the magnetic head hardly needs to change lanes, or the time of changing lanes is very short; For random I / O, if there are many I / OS, the magnetic head will change lanes continuously, resulting in a great reduction in efficiency. This is why continuous I / O is more efficient than random I / O.

Because reading and writing depend on storage, and queries often have conditions, resulting in discontinuous data. So MySQL uncles thought, can we design a storage method to avoid random I / O or reduce the number of random I / O to improve query efficiency?

2、 Faster lookup – tree

As a programmer, the term “tree” must be well known (what? You don’t know? Face the wall). The data structure of tree is often involved in algorithm problems. The types of trees are as follows:

  • Binary (search / sort) tree: BST
  • Balanced binary lookup tree: BBSt
  • Red black tree: BRT
  • B-tree (also called B-tree)
  • B + tree
  • B * Tree
  • R tree

This article will not introduce the characteristics of various trees again. Later, an article will be opened to introduce these trees and their characteristics in detail. Because we are an article suitable for all ages, let’s start with the principle of tree search

1. Search Binary (search / sort) tree

Everyone has heard of a binary tree. It is generally a root node. Under the root node, there is a left child node and a right child node. The left and right child nodes can be used as the root nodes of the subtree. If you add a little bit of requirements on this basis, it becomes a binary search tree (BST). Binary search tree is defined as follows:

  • If its left subtree is not empty, the values of all nodes on the left subtree are less than the values of its root node;
  • If its right subtree is not empty, the values of all nodes on the right subtree are greater than those of its root node;
  • Its left and right subtrees are also binary search trees.

It's design -- why does MySQL use B + trees?

From the above figure, we can see that the value of any node in the left subtree of root node 5 is less than 5, and the value of all nodes on the right subtree of root node 5 is greater than 5, and we take 2 or 7 as the root node, we can still draw the conclusion that “the value of all nodes on the left subtree is less than the value of its root node, and the value of all nodes on the right subtree is greater than the value of its root node”.

Because the binary search tree has such characteristics, suppose we find a data 3. Our algorithm path is:

  • 3 is smaller than root node 5. Compare the left subtree. The temporary root node is determined as root node 2 of the left subtree;
  • 3 is larger than root node 2. Compare the right subtree, and the temporary root node is determined as root node 3 of the right subtree;
  • 3 is equal to root node 3. Find the target data.

From the above query path, we can find that we do not need to traverse all nodes, and searching through the binary search tree does not consume additional space. Compared with traversal search, the efficiency of finding a specific value in this way is greatly optimized. And you know, it’s not just about comparing numbers. Because Unicode, ASCII, UTF-8 and so on, these computer codes will make characters comparable. If we look up a character or number, according to this method, we can greatly shorten the query time.

2. B-tree (B-tree)

Although binary search tree can optimize queries, have you found a problem. The database needs to be able to process tens of millions of data. When the amount of data becomes particularly large, if we still use the binary search tree to store data, the binary search tree will become very, very high. In addition, when storing data, we generally store it in sequence, that is, perform sequential I / O for one write. However, when you want to query, the data you are looking for may not be orderly, so it will be accompanied by random I / O, and the data will be read into memory for calculation and comparison. Because a downward lookup of a binary tree is often a random I / O. if the tree is too high, there will be too many random I / OS and the query efficiency will be reduced.

At this time, the smart little friend is thinking, if you turn this tree into “pudgy” and reduce its random I / O, will you be able to speed up the query!

Therefore, our B-tree appears brightly. At the same time, B-tree is also called B-tree (not to mention B + tree, fall). For a B-tree of order m (in which a sub tree has at most M sub nodes), compared with binary search tree, its definition is as follows:

  • Each node has at most M child nodes.
  • Each non leaf node (except the root) has at least ceil (M / 2) child nodes( At least 2 child nodes for order 3 and at least 3 nodes for order 5…)
  • If the root is not a leaf node, the root has at least two child nodes( Level 2 (at least 2 child nodes)
  • Non leaf nodes with K children contain K – 1 keys( If he has k sons, his node has k-1 identifiers)
  • All leaf nodes are on the same layer, and the leaf node has only keywords, and the pointer to the child is null

Note: ceil is the further one of division. For example, if the result of 7 / 6 is 1 + 1, then our output result is 2.

It's design -- why does MySQL use B + trees?

As shown in the figure above, this is a 4th order B-tree. If we want to find data 19, we have the following path:

  • 19 < 24, because the root node has only one key 24, the left subtree A is directly compared;
  • Judge whether 19 < 5 is true, the result is not true, and subtree B is not considered;
  • Judge whether 5 < 19 < 13 is true. If the result is not true, subtree C will not be considered;
  • Judge whether 13 < 19 < 17 is true. If the result is not true, subtree D will not be considered;
  • Judge whether 17 < 19 is true. If the result is true, consider subtree E;
  • Because subtree e is a leaf node, its child node is null. Judge whether the data 19 exists in the leaf node e, the result is yes, and find the data 19. If 19 does not exist in leaf node e, the searched data does not exist.

You can find that through the B-tree, MySQL can plug more data into the tree on the premise that the tree is “short and fat”, and can enjoy the advantages of improving the query efficiency of binary tree. If we use the B-tree as the index, the actual data corresponding to the destination key is stored in each node.

3. B + tree

Since the B-tree can make MySQL query faster, why doesn’t MySQL use the B-tree as the index data structure? This is because our B + tree is an advanced version of B tree ~ (increase quantity without price, it’s OK to use it). Compared with B tree, the definition of B + tree is as follows:

  • The leaf node contains all keyword information and the real data information corresponding to these keywords. The keywords of leaf nodes are also incrementally linked. The left end data will save the pointer of the right node start data.
  • All non leaf nodes can be regarded as index parts. A non leaf node contains only the largest or smallest keywords in its subtree. And does not point to specific information.
  • The intermediate node with K subtrees contains k elements (k-1 elements in the B tree). Each element does not save data, but is only used for index, and all data is saved in the leaf node.
  • The maximum element of the root node is equal to the maximum element of the whole B + tree. In the future, no matter how many elements are inserted or deleted, the maximum element is always kept in the root node.

It's design -- why does MySQL use B + trees?

As shown in the figure above, this is a B + tree. Through this design, we can find that:

  • A single node stores more elements, so that we can make the tree more short and fat, making the query IO times less;
  • Because the data in the whole tree appears in the lowest leaf node, and the target data information is stored only in the leaf node, the query performance is more stable;
  • In the leaf node, the left leaf node points to the right leaf node through the pointer. Then all leaf nodes form an ordered linked list to facilitate range query.

4. Why not use hash

According to the above introduction, if the B + tree is used as MySQL data storage, the time complexity will be o (log n), that is, the height of the tree. However, if we query a specific data in the way of hash, the time complexity may reach o (1). So why don’t MySQL uncles consider such a design? We can see the following SQL:

SELECT * FROM class WHERE teacher = 'yuann' ORDER BY id DESC
SELECT * FROM class WHERE student_number > 50

The above two SQL involve sorting and range query. We know that hash obtains the target data through hash calculation, and the calculation result is often a point. Obviously, there is no way to quickly process sorting and range queries using an index composed of hash. The query will fall back to the full table scan and judge whether the conditions are met in turn. Obviously, full table scanning is a bad situation, so MySQL uncles don’t use hash as index.

3、 Summary: why does MySQL index use B + tree

  • There are three stages for the computer to read and write hard disk data: head seek, disk rotation and data transmission. Steps 1 and 2 are particularly time-consuming. Therefore, when reading and writing information, try to reduce the number of movements of the magnetic head, which can reduce a lot of time. Each time the head moves, it also corresponds to each time the class B tree looks down for child nodes. Because class B trees have a structure with multiple keywords under the same node, the height of the tree can be reduced, and the query efficiency can be improved.
  • Because there are leaf nodes in the data of B + tree, the query efficiency is more stable than that of B tree.
  • For the database, range query and sorting are very frequent. Compared with B tree, B + tree traversal only needs to traverse leaf nodes, and range query reduces random I / O. At the same time, hash processing range query and sorting will fall back to full table scanning, which will be very inefficient.

It's design -- why does MySQL use B + trees?

This article adoptsKnowledge sharing Attribution – non-commercial use – sharing in the same way 4.0 international license agreementLicense. When reprinting, please indicate the original link. When using the picture, please keep all the contents in the picture, which can be scaled appropriately. Attach the article link where the picture is located at the quotation, and the picture is drawn with sigma.

Original author:yuann

Original link:It’s design — why does MySQL index use B + tree?

Release date: April 15, 2021

Recommended Today

Perl beginner’s notes: Hello World

Work needs to start learning Perl, download a window version (5.16): download link installation, write the first Perl program Copy codeThe code is as follows: #!/usr/bin/perl  print “Hello,World!\n”; Running result: (very much like Python) Copy codeThe code is as follows: c:\Perl>perl  Hello,World! Then I learned: cpan, although I don’t know what this guy […]