What are the advantages of MySQL using B + tree index

Time:2022-5-9

Before understanding this problem, let’s first take a look at the storage structure of MySQL table, and then compare the differences between binary tree, multi tree, B tree and B + tree.

Storage structure of MySQL

Table storage structure

Unit: Table > segment > area > Page > row

In the database, no matter reading one row or multiple rows, the page where these rows are located is loaded. That is, the basic unit of storage space is page.
A page is the node of a tree B + tree. The minimum unit of database I / O operation is the page. The contents related to the database will be stored in the page structure.

B + tree index structure

  1. In a B + tree, each node is a page. Each time a new node is created, a page space will be applied for
  2. The nodes in the same layer are, and a two-way linked list is formed through the structure of pages
  3. The non leaf node is, which includes multiple index lines. Each index line stores index keys and pointers to the next page
  4. The leaf node is, which stores keywords and row records. There is a one-way linked list between the records inside the node (that is, inside the page structure)

B + tree page node structure

It has the following characteristics

  1. Divide all records into several groups, and each group will store multiple records,
  2. The page directory stores slots, which are equivalent to the index of grouped records. Each slot pointer points to the last record in different groups
  3. We locate the group through the slot, and then view the records in the group

The main function of page is to store records. Records in the page are stored in the form of single linked list.
The advantage of single linked list is that it is easy to insert and delete, but the disadvantage is that the retrieval efficiency is not high. In the worst case, it needs to traverse all nodes of the linked list. Therefore, a binary search method is provided in the page directory to improve the retrieval efficiency of records.

Retrieval process of B + tree

Let’s look at the retrieval process of B + tree

  1. Start from the root of the B + tree and find the leaf node layer by layer.
  2. Find the leaf node as the corresponding data page, load the data leaf into memory, and find a rough record group by binary search through the slot of page directory.
  3. In the grouping, the records are searched through linked list traversal.

Why use B + tree index

The database accesses data through pages. A page is a B + tree node. Accessing a node is equivalent to an I / O operation. Therefore, the faster you can find the node, the better the search performance.
The characteristic of B + tree is that it is short and fat enough, which can effectively reduce the number of visits to nodes, so as to improve the performance.

Next, let’s compare a binary tree, a multitree, a B tree and a B + tree.

Binary tree

Binary tree is a binary search tree with good search performance, which is equivalent to binary search.
But when n is large, the depth of the tree is high. The time of data query mainly depends on the number of disk IO. The deeper the binary tree is, the more times it is searched, and the worse the performance is.
In the worst case, it degenerates into a linked list, as shown below

In order to prevent the binary tree from degenerating into a linked list, people invented AVL tree (balanced binary search tree): the height difference between the left subtree and the right subtree of any node is up to 1

Multitree

A multi tree can have m nodes, which can effectively reduce the height. When the height becomes smaller, there will be fewer nodes and less I / O. the performance is better than that of a binary tree

B tree

B-tree is simply a multi fork tree. Each leaf stores data and a pointer to the next node.

For example, to find 9, the steps are as follows

  1. We compare it with the keyword (17, 35) of the root node. If 9 is less than 17, we get the pointer P1;
  2. Find disk block 2 according to pointer P1. The keyword is (8, 12). Because 9 is between 8 and 12, we get pointer P2;
  3. Find disk block 6 according to pointer P2, and the keyword is (9, 10), and then we find keyword 9.

B + tree

B + tree is an improvement of B tree. In short, only leaf nodes can store data, and non leaf nodes are stored pointers; All leaf nodes form an ordered linked list

The internal node of the B + tree does not have a pointer to the specific information of the keyword, so its internal node is smaller than that of the B tree. If all the keywords of the same internal node are stored in the same disk block, the more keywords the disk block can hold, the more keywords to be searched for when reading into memory at one time, and the relative IO reading and writing times are reduced

For example, to find keyword 16, the steps are as follows

  1. Compare with the keyword (1, 18, 35) of the root node, 16 is between 1 and 18, and get the pointer P1 (pointing to disk block 2)
  2. Find disk block 2 with the keyword (1, 8, 14). Because 16 is greater than 14, you get pointer P3 (pointing to disk block 7)
  3. Find disk block 7 with the keyword (14, 16, 17), and then we find keyword 16, so we can find the data corresponding to keyword 16.

The difference between B + tree and B tree:

  1. The non leaf node of B + tree does not exist, and only the index is stored. The non leaf node of B tree stores data
  2. B + tree query is more efficient. The B + tree uses a two-way linked list to connect all leaf nodes, which makes the interval query more efficient (because all data are in the leaf nodes of the B + tree, you only need to scan the leaf nodes once to scan the database), but the B tree needs to be traversed in order to complete the search of the query range.
  3. B + tree query efficiency is more stable. The B + tree must query the leaf node every time to find the data, and the data queried by the B + tree may not be in the leaf node or in the leaf node, which will lead to the instability of query efficiency
  4. The disk read and write cost of B + tree is less. The internal node of the B + tree does not point to the specific information of the keyword, so its internal node is smaller than that of the B + tree. Generally, the B + tree is shorter and fatter, and the height is small. The query generates less I / O.

This is why MySQL uses the B + tree. It’s so simple!

The above is the details of the advantages of MySQL using B + tree index. For more information about MySQL using B + tree index, please pay attention to other relevant articles of developeppaer!