Data structure tree (3): multiway search tree B tree, B + tree


Multiway search tree

  1. Height of complete binary tree: O (log2n), where 2 is logarithm
  2. The height of the complete m-path search tree: O (logmn), where m is the logarithm and the number of nodes in each layer of the tree
  3. M-path search tree is mainly used to solve the problem of data storage which can not be loaded into memory. By increasing the number of nodes in each layer and storing more data in each node, more data can be stored in one layer, so as to reduce the height of the tree, and reduce the number of disk accesses during data searching.
  4. So the more nodes in each layer and keywords each node contains, the lower the height of the tree. But the slower it is to determine the data in each node, but B-tree focuses on the disk performance bottleneck, so the cost of searching data in a single node can be ignored.

B tree

B-tree is a kind of m-path search tree. B-tree is mainly used to solve the problem that the imbalance of m-path search tree causes the height of the tree to become higher, just like the performance problem caused by the degradation of binary tree into linked list. B-tree ensures the balance of m-path search tree by controlling and adjusting the nodes of each layer, such as node separation, node merging, and adding new layers by splitting up the parent nodes when one layer is full. The specific rules are as follows:

  1. The number of son trees of the root node is between 2 and m, and that of other non leaf nodes is between M / 2 and m. If the number of son trees exceeds m due to splitting, then it is necessary to recursively split the parent node upward, and stop splitting when a parent node that does not need to be split again is found. This splitting process is until the root node. If the root node needs to be split, two roots will be generated. Therefore, a new root needs to be created to take these two roots as the son nodes. At this time, the height of the tree will increase by 1.
  2. The key value of each non leaf node increases from left to right, and the ith key represents the smallest key in the subtree I + 1; (for the root node, I is between 1 and (2 to m), and for other non leaf nodes, I is between 1 and (M / 2 to M));
  3. All data items in the B-tree are stored in the leaf node. Non leaf nodes do not store data. Non leaf nodes only store keywords that indicate the search direction, that is, indexes. In this way, more non leaf nodes can be loaded into memory, which is convenient for data searching;
  4. All leaf nodes are at the same depth and each leaf node contains L / 2 to l item data.

Size selection of M and l

  1. M is the order or path of B tree
  2. L is the maximum number of data items stored in each leaf node
  3. In the B tree, each node is a disk block, so m and l need to be determined according to the size of the disk block.

Disk block size and M calculation

  1. Each non leaf node stores keywords and pointers to the son tree. The specific number is: M-level B tree. Each non leaf node stores M-1 keywords and M pointers to the son tree. Therefore, the size of each keyword is 8 bytes (for example, Java’s long type is 8 bytes), and each pointer is 4 bytes. Then each non leaf node of M-level B tree needs: 8 * (m-1)+ 4 * m = 12m – 8 bytes.
  2. If it is specified that each non leaf node (disk block) occupies no more than 8K of memory, i.e. 8192, the maximum m is 683, i.e. 683 * 12-8 = 8192.

Number of leaf node data items L

  1. If the size of each data item is also 256 bytes, because the disk block size is 8K, that is, 8192 bytes, and each leaf node can store L / 2 to l data items, so each leaf node can store at most: 8192 / 256 = 32 data items, that is, the size of L is 32.
  2. The structure of a 5-order B-tree is as follows, i.e. m and L are equal to 5: each non leaf node contains up to 4 keywords (m-1 = 5-1 = 4), including m, i.e. 5 pointers to the subtree. If l is equal to 5, each leaf node can store up to 5 data items.


B+ tree

The structure of the B + tree is basically the same as that of the B tree. The only difference is that the leaf nodes of the B + tree are connected by pointers to form a linked list. Therefore, it is convenient to traverse all the leaf nodes, that is, to obtain all or search all data items within a certain range of keywords. MySQL’s InnoDB storage engine uses B + trees as indexes.

The above is the multi-channel search tree B tree and B + tree detailed explanation and integration introduced by Xiaobian to you, hoping to help you. If you have any questions, please leave a message to me, and Xiaobian will reply to you in time. Thank you very much for your support of the developepaer website!

Recommended Today

“Self test” stay up late to summarize 50 Vue knowledge points, all of which will make you God!!!

preface Hello everyone, I’m Lin Sanxin. A lot of things have happened these days (I won’t say what’s specific). These things have scared me to treasure my collection these yearsVue knowledge pointsI took out my notes and tried my best to recall them. Finally, I realized these 50Knowledge points(let’s not be too vulgar. It’s not […]