B-tree and B + tree

Time:2021-7-23

B-tree and B + tree

[TOC]

reference resources:Detailed explanation of B-tree and B + tree

preface

B + tree is a kind of storage structure, which is often used in database indexing.

Prepare knowledge

M-order tree: the maximum bifurcation of the tree is m, that is, the maximum number of child nodes is m;

Root node: node without parent node;

Leaf node: node without child node;

Internal node: a node that is not a root or leaf node;

Binary search tree: the value in the left subtree is greater than the value of the root nodeSmall, the value in the right subtree is greater than the value of the root nodelarge

Balanced binary tree: special case of binary search tree, left and right subtreeheightThe same;

1、 B-tree

B number is equilibriummanyFork tree, a B-tree of order m, has the following characteristics:

  1. The internal node must have at least ceil (M / 2) child nodes;
  2. When the root node is not a leaf node, it must have at least two child nodes, that is, order 2;
  3. Nodes of order m contain M-1 data
  4. All leaf nodes have the same height;

B-tree and B + tree

2、 B + tree

B + tree is a variant of B tree. Some changes have been made to its rules. The changes are as follows:

  1. The leaf node consists of an ordered array and a pointer to the right leaf node;
  2. Non leaf nodes consist of an ordered array, but array elements consist of aIndex valueIt consists of a pointer;

    1. Pointer: points to a leaf node;
    2. Index value: the smallest index value in the leaf node pointed to;
  3. The non leaf node is a tool node, which is used to quickly find the specified leaf node. Only the leaf node stores the real data (a row of data);
  4. leafNodesSimilar to an ordered linked list;
  5. The node of order m contains M data;

B-tree and B + tree

2.1 why is B + tree suitable for database?

  1. B + tree is convenient for range query, which is the most important.

You just need to find the leftmost range. After finding it, you can traverse the leaf node to the right until you meet the end of the right range. In this way, you can screen out all the data in the range.

B tree’s scope search uses middle order traversal, while B + tree uses traversal on linked list;

  1. The cost of disk reading and writing of B + tree is lower.

The internal nodes of the B + tree do not have pointers to the specific information of keywords. Therefore, its internal nodes are smaller than B-tree. If all the keywords of the same internal node are stored in the same disk, the more keywords the disk can hold. The more keywords that need to be searched are read into memory at one time. Relatively speaking, the number of IO reads and writes is reduced;

  1. The query efficiency of B + tree is more stable

Because the non endpoint is not the node that finally points to the file content, but the index of keywords in the leaf node. So any keyword search must take a path from root node to leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency of each data;