B-Tree & B+Tree

Time:2021-3-7

B-Tree

The maximum number of child nodes in all nodes of B-tree becomes the order of B-tree, which is usually expressed by M. considering the efficiency of searching, it is generally required that M > = 3. An m-order B-tree is either an empty tree or an M-ary tree satisfying the following conditions.
1) Each node has a maximum of M branches (subtrees); and the minimum number of branches depends on whether it is a root node. If it is a root node and not a leaf node, there must be at least two branches, and non root and non leaf nodes have at least ceil (M / 2) branches, where ceil represents rounding up.
2) If a node has n-1 keywords, then the node has n branches. The N-1 keywords are arranged in ascending order.
3) The structure of each node is as follows:

n

k1

k2

kn

p0

p1

p2

pn

Where n is the number of keywords in the node; ki is the keyword of the node and satisfies ki < ki + 1; PI is the child node pointer of the node and satisfies that the keyword on the node indicated by pi is greater than Ki and less than ki + 1, the keyword on the node indicated by P0 is less than K1, and the key word on the node indicated by PN is greater than kn.

4) The keywords in the node are not equal to each other and are arranged from small to large.
5) The leaf node is in the same layer; it can be represented by a null pointer, which is the location where the search fails to arrive.

B-Tree & B+Tree

B+ Tree

B + tree is a variant of B tree, and it is also a multi-channel search tree

1. The number of sub tree pointers and keywords of non leaf nodes is the same;

2. The subtree pointer P [i] of non leaf node points to the subtree whose key value belongs to [K [i], K [i + 1]) (tree B is an open interval);

3. Add a chain pointer to all leaf nodes;

4. All keywords appear in leaf nodes;

B-Tree & B+Tree

Why B + tree is more suitable for database index than B tree

1、 The cost of disk reading and writing of B + tree is lower: the internal node of B + tree does not have a pointer to the specific information of keywords, so its internal node is smaller than that of B tree. If all the keywords of the same internal node are stored in the same disk block, the more key words the disk block can hold, and the more keywords need to be found to read into memory at one time, compared with IO read-write times It’s down.

2. The query efficiency of B + tree is more stable: the non endpoint is not the node that points to the content of the file, but the index of keywords in the leaf node. So any keyword search must take a path from root node to leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency of each data.

3. Because the data of B + tree are stored in leaf nodes, and the branch nodes are indexes, it is convenient to scan the database, and only need to scan the leaf nodes. But because the branch nodes of B tree also store data, we need to do a middle order traversal to scan the data in order to find the specific data, so B + tree is more suitable for interval query, so B + tree is usually used for data CuSO index.