### Binary search tree

concept

Binary search tree, also known as binary search tree, binary sort tree, has the following characteristics:

###### The values of all nodes in the left subtree are less than the root nodes

###### All nodes in the right subtree are larger than the root

###### The left and right subtree of the node itself is a binary search tree

###### In binary search tree, the result of order traversal is the node sequence of increasing order.

#### The operation cost of BST is analyzed

###### Search cost:

Any data search process needs to start from the root node, along a path to the leaf node. Therefore, the number of data comparison is closely related to the shape of the tree.

When the height of left and right subtrees of each node is approximately the same, the tree height is logn. The average search length is proportional to the logn, and the average time complexity is on the order of O (logn).

When the inserted keywords are in order, the BST degenerates into a single tree structure. The tree height is n. The average search length is (n + 1) / 2, and the average time complexity is O (n) order.

###### Insertion cost:

When the new node is inserted into the leaves of the tree, it is not necessary to change the organizational structure of the original nodes. The cost of inserting a node is exactly the same as that of finding a nonexistent data.

###### Delete cost:

When you delete a node P, you need to locate the node P first. This process requires a search cost. Then change the shape of the tree a little bit. If only one of the left and right subtrees of the deleted node exists, the cost of changing the shape is only O (1). If both the left and right subtrees of the deleted node exist, it is only necessary to The right leaf node of P is exchanged with P, and some left and right subtrees can be changed. Therefore, the time complexity of delete operation will not exceed o (logn).

###### BST efficiency summary:

Find the best time complexity O (logn), the worst time complexity O (n).

The algorithm of insertion and deletion is simple, and the time complexity is similar to that of search.

### Balanced binary search tree

In the worst case, binary search tree is as efficient as sequential search. It has been proved that when the storage data is large enough, the structure of the tree has a great influence on the efficiency of searching some keywords. Of course, the main reason for this is that the BST is not balanced (the height difference between the left and right subtrees is too big). In this case, we need to change the unbalanced tree into the balanced tree through a certain algorithm. Therefore, AVL tree was born.

#### Analysis of AVL operation cost

###### Search cost:

AVL is a strictly balanced BST (the balance factor is not more than 1). So the search process is the same as BST, but AVL will not have the worst-case BST (single tree). Therefore, the search efficiency is the best and the worst is O (logn) order of magnitude.

###### Insertion cost:

AVL must ensure strict balance (| BF | < = 1), so every time the data is inserted, the balance factor of some nodes in AVL exceeds 1, the rotation operation must be carried out. In fact, AVL only needs to rotate once (single rotation or double rotation) for each insertion operation. Therefore, in general, the cost of the insertion operation is still at the O (logn) level (the insertion node needs to find the insertion location first).

###### Delete cost:

The algorithm of AVL deleting node can refer to the deleting node of BST, but after deleting, we must check the balance factor of all nodes on the path from the deleted node to the root node. Therefore, the cost of deletion is slightly higher. Each delete operation needs at most o (logn) rotations. Therefore, the time complexity of deletion is O (logn) + O (logn) = O (2logn)

###### AVL efficiency summary:

The time complexity of the search is maintained at O (logn), and there is no worst case

AVL tree needs at most one rotation for each insert operation, and its time complexity is about O (logn).

AVL trees are more expensive when they are deleted, and the time complexity of each deletion operation needs o (2logn).

### Red black tree

The strict balancing strategy of binary balanced tree has a stable o (log n) search time complexity at the expense of establishing search structure (insert, delete operations). But is it worth it?

Can we find a compromise strategy that can ensure stable and efficient search efficiency without sacrificing too much cost of establishing search structure? The answer is: the red black tree.

#### The operation cost analysis of RBT is as follows

###### Search cost:

Because of the nature of the red black tree (the longest path length is not more than 2 times of the shortest path length), it shows that although the red black tree is not strictly balanced like AVL, its balance performance is better than BST. Its search cost is about O (logn), but in the worst case (the longest path is 2 times less than the shortest path), it is slightly inferior to AVL.

###### Insertion cost:

When RBT is inserted into a node, it needs rotation operation and color changing operation. But only need to ensure the basic balance of RBT. Therefore, the insertion node only needs two rotations at most, which is the same as the insertion operation of AVL. Although the color changing operation needs o (logn), the color changing operation is very simple and the cost is very small.

###### Delete cost:

The cost of RBT is much better than AVL, and it only needs three rotation operations to delete a node at most.

###### RBT efficiency summary:

The time complexity is O (logn) in the best case, but it is worse than AVL in the worst case, but it is also far better than BST.

The probability that the insertion and deletion operations change the balance of the tree is much less than that of AVL (RBT is not highly balanced). Therefore, the possibility of rotation operation is small, and once rotation is needed, inserting a node only needs to rotate 2 times at most, and deleting it only needs to rotate 3 times at most (less than the rotation times of AVL deletion operation). Although the time complexity of color changing operation is O (logn), in fact, the cost of this operation is very small due to its simplicity.

### B ~ tree / B + tree

For the search structure in memory, the efficiency of red black tree is very good (in fact, many practical applications also optimize RBT). But what if it’s a huge amount of data? It is impractical to put all these data into memory and organize them into RBT structure. In fact, like file directory storage in OS, file index structure storage in database It is impossible to build a search structure in memory. This structure must be built in disk. So in this context, is RBT still a good choice?

It is possible to read disk data once from any node to other nodes, and then write the data to memory for comparison. As we all know, frequent disk IO operation is inefficient (mechanical motion is slower than electronic motion, I don’t know how much). Obviously, all binary tree lookup structures are inefficient on disk. Therefore, B-tree is a good solution to this problem.

#### Operation cost analysis of B-tree

###### Search cost:

B-tree acts as a balanced multi-path lookup tree (m-fork). The search of B-tree is divided into two kinds: one is to locate the disk address (search address) when searching the address of another node from one node, which is very expensive. The other is to put the ordered keyword sequence in the node into the memory to optimize the search (can use half), which is very low compared with the search cost. The height of B-tree is very small, so in this context, the efficiency of B-tree is much higher than that of any binary structure. Moreover, as a variety of B tree, B + tree is more efficient.

###### Insertion cost:

The insertion of B-tree will split the nodes. When the insert operation causes the splitting of s nodes, the number of disk accesses is h (read the nodes on the search path) + 2S (write back two new split nodes) + 1 (write back the new root node or the node that did not cause the splitting after insertion). Therefore, the number of disk accesses required is H + 2S + 1, up to 3H + 1. So the cost of insertion is huge.

###### Delete cost:

The deletion of B-tree will result in node merging. In the worst case, the number of disk accesses is 3H = (h read accesses are required to find the deleted element) + (h-1 read accesses are required to obtain the nearest sibling of layers 2 to h) + (H-2 write accesses are required to merge layers 3 to h) + (3 write accesses are required to the modified root node and two nodes of layer 2).

###### definition:

A B-tree of order m (M > = 3, that is, the number of data and children contained in a node) has the following characteristics:

1. The number of root nodes is 3 subtrees

2. Definition:

The order of define m 3 / * B tree*/

typedef struct Node{

Int keynum; / * the number of keys in the node, that is, the size of the node*/

Int key [M]; / * node data array*/

Struct node * parent; / * pointer to parent node*/

Node * son [M]; / * pointer array pointing to child node*/

};

###### B-tree efficiency summary:

Due to the consideration of disk storage structure, the cost of searching, deleting and inserting B-tree is much less than that of any binary structure tree (reducing the number of disk reading and writing).

### Comparison of dynamic search tree structure

###### Balanced binary tree and red black tree [AVL PK RBT]

AVL and RBT are the optimization of binary search tree. Its performance is much better than binary search tree. They all have their own advantages and their applications are different.

Structure comparison: AVL structure is highly balanced, RBT structure is basically balanced. AVL > RBT

Search comparison: AVL search time complexity is the best, the worst case is O (logn). The best time complexity of RBT is O (logn), and the worst is slightly worse than AVL.

Insert delete comparison:

1. The insertion and deletion of AVL nodes can easily lead to the imbalance of tree structure, while the balance of RBT is low. Therefore, in the case of a large number of data insertion, RBT needs to re balance by rotation color changing operation, and the frequency is less than AVL.

2. If balance processing is needed, RBT has one more color changing operation than AVL, and the time complexity of color changing is on the order of O (logn). But because of the simple operation, the discoloration is still very fast in practice.

3. When inserting a node causes imbalance of the tree, AVL and RBT need two rotation operations at most. However, after deleting a node and causing imbalance, AVL needs logn rotation operations at most, while RBT only needs 3 rotation operations at most. Therefore, the cost of inserting a node is almost the same, but the cost of deleting a node RBT is lower.

4. The insertion and deletion cost of AVL and RBT is mainly consumed in finding the nodes to be operated. Therefore, the time complexity is basically proportional to o (logn).

Overall evaluation: a large number of data practice shows that the overall statistical performance of RBT is better than that of balanced binary tree.

###### B-tree and B + tree [B-tree PK B + tree]

B + tree is a variant of B-tree. In disk search structure, B + tree is more suitable for disk storage structure of file system.

Structure comparison:

B-tree is a balanced multi-path search tree. All nodes contain valid information of keywords to be searched (such as file disk pointer). If each node has n keywords, it has n + 1 pointers to other nodes.

Compared with B-tree, B + tree has the following characteristics

1. Data only appears in leaf nodes, and each node of B-tree contains data;

2. Leaf nodes are connected by pointers;

3. The height of B + tree is generally 3;

Find and compare:

1. With the same amount of data to be checked, the disk IO operations that need to be called in the B + tree search process are less than the ordinary B-tree. Because B + tree is in the background of disk storage, the performance of B + tree is better than B-tree.

2. The search efficiency of B + tree is more stable, because all leaf nodes are in the same layer, and all keywords must go through the whole process from root node to leaf node. Therefore, in the same B + tree, the search and comparison times of any keyword are the same. The B-tree is not necessarily, it may find a non endpoint and end.

Comparison of insertion and deletion: the efficiency of B + tree and B-tree in insertion and deletion is almost the same.

Overall evaluation: in the application background, especially in the file structure storage. The application of B + tree is more and its efficiency is better than that of B-tree.