Several common data structures

Time:2021-9-18

Binary tree

The characteristic of binary tree is that the key value of the left subtree is less than that of the root, and the key value of the right subtree is greater than that of the root

Several common data structures

However, in some scenarios, binary trees will have such a structure
Several common data structures
The binary tree of this structure is unbalanced, the level is too high, and even has degenerated into a linked list, which will lead to low query efficiency

Balanced binary tree (AVL)

Based on the binary tree, a balanced binary tree is derived. The rule of balanced binary tree is that the height difference of two subtrees of any node is < = 1

Take the above binary tree as an example. When inserting 1 \ ~ 5, the process is as follows
Several common data structures
It can be seen that every time the height difference of the subtree is greater than 1, the balanced binary tree will rotate to keep the structure balanced

According to the state before imbalance, it can be divided into four states
Ll: after inserting or deleting a node, the root node is foundLeftNodalLeftThere are also non empty nodes under the node, resulting in the root nodeLeftThe height of the node is higher than that of the root noderightThe height of the node is 2. The balance tree is unbalanced and needs to be rotated to maintain balance. The schematic diagram is as follows
Several common data structures

LR: after inserting or deleting a node, the root node is foundLeftNodalrightThere are also non empty nodes under the node, resulting in the root nodeLeftThe height of the node is higher than that of the root noderightThe height of the node is 2. The balance tree is unbalanced and needs to be rotated to maintain balance. The schematic diagram is as follows
Several common data structures

RR: after inserting or deleting a node, the root node is foundrightNodalrightThere are also non empty nodes under the node, resulting in the root noderightThe height of the node is higher than that of the root nodeLeftThe height of the node is 2. The balance tree is unbalanced and needs to be rotated to maintain balance. The schematic diagram is as follows
Several common data structures

RL: after inserting or deleting a node, the root node is foundrightNodalLeftThere are also non empty nodes under the node, resulting in the root noderightThe height of the node is higher than that of the root nodeLeftThe height of the node is 2. The balance tree is unbalanced and needs to be rotated to maintain balance. The schematic diagram is as follows
Several common data structures

The balanced binary tree pursues absolute balance. The number of rotations required after each insertion or deletion of a new node can not be predicted. If the relevant insertion and deletion operations are not frequent, but the search operations are relatively frequent, the balanced binary tree is preferred for implementation

Red black tree

Red black tree is also a balanced binary tree. However, compared with AVL tree, the balance of red black tree is not so absolute. Red black tree needs to change color or rotate constantly to meet the following rules:

  • Nodes are red or black.
  • The root is black.
  • All leaves are black (leaves are nil nodes).
  • Each red node must have two black child nodes. (there cannot be two consecutive red nodes on all paths from each leaf to the root.)
  • All simple paths from any node to each leaf contain the same number of black nodes (black height for short).

Hash table

When it comes to hash table, many people will think of HashMap. Hash table is a data structure that establishes a mapping relationship f between keyword key and value based on hash. This mapping relationship f is calledhash function , this continuous storage space that calculates and stores value is calledHashtable , the storage address obtained from the calculated key is calledhash address

Note that the hash value is not the same as the value obtained by the hashcode () method in java! Hash value is the output value calculated by hash algorithm, and hashcode is the int type value calculated by Java according to object address, etc

1. Hash function

Hash functions can be evaluated in a variety of ways

  • Direct addressing method: take a linear function value of the key as the hash address
  • Digital analysis method: select a part of the key as the operation parameter to calculate the hash address
  • Square middle method: square the key and take the middle segment as the hash address
  • Folding method: divide the key into several segments with the same length, calculate the superposition sum, and then select the last few bits as the hash address
  • Divide and leave remainder method: directly take the modulus (remainder) of the key and take the value as the hash address
  • Random number method: take the random function value of key as the hash address
2. Hash collision (hash collision)

After the hash function calculation, the hash values obtained are still equal. This phenomenon is called hash collision. The intelligent pioneers have come up with several schemes to solve this conflict problem.

3. Hash collision solution
  • Open address method: detect and find the next empty address in case of conflict. In the open address method, multiple different keys may detect the same empty address and compete, which is called stacking
  • Re hash: after a conflict occurs, another hash function is used for hash operation until a unique hash address is obtained
  • Linked list address method: for each hash address, a one-way linked list is maintained for value storage. Each node of the linked list stores the pointer address of the next node. HashMap uses this data structure
  • Common overflow area method: maintain two tables, the basic table and the overflow table. Put the data without conflict into the basic table and the data with conflict into the overflow table. During query, first find the location of the corresponding hash address in the basic table and compare it (it is speculated that the key should be further compared on the basis of the same hash address). If it is not equal, then find it in the overflow table
    PS: I have also consulted a lot of data about multiple data with the same hash address in the overflow table, but this is not clear. I guess the structure of the overflow table is also stored in the form of a linked list

B-tree

The structure of B-tree is shown in the figure. All index elements of B-tree are not repeated. Therefore, the data of B-tree has corresponding storage data at each level. In addition, its leaf node has no pointer and only stores data.

For example, if you want to query the data with the index value of 12, you can see 12 < 17 in the first level, so the P1 pointer points to disk page 2 (for B-tree and B + tree, a node is a disk page), then search in order, find the same index value 12, and directly return the data value stored in 12. If it is not found, wait and go to the next level for search. Therefore, in B-tree, the structure of non repeating index elements is designed.

Several common data structures
(relevant legend is quoted fromhttps://blog.csdn.net/a764340703/article/details/82621781

B+tree

The structure of B + tree is shown in the figure. The biggest difference between B + tree and B-tree is:

  • Non leaf nodes of B + tree will have redundant indexes
  • All data of B + tree is stored on the leaf node. The advantage of this operation is that more data can be stored under the same order conditions
  • The leaf node of B + tree has two-way pointers, which can greatly improve the efficiency of range search

Several common data structures

More applications of B + tree structure in the database will be introduced in the MySQL chapter. Please look forward to it

Recommend a data structure visualization website from David Galles, University of San Francisco
https://www.cs.usfca.edu/\~galles/visualization/Algorithms.html