[redis5 source code learning] April 16, 2019 skipplist


All videos: https://segmentfault.com/a/11…


Imagine the following scenario:

Interviewer: we have an ordered array 2,5,6,7,9. We need to look up 7 and design an algorithm.
Examinee: at the first sight, I believe everyone will see that it is a binary search, and O (logn) is over.
Interviewer: next, let's change this array to a linked list (2 - > 5 - > 6 - > 7 - > 9)?
Examinee: This is simple, binary tree, the same logn.
Interviewer: then please write the complete code by hand!
Candidate: death

Imagine, give you a piece of paper, a pen, an editor, you can immediately realize a red black tree, or AVL tree out? It’s very difficult. It takes time to consider a lot of details. It’s very troublesome to refer to a bunch of trees such as algorithms and data structures, as well as the code on the Internet.

After going back, Xiao Ming was very sad and didn’t want to be tortured by the binary tree. He wanted to find a way to replace the binary tree. With his unremitting efforts, he finally found a way to replace the red black tree, which is called skiplist.

The birth of skiplist

How to solve it?
First of all, the table is in an initial state without any elements, similar to the following figure:
[redis5 source code learning] April 16, 2019 skipplist
So, let’s continue to insert an element 2, and it becomes like this.
[redis5 source code learning] April 16, 2019 skipplist
Then we flip a coin, and the result is the front. Then we need to insert 2 into L2 layer, as shown in the following figure:
[redis5 source code learning] April 16, 2019 skipplist
If you continue to flip the coin, the result is the reverse side. Then the insertion of element 2 stops. The table structure after insertion is shown in the figure above. Next, we insert element 5, just like element 2. Now, layer L1 inserts element 5, as shown in the following figure:
[redis5 source code learning] April 16, 2019 skipplist
Next, continue to flip the coin. If it is positive, it will rise one level. Otherwise, it will stop and continue to insert other new elements.
In the end, we’ll build it as shown in the figure below.
[redis5 source code learning] April 16, 2019 skipplist
In this way, a skiplist is constructed. Of course, because of its small size, the result may not be an ideal jump table. But if the number of elements n is very large, students who have studied probability theory know that the final table structure must be very close to the ideal jump table.
Is that easy?
Back to the topic, how can we find 6? It’s very simple. Let’s first compare with 6 and find that 7 is greater than 6. Then we go backward and find that it is equal and we find node 7. Of course, if we look for 5, we will go down to L2 after 6, and then go down to L2 after 2, larger than 2 and smaller than 6. Continue to downgrade and find 5.
Xiaoming is a very good person to draw inferences from one instance. Since we all know that searching is so simple, let’s take a look at the insertion. After the addition, deletion, modification and searching are solved, my mother will never have to worry about my red black tree again.

Adding, deleting, modifying and querying skiplist

Next, let’s look at the insertion. We want to insert a 4. What should we do?
Starting from the top layer, find the previous value of the node with each layer larger than 4, then toss a coin, randomly select the number of layers and insert it. For example, the value is 4. Then after inserting, it is shown in the figure below.

[redis5 source code learning] April 16, 2019 skipplist

We found that he would add a new layer and connect between the same layers. Then the insertion is done.

Delete operation:
The delete operation is similar to the insert operation, including the following three steps: 1. Find the node to be deleted; 2. Delete the node; 3. Adjust the pointer.

At this point, the addition, deletion and modification of skiplist are very clear, but we also know why. Xiao Ming does not give up and wants to know how he realized it and his own problems in the process.

Four questions skiplist

1. Why put in a coin?
Let’s first explain the coin tossing process: the number of layers of jump table nodes is limited to 64 (32 before redis 5.0). If you want to have more than 64 layers, you have to have enough nodes. Redis limits the probability of coin tossing to 1 / 4, so the probability of reaching 64 layers is (1 / 2) ^ 128. Generally, a 64 bit computer has no maximum memory Method to store so many zskiplistnodes, so the upper limit for the basic use of layer 64 is high enough. No matter how high, there is no need to waste the memory of the head node. Therefore, the purpose of coin tossing is to make the data as low as possible in order to save memory.

2. What is a jump watch? Where is it used?
Skip list is a kind of ordered data structure, which maintains multiple pointers to other nodes in each node, so as to achieve the purpose of fast access to nodes. In most cases, the efficiency of jump table is comparable to that of balanced tree, and the implementation of jump table is simpler than that of balanced tree.
Redis uses the jump table as one of the underlying implementations of the ordered set key. If an ordered set contains a large number of elements, or the members of the elements in the ordered set are long strings, redis will use the jump table as the underlying implementation of the ordered set.
Is the jump watch so good that it must be used a lot in redis? The answer is No. redis only uses jump tables in two places. One is to implement ordered set keys, and the other is to use them as internal data structures in cluster nodes. In addition, jump tables have no other uses in redis.

3. How is jump table realized?
Let’s take a look at the source code of skiplist

typedef struct zskiplistNode {
            SDS ele; // element
            Double score; // score
            Struct zskiplistnode * backward; // backward pointer. The backward pointer is used to access nodes from the end of the table to the header. Unlike the forward pointer, which can skip multiple nodes at one time, each node has only one backward pointer
            struct zskiplistLevel {
                Struct zskiplistnode * forward; // forward pointer. Each layer has a pointer to the end of the table. It is used to access nodes from the header to the end of the table
                The larger the span between two nodes, the farther they are; the span of the node pointing to null is 0
            } level[];   
        } zskiplistNode;
        //The level array of jump table can contain multiple elements, and each element contains a pointer to other nodes, through which the program can speed up the access speed
        //Generally speaking, the more layers there are, the faster access to other nodes is
        //Every time a new jump table node is created, the program will randomly generate a value between 1 and 64 as the size of the level array according to the power law (the larger the number, the smaller the probability of occurrence). This size is the height of the layer
    typedef struct zskiplist {
        Struct zskiplistnode * header, * tail; // header and tail pointers
        Unsigned long length; // number of nodes
        Int level; // the number of layers of the node with the largest number of layers
    } zskiplist;

From this, we can get the memory structure of skiplist as follows:
[redis5 source code learning] April 16, 2019 skipplist
The abstract memory structure is as follows:
[redis5 source code learning] April 16, 2019 skipplist

What else? When we code GDB ordered set Zset, we find that the program will create a dictionary dict before creating skiplist. So, what is the function of dict? Dict is a hashtable used to map the relationship between elements and score in Zset. With this mapping table, we can find the score of an element, and the time complexity becomes o (1).

4. Why redis uses jump table instead of balance tree
The elements of skiplist and various balance trees (such as AVL, red black tree, etc.) are arranged in order, but the hash table is not. Therefore, the hash table can only do a single key search, not suitable for range search. The so-called range search refers to finding all nodes whose size is between two specified values.

When doing range lookup, the balance tree is more complex than skiplist. On the balance tree, after we find the small value in the specified range, we need to continue to search for other nodes that do not exceed the large value in the middle order traversal order. If we don’t transform the balance tree, the middle order traversal is not easy to achieve. It’s very simple to search the range on skiplist. You only need to traverse the first level list after finding the small value.

The insertion and deletion of balance tree may lead to the adjustment of subtree, and the logic is complex, while the insertion and deletion of skiplist only need to modify the pointer of adjacent nodes, which is simple and fast.

In terms of memory consumption, skiplist is more flexible than balance tree. Generally speaking, each node of the balanced tree contains two pointers (pointing to the left and right subtrees respectively), while the average number of pointers in each node of the skiplist is 1 / (1-p), depending on the size of the parameter P. If, like the implementation in redis, P = 1 / 4, each node contains 1.33 pointers on average, which is better than the balanced tree.

The time complexity of finding a single key, skiplist and balanced tree is O (log n), which is roughly the same; while the time complexity of finding a hash table is close to o (1) and the performance is higher on the premise of keeping a low hash conflict probability. Therefore, most of the map or dictionary structures we usually use are based on hash table.

Compared with the difficulty of algorithm implementation, skiplist is much simpler than balance tree.

Final chapter

Finally, we need to know how skiplist is used by its old owners. You can think about how zadd, zrange, zrangebycore and other commands in redis use it.

If you want to know more about the jump table source code, it is recommended to read the [redis learning notes] 2018-05-29 redis source code learning jump table.

Recommended Today

General method of Tkinter (21) components

method explain after(delay_ms, callback=None, *args) At least delay_ Ms after calling callback, no callback, equivalent time.sleep (); returns an ID to cancel after_ The cancel () method uses after_cancel(id) Cancel the callback of after method call after_idle(func, *args) Similar to the after method, but called when there is no event idle bell() A beep bind(sequence=None, […]