Introduction to redis data structure part 2 – table skipping

Time:2021-6-14

This article uses the “signature 4.0 International (CC by 4.0)” license agreement. You are welcome to reprint or modify it, but you need to indicate the source.Signature 4.0 International (CC by 4.0)

Author: nicksxs

Created on: January 4, 2020

Link to this article:Introduction to redis data structure part 2 – table skipping

Skip list

Jump list is a data structure that is not commonly used in our daily code. It is relatively not as familiar as arrays, linked lists, dictionaries, hashes, trees and other structures. So we start from the beginning. First of all, linked lists, jump lists and linked lists all have a table word ♀️), Notice that this is an ordered list
Introduction to redis data structure part 2 - table skipping
As shown in the figure above, in this linked list, if I want to find 23, do I need to start from 3, 5, 9 until I find 23, that is, the time complexity is O (n), the first power of N complexity, then let’s take a look at the second one
Introduction to redis data structure part 2 - table skipping
This structure is a little different from the original one. It adds a pointer to the even digit nodes in the list to link them. When we are looking for 23, we can change from searching one by one to skipping. First we find 5, then 10, then 19, and then 28. At this time, we find that 28 is bigger than 23. Then I’m going back to 19, Then look forward from the original linked list on the next level,
Introduction to redis data structure part 2 - table skipping
I’ve found half of the nodes in front of me, which means dichotomy.
In fact, the previous one is the introduction of the jump table. The real jump table is not like this, because there is a big problem in the above one, that is, you need to adjust the pointer of each element after inserting an element. In fact, the jump table in redis optimizes the random number of layers, because in the previous example, when the amount of data is large, is there more layers, The higher the query efficiency, but as the number of layers becomes more and more, it will increase the processing complexity to maintain this strict number of layers rule. Therefore, when redis inserts each element, it uses a random way to take a look at the code

/* ZSETs use a specialized version of Skiplists */
typedef struct zskiplistNode {
    sds ele;
    double score;
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];
} zskiplistNode;

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;

typedef struct zset {
    dict *dict;
    zskiplist *zsl;
} zset;

I forgot to mention that redis uses skiplist jump table in Zset. Zset is an ordered set. You can see that zskiplist is a jump table structure, in which header is used to save the header of jump table, tail is used to save the tail of jump table, and length and maximum level are also saved. The specific jump table node elements are represented by zskiplistnode, which contains the element values of SDS type, The score of double type is used for sorting, a backward backward pointer and a zskiplistlevel array. Each level contains a forward pointer and a span. Span represents the span of the forward pointer of the hop table. Here we add another point. In order to flexibly add and modify the hop table, redis uses the random layer height method to insert new nodes, But if all nodes are randomly assigned to a very high level or all nodes are very low, the efficiency advantage of jump table will be reduced, so redis uses a trick to post the code

#define ZSKIPLIST_P 0.25      /* Skiplist P = 1/4 */
int zslRandomLevel(void) {
    int level = 1;
    while ((random()&0xFFFF) < (ZSKIPLIST_P * 0xFFFF))
        level += 1;
    return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}

When the sum operation of random value and 0xFFFF is less than zskiplist_ The value of level will increase only when p * 0xFFFF, so a relatively decreasing probability is maintained
In a simple analysis, when the value of random() is less than 1 / 4 of 0xFFFF, level + 1 will be achieved, which means that when there is a probability of 1-1 / 4, that is 3 / 4, that is 1-p, the probability of the first layer is 3 / 4, that is 1-p, and the probability of the second layer is p(1-p), the probability of three layers is p ²(1-p) recursion in turn.