Why is redis so fast

Time:2022-5-19

Common questions during the interview can be realized from the underlying data structure of different data types of redis, completely based on memory, IO multiplexing network model, thread model, progressive rehash And so on

1. Memory based implementation

Redis is a memory based database. Compared with disk database, it can completely suspend the disk. The memory is directly controlled by the CPU, that is, the memory controller integrated inside the CPU. Therefore, the memory is directly connected with the CPU to enjoy the optimal bandwidth of communication with the CPU.

Why is redis so fast

2. Efficient data structure

Redis has five data types: string, list, hash, set and sortedset. Different data types are supported by one or more data structures at the bottom, in order to pursue faster speed.

Why is redis so fast

1) SDS simple dynamic string

SDS (simple dynamic string) is the underlying data structure adopted by string in redis. It is a string that can be modified.

Why is redis so fast

In SDS, len saves the length of the string. O (1) time complexity queries the length information of the string. The traditional C string traverses the length of the string and stops when it meets zero. The complexity is O (n).
Space pre allocation: after the SDS is modified, the program will not only allocate the necessary space for the SDS, but also allocate additional unused space.

Inert space release: when shortening the SDS, the program will not reclaim the excess memory space, but use the free field to record the number of bytes without releasing them. If the append operation is required later, the unused space in free will be used directly to reduce the memory allocation.

Binary security: SDS is essentially char *. Because of the existence of header sdshdr structure, it can store any binary data. Unlike C language string, it can identify the end of the string with ‘\ 0′ (when it encounters’ \ 0 ‘, it is considered to reach the end, and all characters after’ \ 0 ‘are ignored. Therefore, if the traditional string saves binary files such as pictures and videos, it will be truncated when operating the file). The buf of the SDS header is defined as a byte array, and the len member of the header is used to judge whether the end of the string is reached, which means that it can store any binary data and text data, including ‘\ 0’.

Supplement:

All redis objects have a redis object header structure

struct RedisObject {

    int4 type; //  4bits type

    int4 encoding; //  4bits storage format

    int24 lru; //  24bits record LRU information

    int32 refcount; // 4bytes

    void *ptr; // 8bytes,64-bit system

} robj;

Different objects have different types (4bit), and the type of the same type will have different storage forms encoding (4bit).

In order to record the LRU information of the object, 24 bit LRU is used to record the LRU information.

Each object has a reference count refcount. When the reference count is zero, the object will be destroyed and memory will be recycled. The PTR pointer will point to the specific storage location of the object content (body).

A redisobject object header needs to occupy a total of 16 bytes of storage space.

Redis strings can be stored in two ways. When the length is very short, they are stored in the form of embedded. When the length exceeds 44, they are stored in the form of raw.

The embstr storage form is a storage form that continuously exists redisobject object headers and SDS objects, and uses malloc method to allocate them at one time. The raw storage form is different. It requires two malloc, and the two object headers are generally discontinuous in memory address.

Why is redis so fast

When string comparison is small, the size of SDS object header is capacity + 3 — the memory size of SDS structure is at least 3. This means that the minimum space used to allocate a string is 19 bytes (16 + 3).

If the total exceeds 64 bytes, redis considers it to be a large string, which is no longer stored in emdstr form, but in raw form. The 64-19-terminated \ 0, so empstr can only hold 44 bytes.

2) Intset integer set

Intset is one of the underlying implementations of set keys. If a set satisfiesSave only integer elementsandThe number of elements is not largeIf these two conditions are met, redis will use intset to save this data set.
The intset element type can only be numeric. There are three types: int16_ t、int32_ t、int64_ t。
The elements are ordered and cannot be repeated.
Like SDS, intset has continuous memory, just like an array. Its data structure is as follows:

typedef struct intset {
    uint32_ t encoding; //  Coding mode
    uint32_ t length;  //  length
    int8_ t contents[];  //  Data part
} intset;

The encoding field represents the encoding mode of the integer set. Redis provides the macro definitions of three modes as follows:

//As you can see, although the type indicated in the contents section is int8_ t. But the data is not stored in this type
//Data in int16_ T type storage, each accounting for 2 bytes, can store integers in the range of - 32768 ~ 32767
#define INTSET_ENC_INT16 (sizeof(int16_t)) 
//Data in int32_ T type storage, each occupying 4 bytes, can store integers in the range of - 2 ^ 32-1 ~ 2 ^ 32
#define INTSET_ENC_INT32 (sizeof(int32_t)) 
//Data in Int64_ T type storage, each accounting for 8 bytes, can store integers in the range of - 2 ^ 64-1 ~ 2 ^ 64
#define INTSET_ENC_INT64 (sizeof(int64_t)) 
The length field is used to hold the number of elements in the collection.

The contents field is used to save integers. The elements in the array must not contain duplicate integers and be arranged in the order from small to large. When reading and writing, they are read and written according to the specified encoding coding mode.

upgrade
The most noteworthy thing in inset is the upgrade operation. When the integer added to the intset exceeds the current encoding type, the intset will be upgraded to the encoding mode that can accommodate the integer type, such as 1,2,3,4. When creating the set, int16 is used_ For the type storage of T, now you need to add a large integer to the set, which exceeds the maximum storage range of the current set. At this time, you need to upgrade the integer set and change the encoding field to int32_ 6 types, and rearrange the data in the contents field.
Redis provides the intsetupgradeandadd function to upgrade the integer set and add data. Refer to the following figure for the upgrade process:

Why is redis so fast

image.png

There is also a more vivid illustration:

//According to the original encoding method of the set, take out the set elements from the underlying array
    //The element is then added to the collection in a new encoding
    //After the conversion from the old code to the new code is completed
    //Because the newly allocated space is placed at the back end of the array, the program moves the elements from the back end to the front end first
    //For example, suppose there are three elements encoded by currenc, which are arranged in the array as follows:
    // | x | y | z | 
    //After the program reallocates the array, the array is expanded (the symbol "indicates unused memory):
    // | x | y | z | ? |   ?   |   ?   |
    //At this time, the program starts from the back end of the array and reinserts the elements:
    // | x | y | z | ? |   z   |   ?   |
    // | x | y |   y   |   z   |   ?   |
    // |   x   |   y   |   z   |   ?   |
    //Finally, the program can add new elements to the last? In the position indicated by the number:
    // |   x   |   y   |   z   |  new  |
    //The above shows that the new element is larger than all the original elements, that is, prepend = = 0

    //When the new element is smaller than all the original elements (prepend = = 1), the adjustment process is as follows:
    // | x | y | z | ? |   ?   |   ?   |
    // | x | y | z | ? |   ?   |   z   |
    // | x | y | z | ? |   y   |   z   |
    // | x | y |   x   |   y   |   z   |
    //When a new value is added, the original data of | x | y | will be replaced by the new value
    // |  new  |   x   |   y   |   z   |
————————————————

Find:
The search logic has been used in the insertion operation. In fact, it is binary search intsetsearch()
Delete:
First, get the encoding of the element. If it does not meet the conditions, success is 0, indicating that the deletion fails.
Otherwise, call intsetsearch() to find the corresponding location
Then move the element of POS + 1 to the POS position, which is equivalent to overwriting an element forward.
Reduce the number of elements by one and reallocate memory

3) Zip list

Compressed list is one of the underlying implementations of list, hash and sorted set.
When there is only a small amount of data and each list item is either a small integer value or a short string, redis will use the compressed list as the underlying implementation of the list key

Why is redis so fast

1631517627900.jpg

Ziplost is essentially a byte array. It is a linear data structure designed to save memory. It can contain any number of elements, and each element can be a byte array or an integer.
The meaning of each field is as follows:
1. Zlbytes: the byte length of the compressed list, accounting for 4 bytes, so the longest compressed list is (2 ^ 32) – 1 byte;
2. Offset of the compressed list elements from the starting address of zlail by 4 bytes;
3. Zllen: number of elements in the compressed list, accounting for two bytes; So what happens when the number of elements in the compressed list exceeds (2 ^ 16) – 1? At this time, the number of elements in the compressed list cannot be obtained through the zllen field. You must traverse the whole compressed list to obtain the number of elements;
4. Entryx: several elements stored in the compressed list, which can be byte array or integer; The coding structure of entry is detailed later;
5. Zlend: the end of the compressed list, accounting for one byte, which is always 0xff.

According to the above structure, it is easy to obtain the byte length and the number of elements of the ziplost, so how to traverse all elements? We already know that for each entry element, the stored may be a byte array or an integer value; So for any element, how do we judge what type is stored? For byte array, how do we get the length of byte array?
Before answering these questions, you need to look at the coding structure of compressed list elements, as shown in the figure:

Why is redis so fast

image.png

ergodic
previous_ entry_ The length field indicates the byte length of the previous element, accounting for 1 or 5 bytes; When the length of the current element is less than 254 bytes, previous_ entry_ The length field is represented by one byte; When the length of the current element is greater than or equal to 254 bytes, previous_ entry_ The length field is represented by 5 bytes; And at this time, previous_ entry_ The first byte of length is the fixed flag 0xFE, and the last four bytes really represent the length of the previous element;

Because the previous of each element_ entry_ The length field stores the length of the previous element, so the * * forward traversal * * of the compressed list is relatively simple. The expression (p-previous_entry_length) can obtain the first address of the previous element, which will not be detailed here.

Backward traversalWhen, you need to decode the current element and calculate the length of the current element to obtain the first address of the next element
   encoding field indicates the encoding of the current element, that is, the data type (integer or byte array) stored in the content field, and the data content is stored in the content field; In order to save memory, the encoding field is also variable length.

Why is redis so fast

image.png

It can be seen that according to the first two bits of the first byte of the encoding field, it can be judged that the content field stores an integer or a byte array (and the maximum length of the byte array); When content stores a byte array, subsequent bytes identify the actual length of the byte array; When the content stores an integer, the specific type of the integer can be determined according to the 3rd and 4th bits; When the encoding field identifies that the current element stores an immediate number of 0 ~ 12, the data is directly stored in the last four bits of the encoding field. At this time, there is no content field.

Entry structure

For any compressed list element, complex decoding operations are required to obtain the length of the previous element, judge the stored data type and obtain the data content. Then the decoded results should be cached. Therefore, the structure zlentry is defined to represent the decoded compressed list element

Why is redis so fast

image.png

Reviewing the coding structure of compressed list elements, there are actually more than three variables; previous_ entry_ Length: the length of the field (represented by prevrawlensize), previous_ entry_ The content stored in the length field (represented by the field prevrawlen), the length of the encoding field (represented by the field lensize), the content of the encoding field (represented by the field len, represented by the data type), and the first address of the current element (represented by the field P). The header size field represents the header length of the current element, that is, previous_ entry_ The sum of the length of the length field and the length of the encoding field.
The zipentry function is used to decode the elements of the compressed list and store them in the zlentry structure:

Why is redis so fast

image.png
Cascade update of ziplost

The prevlen field in the entry indicates the length of the previous entry. There are two values: 1byte or 5byte
When the length of the entry in front of an entry changes, it will lead to the need to increase the size of the prevlen field of the entry to store the length of the previous entry. If the capacity of multiple consecutive entries is close to 254, the prevlen size of multiple entries will need to be expanded, and the so-called cascade update will occur.
The essence of this update is the change of prevlen size. There are two situations:,
One is expansion (1byte – > 5bytes),
One is shrinkage (5bytes – > 1byte), which is not handled in ziplost, because 5bytes redundancy can be used to represent 1byte

Cascading updates can occur when an element is inserted or deleted

Why is redis so fast

image.png
Why is redis so fast

image.png
Why is redis so fast

image.png

Chain updates will lead to multiple reallocations of memory and data copies, which is very inefficient. However, the probability of this situation is very low, so for the operation of deleting and inserting elements,Redis did not take measures to avoid chain updates。 Redis only checks whether the previous of subsequent elements needs to be updated at the end of the operations of deleting and inserting elements_ entry_ Length field

4) LinkedList bidirectional linked list

Because C language has no built-in linked list data structure, redis has built its own linked list implementation. One of the underlying implementations of list keys is linked lists. When a list key contains a large number of elements, or the elements in the list are long strings, redis will use the linked list as the underlying implementation of the list key.

typedef struct listNode {

    struct listNode *prev;// Precursor pointer

    struct listNode *next;// Successor pointer

    void *value; // Value of node

} listNode;

 

Typedef struct listiter {// linked list iterator

    listNode *next;

    int direction;// Traversal direction

} listIter;

 

Typedef struct list {// linked list

    listNode *head;// Chain header

    listNode *tail;// List tail

    void *(*dup)(void *ptr); // Copy function pointer

    void (*free)(void *ptr); // Free memory function pointer

    int (*match)(void *ptr, void *key); // Compare function pointer

    unsigned long len; // Linked list length

} list;

The list structure provides the header pointer head, the tail pointer tail and the linked list length counter len for the linked list, while the DUP, free and match members are type specific functions required to implement the polymorphic linked list:

  • DUP function is used to copy the value saved by the linked list node;
  • The free function is used to release the value saved by the linked list node;
  • The match function is used to compare whether the value saved by the linked list node is equal to another input value;

characteristic

  • Double ended: the linked list node is provided with prev and next pointers, and the replication degree of the pre node and post node of a node is O (1)
  • Acyclic: the prev pointer of the header node and the next pointer of the tail node point to null, and the access to the linked list ends with null.
  • With header pointer and footer pointer: obtain the replication degree o (1) of header node and footer node
  • With linked list length counter: the len attribute counts the linked list nodes held by the list to obtain the number of nodes and the replication degree o (1)
  • Polymorphism: use the void * pointer to save the node value, and set the type specific function for the node value through the DUP, free and match attributes of the list structure, so the linked list can be used to save various types of values.

5) quicklist

Although ziplist saves memory overhead, it also has two design costs: first, it cannot save too many elements, otherwise the access performance will be reduced; Second, you can’t save too large elements, otherwise it will easily lead to memory reallocation and chain update.
Therefore, in view of the deficiencies in the design of ziplist, redis code has designed two new data structures in the process of development and evolution: QuickList and listpack. The design goal of these two data structures is to keep the memory saving advantage of ziplist as much as possible and avoid the potential performance degradation of ziplist. After redis version 3.2, in order to further improve the performance of redis, QuickList is used for storageList object
QuickList is a mixture of zipplist and LinkedList. It divides LinkedList into segments. Each segment uses zipplist for compact storage, and multiple zipplists are connected in series by two-way pointers.

Why is redis so fast

image.png

Uicklist is a two-way linked list macroscopically. Therefore, it has the advantage of a two-way linked list, which is very convenient for insertion or deletion. Although the complexity is O (n), it does not need memory replication, which improves the efficiency, and the complexity of accessing elements at both ends is O (1).
On the micro level, QuickList is a piece of entry nodes. Each piece of entry node memory is stored continuously and sequentially. It can be located by the complexity of log2 (n) log2 (n) through binary search

The general structure of QuickList is shown in the figure below:

Why is redis so fast

image.png

The implementation of QuickList operation is very complex
Refer to:https://blog.csdn.net/men_wen/article/details/70229375

6) Dict dictionary

Dict (dictionary dictionary) is one of the underlying implementation structures of hash key (ziplist when the amount of data is small). Redis itself is also called the remote dictionary server (remote dictionary server). In fact, it is a very large dictionary. Its key is usually of string type, but the value can be
String, set, Zset, hash, list and other different types. Data structure definition of dict:

The hash table node dictentry is defined as follows:

typedef struct dictEntry {
    void *key;                // Key void * indicates any type of pointer

    Union {// the union provides special type optimization for numeric types
       void      *val;
       uint64_t  u64;
       int64_t   s64;
    } v;

    struct dictEntry *next;   // Next pointer

} dictEntry;

Redis’s dictionary is implemented by hash table. There are multiple hash table nodes in a hash table, and each node represents a key value pair of the dictionary

typedef struct dictht {
    dictEntry **table;        // Array pointer. Each element is a pointer to dictentry

    unsigned long size;       // Indicates the size of the allocated space of this dictht. The size is always 2 ^ n

    unsigned long sizemask;   // sizemask = size - 1;  Is the mask used to find the hash value, which is 2 ^ n-1

    unsigned long used;       // Number of existing elements
} dictht;

The complete dictionary dict implementation is composed of two hash tables dictht and several variables, as follows:

typedef struct dict {
    dictType *type;     // Type defines the operation functions for the hash table, such as hash function, key comparison function, etc

    void *privdata;      // Privdata is private data that can be passed to dict         

    dictht ht[2];       // Each dict contains two dicths, one for rehash

    int rehashidx;      // Indicates whether the rehash operation is in progress at this time

    int iterators;      // iterator 
} dict;

Through the above three data structures, we can roughly see the composition of dict, and the data (key value) is stored in each dictentry node; Then a hash table is a dictht structure, which indicates the size, used and other information of the hash table; Finally, the dict structure of each redis will contain two dicths by default. If one hash table meets specific conditions and needs to be expanded, it will apply for another hash table, and then rehash the element. Rehash means recalculating the hash value of each key and storing it in the appropriate location of the second hash table. However, this operation is not completed in a centralized way in redis, It is gradually completed in the subsequent process of addition, deletion, modification and query. This is called progressive rehash

Why is redis so fast

image.png

Dictionary insertion process

  1. First calculate the hash value of the key with the hash function (redis uses murmurhash2 algorithm to calculate the hash value)
    hash = dict->type->hashFunction(key)

  2. With sizemask and hash value, calculate the index value (the following x can be 0 or 1)
    index = hash & dict->ht[x].sizemask

  3. The index value calculated above is actually the subscript of the corresponding dictentry * array. If the corresponding subscript does not store any key value pairs, it will be stored directly. Otherwise, with the help of the open chain method, insert new key value pairs from the chain header (because the chain list does not record the pointer to the tail of the chain list, the insertion efficiency from the chain header is higher, which can reach o (1))

When the conflict rate of the hash table is too high, the linked list will be very long, and the query efficiency will become low. Therefore, it is necessary to expand the hash table. If the size of the hash table is set to be large when there are few key value pairs stored in the hash table, it will waste memory. At this time, it is necessary to shrink the hash table. The process of expansion and contraction here is actually the process of rehash.

Rehash steps are as follows:

  1. Allocate space for the hash table HT [1] of dict. The amount of space allocated depends on the operation type and the current number of key value pairs HT [0] used
    (1) If it is an extension operation, the size of HT [1] is the first one greater than or equal to HT [0] used2Integer of 2 ^ n
    (2) If it is a shrink operation, the size of HT [1] is the first one greater than or equal to HT [0] Integer of used * 2 ^ n

  2. Recalculate the hash value and index value of all keys in HT [0] and migrate the corresponding key value pairs to the specified location of HT [1]. It should be noted that this process is completed gradually. Otherwise, if the dictionary is large, it will take a certain time to complete the migration, and the redis server will not be available during this time

  3. When all key value pairs of HT [0] are migrated to HT [1] (at this time, HT [0] will become an empty table), set HT [1] to HT [0], and re create an empty table on HT [1] to prepare for the next rehash

Progressive rehash:
For redis, if there are too many keys in the hash table, the rehash operation may take a long time and block the server. Therefore, redis itself disperses the rehash operation in each subsequent addition, deletion, modification and query.
Access policy during rehash:
In redis, by default, all access operations related to hash table will first look up hash table 0, and then decide whether to look up hash table 1 according to whether it is rehashing. The key codes are as follows

The gradual process is as follows:
i. Allocate space for HT [1], where the dictionary holds both HT [0] and HT [1]
ii. Set rehashidx to 0, which means rehash officially starts
III. during rehash, every time any operation is performed on the dictionary, in addition to the corresponding operation, the program will rehash all key value pairs of HT [0] on rehashidx index to HT [1], and increase the value of rehashidx by one after the operation
IV. HT [0] the dictionary during rehash After the size operation, the value of rehashidx will increase to HT [0] Size. At this time, all key value pairs of HT [0] have been migrated to HT [1], and the program will reset rehashidx to – 1, indicating that rehash is completed

It should be noted here that in the rehash process, HT [0] and HT [1] may have key value pairs at the same time. Therefore, both hash tables must be checked when executing the query operation. If the operation is to insert key value pairs, it is OK to operate directly on HT [1].

Finally, under what conditions will redis expand or shrink the hash table:

  1. Currently, the server does not perform expansion when executing bgsave or bgrewriteaof commands and the load factor of hash table is greater than or equal to 1

  2. When the server is executing the bgsave or bgrewriteaof command and the load factor of the hash table is greater than or equal to 5, perform the expansion operation (here, the calculation formula of load factor is: load factor = the number of nodes currently saved in the hash table / the size of the hash table. The reason why the expansion operation is carried out when the server performs bgsave or bgrewriteaof is that redis will create sub processes at this time, and most operating systems adopt write time replication technology to optimize the utilization efficiency of sub processes, which is not suitable for large-scale data migration activities at this time, To put it bluntly, it is to save memory and improve efficiency)

  3. Shrink when the current load factor is less than 0.1

7) Skiplist jump table

The sorting function of sorted set type is realized through the skip list data structure.
Skip list is an ordered data structure. It maintains multiple pointers to other nodes in each node, so as to achieve the purpose of fast access to nodes.
Based on the linked list, the jump list adds a multi-level index. Through several jumps of the index position, the fast positioning of data is realized, as shown in the following figure:

Why is redis so fast

image.png

Jump list is a set of multiple linked lists. It is a probabilistic data structure, which is used to replace the data structure of balance tree. To be exact, it is used to replace the structure of self balancing binary search tree (BST). (the more ordered the insert elements of a normal BST, the lower the efficiency. In the worst case, it will degenerate back to the linked list. In any case, the addition, deletion and query operations of a self balancing BST maintain the time complexity of O (logn), such as AVL tree, splay tree, 2-3 tree and its derived red black tree. However, the self balancing process of the tree is complex and troublesome to implement. In the case of high concurrency, locking will also bring considerable overhead.)
Jump table is a data structure with simple design but similar efficiency to self balancing BST.

The skip table has the following properties:

i. It is composed of multiple layers, the first layer is the lowest layer, the second layer is the second layer, and so on. The number of layers will not exceed a fixed maximum value Lmax.
ii. Each layer is an ordered linked list with header nodes, and the linked list of layer 1Contains all elements in the skip table
III. If an element appears in layer K, it must also appear in layers 1 ~ k-1, but it will appear in layer K + 1 according to a certain probability P.

Obviously, this is a space for time idea, which is similar to the index. The k-th layer can be regarded as the K-1st index to speed up the search. In order to avoid taking up too much space, the first floor is aboveDo not store actual data, only pointers (including the pointer to the next element in the same layer and the pointer to the lower layer of the same element).

When looking for elements, it will traverse from the head node of the top-level linked list. Take the ascending jump table as an example. If the value of the next node of the current node is smaller than the value of the target element, continue to look to the right. If the value of the next node is greater than the target value, go to the next layer of the current layer to find it. Repeat the right and down operations until you find an element equal to the target value. The blue arrows in the following figure mark the steps to find element 21.

Why is redis so fast

image.png

Probability of inserting elements
As mentioned earlier, the elements of layer K of the jump table will appear in layer K + 1 with a certain probability P. this probability is realized in the insertion process.
After finding the insertion position of the new element according to the above search process, first insert it into the first layer. As for whether to insert No. 2, 3, 4 Layer, you need to use random numbers and other methods to determine

Implementation in redis
Hop list is called zskiplist in redis and is one of the underlying data structures of the sorted set / Zset type it provides. In addition to zskiplist, Zset also uses kV hash table dict. The default implementation of ordered collection in redis is actually the more common ziplost (compressed double linked list), but in redis There are two parameters in conf that can control its conversion to Zset implementation:

zset-max-ziplist-entries 128
zset-max-ziplist-value 64

When the number of elements in the ordered set exceeds Zset Max ziplist entries, or the data length of any element exceeds Zset Max ziplist value, it will be automatically converted from ziplist to Zset

typedef struct zskiplistNode {
    robj *obj;
    double score;
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned int span;
    } level[];
} zskiplistNode;

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;

The node definition of zskiplist is the structure zskiplistnode, which has the following fields.

Obj: store the data of this node.
Score: the score value corresponding to the data. Zset sorts the data in ascending order through the score.
Backward: pointer to a node on the linked list, i.e. backward pointer.
Level []: the array of zskiplistlevel of the structure, which represents a layer in the skip table. There are two fields in each layer:
Forward is the pointer to the next node in the linked list, that is, the forward pointer.
Span indicates how many nodes this forward pointer has skipped (excluding the current node).

Zskiplist is the jump table itself, which has the following fields.
Header and tail: head pointer and tail pointer.
Length: the length of the skip table, excluding the header pointer.
Level: the number of layers of the jump table.

Why is redis so fast

image.png

It can be seen that the first layer of zskipplist is a two-way linked list, and the other layers are still a one-way linked list. This is to facilitate the possible demand for reverse data acquisition.

In addition, the number span of nodes skipped by the forward pointer will be saved in the node, because Zset itself supports ranking based operations, such as zrevrank instruction (ranking by data query), zrevrange instruction (querying data by ranking range), etc. If you have a span value, you can easily accumulate rankings in the search process.

The above are the two differences between zskiplist and the traditional jump table mentioned above, and both bring convenience to us. Now let’s continue to read the code and see some specific operations.

3. Single thread model

Don’t say that redis has only one thread.
Single thread refers to redis key value pairsThe execution of read-write instructions is single thread

What are the benefits of single threading?

1. No performance consumption caused by thread creation;
2. Avoid CPU consumption caused by context switching, and there is no overhead of multi-threaded switching;
3. It avoids the competition between threads, such as adding locks, releasing locks, deadlocks, etc. there is no need to consider all kinds of locks.
4. The code is clearer and the processing logic is simple.

(official answer: “because redis is a memory based operation, the CPU is not the bottleneck of redis. The bottleneck of redis is most likely the size of machine memory or network bandwidth.” Since single thread is easy to implement and the CPU will not become a bottleneck, it is logical to adopt the single thread scheme)

4. I / O multiplexing model

Why should I / O multiplexing be used in redis? Because redis runs in a single thread, all operations are performed linearly in sequence. However, since the read-write operations waiting for user input or output are blocked, I / O operations usually cannot be returned directly, which will lead to I / O blocking of a file, resulting in the whole process unable to provide services to other customers. I / O multiplexing is to solve this problem. In order to enable the server-side application of single thread (process) to process the events of multiple clients at the same time, redis adopts the IO multiplexing mechanism (about IO multiplexinghttps://www.cnblogs.com/reecelin/p/13537734.html)。

Redis’s I / O multiplexing actually uses one thread to check the ready status of multiple sockets, and manages and processes multiple I / O streams by recording and tracking the status of each socket (I / O stream) in a single thread. The following figure shows redis’s I / O multiplexing model:

Why is redis so fast

image.png

As shown in the figure above, the I / O multiplexing model of redis is described as follows:
(1) When a socket client connects with the server, it will generate a corresponding socket descriptor (socket descriptor is a kind of file descriptor). In fact, each socket network connection corresponds to a file descriptor.
(2) When multiple clients are connected to the server, redis uses the I / O multiplexer to register the FD corresponding to the client socket into the listening list (a queue) and monitor the reading and writing of multiple file descriptors (FD) at the same time. When the customer service side executes accept, read, write, close and other operation commands, the I / O multiplexer will encapsulate the command into an event and bind it to the corresponding FD.
(3) When file events are generated in the socket, the I / O multiplexing module will transmit those sockets FD that generate events to the file event dispatcher.
(4) After receiving the socket FD from the I / O multiplexer, the file event dispatcher sends the socket to the corresponding event processor to process relevant commands according to the event type generated by the socket.
(5) The whole file event processor runs on a single thread, but through the introduction of I / O multiplexing module, it can monitor the reading and writing of multiple FDS at the same time. When one client reaches the state of writing or reading, the file event processor will execute immediately, so as to avoid the problem of I / O congestion and improve the performance of network communication.
(6) As shown in the figure above, redis’s I / O multiplexing mode is implemented by using the reactor design mode.

About reactor mode:https://www.jianshu.com/p/188ef8462100
NiO implements multi reactor mode:https://blog.csdn.net/qq_32445015/article/details/104584433

Redis has developed its own network event processor based on reactor mode, which is called file event handler. File event processor consists of socket, IO multiplexer, file event dispatcher and event handler.

IO multiplexerMultiple sockets will be monitored at the same time. When the monitored socket is ready to perform accept, read, write, close and other operations, the corresponding file event will be generated. The IO multiplexer will press all the sockets that generate events into a queue, and then send them to the file event dispatcher in an orderly way of only one socket at a time. After receiving the socket, the file event dispatcher will call the corresponding event processor for processing according to the event type generated by the socket.

Processor fileThere are several types:
Connection response processor: used to process the connection request of the client;
Command request processor: used to execute commands passed from the client, such as common set, lpush, etc;
Command reply processor: used to return the execution results of client commands, such as the results of set, get and other commands;

Event type:
AE_READABLE: used in conjunction with two event handlers.
When the client connects to the server, the server will connect the response processor with the AE of the socket_ Associated with the readable event;
When the client sends a command to the server, the server connects the command request processor with the AE_ Associated with the readable event;
AE_WRITABLE: when the server needs to send back data to the client, the server will reply the command to the AE of the processor and socket_ Associated with the writeable event.

Why is redis so fast

image.png

4. Redis global hash dictionary

As mentioned earlier, redis as a whole is a hash table used to store all key value pairs, regardless of any of the five data types. A hash table is essentially an array. Each element is called a hash bucket. The entry in each bucket holds a pointer to the actual concrete value.

Why is redis so fast

image.png

It is mentioned in SDS that all redis objects have a redis object header structure. Redis objects have five types; Regardless of the type, redis will not store directly, but through redisobject objects. Redisobject object is very important. Redisobject support is required for redisobject type, internal coding, memory recycling, shared object and other functions. When we create a key value pair in redis, we create at least two objects. One object is the key object used as the key value pair, and the other is the value object of the key value pair.

That is, in the global hash table, each entry stores the redisobject object of “key value pair”, and the corresponding data is found through the pointer of redisobject.

Redis solves conflicts through chain hashing: that is, the elements in the same bucket are saved in a linked list. However, when the linked list is too long, it may lead to poor search performance. Therefore, redis uses two global hash tables for rehash operation, increases the number of existing hash buckets, reduces hash conflicts, and disperses rehash into multiple requests to avoid time-consuming blocking.