Redis data structure (3) – linked list

Time:2020-1-13

Redis 5.0 based version.

The character codes of redis list include ziplist and QuickList, and the old version also has linkedlis.

1. linkedlist

3.2LinkedList will no longer be used in the list after the release, but it is listed here for comparison.

struct list {
    Listnode * head; // header node
    Listnode * tail; // tail node
    Unsigned long len; // the number of nodes in the linked list lock
    Void * (* DUP) (void * PTR); // node value copy function
    Void * (* FREE) (void * PTR); // number of rows released by node value
    Void * (* match) (void * PTR, void * key); // node value comparison function
}

struct listNode {
    listNode *prev;
    listNode *next;
    void *value;
}

Redis data structure (3) - linked list

  • Bidirectional: the linked list has prev and next pointers, and the complexity of obtaining the pre and post nodes of a node is O (1).
  • Acyclic: the prev pointer of the header node and the next pointer of the footer node point to null, and the range of the list ends with null.
  • With header pointer and tail pointer: through the head and tail pointer, the complexity of obtaining the head node and tail node is O (1).
  • Length counter with linked list: the complexity of obtaining the number of nodes through len is O (1).
  • Polymorphism: link list nodes use void * pointer to save node values, and can set type specific functions for node values through dump, free, and match attributes of list structure, so link list can be used to save various types of values.

Stringobject is a redisobject object of type string. In this article, it’s all referred to as stringobject.

2. ziplist

// ziplist.c
unsigned char *ziplistNew(void) {
    unsigned int bytes = ZIPLIST_HEADER_SIZE+ZIPLIST_END_SIZE;
    unsigned char *zl = zmalloc(bytes);
    ZIPLIST_BYTES(zl) = intrev32ifbe(bytes);
    ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE);
    ZIPLIST_LENGTH(zl) = 0;
    zl[bytes-1] = ZIP_END;
    return zl;
}

Redis data structure (3) - linked list

field type length Explain
zlbytes uint32_t 4 byte Record the number of memory bytes occupied by the entire compressed list, including the 4-byte zlbytes itself. Used for memory reallocation of compressed lists, or for calculating the location of Zlend.
zltail uint32_t 4 byte Record the byte offset from the start address of the tail section of the compression list table. By this offset, you can quickly determine the address of the last node.
zllen uint16_t 2 byte Records the number of nodes in the compression list. When the number of nodes is greater than or equal to uint16, you need to traverse the entire list to know the number of nodes.
entry zlentry According to node content Compress the nodes contained in the list.
zlend uint8_t 1 byte The fixed value is 0xff (255), which identifies the tail node of the compressed list. Other normal nodes do not start with 255. Therefore, it is possible to know whether the end of the list has been reached by checking whether the ground byte of the node is equal to 255.

The following example:
Redis data structure (3) - linked list
Explain:

  1. The zlbytes value is 80, and the total length of the identification list is 80 bytes.
  2. Zltail value is 60, which means that the starting address of the end node entry3 can be obtained by adding the offset 60 from the first node pointer P.
  3. The zllen value is 3, indicating that the number of entry nodes is 3.

entry:
Redis data structure (3) - linked list

2.1 prevrawlen:

Prevrawlen records the length of the previous node. The length of the attribute is 1 byte or 5 bytes.

  1. If the length of the previous node is less than 254 bytes, the prevrawlen length is 1 byte, and the value is the length of the previous node.
  2. If the length of the previous byte is greater than 254 bytes, the prevrawlen length is 5 bytes, the first byte value is 0xFE (decimal 254), and the next four bytes are the length of the previous node.

Because prevrawlen records the length of the previous node, the program can calculate the starting address of the previous node according to the starting address of the current node through pointer operation. This principle is used in the traversal from the end of a compressed list to the header.

2.2 encoding

Encoding records the type and length of data data (detailed in ziplist. C of GitHub).

  1. When the highest bit is 00, the encoding length is 1 byte, and the data byte array length is less than 63 bytes (the 6th power of 2). The value after removing the highest two bits represents the data length.
  2. When the highest bit is 01, the encoding length is 2 bytes, and the data byte array length is less than 16383 bytes (the 14th power of 2). The value after removing the highest two bits represents the data length.
  3. When the highest bit is 10, the encoding length is 5 bytes, and the data byte array length is less than 4294967295 bytes (the 36th power of 2). The value after removing the highest two bits represents the length of data.

4. When the highest bit is 11, the encoding length is 1 byte, and the data storage is an integer value:

  • The value is 11000000, and the data value type is 2-byte int16? T encoding.
  • The value is 11010000, and the data value type is 4-byte int32? T encoding.
  • The value is 11100000, and the data value type is 8-byte Int64 ﹐ t encoding.
  • The value is 11110000 and the data value is 3 bytes (24 bits) when the signed integer is encoded.
  • The value is 1111xxxx, and the value of XXXX is 0001-1101, respectively representing 0-12 integer values, 0001 representing 0, and so on. When the encoding value is in this range, the value of XXXX is the data value, that is, the entry has no data attribute.
  • The value is 11111110, and the data value type is 1 byte (24 bits) when the signed integer is encoded.
  • Note: encoding does not have a value of 11111111, because 11111111 is fixed to the Zlend value (tail node) of ziplist.

For example:

  • Entry with value ‘hello’:

Redis data structure (3) - linked list

  • Entry with Integer ‘2’:

Redis data structure (3) - linked list

2.3 chain update

When ziplist is inserted into a new node, or the content of the node becomes longer, the memory space required for the application needs to be added (ziplistresize function under ziplist. C file, and finally the zrealloc function under object. C file). If the application cannot be added to enough memory, a complete memory will be applied again, and the current ziplist data will be copied to the new memory space.
The prevlen attribute records the length of the previous node: assuming that the length of the first entry1 node of Entry2 is less than 254 bytes, the prevlen of Entry2 only needs 1 byte to save the length; if the content of entry1 changes (or a new node is inserted between entry1 and Entry2; or if the front node of entry1 is changed to entry0 by deleting entry1), if the length exceeds 254 bytes, then the P of Entry2 Revlen cannot save one byte at present, so it needs to be expanded to five bytes. Redis needs to reapply memory space. If the original length of Entry2 is between 250 and 253 bytes, after expansion, the length of Entry2 will exceed 254 bytes, which will lead to the change of entry3. In the worst case, if each node is similar to entry1 and Entry2,Redis needs to continuously reallocate the compressed list (ziplistcaseupdate function under ziplist. C, while loop node, each node will re apply for memory space)
Although the complexity of chained update is high, it will cause performance problems, but the probability of its occurrence is very low.

2.4 advantages and disadvantages
  • The prev and next pointers of LinkedList will take up 16 bytes, and each listnode memory is allocated separately, which will aggravate the fragmentation of memory.
  • Ziplist is a piece of continuous memory with high storage efficiency, but it is not conducive to modification. A realloc may lead to a large number of data copies, especially when the length of ziplist is very long, further reducing the performance.

3. Quecklist

QuickList is the internal implementation of redis list and a two-way linked list of ziplist. Each node of the QuickList is a ziplist, which combines the advantages of LinkedList and ziplist.

// qicklist.h
/* quicklistNode is a 32 byte struct describing a ziplist for a quicklist.
 * We use bit fields keep the quicklistNode at 32 bytes.
 * count: 16 bits, max 65536 (max zl bytes is 65k, so max count actually < 32k).
 * encoding: 2 bits, RAW=1, LZF=2.
 * container: 2 bits, NONE=1, ZIPLIST=2.
 * recompress: 1 bit, bool, true if node is temporarry decompressed for usage.
 * attempted_compress: 1 bit, boolean, used for verifying during testing.
 * extra: 10 bits, free for future use; pads out the remainder of 32 bits */
typedef struct quicklistNode {
    Struct quicklistnode * prev; // points to the previous ziplist node
    Struct quicklistnode * next; // points to the next ziplist node
    Unsigned char * ZL; // if the data pointer is not compressed, it points to the ziplist structure. Otherwise, it points to the quicklistlzf structure 
    Unsigned int SZ; // indicates the total length of the structure pointing to ziplist (memory occupation length)
    Unsigned int count: 16; // indicates the number of data items in ziplist
    Unsigned int encoding: 2; // encoding method, 1 -- ziplist, 2 -- quicklistlzf
    Unsigned int container: 2; // reserved field, data storage method, 1 -- none, 2 -- ziplist. The original design is to indicate whether a QuickList node stores data directly, ziplist or other structures (used as a data container, so it is called container). In the current implementation, this value is a fixed value of 2, indicating that ziplist is used as the data container.
    Unsigned int recompress: 1; // decompress the tag. When viewing a compressed data, you need to decompress it temporarily. Mark this parameter as 1, and then compress it again
    Unsigned int attempted_compress: 1; // test related
    Unsigned int extra: 10; // extension field, temporarily useless
} quicklistNode;
/* quicklistLZF is a 4+N byte struct holding 'sz' followed by 'compressed'.
 * 'sz' is byte length of 'compressed' field.
 * 'compressed' is LZF data with total (compressed) length 'sz'
 * NOTE: uncompressed length is stored in quicklistNode->sz.
 * When quicklistNode->zl is compressed, node->zl points to a quicklistLZF */
Typedef struct quicklistlzf {// indicates a compressed ziplist
    Unsigned int SZ; // number of bytes occupied by lzf after compression
    Char compressed []; // flexible array, which stores the compressed ziplist byte array
} quicklistLZF;
/* quicklist is a 40 byte struct (on 64-bit systems) describing a quicklist.
 * 'count' is the number of total entries.
 * 'len' is the number of quicklist nodes.
 * 'compress' is: -1 if compression disabled, otherwise it's the number
 * of quicklistNodes to leave uncompressed at ends of quicklist.
 * 'fill' is the user-requested (or default) fill factor. */
typedef struct quicklist {
    Quicklistnode * head; // points to the QuickList's head node
    Quicklistnode * tail; // points to the tail node of the QuickList
    Unsigned long count; // the total number of all data items in the list
    Unsigned int len; // sum of all ziplists
    Int fill: 16; // ziplist size limit, given by list Max ziplist size
    Unsigned int compress: 16; // node compression depth setting, given by list compress depth
} quicklist;

How many ziplists is suitable for a QuickList node? For example, it also stores 12 data items, which can be a QuickList containing 3 nodes, and the ziplist of each node contains 4 data items, or a QuickList containing 6 nodes, and the ziplist of each node contains 2 data items.
This is another problem that needs to find a balance point. We only analyze the storage efficiency:

  • The shorter the ziplist on each QuickList node, the more memory fragmentation. There are a lot of memory fragments. It is possible to generate a lot of small fragments that cannot be used in memory, thus reducing the storage efficiency. At the extreme of this situation, the ziplist on each QuickList node contains only one data item, which degenerates into a normal two-way linked list.
  • The longer the ziplist on each QuickList node, the more difficult it is to allocate large contiguous memory space for ziplist. It’s possible that there are lots of small pieces of free space in memory (they add up a lot), but we can’t find a large enough piece of free space to allocate to ziplist. This also reduces storage efficiency. At the extreme of this situation, there is only one node in the whole QuickList, and all data items are allocated in the ziplist of the only node. This actually degenerated into a ziplist.

It can be seen that the ziplist on a QuickList node should be kept at a reasonable length. How reasonable is that? This may depend on the application scenario. In fact, redis provides a configuration parameter, list Max ziplist size, so that users can adjust it according to their own situations.
Let’s explain the meaning of this parameter in detail. It can be positive or negative.
When a positive value is taken, the ziplist length on each QuickList node is limited by the number of data items. For example, when this parameter is configured to 5, it means that the ziplist of each QuickList node contains at most 5 data items.
When a negative value is taken, the ziplist length on each QuickList node is limited according to the number of bytes occupied. At this time, it can only take – 1 to – 5 values, each of which has the following meaning:

  • -5: The ziplist size on each QuickList node cannot exceed 64 kb. (Note: 1KB = > 1024 bytes)
  • -4: The ziplist size on each QuickList node cannot exceed 32 KB.
  • -3: The ziplist size on each QuickList node cannot exceed 16 KB.
  • -2: The ziplist size on each QuickList node cannot exceed 8 KB. (- 2 is the default value given by redis)
  • -1: The ziplist size on each QuickList node cannot exceed 4 KB.

In addition, list is designed to store a long list of data. For example, writing a simple twitter clone with PHP and redis is a tutorial on the official website of redis, which uses a list to store timeline data similar to twitter.
When the list is very long, the data at both ends is likely to be accessed most easily, and the data in the middle is accessed less frequently (the access performance is also very low). If the application scenario conforms to this feature, then list also provides an option to compress the data nodes in the middle, thus further saving memory space. Redis configuration parameterslist-compress-depthIt is used to complete this setting.
This parameter indicates the number of nodes on both ends of a QuickList that are not compressed. Note: the number of nodes here refers to the number of nodes in the QuickList bidirectional linked list, not the number of data items in the ziplist. In fact, if ziplist on a QuickList node is compressed, it is compressed as a whole.
parameterlist-compress-depthThe meaning of the value is as follows:

  • 0: is a special value, indicating no compression. This is the default value for redis.
  • 1: Indicates that there is one node at each end of the QuickList that is not compressed and the node in the middle is compressed.
  • 2: Indicates that there are two nodes at both ends of the QuickList that are not compressed and the nodes in the middle are compressed.
  • 3: Indicates that there are three nodes at both ends of the QuickList that are not compressed, and the nodes in the middle are compressed.
  • And so on

Since 0 is a special value, it is easy to see that the head and tail nodes of QuickList are always uncompressed, so as to facilitate quick access at both ends of the table.
For the compression algorithm of internal nodes in QuickList, redis adopts lzf, a lossless compression algorithm.

// server.h
/* List defaults */
#define OBJ_LIST_MAX_ZIPLIST_SIZE -2
#define OBJ_LIST_COMPRESS_DEPTH 0

The above contents refer to:
Redis design and Implementation
Redis source code analysis series
Redis internal data structure explanation series
Probably the most detailed redis memory model and application interpretation