Talk about the secret of redis’s ziplist

Time:2021-1-25

Reference for this blog:

Redis deep Adventure: core principles and application practice

Detailed explanation of redis internal data structure (4) — ziplist

Zip list of redis

In the last blog, I gave you a quick introduction to the mysteries of SDS in redis, explaining that there is another very important but often overlooked point about the speed of redis, which is the well-designed data structure of redis. In this blog, I’d like to continue this topic and introduce another underlying data structure of redis: ziplist.

In redis, there are five basic data types. In addition to the string mentioned in the last blog, there are also list, hash, Zset and set. Among them, list, hash and Zset all use ziplist indirectly or directly, so it is very important to understand ziplist.

What does ziplist mean

When I first started reading ziplist, I always felt that the word “zip” was very familiar. It seemed that I often saw it when I used my computer everyday. So I Baidu the following:
image.png
Oh, no wonder so familiar, the original meaning is “compression”, then ziplist can be translated into “compression list”.

Why ziplist

There are two reasons

  • Ordinary two-way linked list, there will be two pointers, in the case of small storage data, the size of the actual data we store may not be as large as the memory occupied by the pointer, is it a bit more than the gain? Moreover, redis is memory based and resident, and memory is precious. Therefore, redis developers must try their best to optimize the memory usage, so ziplist appeared.
  • Linked list in memory, is generally discontinuous, traversal is relatively slow, and ziplist can solve this problem.

Take a look at the existence of ziplist

zadd programmings 1.0 go 2.0 python 3.0 java

Create a Zset with three elements, and then look at the data structure it uses:

debug object  programmings
"Value at:0x7f404ac30c60 refcount:1 encoding:ziplist serializedlength:36 lru:2689815 lru_seconds_idle:9"
HSET website google "www.g.cn

Create a hash with only one element. Take a look at the data structure it uses

debug object website
"Value at:0x7f404ac30ac0 refcount:1 encoding:ziplist serializedlength:30 lru:2690274 lru_seconds_idle:14"

It is clear that both Zset and hash adopt the ziplist data structure.

When certain conditions are met, Zset and hash no longer use the ziplist data structure
image.png

debug object website
"Value at:0x7f404ac30ac0 refcount:1 encoding:hashtable serializedlength:180 lru:2690810 lru_seconds_idle:2"

As you can see, the underlying data structure of hash becomes hashtable.

Szet will not do the experiment, and the interested partners can do the experiment by themselves.

As for what this conversion condition is, let’s talk about it later.

Curious, you will certainly try to see what the underlying data structure of list is and find that it is not ziplist

LPUSH languages python
debug object languages
"Value at:0x7f404c4763d0 refcount:1 encoding:quicklist serializedlength:21 lru:2691722 lru_seconds_idle:22 ql_nodes:1 ql_avg_node:1.00 ql_ziplist_max:-2 ql_compressed:0 ql_uncompressed_size:19"

As you can see, the underlying data structure of list is QuickList, not ziplist.

In the lower version of redis, the underlying data structure of list is ziplist + LinkedList. In the higher version of redis, QuickList replaces ziplist + LinkedList, and QuickList also uses ziplist, so it can be said that list indirectly uses the ziplist data structure. What is this QuickList? It’s not the content of this blog. Let’s leave it alone.

Explore ziplist

Ziplist source code:Ziplist source code

Ziplist source code notes written very clearly, if the English is better, you can directly see the above notes, if your English is not very good, or do not have a certain spirit of research, or look at my blog.

Ziplist layout

...

This is the ziplist layout described in the comments. Let’s look at these fields one by one

  • Zlbytes: a 32bit unsigned integer representing the total number of bytes occupied by ziplist (including4 bytes occupied by itself);
  • Zltail: 32bit unsigned integer, which records the offset of the last entry to locate the last entry quickly;
  • Zllen: 16 bit unsigned integer, recording the number of entries;
  • Entry: several elements stored, which can be byte array or integer;
  • Zlend: the last byte of ziplist is an end tag bit with a fixed value of 255.

Redis can access the fields of ziplist through the following macro definitions:

//Suppose char * ZL points to the first address of ziplist
//Point to the zlbytes field
#define ZIPLIST_BYTES(zl)       (*((uint32_t*)(zl)))

//Point to zltail field (ZL + 4)
#define ZIPLIST_TAIL_OFFSET(zl) (*((uint32_t*)((zl)+sizeof(uint32_t))))

//Point to zllen field (ZL + (4 * 2))
#define ZIPLIST_LENGTH(zl)      (*((uint16_t*)((zl)+sizeof(uint32_t)*2)))

//Points to the first address of the tail element in the ziplist
#define ZIPLIST_ENTRY_TAIL(zl)  ((zl)+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl)))

//Point to the Zlend field, and the constant value is 255 (0xff)
#define ZIPLIST_ENTRY_END(zl)   ((zl)+intrev32ifbe(ZIPLIST_BYTES(zl))-1)

The composition of entry

From the ziplist layout, we can clearly know that our data is saved in each entry in the ziplist. Let’s take a look at the composition of the entries.

Let’s take a look at these three fields

  • Prevlen: the byte length of the previous element, so as to quickly find the first address of the previous element. If the first address of the current element is x, then (x-prevlen) is the first address of the previous element.
  • Encoding: the code of the current element. This field is too complex. Let’s put it later;
  • Entry data: the actual stored data.
prevlen

The prevlen field is variable length:

  • When the length of the previous element is less than 254 bytes, prevlen is represented by one byte;
  • When the length of the previous element is greater than or equal to 254 bytes, prevlen is represented by 5 bytes. At this time, the first byte of prevlen is fixed 254 (0xFE) (as a sign of this situation), and the last 4 bytes represent the length of the previous element.
encoding

Next, I’ll introduce the field encoding. Before that, you can go to the balcony to blow the wind, drink some hot water, take a deep breath, and finally make a psychological preparation, because this field is too complicated. If you can’t do it well, you’ll vomit all of a sudden… If you really can’t understand it, just skip this paragraph.

In order to save space, redis makes a rather complicated design for the encoding field. Redis uses encoding to determine the type of stored data. Let’s take a look at how redis determines the type of stored data according to encoding

  1. 00xxxxxxThe maximum length of the short string is 63 bits, and the following six bits store the number of digits of the string;
  2. 01xxxxxx xxxxxxxxMedium length string, followed by 14 bits to indicate the length of the string;
  3. 10000000 aaaaaaaa bbbbbbbb cccccccc ddddddddExtra large string, need to use extra 4 bytes to represent the length. The first byte prefix is10The remaining 6 bits are not used and are set to zero;
  4. 11000000Denotes int16;
  5. 11010000Denotes int32;
  6. 11100000Denotes Int64;
  7. 11110000Denotes int24;
  8. 11111110Denotes int8;
  9. 11111111Indicates the end of ziplist, that is, the value of Zlend is 0xff;
  10. 1111xxxxRepresents a minimal integer. The range of XXX can only be(0001~1101)That is to say1~13

If it is the tenth case, the composition of entry will change

Because the data is already stored in the encoding field.

It can be seen that redis determines whether the stored data is a string (byte array) or an integer according to the first two bits of the encoding field. If it is a string, it can also determine the length of the string through the first two bits of the encoding field; if it is an integer, it needs to determine the specific length through the following bits.

The structure of entry

We have said so many things about entry above. What we are going to say below may overturn your three views. We can see the structure of entry in the source code. There is a very important comment on it:

/* We use this function to receive information about a ziplist entry.
 * Note that this is not how the data is actually encoded, is just what we
 * get filled by a function in order to operate more easily. */
typedef struct zlentry {
    unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/
    unsigned int prevrawlen;     /* Previous entry len. */
    unsigned int lensize;        /* Bytes used to encode this entry type/len.
                                    For example strings have a 1, 2 or 5 bytes
                                    header. Integers always use a single byte.*/
    unsigned int len;            /* Bytes used to represent the actual entry.
                                    For strings this is just the string length
                                    while for integers it is 1, 2, 3, 4, 8 or
                                    0 (for 4 bit immediate) depending on the
                                    number range. */
    unsigned int headersize;     /* prevrawlensize + lensize. */
    unsigned char encoding;      /* Set to ZIP_STR_* or ZIP_INT_* depending on
                                    the entry encoding. However for 4 bits
                                    immediate integers this can assume a range
                                    of values and must be range-checked. */
    unsigned char *p;            /* Pointer to the very start of the entry, that
                                    is, this points to prev-entry-len field. */
} zlentry;

Focus on the notes above. In a word: Although this structure is defined, it is not used, because if it is used in this way, the memory occupied by entry will be too large.

The storage form of ziplist

Redis does not encapsulate a structure to save ziplist like the SDS introduced in the last blog. Instead, it defines a series of macros to operate on the data. That is to say, ziplist is a heap of byte data. The layout of ziplist and the layout of entry in ziplist mentioned above are just abstract concepts.

Why not always ziplist

Compared with the previous part of the article, we have done experiments to prove that after certain conditions are met, the underlying storage structure of Zset and hash is no longer ziplist. Since ziplist is so powerful and redis developers have also spent so much energy on the design of ziplist, why can’t the underlying storage structure of Zset and hash be ziplist all the time?
Because ziplist is a compact storage, there is no redundant space, which means that the newly inserted elements need to expand the memory

  • Allocate new memory and copy original data to new memory;
  • Expand the original memory.

Therefore, ziplist is not suitable for storing large strings and too many elements.

Ziplist storage boundary

So what conditions are met, the underlying storage structure of Zset and hash is no longer ziplist? You can set it in the configuration file

Hash Max ziplist entries 512 if the number of hash elements exceeds 512, it must be stored in a standard structure
Hash Max ziplist value 64 # if the length of key / value of any element of hash exceeds 64, it must be stored in a standard structure
Zset Max ziplist entries 128 # if the number of elements of Zset exceeds 128, it must be stored in a standard structure
If the length of any element of Zset Max ziplist value 64 # Zset exceeds 64, it must be stored in a standard structure

For this configuration, I’m just a porter, and I didn’t experiment with it. After all, no one will modify it. Interested partners can experiment with it.

There are too many ziplist elements. What should I do

When introducing the layout of ziplist, ziplist uses two bits to record the number of elements in ziplist. If there are too many elements and two bits are not enough, what should we do? In this case, the number of ziplist elements can only be traversed.

You can see that redis is not as simple as you think. There are many things to study and it’s very complicated. If we don’t learn, we may feel that we have mastered redis completely. But once we start to learn, we find that what we have mastered before is only superficial. The more you know, the more you don’t know.

This blog ends here.

Recommended Today

JS function

1. Ordinary function Grammar: Function function name (){ Statement block } 2. Functions with parameters Grammar: Function function name (parameter list){ Statement block } 3. Function with return value Grammar: Function function name (parameter list){ Statement block; Return value; } Allow a variable to accept the return value after calling the function Var variable name […]