All videos: https://segmentfault.com/a/11…
At first glance, we may not know who ziplist is. But if I say list, hash and Zset, you will be familiar with them. Ziplist is one of the underlying implementations of these data structures
- List: before 3.2. X, it is (ziplist + LinkedList) and after it is QuickList
- Hash: use ziplist when the amount of data is small, and hashtable (dict) when the amount of data is large
- Zset: use ziplist when the amount of data is small, and skiplist when the amount of data is large
We can see that ziplist is always used in a list, hash and ordered collection structure when the amount of data stored is small. With the growth of the amount of data, it will be converted to the corresponding more complex type. We can guess that ziplist is a lightweight, simple and memory intensive data structure. It can solve the storage problem when the amount of redis data is small.
The structure of ziplist
In the design idea of redis, in most cases, it is based onTime for space。 Because redis is based on memory and memory resources are very valuable, the “cost performance” of saving space is obviously higher than saving time. In the process of learning data structures and algorithms, we often compare arrays and linked lists together. Because the array uses a continuous memory, and the linked list is divided into pointer field and data field,The space utilization of array is obviously higher than that of linked list。 Referring to the above design ideas, if we design the structure of ziplist ourselves, how can we think about it?
- A continuous memory space is needed to store real data
- Some additional information fields are needed to record its length, end flag, total amount of data and other auxiliary information
In ziplist, it is stored according to the following structure. Does it meet your expectations?
The meaning of each field is as follows:
- Zlbytes: 4 bytes. Records the total number of bytes occupied by the compressed list, which is used in memory reallocation of the compressed list
- Zltail: 4 bytes. You can quickly locate to the end of the linked list through this field
- Zllen: 2 bytes. How many entries are there in the record
- Entry: the content of specific data exists here
- Zlend: 1 byte. Mark the end of a compressed list with the hexadecimal value of 0xff
The specific data is stored in the entry. In ziplist, two kinds of data can be stored:
- String (byte array)
For example, in the case of a small amount of data, a Zset will set the element name and score byfrom small to largeThe order of is continuously stored in the entry of ziplist
So the question is, when we read data, how do we know whether we should read it in the way of reading string or integer type? We need to know the type of data stored in the current entry, that is, an encoding field, which is used to identify the type of the current entry data.
In addition, when we search for an element, we need to traverse it. How do we traverse it in ziplist? Think back to the way we traverse an array:
The traversal of ordinary array is based on the data stored in the arraydata typeTo find the next element, such as an array of type int (also a pointer). When accessing the next element, you only need to add the pointer offset of the corresponding type each time (if you use the subscript method to represent the array, P  to p  is equivalent to the process of P + 1 * sizeof (int); if you use the pointer method, you can use p + 1 to represent it, which is also equivalent to P + 1 * sizeof (int))
So in ziplist, we don’t know the data type or the length of the data. How can we move the pointer back? This requires a field to complete this task, here is previous_ entry_ Length, which records the length of the previous entry and can be used to complete the traversal of the compressed list
Finally, the most important field is content, which stores real data
Let’s continue to draw the structure of entry with the example above
- previous_entry_length: recorded in the compressed listThe length of the previous entry。 occupy1or5byte
- encoding: represents the data stored in the current entrytypeRelationship with datalength。 Occupies 1, 2, or 5 bytes
- content: trueStore dataThe place where
Traversal of ziplist
Traversal is the basis of the search operation, learning any kind of data structure, traversal is the key.
Forward traversal ziplist: first, the pointer P is at the beginning of the first entry, that is, previous_ entry_ The location of the length field. Because of previous_ entry_ Length may take up 1 byte or 5 bytes, so we need to know how to distinguish whether this field takes up 1 byte or 5 bytes. The expression is as follows:
- If the length of the previous entry is less than 254 bytes, previous_ entry_ Length is represented by 1 byte
- If the length of the previous entry is greater than or equal to 254 bytes, previous_ entry_ Length is expressed in 5 bytes. Note that the first byte is the fixed flag 0xFE (254), and the last four bytes are used to indicate the length of the previous entry
- In this way, we can know that: since our current pointer is of the type of unsigned char * (see the source code), the pointer operation p + 1 is equal to an offset of 1 byte (i.e. 8 bits). Therefore, we only need to read the content of the first byte of the current pointer, that is, whether the value of P  is in the binary range of 00000000 ~ 11111110 (0 ~ 254). If it is in this range, it means previous_ entry_ Length takes only one byte. Using P + 1, you can get the first address of encoding. If the value of P  is 11111111 (255), it means previous_ entry_ Length takes up five bytes, and the first address of encoding can also be obtained by using P + 5.
- Now our pointer comes to the starting address of the encoding field. So, how does the encoding field store the data type and length? In order to save the memory space occupied by encoding field, all character array (string) types and integer types are distinguished according to the following encoding methods:
- Looking at the encoding method of encoding in the figure above, we find that the length of encoding field can be obtained only by reading the content that the current pointer position is shifted back two bits. (00, 11 occupy 1 byte; 01 occupy 2 bytes; 10 occupy 5 bytes). Then our corresponding P + 1, P + 2, P + 5 can shift the pointer to the position of content.
- Because we know the length of the data type of the content field (such as int16) in the encoding field, and then offset the pointer back from the corresponding length of the data type stored in the previous encoding field, we can offset it to the end of the content field. If there are more than one entry structure in the future, the same is true. In this way, the positive order traversal of ziplist is successfully realized.
Because of previous_ entry_ For the existence of the length field, we first take out the external zltail field, that is, the pointer to the end of the ziplist structure, and then subtract the previous from the entry again and again_ entry_ The value of the length field can shift the pointer to the head of the ziplist. The principle is very simple. I believe everyone can understand it. I won’t repeat it. So we can find that ziplist is more suitable for traversing from the back to the front.
The root cause of redis code conversion
- In fact, ziplist refers to the idea of array, while skiplist refers to the idea of linked list. Whether traversing forward or backward, or inserting or deleting ziplist, you need to move the following elements back or forward. The complexity of all operations is O (n). Compared with the time complexity of searching o (1) in dict + skiplist and the O (logn) complexity of inserting and deleting, ziplist has no advantage in efficiency. However, as we have said before, the design idea of redis is generally to trade time for space. Therefore, compared with skiplist, which needs to maintain a large number of pointers and duplicate data on multiple layers (skiplist takes up 2n of space, please see the previous note for details), ziplist has all the advantages in space complexity.
- However, we have to say that ziplist still has disadvantages in terms of time complexity, so we can’t enlarge the disadvantages infinitely, but should make full use of the advantages and avoid the disadvantages. Therefore, redis developers have repeatedly considered this point. Take Zset for example. Only if the following two conditions are met will ziplist be used for encoding. Otherwise, skiplist will be used for encoding
- The number of elements stored in the object in Zset is no more than 128
- The length of all element members stored in Zset is less than 64 bytes
- In this way, because ziplist only processes a small amount of data, the time complexity O (n) is very small when ziplist processes a small amount of data. Therefore, in this way, we can minimize the impact of its time complexity and maximize the advantage of its space complexity, which is the fundamental reason why we need to carry out code conversion.
So far as the key points of ziplist are concerned. As for the specific source code of its addition, deletion and modification, interested readers can go to ziplist. C to have an in-depth look at it. It is not meaningful for the author to copy and paste it again in this article. In the process of learning, I read a lot of materials, but the quality of the content is uneven. Here I want to say that when we learn a new kind of knowledge, we should not only know what it looks like, but also know why it is like this and why it does not adopt other alternatives? What are its comparative advantages? Instead of simply piling up concepts. In learning at the same time, if not through their own thinking, little effect.