Introduction:Introduction this paper is a collation and explanation of data structure and object related contents in redis design and Implementation (Second Edition). This article only deals with object structure, one kind of object – String object. And the two corresponding encodings of string objects, raw and embstr, are introduced in detail.
This article is to sort out and explain the data structure and object related contents in redis design and Implementation (Second Edition). This article only deals with object structure, one kind of object – String object. And the two corresponding encodings of string objects, raw and embstr, are introduced in detail. Express some of my thoughts and views, and hope more friends to discuss, share and exchange.
Yunzhe technology – database team
String objects can store integers, floating-point numbers, and strings. The specific strategies are:
When storing integers, the code used is int, and the underlying data structure can be used to store long type integers;
When storing a string, if the length of the string is less than or equal to 32 bytes, it will be stored in the format encoded as embstr; If the length of the string is greater than 32 bytes, it will be stored in the SDS format encoded as raw;
When storing floating-point numbers, the floating-point number will be converted into a string first. If the length of the converted string is less than 32 bytes, it will be stored in the format of embstr; otherwise, it will be stored in the SDS format of raw.
The following figure shows the structure of a string object. The object structure is on the left, and the raw encoded SDS data structure (sdshdr) is on the middle and right. An example figure:
Raw encoding, simple dynamic string SDS
Redis does not use the traditional string of C language, but builds its own simpledynamic string (SDS).
When redis prints log information or outputs error messages, the output string is a string literal that will not be modified. In this case, the traditional string of C language is used to store these information. When redis needs to store strings that can be modified, it will use the SDS structure.
In addition to storing string values in the database, SDS is also used as a buffer: the AOF buffer in the AOF module and the input buffer in the client state are all implemented by SDS.
Structure of SDS
SDS structure diagram is as follows:
Sdshdr is the name of the data structure, namely SDS, where:
The buf attribute is a byte array used to save the string. The following arrow corresponds to the actual saved string content, and ends with ‘0’ empty string;
The len attribute records the actual number of bytes used in the buf array, which is equal to the length of the string saved in the SDS;
The free attribute records the number of unused bytes in the buf array.
1、 The string length can be obtained with the complexity of O (1)
The len attribute of SDS records the length of the string, while the traditional C string needs to traverse the entire string to know the length. Compared with the traditional C string, the complexity required for redis to obtain the string length is reduced from O (n) to o (1).
Even if you repeatedly execute the strlen command (get the string length) on a very long string, it will not cause excessive performance consumption.
2、 Eliminate buffer overflow
In the traditional C string, if you want to modify the content of the string, but the length of the modified string exceeds the original length, overflow will occur. See the following figure for details:
In SDS, when the content stored in the buf byte array needs to be modified (added or deleted), the API will first check whether the SDS space is sufficient through the free and Len attributes. If not, the SDS will automatically expand the space and then modify the content. See “space pre allocation” below for the strategy of automatically expanding space.
3、 Reduce the number of memory reallocations required to modify string length
For legacy C strings:
If the operation is to increase the string, such as append, the program needs to expand the space of the underlying data through memory reallocation before executing the command – otherwise, a buffer overflow will occur.
If you perform an operation to shorten a string, such as a truncation operation (trim), after this operation, the program needs to free the space that the string is no longer used through memory reallocation – otherwise, a memory leak will occur.
For the SDS structure in redis:
Memory reallocation is a time-consuming operation with complex algorithms. As a database with strict speed requirements and frequent data execution, redis requires a memory reallocation every time the string is modified, which will seriously affect the performance.
Using SDS, the buf array can contain unused bytes. The number of these bytes is recorded by the free attribute, which can reduce the number of memory reallocations required to modify the string length.
Space pre allocation and inert space release
Through the unused space defined by the free attribute in SDS, SDS can implement two optimization strategies: space pre allocation and inert space release:
1. Space pre allocation strategy — can reduce memory reallocation caused by string growth operation
When the content of the SDS needs to be modified and the space needs to be expanded, the program will not only allocate the necessary space for the SDS modification, but also allocate additional unused space for the SDS.
The amount of extra unused space allocated is determined by the following formula:
If the length of the SDS (i.e. the value of the len attribute) will be less than 1MB after the SDS is modified, the program will allocate unused space of the same size as the len attribute. At this time, the value of the SDS len attribute will be the same as the value of the free attribute.
If the SDS length will be greater than or equal to 1MB after modification, the program will allocate 1MB of unused space.
If the content is continuously added to the end of a string, when the overall size of the string is greater than 1MB, even if only one byte of characters is added, the program will allocate an additional 1MB of space. When another byte of characters is added, the program will not allocate another 1MB of space, but use the existing free space.
That is, before expanding the space, it will check whether the unused space is enough. If it is enough, it will not be expanded.
Through the space pre allocation strategy, SDS reduces the number of memory reallocation required to continuously increase the string for N times from a certain n times to a maximum of N times.
2. Lazy space release strategy — can reduce memory reallocation caused by string shortening operation
When the string length in SDS is shortened, the program will not immediately use memory reallocation to reclaim the extra byte space after shortening. Instead, it uses the free attribute to record the number of these bytes for future use.
Of course, redis provides corresponding commands to really free these unused spaces and avoid unnecessary memory waste.
4、 Binary security
The characters in the C string must conform to a certain code (such as ASCII). Besides the end of the string, the string cannot contain empty characters. If the string has other empty characters besides the end, the empty character first read by the program will be mistaken as the end of the string. These restrictions make the C string only save text data, but not pictures, audio, video Compress binary data such as files.
To ensure that redis can be used in different usage scenarios, the SDS APIs are binary safe. All SDS APIs will process the data stored in the buf array in the SDS in a binary way. The program will not make any restrictions, filters or assumptions on the data. The data is what it looks like when it is written and when it is read.
This is why the buf attribute of RDS is called a byte array – redis does not use this array to save characters, but a series of binary data.
5、 Compatible Part C string function
SDS follows the Convention of empty string termination. The advantage is that functions in the C string function library can be reused directly, thus avoiding unnecessary code duplication.
If the string object stores a string with a length less than or equal to 32 bytes, embstr encoding will be used. Embstr encoding is an optimized encoding method specially used to save short strings. String objects corresponding to embstr encoding and raw encoding are composed of redisobject and sdshdr.
The difference is that the raw encoded string object will call the memory allocation function twice to create the redisobject structure and the sdshdr structure respectively, while the embstr encoded string object will allocate a continuous space by calling the memory allocation function once. The space contains the redisobject and sdshr structures at one time. The embstr encoded string object structure is as follows:
The difference between the two
When an embstr encoded string object executes a command, it produces the same effect as when a raw encoded string object executes a command. However, using an embstr encoded string object to save short string values has the following advantages:
1. The embstr encoding reduces the number of memory allocations required to create a string object from two raw encoding to one;
2. To release an embstr encoded string object, you only need to call the memory release function once, while to release a raw encoded string object, you need to call the memory release function twice;
3. All data of the string object encoded by embstr is stored in a continuous memory, and the structure is more compact. The raw encoding is decentralized. The redisobject object structure and the sdshdr data structure are associated with each other by pointers. The object encoded by embstr can make better use of the advantages brought by the cache than the object encoded by raw.
Int encoded string objects and embstr encoded string objects will be converted to raw encoded string objects when the conditions are met. The encoding command can view the value corresponding to the key and the underlying encoding.
Int to raw
For an int encoded string object, if we execute some commands to the object so that the object stores a string value instead of an integer value, the encoding of the string object will change from int to raw.
188.8.131.52:6379> set a 100 // set a=100 OK 127.0.0.1:6379> object encoding a // view the encoding of the value stored in key a "int" 127.0.0.1:6379> append a'a'// append the content' a 'to the value of key A. at this time, the value stored in key a will become a string type (integer) 4 127.0.0.1:6379> get a // query the value of key a "100a" 127.0.0.1:6379> object encoding a // view the code corresponding to the value stored in key A. It is found that the code has changed to raw format, indicating that the stored value is a string "raw"
An int encoded string that stores integers of type long, ranging from 2^63-1 (the 63rd power of 2 minus one) to -2^63 (the 63rd power of 2). When the stored integer is within this range, it is encoded as int. when the value exceeds this range, the encoding will be converted to embstr.
184.108.40.206:6379> set number1 9223372036854775807 OK 127.0.0.1:6379> object encoding number1 "int" 127.0.0.1:6379> set number2 9223372036854775808 OK 127.0.0.1:6379> object encoding number2 "embstr" 127.0.0.1:6379> set number3 -9223372036854775808 OK 127.0.0.1:6379> object encoding number3 "int" 127.0.0.1:6379> set number4 -9223372036854775809 OK 127.0.0.1:6379> object encoding number4 "embstr"
Embstr to raw
Embstr encoded string objects cannot be modified (redis does not write any response modification program for embstr encoded string objects). Only int and raw encoded string objects can be modified. Therefore, embstr encoded strings are actually read-only.
When any modification command is executed on an embstr encoded string object, the program will first convert the encoding of the object from embstr to raw, and then execute the modification command. So once the embstr encoded string is modified, its data structure will change to raw encoded format.
127.0.0.1:6379> set a 'ab' OK 127.0.0.1:6379> object encoding a "embstr" 127.0.0.1:6379> append a 'c' (integer) 3127.0.0.1:6379> get a "abc" 127.0.0.1:6379> object encoding a "raw"
The above is part of the sorting and sharing based on the data structure and object related content in redis design and Implementation (Second Edition). You are welcome to participate in the discussion and communication.
Yunzhe technology, an enterprise focusing on cloud hosting (MSP) services