Read the source code with Dabin – redis 7 – simple dynamic string of object coding

Time:2021-1-14

Redis does not directly use the traditional string representation of C language (an array of characters ending with an empty string), but constructs a string representation calledSimple dynamic stringSDS is used as the default string representation of redis.

In redis, the C string is only used as literal quantity of the string in places where there is no need to modify the string, such as printing log:

serverLog(LL_WARNING,”SIGTERM received but errors trying to shut down the server, check the logs for more information”);

When redis needs more than a literal string, but a string value that can be modified, redis will adapt to SDS to represent the string. For example, in the database, key value pairs containing string values are implemented by SDS at the bottom.

Take the simple set command as an example to execute the following command

redis> SET msg "hello world"
ok

Then, redis will create a new key value pair in the data, where:

  • The key of the key value pair is a string, and the underlying implementation of the object is an SDS that stores the string “MSG”.
  • The value of the key value pair is also a string object, and the underlying implementation of the object is an SDS that stores the string “Hello world”.

In addition to storing string values in the database, SDS is also used as a buffer. The AOF buffer in the AOF module and the input buffer in the client state are implemented by SDS.

Next, let’s take a closer look at SDS.

1 definition of SDS

In SDS. H, we will see the following structure:

typedef char *sds;

As you can see, SDS is equivalent to charType. This is because SDS needs to be compatible with traditional C string saving, so its type is set to char。 However, it should be noted that SDS is not the same as char *, and it also includes a header structure. There are five types of headers in total. The source code is as follows:

struct __ attribute__  ((__ packed__ ))Sdshdr5 {// obsolete
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __ attribute__  ((__ packed__ ))Sdshdr8 {// string type with length less than 2 ^ 8
    uint8_ T len; // length of string saved by SDS
    uint8_ T alloc; // length of SDS allocation
    Unsigned char flags; // flag bits, 1 byte. The lower 3 bits are used to store the type of SDS, and the upper 5 bits are not used
    Char buf []; // real string data stored
};
struct __ attribute__  ((__ packed__ ))Sdshdr16 {// string type with length less than 2 ^ 16
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __ attribute__  ((__ packed__ ))Sdshdr32 {// string type with length less than 2 ^ 32
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __ attribute__  ((__ packed__ ))Sdshdr64 {// string type with length less than 2 ^ 64
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

The reason why there are five types of headers is to enable strings of different lengths to use headers of corresponding sizes to improve memory utilization.

The complete structure of an SDS consists of two adjacent parts in the memory address

  • Header: including the length of the string (len), the maximum capacity (alloc) and flags (excluding sdshdr5).
  • BUF: an array of strings. The length of this array is equal to the maximum capacity plus 1, which stores the real string data.

Figure 1-1 shows an example of SDS

Read the source code with Dabin - redis 7 - simple dynamic string of object coding

In the example, the fields are described as follows:

  • Alloca: the space allocated by SDS. The figure shows that the allocated space size is 10.
  • Len: SDS saves the string size. The figure shows a string of 5 bytes.
  • BUF: the length of this array is equal to the maximum capacity plus 1, which stores the real string data. In the figure, the first five bytes of the number store the five characters’ H ‘,’ e ‘,’ l ‘,’ l ‘and’ o ‘, respectively, while the last byte stores the empty string’ 0 ‘.

SDS follows the convention that the C string ends with a null character, and the size of the null character is not calculated in the len attribute of SDS. In addition, operations such as adding an empty string to the end of the string are automatically performed by SDS functions (related functions in the SDS. C file).

Moreover, following the Convention of null character ending, some functions in C string function library can be reused directly.

For example, we can use it directlyprintf()Function printings->buf
printf("%s", s->buf);
In this way, we can directly use the C function to print the string “redis” without writing the transcoding print function for SDS.

What are the advantages of SDS over C string

In C language, character array with length of N + 1 is used to represent string with length of N, and the last element of character array is always empty character “0”.

This kind of string representation used by C language can’t meet the requirements of redis on string re security, efficiency and function. Therefore, redis has designed SDS to meet its related needs. Next, we will recognize the advantages of SDS over C string from the following aspects:

  1. Get the string length;
  2. Buffer overflow;
  3. The number of memory reallocation when modifying string;
  4. Binary security;

2.1 constant complexity to get string length

Because C string does not record its own length information, in C language, in order to obtain the length of a C string, the program must traverse the whole string until it encounters an empty character representing the end of the string. The complexity of this operation is O (n).

For redis, once it comes across a very long string, useSTRLENCommand, it is easy to affect the system performance.

Different from C string, because SDS records the length of string saved by SDS in len attribute, the complexity of getting a length of SDS is only O (1).

Moreover, the work of setting and updating the length of SDS is automatically completed by the API of SDS during execution, so there is no need to modify the length manually when using SDS.

By using SDS, redis reduces the complexity of getting string length from O (n) to o (1), which ensures that the work of getting string length will not become the performance bottleneck of redis.

2.2 prevent buffer overflow

C string does not record its own length, which not only makes it more complex to get the length of the string, but also makes it easier to get the length of the stringEasy to cause buffer overflow

The language of Cstrcat()Function to splice the contents of SRC string to the end of DeST string:

char *strcat(char *dest, const char *src);

Because the C string does not record its own length, when the strcat function is executed, it is assumed that the user has allocated enough memory for dest to hold all the contents of the SRC string. Once this assumption does not hold, there will be a buffer overflow.

For example, suppose there are two C strings S1 and S2 in the program that are next to each other in memory. S1 stores the string “redis” and S2 stores the string “MySQL”. The storage structure is shown in Figure 2-1

Read the source code with Dabin - redis 7 - simple dynamic string of object coding

If we execute the following statement:

strcat(s1, " 666");

The content of S1 is modified to “redis 666”, but it is not being executedstrcat()If enough space is allocated for S1 before execution, thenstrcat()After that, the data of S1 will be removed to the space where S2 is located, resulting in the content saved in S2 being accidentally modified, as shown in Figure 2-2

Read the source code with Dabin - redis 7 - simple dynamic string of object coding

Different from C string, the space allocation strategy of SDS completely eliminates the possibility of buffer overflow: when the SDS API needs to modify the SDS, the API will first check that the space of SDS meets the requirements of modification, if not, the API will automatically change the SDS So it is not necessary to manually modify the space size of SDS, and there is no buffer overflow problem.

2.3 reduce memory reallocation times

Since the length of the C string slen and the length of the underlying array salen always have the following relationship:

Salen = slen + 1; / / 1 is the length of an empty character

Therefore, every time a C string is increased or shortened, a memory reallocation operation must be performed on the array of C strings

  • Grow string。 The program needs to be reallocated by memoryextendThe size of the space of the underlying array. If this step is omitted, a buffer overflow may occur.
  • Shorten string。 The program needs to be reallocated by memoryreleaseIf you miss this step, you may have a memory leak.

Memory reallocation involves complex algorithms, and system calls may need to be executedMemory reallocation is a time-consuming process

For redis, all time-consuming operations should be optimized. Based on this, SDS for string growth and shortening operations, through theSpace pre allocationandInert space releaseTwo ways to optimize.

2.3.1 space pre allocation

Space pre allocation refers to:When the space of SDS needs to be expanded, the program not only allocates the necessary space, but also allocates additional unused space for SDS

About the spatial expansion of SDS, the source code is as follows:

# sds.c/sdsMakeRoomFor()
...
Newlen = (len + addlen); // latest length of SDS
if (newlen < SDS_ MAX_ Prealloc) // preallocate maximum SDS_ MAX_ Prealloc is defined in SDS. H and its value is 1024 * 1024
    newlen *= 2;
else
    newlen += SDS_MAX_PREALLOC;
...

As can be seen from the source code, space expansion can be divided into two situations:

  • New lengthless thanPreallocation maximum. At this time, the program will directly add unused space of the latest length for SDS. Take chestnut for example. There is a string S1 with a length of 10 bytes. When the string “redis” is added to S1, the program will not only allocate enough space for S1, but also allocate the latest length of pre use space for S1. Therefore, the actual length of S1 becomes:15 + 15 + 1 = 31It’s a byte.
  • New lengthgreater thanPreallocation maximum. At this time, because the latest string is large, the program will not pre allocate so much space, only the maximum space will be pre allocated. Take chestnut for example. There is a 3M string S2. When a 2m string is added to S1, the program will not only add 2m to store the new length, but also allocate 1m (SDS) to S2_ MAX_ Prealloc). Therefore, the actual length of S2 becomes:3M + 2M +1M + 1byte

It is through the pre allocation strategy that redis reduces the number of memory reallocation required to perform string growth operations, and ensures that redis will not lose performance due to string growth operations.

2.3.2 inert space release

Preallocation corresponds to the growth of the string, while space release corresponds to the shortening of the string.

Inert space release refers to:When shortening the SDS, the program does not immediately recycle the shortened bytes and wait for future use

For example, we use chestnutssdstrim()Function to remove all the specified characters in the following SDS:

Read the source code with Dabin - redis 7 - simple dynamic string of object coding

For the SDS in the figure above, execute:
Sdtrim (s, "L"); // remove all 'l' in the SDS string

The SDS will be modified as shown in Figure 2-4

Read the source code with Dabin - redis 7 - simple dynamic string of object coding

As you can see, executionsdstrim()The later SDS does not release the extra 3-byte space, but reserves the 3-byte space as unused space in the SDS for standby.

It is throughStrategy of releasing inert spaceSDS avoids the memory reallocation operation when shortening strings, and provides optimization for possible growth operations in the future.

In addition, SDS also provides the corresponding API, so that we can really release the unused space of SDS when necessary to avoid the waste of memory.

summary

  1. Redis only uses the C string as the literal quantity. In most cases, it uses SDS as the string representation.
  2. Compared with C string, SDS has several advantagesConstant complexity to get string lengthPrevent buffer overflowReduces the number of memory reallocations required to modify strings