Reading redis source code — SDS (1)

Time:2021-9-15

Simple dynamic string SDS

Redis does not directly use C’s string (a character array ending with null characters), but builds an abstract type of string (simple dynamic string, SDS).

Key value pairs containing string values are implemented by SDS

127.0.0.1:6379> set msg "hello wordl"
OK

The key is a character object, and the bottom layer is an SDS that saves the string "MSG"
The value is also a string object, and the bottom layer is an SDS that saves the string "Hello world"

127.0.0.1:6379> rpush fruits "apple" "banana" "cherry"
(integer) 3
The value here is a list object, in which there are three SDS objects

In addition to storing strings, SDS is also used as a buffer: the buffer of AOF. The input buffer of client state is implemented by SDS

Definition of SDS

/*
 *Type alias that points to the buf property of sdshdr
 */
typedef char *sds;

/*
 *Saves the structure of a string object
 */
struct sdshdr {
    
    //Length of occupied space in buf
    int len;

    //Length of free space remaining in buf
    int free;

    //Data space
    char buf[];
};

The buf here is a char array, so the last byte still holds the terminator '\ 0'
If you want to access the string redis five characters 
The actual stored characters in buf are 'R', 'e','d ',' I ','s' and' \ 0 '  
But in fact, one advantage of doing this in the extra len is that SDS can reuse only part of the functions of the C string.
For example, you can print directly with printf

C string because the string representing n with N + 1 does not meet redis’s requirements for string re security and efficiency.

Constant complexity gets the length of the string

The string of C does not record its own length information, so the length of the query obtained by SDS is O (1)

Therefore, the length of very long strings can be obtained directly by executing strlen, which will not affect the system performance

127.0.0.1:6379> strlen msg
(integer) 11

Eliminate buffer overflow

C string is easy to cause buffer overflow

< string. H > / strcat can splice the contents of SRC string to the end of DeST string
char *strcat(char *dest,const char *src)

Because the C string does not record its own length, strcat assumes that when the user executes this function again, enough memory has been allocated for dest to continue to accommodate the contents of the SRC string. If it is not satisfied, a buffer overflow will occur.
    
(C language always thinks that its own programmers are the smartest, and any practice is meaningful and correct)

When the SDS API needs to modify the SDS, it will first check whether the SDS space meets the requirements for modification. If not, the API will automatically expand the SDS space to the required size.

SDS initialization

sds sdsnewlen(const void *init, size_t initlen) {

    struct sdshdr *sh;

    //Select the appropriate memory allocation method according to whether there is initialization content
    // T = O(N)
    if (init) {
        //Zmalloc does not initialize the allocated memory
        sh = zmalloc(sizeof(struct sdshdr)+initlen+1);
    } else {
        //Zcalloc initializes all allocated memory to 0
        sh = zcalloc(sizeof(struct sdshdr)+initlen+1);
    }

    //Memory allocation failed, return
    if (sh == NULL) return NULL;

    //Set initialization length
    sh->len = initlen;
    //The new SDS does not reserve any space
    sh->free = 0;
    //If there are initialization contents specified, copy them to the buf of sdshdr
    // T = O(N)
    if (initlen && init)
        memcpy(sh->buf, init, initlen);
    //End with \ 0
    sh->buf[initlen] = '
sds sdsnewlen(const void *init, size_t initlen) {
struct sdshdr *sh;
//Select the appropriate memory allocation method according to whether there is initialization content
// T = O(N)
if (init) {
//Zmalloc does not initialize the allocated memory
sh = zmalloc(sizeof(struct sdshdr)+initlen+1);
} else {
//Zcalloc initializes all allocated memory to 0
sh = zcalloc(sizeof(struct sdshdr)+initlen+1);
}
//Memory allocation failed, return
if (sh == NULL) return NULL;
//Set initialization length
sh->len = initlen;
//The new SDS does not reserve any space
sh->free = 0;
//If there are initialization contents specified, copy them to the buf of sdshdr
// T = O(N)
if (initlen && init)
memcpy(sh->buf, init, initlen);
//End with \ 0
sh->buf[initlen] = '\0';
//Returns the buf part, not the entire sdshdr
return (char*)sh->buf;
}
'; //Returns the buf part, not the entire sdshdr return (char*)sh->buf; }
sds sdscat(sds s, const char *t) {
    return sdscatlen(s, t, strlen(t));
}

//Docking function
sds sdscatlen(sds s, const void *t, size_t len) {
    
    struct sdshdr *sh;
    
    //Original string length
    size_t curlen = sdslen(s);

    //Expand SDS space
    // T = O(N)
    s = sdsMakeRoomFor(s,len);

    //Out of memory? Direct return
    if (s == NULL) return NULL;

    //Copy the contents of t to the end of the string
    // T = O(N)
    sh = (void*) (s-(sizeof(struct sdshdr)));
    memcpy(s+curlen, t, len);

    //Update properties
    sh->len = curlen+len;
    sh->free = sh->free-len;

    //Add a new ending symbol
    s[curlen+len] = '
sds sdscat(sds s, const char *t) {
return sdscatlen(s, t, strlen(t));
}
//Docking function
sds sdscatlen(sds s, const void *t, size_t len) {
struct sdshdr *sh;
//Original string length
size_t curlen = sdslen(s);
//Expand SDS space
// T = O(N)
s = sdsMakeRoomFor(s,len);
//Out of memory? Direct return
if (s == NULL) return NULL;
//Copy the contents of t to the end of the string
// T = O(N)
sh = (void*) (s-(sizeof(struct sdshdr)));
memcpy(s+curlen, t, len);
//Update properties
sh->len = curlen+len;
sh->free = sh->free-len;
//Add a new ending symbol
s[curlen+len] = '\0';
//Return to new SDS
return s;
}
//Expansion function
sds sdsMakeRoomFor(sds s, size_t addlen) {
struct sdshdr *sh, *newsh;
//Get s current free space length
size_t free = sdsavail(s);
size_t len, newlen;
//S the current free space is enough. There is no need to expand it and return directly
if (free >= addlen) return s;
//Gets the length of S's currently occupied space
len = sdslen(s);
sh = (void*) (s-(sizeof(struct sdshdr)));
//S minimum required length
newlen = (len+addlen);
//The size required to allocate new space for S based on the new length
if (newlen < SDS_MAX_PREALLOC)
//If the new length is less than SDS_ MAX_ PREALLOC 
//Then allocate it twice the required length of space
newlen *= 2;
else
//Otherwise, the allocation length is the current length plus SDS_ MAX_ PREALLOC
newlen += SDS_MAX_PREALLOC;
// T = O(N)
newsh = zrealloc(sh, sizeof(struct sdshdr)+newlen+1);
//Insufficient memory, allocation failed, return
if (newsh == NULL) return NULL;
//Update the free length of SDS
newsh->free = newlen - len;
//Return SDS
return newsh->buf;
}
void* __cdecl memcpy(
_Out_writes_bytes_all_(_Size) void* _Dst,
_In_reads_bytes_(_Size)       void const* _Src,
_In_                          size_t      _Size
);
//Get free space of SDS
static inline size_t sdsavail(const sds s) {
struct sdshdr *sh = (void*)(s-(sizeof(struct sdshdr)));
return sh->free;
}
//Memory bytes used
static size_t used_memory = 0;
//Thread safe 0 = safe 1 = unsafe
static int zmalloc_thread_safe = 0;
//Update used_ Mutex used in memory
pthread_mutex_t used_memory_mutex = PTHREAD_MUTEX_INITIALIZER;
void *zrealloc(void *ptr, size_t size) {
#ifndef HAVE_MALLOC_SIZE
void *realptr;
#endif
size_t oldsize;
void *newptr;
if (ptr == NULL) return zmalloc(size);
#ifdef HAVE_MALLOC_SIZE
oldsize = zmalloc_size(ptr);
newptr = realloc(ptr,size);
if (!newptr) zmalloc_oom_handler(size);
update_zmalloc_stat_free(oldsize);
update_zmalloc_stat_alloc(zmalloc_size(newptr));
return newptr;
#else
realptr = (char*)ptr-PREFIX_SIZE;
oldsize = *((size_t*)realptr);
newptr = realloc(realptr,size+PREFIX_SIZE);
if (!newptr) zmalloc_oom_handler(size);
*((size_t*)newptr) = size;
update_zmalloc_stat_free(oldsize);
update_zmalloc_stat_alloc(size);
return (char*)newptr+PREFIX_SIZE;
#endif
}
'; //Return to new SDS return s; } //Expansion function sds sdsMakeRoomFor(sds s, size_t addlen) { struct sdshdr *sh, *newsh; //Get s current free space length size_t free = sdsavail(s); size_t len, newlen; //S the current free space is enough. There is no need to expand it and return directly if (free >= addlen) return s; //Gets the length of S's currently occupied space len = sdslen(s); sh = (void*) (s-(sizeof(struct sdshdr))); //S minimum required length newlen = (len+addlen); //The size required to allocate new space for S based on the new length if (newlen < SDS_MAX_PREALLOC) //If the new length is less than SDS_ MAX_ PREALLOC //Then allocate it twice the required length of space newlen *= 2; else //Otherwise, the allocation length is the current length plus SDS_ MAX_ PREALLOC newlen += SDS_MAX_PREALLOC; // T = O(N) newsh = zrealloc(sh, sizeof(struct sdshdr)+newlen+1); //Insufficient memory, allocation failed, return if (newsh == NULL) return NULL; //Update the free length of SDS newsh->free = newlen - len; //Return SDS return newsh->buf; } void* __cdecl memcpy( _Out_writes_bytes_all_(_Size) void* _Dst, _In_reads_bytes_(_Size) void const* _Src, _In_ size_t _Size ); //Get free space of SDS static inline size_t sdsavail(const sds s) { struct sdshdr *sh = (void*)(s-(sizeof(struct sdshdr))); return sh->free; } //Memory bytes used static size_t used_memory = 0; //Thread safe 0 = safe 1 = unsafe static int zmalloc_thread_safe = 0; //Update used_ Mutex used in memory pthread_mutex_t used_memory_mutex = PTHREAD_MUTEX_INITIALIZER; void *zrealloc(void *ptr, size_t size) { #ifndef HAVE_MALLOC_SIZE void *realptr; #endif size_t oldsize; void *newptr; if (ptr == NULL) return zmalloc(size); #ifdef HAVE_MALLOC_SIZE oldsize = zmalloc_size(ptr); newptr = realloc(ptr,size); if (!newptr) zmalloc_oom_handler(size); update_zmalloc_stat_free(oldsize); update_zmalloc_stat_alloc(zmalloc_size(newptr)); return newptr; #else realptr = (char*)ptr-PREFIX_SIZE; oldsize = *((size_t*)realptr); newptr = realloc(realptr,size+PREFIX_SIZE); if (!newptr) zmalloc_oom_handler(size); *((size_t*)newptr) = size; update_zmalloc_stat_free(oldsize); update_zmalloc_stat_alloc(size); return (char*)newptr+PREFIX_SIZE; #endif }
//Update the number of memory bytes used when zmalloc allocates memory under non thread safe tuning
#define update_zmalloc_stat_add(__n) do { \
    pthread_mutex_lock(&used_memory_mutex); \
    used_memory += (__n); \
    pthread_mutex_unlock(&used_memory_mutex); \
} while(0)

#define update_zmalloc_stat_sub(__n) do { \
    pthread_mutex_lock(&used_memory_mutex); \
    used_memory -= (__n); \
    pthread_mutex_unlock(&used_memory_mutex); \
} while(0)

#endif

//Zmalloc and zcalloc update the number of bytes of memory used after allocating memory
#define update_zmalloc_stat_alloc(__n) do { \
    size_t _n = (__n); \
    if (_n&(sizeof(long)-1)) _n += sizeof(long)-(_n&(sizeof(long)-1)); \
    if (zmalloc_thread_safe) { \
        update_zmalloc_stat_add(_n); \
    } else { \
        used_memory += _n; \
    } \
} while(0)

#define update_zmalloc_stat_free(__n) do { \
    size_t _n = (__n); \
    if (_n&(sizeof(long)-1)) _n += sizeof(long)-(_n&(sizeof(long)-1)); \
    if (zmalloc_thread_safe) { \
        update_zmalloc_stat_sub(_n); \
    } else { \
        used_memory -= _n; \
    } \
} while(0)


You can see the update_ zmalloc_ stat_ Alloc is responsible for adding used after allocating memory_ Value of memory, update_ zmalloc_ stat_ Free is responsible for reducing used after freeing memory_ Value of memory, input parameter_ N is the newly added or reduced memory. Within these two macro definitions, there are two situations: thread safe and unsafe. When unsafe, mutual exclusive access through thread lock is required.
    
 For if (_n& (sizeof (long) - 1))_ n += sizeof(long)-(_n&(sizeof(long)-1)); \

     Its main function is to allocate or release memory_ If n is not an integer multiple of the number of bytes of long type, adjust it upward to an integer multiple of sizeof (long), and finally ensure used_ Memory is an integer multiple of sizeof (long).
     
     
     
#define PREFIX_SIZE (sizeof(size_t))
//Zmalloc: allocate memory and allocate prefix when allocating_ Size is used to record the number of bytes currently allocated memory
 
void *zmalloc(size_t size) {
    void *ptr = malloc(size+PREFIX_SIZE);

    if (!ptr) zmalloc_oom_handler(size);
#ifdef HAVE_MALLOC_SIZE
    update_zmalloc_stat_alloc(zmalloc_size(ptr));
    return ptr;
#else
    *((size_t*)ptr) = size;
    update_zmalloc_stat_alloc(size+PREFIX_SIZE);
    return (char*)ptr+PREFIX_SIZE;
#endif
}

//Whether to malloc the allocated space and update the number of bytes used in memory
void zfree(void *ptr) {
#ifndef HAVE_MALLOC_SIZE
    void *realptr;
    size_t oldsize;
#endif

    if (ptr == NULL) return;
#ifdef HAVE_MALLOC_SIZE
    update_zmalloc_stat_free(zmalloc_size(ptr));
    free(ptr);
#else
    realptr = (char*)ptr-PREFIX_SIZE;
    oldsize = *((size_t*)realptr);
    update_zmalloc_stat_free(oldsize+PREFIX_SIZE);
    free(realptr);
#endif
}
     
    In order to know the size of this memory to update used when releasing memory_ Memory: when allocating memory, additional sizeof (size_t) space is allocated, and it is used to record the allocated memory size

Reduce the number of memory reallocations caused by modifying strings

For C, every time a string is increased or shortened, the program must correct the string

Number for a memory reallocation. This step requires the programmer to uninstall the software in the program

In general, if the string length is modified infrequently, it can be received when overflow memory allocation is performed for each modification.

However, redis pays attention to speed. If the overflow memory needs to be reallocated every time the string length is modified, it is unacceptable at that time.

That SDS disassociates the string length from the underlying array length by using the unused space int free. Therefore, when the length of the buf array in SDS is not certain, the number of characters can be increased by one. The array can also contain unused bytes, and the number of these bytes is recorded by the free attribute of SDS.

Through unused space SDS, two optimization strategies of space pre allocation and inert space release are realized.

Space pre allocation

It is used to optimize the SDS string growth operation. When the SDS API modifies the SDS and needs to expand the SDS space, the program will not only allocate the necessary space for the SDS, but also allocate additional unused space for the SDS.

allocation algorithm

1. If the length of SDS (that is, the value of len attribute) will be less than 1MB after SDS is modified, the program will allocate unused space of the same size as len attribute. At this time, the value of SDS len attribute will be the same as that of free attribute.

If modified, the len of SDS will become 13 bytes  
Then the program allocates 13 bytes of unused space,
The actual length of the SDS buf array will become 13 + 13 + 1 = 27 bytes.

2. If the length of the modified SDS is greater than or equal to 1MB, the program will allocate 1MB of unused space

If the len of SDS becomes 30MB after modification,
Then the program will allocate 1MB of unused space
The actual length of the SDS buf array will become 30MB + 1MB + 1byte

Inert space release

String shortening operation for optimizing SDS: when the SDS API needs to shorten the string saved by SDS, the program does not immediately use memory to reallocate the extra bytes after shrinking back and forth. Second, use the free attribute to record the number of these bytes and wait for use.

Binary security

The characters in the C string must conform to a certain encoding (such as ASCII), and the string cannot contain empty characters except the end of the string, otherwise the empty characters first read by the program will be mistaken for the end of the string. These restrictions make the C string can only save text data, but not pictures, audio, video Compress binary data such as files.

In order to ensure that redis can be applied to different scenarios, the SDS APIs are binary safe, so the SDS APIs will process the data stored in the SDS and then buf array in the way of binary processing, and Che Guangxu will not process the data in any content.

Therefore, the attribute of SDS is called byte data. Buf storage is not character, but binary data.

SDS uses the value of len attribute instead of empty character to judge the end, so there will be no problem that ‘\ 0’ cannot be saved.

Compatible Part C string function

The reason why saving ends with the null character ‘\ 0’ is that SDS that saves text data can reuse some functions defined in the < string. H > library.

Strcasecmp can edit the SDS string and another string
strcasecmp(sds->buf,"hello worrld")

For example, the SDS for saving text can be appended to a C string
strcat(c_string,sds->buf)

SDS API

Reading redis source code -- SDS (1)

Reading redis source code -- SDS (1)