Redis String Object Practical Notes

Time:2019-9-12

String object

String data type is the most commonly used type in Redis. Its keys and values are strings, which are very convenient to use. Although the values of string data types are collectively referred to as strings, in actual storage, the appropriate encoding will be automatically selected according to the different values. There are three kinds of encoding for string objects: int, raw, embstr.

Redis object

Redis uses a unified data structure to represent an object, which is defined as follows:

typedef struct redisObject {
 unsigned type:4;
 unsigned encoding:4;
 // Use LRU algorithm to clear objects in memory when memory exceeds its limit
 unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
       * LFU data (least significant 8 bits frequency
       * and most significant 16 bits access time). */
 // The number of references to this object
 int refcount;
 // Object's value pointer
 void *ptr;
} robj;

Among them, the type field represents the type of the object and has seven values:

/* A redis object, that is a type able to hold a string / list / set */

/* The actual Redis Object */
# Define OBJ_STRING 0/* string object..*/
# Define OBJ_LIST 1/* List Object.*/
# Define OBJ_SET 2/* Collection Object.*/
# Define OBJ_ZSET 3/* Ordered Collection Object.*/
# Define OBJ_HASH 4/* hash object.*/

/* The "module" object type is a special one that signals that the object
 * is one directly managed by a Redis module. In this case the value points
 * to a moduleValue struct, which contains the object value (which is only
 * handled by the module itself) and the RedisModuleType struct which lists
 * function pointers in order to serialize, deserialize, AOF-rewrite and
 * free the object.
 *
 * Inside the RDB file, module types are encoded as OBJ_MODULE followed
 * by a 64 bit module type ID, which has a 54 bits module-specific signature
 * in order to dispatch the loading to the right module, plus a 10 bits
 * encoding version. */
# Define OBJ_MODULE 5/* Module Object.*/
# Define OBJ_STREAM 6/* Stream Object.*/

Then the encoding field, representing the actual encoding type of the object value, has 11 values:

/* Objects encoding. Some kind of objects like Strings and Hashes can be
 * internally represented in multiple ways. The 'encoding' field of the object
 * is set to one of this fields for this object. */
# Define OBJ_ENCODING_RAW 0/* Simple Dynamic String*/
# Integer of type define OBJ_ENCODING_INT 1/* long*/
# Define OBJ_ENCODING_HT 2/* Dictionary*/
# Define OBJ_ENCODING_ZIPMAP 3/* Compressed dictionary*/
# Define the old list that OBJ_ENCODING_LINKEDLIST 4/* is no longer in use, using a double-ended linked list.*/
# Define OBJ_ENCODING_ZIPLIST 5/* Compressed List*/
# Define OBJ_ENCODING_INTSET 6/* Integer Set*/
# Define OBJ_ENCODING_SKIPLIST 7/* Jump Table and Dictionary*/
# Define OBJ_ENCODING_EMBSTR 8/* embstr Coded Simple Dynamic String*/
# Define OBJ_ENCODING_QUICKLIST 9/* List encoded as ziplist*/
# define OBJ_ENCODING_STREAM 10/* as the cardinal tree of listpacks*/

As mentioned earlier, string objects only use long-type integers, simple dynamic strings, and embstr-encoded simple dynamic strings.

OBJ_ENCODING_INT

When the value of a string object is an integer and can be expressed in long, the encoding of the string object is OBJ_ENCODING_INT encoding.

As you can see, OBJ_ENCODING_RAW is still used for storage when the value is very large.

OBJ_ENCODING_RAW

When the value of a string object is a string and the length is greater than 44 bytes, the encoding of the string object will be OBJ_ENCODING_RAW encoding. The specific structure is listed below.

OBJ_ENCODING_EMBSTR

When the value of a string object is a string and the length is less than or equal to 44 bytes, the encoding of the string object will be OBJ_ENCODING_EMBSTR encoding. The differences between OBJ_ENCODING_EMBSTR coding and OBJ_ENCODING_RAW coding are as follows:

  • Objects encoded by OBJ_ENCODING_RAW allocate memory twice, creating redisObject objects and SDS objects, respectively. The OBJ_ENCODING_EMBSTR code is allocated at one time.
  • Similarly, OBJ_ENCODING_RAW coded object-free memory also needs two times, and OBJ_ENCODING_EMBSTR coded once.
  • The data encoded by OBJ_ENCODING_EMBSTR is stored in continuous memory, while the data encoded by OBJ_ENCODING_RAW is not.

/* Create a string object with EMBSTR encoding if it is smaller than
 * OBJ_ENCODING_EMBSTR_SIZE_LIMIT, otherwise the RAW encoding is
 * used.
 *
 * The current limit of 44 is chosen so that the biggest string object
 * we allocate as EMBSTR will still fit into the 64 byte arena of jemalloc. */
#define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44
robj *createStringObject(const char *ptr, size_t len) {
 if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
  return createEmbeddedStringObject(ptr,len);
 else
  return createRawStringObject(ptr,len);
}

SDS

Strings are a very common type in Redis, but Redis implemented in C is different from Java. In C, strings are implemented with an array of characters of length N+1, and empty strings’0’are used as closing symbols. To get the length of a string, you need to go through it and find the empty string’ 0’before you know the length of the string. The complexity is O (N).

If you have a very large string, it is unacceptable that single-threaded EDIS may block for a long time to get its length, so Redis needs a more efficient string type.

Redis implements a string type called SDS (simple dynamic string), in which two variables represent the length of the string and the number of unused characters in the character array, so that the length of the string can be obtained with the complexity of O (1), and the empty string’\ 0’is also used as the closing symbol.

struct sdshdr {
 // String length
 int len;
 // Number of Unused Characters in Character Array
 int free;
 // Array of characters that hold strings
 char buf[];
}

Capacity expansion mechanism

SDS automatically expands when the character array space is insufficient to accommodate new strings.

If a C string is spliced behind an SDS, when the character array space is insufficient, the SDS will expand to just the length of the new string, and then expand the empty character length of the new string. Finally, the character array length of the SDS is equal to 2 * new string + 1 (ending symbol’ 0′). However, when the size of the new string exceeds 1MB, the extended empty character length will be fixed to 1MB.

The reason for this mechanism is that Redis, as a NoSQL database, frequently modifies strings. The expansion mechanism is equivalent to making a buffer pool for SDS. In fact, the idea of String Builder in Java is the same as that of String Builder.

Epilogue

I’ve read two books about Redis. They all talk about how Redis works. They don’t talk about the design and implementation of Redis. This also leads to an embarrassing interview, because the interviewer likes to ask about the principle of related things, so in the future, when learning technology, do not start from the actual books, or first understand the principle is better.

Reference material

This is the summary of the string section in Redis Design and Implementation.

summary

Above is the whole content of this article. I hope the content of this article has some reference value for your study or work. Thank you for your support to developpaer.