Redis (1) – memory model

Time:2021-6-21

Redis (1) – memory model

preface

Redis is one of the most popular in memory databases at present. By reading and writing data in memory, the speed of reading and writing is greatly improved. It can be said that redis is an indispensable part of realizing high concurrency of websites.

When we use redis, we will contact five object types of redis (string, hash, list, set and ordered set). Rich types are one of the advantages of redis over memcached. On the basis of understanding the usage and characteristics of the five object types of redis, it is helpful to further understand the memory model of redis

1. Estimate redis memory usage. So far, the cost of using memory is still relatively high, so we can’t use memory without scruple; According to the needs of a reasonable assessment of redis memory usage, choose the appropriate machine configuration, can meet the needs of the case to save costs.

2. Optimize memory footprint. By understanding the redis memory model, you can choose a more appropriate data type and code to make better use of redis memory.

3. Analyze and solve problems. When redis has problems such as blocking and memory occupation, find out the cause of the problem as soon as possible, so as to analyze and solve the problem.

This article mainly introduces the memory model of redis (taking 3.0 as an example), including the memory occupied by redis and how to query, the encoding methods of different object types in memory, the memory allocator (jemalloc), simple dynamic string (SDS), redisobject, etc; Then it introduces the application of several redis memory models.

In the following articles, we will introduce the content of redis high availability, including master-slave replication, sentinel, cluster and so on. You are welcome to pay attention.

1、 Redis memory statistics

If you want to do a good job, you must first use your tools. Before explaining redis memory, you should first explain how to count redis memory usage.

After the client connects to the server through redis cli (all clients use redis cli unless otherwise specified), you can view the memory usage through the info command

info memory

Redis (1) - memory model

Among them, the info command can display many information of the redis server, including the basic information of the server, CPU, memory, persistence, client connection information and so on; Memory is a parameter, indicating that only memory related information is displayed.

Several important descriptions of the returned results are as follows:

(1)used_memory: the total amount of memory allocated by redis allocator (in bytes), including the virtual memory used (SWAP); Redis allocator will be introduced later. used_ memory_ Human is just more friendly.

(2)used_memory_rs: the redis process occupies the memory of the operating system (in bytes), which is consistent with the values seen by the top and PS commands; In addition to the memory allocated by the allocator, used_ memory_ RSS also includes the memory and memory fragments needed by the process itself, but does not include virtual memory.

Therefore, used_ Memory and used_ memory_ RSS, the former is from the perspective of redis, and the latter is from the perspective of operating system. On the one hand, memory fragmentation and redis process need to occupy memory, which makes the former smaller than the latter. On the other hand, the existence of virtual memory makes the former larger than the latter.

In practical application, the amount of data in redis will be large, and the memory occupied by the process will be much smaller than the amount of data and memory fragments in redis; So used_ memory_ RSS and used_ The proportion of memory becomes a parameter to measure the rate of memory fragmentation in redis; This parameter is mem_ fragmentation_ ratio。

(3)mem_fragmentation_ratio: memory fragmentation ratio, which is used_ memory_ rss / used_ Memory ratio.

mem_ fragmentation_ Ratio is generally greater than 1, and the larger the value is, the larger the memory fragmentation ratio is. mem_ fragmentation_ Ratio < 1 indicates that redis uses virtual memory. Because the medium of virtual memory is disk, it is much slower than memory. When this happens, it should be checked in time. If the memory is insufficient, it should be handled in time, such as increasing redis nodes, increasing the memory of redis server, optimizing applications, etc.

In general, MEM_ fragmentation_ A ratio of about 1.03 is a relatively healthy state (for jemalloc); MEM in the screenshot above_ fragmentation_ The ratio value is very large because the data has not been stored in redis, and the memory of the redis process itself is used_ memory_ RSS is better than used_ Memory is much larger.

(4)mem_allocator: the memory allocator used by redis, which is specified at compile time; It can be libc, jemalloc or tcmalloc, and the default is jemalloc; The default jemalloc is used in the screenshot.

2、 Redis memory partition

Redis, as an in memory database, mainly stores data (key value pairs) in memory; As we can see from the previous description, in addition to data, other parts of redis also occupy memory.

The memory consumption of redis can be divided into the following parts:

1. Data

As a database, data is the most important part; The memory occupied by this part will be counted in used_ In memory.

Redis uses key value pairs to store data. There are five types of values (objects), namely string, hash, list, set and ordered set. These five types are provided by redis externally. In fact, within redis, each type may have two or more internal codes; In addition, when redis stores objects, it does not directly throw data into memory, but it will package objects in various ways, such as redisobject, SDS, etc; Later in this article, we will focus on the details of data storage in redis.

2. Memory required for the process itself to run

The main process of redis must occupy memory, such as code, constant pool, etc; This part of memory is about several megabytes, which can be ignored in most production environments compared with the memory occupied by redis data. This part of memory is not allocated by jemalloc, so it will not be counted in used_ In memory.

Supplementary note: in addition to the main process, the running of the subprocess created by redis will also occupy memory, such as the subprocess created when redis performs AOF and RDB rewriting. Of course, this part of the memory does not belong to the redis process and will not be counted in used_ Memory and used_ memory_ In RSS.

3. Buffer memory

Buffer memory includes client buffer, copy backlog buffer, AOF buffer, etc; The client buffer stores the input and output buffers of the client connection; Copy backlog buffer is used for partial copy function; The AOF buffer is used to save the most recent write command during AOF rewriting. Before understanding the corresponding functions, it is not necessary to know the details of these buffers; This part of memory is allocated by jemalloc, so it will be counted in used_ In memory.

4. Memory fragmentation

Memory fragmentation is produced by redis in the process of allocating and reclaiming physical memory. For example, if the data is changed frequently and the sizes of the data vary greatly, the space released by redis may not be released in the physical memory, but redis can not be used effectively, which leads to memory fragmentation. Memory fragmentation is not counted in used_ In memory.

The generation of memory fragmentation is related to the operation of data and the characteristics of data; In addition, it has something to do with the memory allocator used: if the memory allocator is designed reasonably, the generation of memory fragmentation can be reduced as much as possible. Jemalloc, which will be mentioned later, does a good job in controlling memory fragmentation.

If the memory fragmentation in the redis server is already large, the memory fragmentation can be reduced by a safe restart: because after the restart, redis reads the data from the backup file again, rearranges the data in memory, selects the appropriate memory unit for each data, and reduces the memory fragmentation.

3、 Details of redis data storage

1. Overview

The details of redis data storage involve memory allocator (such as jemalloc), simple dynamic string (SDS), five object types and internal encoding, redisobject. Before telling the specific content, first explain the relationship between these concepts.

The following figure shows the data model involved in the execution of set Hello world.

Redis (1) - memory model

Image source:https://searchdatabase.techta…

(1) Dictentry: redis is a key value database, so there will be a dictentry for each key value pair, which stores pointers to key and value; Next points to the next dictentry, which has nothing to do with this key value.

(2) Key: it can be seen from the upper right corner of the figure that the key (“hello”) is not directly stored as a string, but is stored in the SDS structure.

(3) Redisobject: value (“world”) is not directly stored as a string, nor directly stored in SDS like a key, but stored in redisobject. In fact, no matter which value is of five types, it is stored through redisobject; The type field in redisobject indicates the type of the value object, and the PTR field points to the address of the object. However, it can be seen that although string objects are packaged by redisobject, they still need to be stored by SDS.

In fact, in addition to the type and PTR fields, there are other fields in redisobject that are not given in the graph, such as the fields used to specify the internal code of the object; It will be described in detail later.

(4) Jemalloc: whether it is a dictentry object, redisobject or SDS object, memory allocator (such as jemalloc) is needed to allocate memory for storage. Take the dictentry object as an example. It is composed of three pointers, accounting for 24 bytes in 64 bit machines. Jemalloc will allocate 32-byte memory units for it.

Next, we will introduce jemalloc, redisobject, SDS, object type and internal coding.

2、jemalloc

Redis will specify a memory allocator at compile time; The memory allocator can be libc, jemalloc or tcmalloc, and the default is jemalloc.

As the default memory allocator of redis, jemalloc does a relatively good job in reducing memory fragmentation. In 64 bit system, jemalloc divides memory space into three ranges: small, large and huge; Each range is divided into many small memory block units; When redis stores data, it will select the memory block with the most appropriate size for storage.

The memory units divided by jemalloc are shown in the following figure:

Redis (1) - memory model

Image source:http://blog.csdn.net/zhengpei…

For example, if you need to store an object with a size of 130 bytes, jemalloc will put it in a 160 byte memory unit.

3、redisObject

As mentioned earlier, there are five types of redis objects; Regardless of the type, redis will not store it directly, but through the redisobject object.

Redisobject object is very important. The functions of redisobject, such as type, internal coding, memory recovery and shared object, all need redisobject support. The structure of redisobject will be used to explain how it works.

The definition of redisobject is as follows (different versions of redis may be slightly different)

typedef struct redisObject {
  unsigned type:4;
  unsigned encoding:4;
  unsigned lru:REDIS_LRU_BITS; /* lru time (relative to server.lruclock) */
  int refcount;
  void *ptr;
} robj;

The meanings and functions of each field of redisobject are as follows:

(1)type

The type field represents the type of the object, accounting for 4 bits; Redis is currently included_ String, redis_ List, redis_ Hash, redis_ Set, redis_ Zset (ordered set).

When we execute the type command, we get the type of the object by reading the type field of redisobject; As shown in the figure below:

Redis (1) - memory model

(2)encoding

Encoding represents the internal code of the object, accounting for 4 bits.

For each type supported by redis, there are at least two internal codes. For example, for strings, there are three codes: int, embstr and raw. Through encoding attribute, redis can set different encoding for objects according to different use scenarios, which greatly improves the flexibility and efficiency of redis. Taking the list object as an example, there are two coding methods: compressed list and double ended list; If there are fewer elements in the list, redis tends to use compressed list for storage, because compressed list takes less memory and can be loaded faster than double ended linked list; When there are many elements in the list object, the compressed list will be transformed into a double ended list which is more suitable for storing a large number of elements.

Through the object encoding command, you can view the encoding method of the object, as shown in the following figure:

Redis (1) - memory model

The coding methods and use conditions corresponding to the five object types will be introduced later.

(3)lru

LRU records the last time the object was accessed by the command program, and the number of bits occupied varies with different versions (for example, version 4.0 accounts for 24 bits, version 2.6 for 22 bits).

By comparing LRU time with current time, the idling time of an object can be calculated; The object idletime command displays the idle time (in seconds). A special feature of the object idletime command is that it does not change the LRU value of the object.

Redis (1) - memory model

LRU value is not only printed by object idletime command, but also related to redis’s memory recovery: if redis has the maxmemory option turned on, and the memory recovery algorithm selects volatile LRU or allkeys LRU, redis will give priority to the object with the longest idle time to release when redis’s memory occupation exceeds the value specified by maxmemory.

(4)refcount

Refcount and shared objects

Refcount records the number of times the object has been referenced. The type is integer. Refcount is mainly used for object reference counting and memory recovery. When creating a new object, refcount is initialized to 1; When a new program uses the object, refcount is added with 1; When the object is no longer used by a new program, refcount minus 1; When refcount becomes 0, the memory occupied by the object is released.

Objects that are used many times in redis (refcount > 1) are called shared objects. Redis in order to save memory, when some objects appear repeatedly, the new program will not create new objects, but still use the original objects. The reused object is the shared object. Currently, shared objects only support string objects with integer values.

Concrete realization of shared object

Currently, redis only supports string objects with integer values. This is actually a balance between memory and CPU (time): although sharing objects will reduce memory consumption, it will take extra time to judge whether two objects are equal. For integer value, the operation complexity is O (1); For ordinary strings, the judgment complexity is O (n); For hash, list, set and ordered set, the complexity of judgment is O (n ^ 2).

Although shared objects can only be string objects with integer values, shared objects can be used in all five types (such as hash, list, etc.).

As far as the current implementation is concerned, when the redis server is initialized, 10000 string objects will be created with integer values of 0-9999; When redis needs to use string objects with values from 0 to 9999, it can directly use these shared objects. The number of 10000 can be adjusted by adjusting the parameter redis_ SHARED_ Inter (obj in 4.0)_ SHARED_ The values of indexes are changed.

The reference times of shared objects can be viewed through the object refcount command, as shown in the figure below. The result page of command execution proves that only integers between 0 and 9999 will be shared.

Redis (1) - memory model

(5)ptr

PTR pointer points to specific data, as in the previous example, set Hello world, PTR points to SDS containing string world.

(6) Summary

To sum up, the structure of redisobject is related to object type, encoding, memory recovery and shared object; The size of a redisobject object is 16 bytes

4bit+4bit+24bit+4Byte+8Byte=16Byte。

4、SDS

Redis does not directly use the C string (that is, the character array ending with the null character ‘0’) as the default string representation, but uses SDS. SDS is the abbreviation of simple dynamic string.

(1) SDS structure

The structure of SDS is as follows

struct sdshdr {
    int len;
    int free;
    char buf[];
};

Among them, buf represents byte array, which is used to store strings; Len is the length used by buf, and free is the length not used by buf. Here are two examples.

Redis (1) - memory model

Redis (1) - memory model

Image source: redis design and Implementation

According to the structure of SDS, the length of buf array = free + len + 1 (where 1 is the empty character at the end of the string); Therefore, the space occupied by an SDS structure is: free length + len length + buf array length = 4 + 4 + free + len + 1 = free + len + 9.

(2) Comparison between SDS and C string

SDS adds free and Len fields to the C string, which brings many benefits

  • Get string length: SDS is O (1), C string is O (n)
  • Buffer overflow: when using C string API, if the string length increases (such as strcat operation) and forgets to reallocate memory, it is easy to cause buffer overflow; Because SDS records the length, the corresponding API will automatically reallocate the memory when it may cause the buffer overflow, thus eliminating the buffer overflow.
  • Reallocation of memory when modifying strings: for C strings, if you want to modify strings, you must reallocate memory (release first and then apply), because if there is no reallocation, the increase of string length will cause memory buffer overflow, and the decrease of string length will cause memory leakage. For SDS, because len and free can be recorded, the association between string length and space array length is removed, and the optimization can be carried out on this basis: the space pre allocation strategy (that is, allocating more memory than needed) greatly reduces the probability of reallocating memory when the string length increases; The idle space release strategy greatly reduces the probability of reallocating memory when the string length is reduced.
  • Access to binary data: SDS can, C string can not. Because the C string ends with an empty character, and for some binary files (such as pictures), the content may include an empty string, so the C string cannot be accessed correctly; SDS uses the length len as the end of the string, so there is no such problem.

In addition, because the buf in SDS still uses the C string (that is, ending with ‘0’), SDS can use some functions in the C string library; However, it should be noted that SDS can only be used when it is used to store text data, but not when it is used to store binary data (‘0’ is not necessarily the end).

(3) Application of SDS and C string

When redis stores objects, it always uses SDS instead of C string. For example, set Hello World command, hello and world are stored in the form of SDS. The Sadd myset member1, member2 and member3 commands, whether they are keys (“myset”) or elements in the collection (“member1”, “member2” and “member3”), are stored in the form of SDS. In addition to storing objects, SDS is also used to store various buffers.

The C string is used only if the string does not change, such as when printing logs.

4、 Object type and internal coding of redis

As mentioned earlier, redis supports five object types, and each structure has at least two kinds of encoding; The advantages of this method are: on the one hand, the interface is separated from the implementation, and when the internal coding needs to be added or changed, the user’s use will not be affected. On the other hand, the internal coding can be switched according to different application scenarios to improve the efficiency.

The internal codes supported by various object types of redis are shown in the figure below (the version in the figure is redis3.0, and the internal codes are added in the later versions of redis, not to mention; The internal coding introduced in this chapter is based on 3.0

Redis (1) - memory model

Image source: redis design and Implementation

The conversion of redis internal code conforms to the following rules:Code conversion inRedisIt is completed when writing data, and the conversion process is irreversible, so it can only be converted from small memory coding to large memory coding.

1. String

(1) Overview

String is the most basic type, because all the keys are string type, and several other complex types of elements are also string.

The string length cannot exceed 512MB.

(2) Internal coding

There are three kinds of internal encoding for string types, and their application scenarios are as follows:

  • Int: an 8-byte long integer. When a string value is an integer, the value is represented by a long integer.
  • Embstr: < = 39 byte string. Embstr and raw both use redisobject and SDS to save data. The difference is that embstr only allocates memory space once (so redisobject and SDS are continuous), while raw allocates memory space twice (redisobject and SDS allocate space respectively). Therefore, compared with raw, embstr has the advantages of allocating less space once when creating, releasing less space once when deleting, and connecting all data of objects together for convenience. The disadvantage of embstr is also obvious. If the length of the string increases and memory needs to be reallocated, the entire redisobject and SDS need to reallocate space. Therefore, embstr in redis is read-only.
  • Raw: a string larger than 39 bytes

An example is shown in the figure below

Redis (1) - memory model

The length of embstr and raw is 39; The reason is that the length of redisobject is 16 bytes, and the length of SDS is 9 + string; Therefore, when the string length is 39, the length of embstr is exactly 16 + 9 + 39 = 64, and jemalloc can allocate 64 bytes of memory units.

(3) Code conversion

When the int data is no longer an integer, or the size exceeds the range of long, it is automatically converted to raw.

For embstr, because its implementation is read-only, when modifying the embstr object, it will be converted to raw first and then modified. Therefore, as long as the embstr object is modified, the modified object must be raw, no matter whether it reaches 39 bytes or not. An example is shown in the figure below

Redis (1) - memory model

2. List

(1) Overview

List is used to store multiple ordered strings, and each string is called an element; A list can store 2 ^ 32-1 elements. The list in redis supports insertion and pop-up at both ends, and can obtain the elements in the specified position (or range), which can act as arrays, queues, stacks, etc.

(2) Internal coding

The internal encoding of a list can be a zip list or a linked list.

Double ended linked list: it consists of a list structure and multiple listnode structures; The typical structure is shown in the figure below

Redis (1) - memory model

Image source: redis design and Implementation

As can be seen from the figure, the double ended linked list saves both the header pointer and the tail pointer, and each node has a pointer pointing forward and backward; The length of the list is saved in the linked list; DUP, free, and match set type specific functions for node values, so linked lists can be used to hold various types of values. Each node in the linked list points to a redisobject whose type is a string.

Compressed list: compressed list is developed by redis in order to save memory and is encoded by a series of special codesContinuous memory blockIt is not a sequential data structure composed of two terminal linked list (each node is a pointer like a two terminal linked list); The specific structure is relatively complex, slightly(See another article)。 Compared with double ended linked list, compressed list can save memory space, but it is more complex to modify or add or delete; Therefore, when the number of nodes is small, compressed list can be used; But when the number of nodes is large, it is cost-effective to use double ended list.

Compressed list is not only used to realize list, but also to realize hash and ordered list; It’s very widely used.

(3) Code conversion

Only when the following two conditions are met, the compressed list will be used: the number of elements in the list is less than 512; All string objects in the list are less than 64 bytes. If one condition is not satisfied, double ended list is used; And the encoding can only be converted from compressed list to double ended list, but not in the opposite direction.

The following figure shows the characteristics of list encoding conversion:

Redis (1) - memory model

Among them, a single string cannot exceed 64 bytes, which is to facilitate the uniform distribution of the length of each node; Here, 64 bytes refers to the length of the string, excluding the SDS structure, because the compressed list uses continuous, fixed length memory blocks to store the string, so it does not need the SDS structure to indicate the length. The compressed list will be mentioned later. It will also be emphasized that the length is no more than 64 bytes. The principle is similar here.

3. Hash

(1) Overview

Hash (as a data structure) is not only one of the five object types provided by redis (combined with string, list, set and order), but also the data structure used by redis as a key value database. For the convenience of explanation, when “inner layer hash” is used later in this paper, it represents one of the five object types provided by redis; Using “outer hash” refers to the data structure used by redis as the key value database.

(2) Internal coding

The internal encoding of inner layer hash can be zip list or hashtable; The outer hash of redis only uses hashtable.

The compressed list was described earlier. Compared with hash table, compressed list is used in the scene of small number and length of elements; Its advantage lies in centralized storage and space saving; At the same time, although the complexity of the operation for elements also changes from O (1) to o (n), the operation time has no obvious disadvantage due to the small number of elements in the hash.

Hashtable: a hashtable consists of one dict structure, two dictht structures, one dictentry pointer array (called bucket) and multiple dictentry structures.

Under normal conditions (i.e. when hashtable is not rehash), the relationship between each part is shown in the following figure:

Redis (1) - memory model

Adapted from design and implementation of redis

The following sections are introduced from the bottom up

dictEntry

The dictentry structure is used to save key value pairs. The structure is defined as follows:

typedef struct dictEntry { 
    void  *key; 
    union { 
        void * val; 
        uint64_tu64;      
        int64_ts64;
    } v; 
    struct dictEntry *next; 
} dictEntry;

The functions of each attribute are as follows:

  • Key: the key in the key value pair;
  • Val: the value in the key value pair is implemented by union, and the stored content can be either a pointer to the value, or a 64 bit integer or an unsigned 64 bit integer;
  • Next: points to the next dictentry, which is used to solve the hash conflict problem

In 64 bit system, a dictentry object takes up 24 bytes (key / Val / next takes up 8 bytes each).

bucket

A bucket is an array, and each element of the array is a pointer to the dictentry structure. The rules for calculating the size of the bucket array in redis are as follows: the smallest 2 ^ n that is larger than dictentry; For example, if there are 1000 dictentries, the bucket size is 1024; If there are 1500 dictentries, the bucket size is 2048.

dictht

The structure of dictht is as follows

typedef struct dictht{
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
}dictht;

The functions of each attribute are described as follows:

  • The table attribute is a pointer to the bucket;
  • The size attribute records the size of the hash table, that is, the size of the bucket;
  • Used records the number of dictentries used;
  • The sizemask attribute always has the value of size-1, which, together with the hash value, determines where a key is stored in the table.

dict

Generally speaking, the function of ordinary hash table can be realized by using dictht and dictentry structures; However, in the implementation of redis, there is a dict structure on the upper layer of dictht structure. The following describes the definition and function of dict structure.

The structure of dict is as follows

typedef struct dict{
    dictType *type;
    void *privdata;
    dictht ht[2];
    int trehashidx;
} dict;

Among them, the type attribute and privdata attribute are used to create polymorphic dictionaries to adapt to different types of key value pairs.

The HT attribute and trehashidx attribute are used for rehash, that is, when the hash table needs to be expanded or shrunk. HT is an array containing two items, each of which points to a dictht structure. This is why redis hash has one dict and two dictht structures. Generally, all data are stored in HT [0] of dict, and HT [1] is only used when rehash. When dict rehash, rehash all data in HT [0] to HT [1]. Then assign HT [1] to HT [0] and empty HT [1].

Therefore, the hash in redis has a dict structure in addition to dictht and dictentry structures. On the one hand, it is to adapt to different types of key value pairs, and on the other hand, it is to rehash.

(3) Code conversion

As mentioned earlier, hash tables and compressed lists may be used for inner layer hashing in redis.

Only when the following two conditions are met, the compressed list will be used: the number of elements in the hash is less than 512; The length of key and value strings of all key value pairs in the hash is less than 64 bytes. If one condition is not satisfied, hash table is used; And encoding can only be converted from compressed list to hash table, but not in the opposite direction.

The following figure shows the characteristics of hash code conversion in redis inner layer

Redis (1) - memory model

4. Set

(1) Overview

Similar to list, set is used to save multiple strings, but there are two differences between set and list: the elements in set are unordered, so they cannot be operated by index; Elements in the collection cannot have duplicates.

A collection can store up to 2 ^ 32-1 elements; In addition to supporting regular addition, deletion, modification and query, redis also supports multiple sets to take intersection, union and difference sets.

(2) Internal coding

The internal encoding of a set can be an intset or a hashtable.

Hash table has been mentioned before, but it will not be mentioned here; It should be noted that when the hash table is used by the collection, all values are set to null.

The structure of integer set is defined as follows:

typedef struct intset{
    uint32_t encoding;
    uint32_t length;
    int8_t contents[];
} intset;

Among them, encoding represents the type of content stored in contents, although contents is int8_ T type, but actually its stored value is int16_ t、int32_ T or Int64_ t. The specific type is determined by encoding; Length indicates the number of elements.

Integer set is suitable when all the elements in the set are integers and the number of elements in the set is small. Compared with hash table, the advantage of integer set is centralized storage and space saving; At the same time, although the operation complexity of elements also changes from O (1) to o (n), the operation time has no obvious disadvantage due to the small number of sets.

(3) Code conversion

Only when the following two conditions are met, the set will use integer set: the number of elements in the set is less than 512; All elements in the collection are integer values. If one condition is not satisfied, hash table is used; And the encoding can only be converted from integer set to hash table, but not in the opposite direction.

The following figure shows the characteristics of set coding conversion

Redis (1) - memory model

5. Ordered set

(1) Overview

Like a set, the elements of an ordered set cannot be repeated; But unlike a set, the elements in an ordered set are ordered. Unlike lists that use index subscripts as sort criteria, ordered sets set a score for each element as sort criteria.

(2) Internal coding

The internal encoding of an ordered set can be a zip list or a skip list. Ziplist is used in both list and hash. I have mentioned it before, but I will not mention it here.

Jump table is an ordered data structure, which can access nodes quickly by maintaining multiple pointers to other nodes in each node. In addition to jump table, another typical implementation of ordered data structure is balanced tree; In most cases, the efficiency of the jump table is comparable to that of the balance tree, and the implementation of the jump table is much simpler than that of the balance tree. The jump table supports the average o (logn) and worst o (n) complex points for node search, and supports sequential operation. The implementation of redis’s jump table consists of two structures: zskiplist and zskiplistnode: the former is used to store the jump table information (such as head node, tail node, length, etc.), and the latter is used to represent the jump table node. The specific structure is relatively complex, slightly.

(3) Code conversion

Only when the following two conditions are met, the compressed list will be used: the number of elements in the ordered set is less than 128; All members in an ordered set are less than 64 bytes in length. If one condition is not satisfied, the jump table is used; And the coding can only be converted from compressed list to jump list, but not in the opposite direction.

The following figure shows the characteristics of the ordered set coding transformation

Redis (1) - memory model

5、 Application examples

After understanding the memory model of redis, we will illustrate its application with several examples.

1. Estimating redis memory usage

To estimate the memory size occupied by the data in redis, we need to have a comprehensive understanding of the memory model of redis, including hashtable, SDS, redisobject, encoding methods of various object types and so on.

The simplest string type is described below.

Suppose there are 90000 key value pairs. The length of each key is 7 bytes, and the length of each value is 7 bytes (and both key and value are not integers); Let’s estimate the space occupied by these 90000 key value pairs. Before estimating the occupied space, we can first determine the encoding method used by the string type: embstr.

The memory space occupied by 90000 key value pairs can be divided into two parts: one is the space occupied by 90000 dictentries; One part is the bucket space required by the key value pair.

The space occupied by each dictentry includes:

  1. A dictentry, 24 bytes. Jemalloc allocates 32 bytes of memory
  2. A key is 7 bytes, so SDS (key) needs 7 + 9 = 16 bytes, and jemalloc will allocate 16 bytes of memory blocks
  3. A redisobject, 16 bytes. Jemalloc allocates 16 bytes of memory blocks
  4. A value is 7 bytes, so SDS (value) needs 7 + 9 = 16 bytes, and jemalloc will allocate 16 bytes of memory blocks
  5. To sum up, a dictentry needs 32 + 16 + 16 + 16 = 80 bytes.

Bucket space: the size of the bucket array is the smallest 2 ^ n greater than 90000, which is 131072; Each bucket element is 8 bytes (because the pointer size in 64 bit system is 8 bytes).

Therefore, it can be estimated that the memory size occupied by these 90000 key value pairs is 9000080 + 1310728 = 8248576。

Write a program to verify it in redis

public class RedisTest {

  public static Jedis jedis = new Jedis("localhost", 6379);

  public static void main(String[] args) throws Exception{
    Long m1 = Long.valueOf(getMemory());
    insertData();
    Long m2 = Long.valueOf(getMemory());
    System.out.println(m2 - m1);
  }

  public static void insertData(){
    for(int i = 10000; i < 100000; i++){
      jedis.set("aa" + i, "aa" + i); // Both key and value are 7 bytes in length and are not integers
    }
  }

  public static String getMemory(){
    String memoryAllLine = jedis.info("memory");
    String usedMemoryLine = memoryAllLine.split("\r\n")[1];
    String memory = usedMemoryLine.substring(usedMemoryLine.indexOf(':') + 1);
    return memory;
  }
}

Operation result: 8247552

The error between the theoretical value and the result value is 1.2 per 10000, which is enough for the calculation of how much memory is needed. The reason for the error is that before we insert 90000 pieces of data, redis has allocated a certain amount of bucket space, which has not yet been used.

As a comparison, if the length of key and value is increased from 7 bytes to 8 bytes, the corresponding SDS will be 17 bytes, and jemalloc will allocate 32 bytes, so the number of bytes occupied by each dictentry will be changed from 80 bytes to 112 bytes. At this time, it is estimated that the memory occupied by 90000 key value pairs is 90000112 + 1310728 = 11128576。

In redis, the verification code is as follows (only modify the inserted data code)

public static void insertData(){
  for(int i = 10000; i < 100000; i++){
    jedis.set("aaa" + i, "aaa" + i); // Both key and value are 8 bytes in length and are not integers
  }
}

Operation results: 11128576; The estimation is accurate.

For other types except string type, the estimation method of memory occupation is similar, which needs to be determined by combining with the encoding method of specific type.

2. Optimize memory footprint

Understanding the memory model of redis is very helpful to optimize the memory consumption of redis. Here are several optimization scenarios.

(1) Optimization using jemalloc characteristics

The 90000 key values described in the previous section are an example. Because jemalloc allocates memory with discontinuous values, the key / value string changes by one byte, which may cause a large change in the occupied memory; This can be used in design.

For example, if the length of the key is 8 bytes, the SDS is 17 bytes, and jemalloc allocates 32 bytes; If the key length is reduced to 7 bytes, the SDS is 16 bytes, and jemalloc allocates 16 bytes; Then the space occupied by each key can be reduced by half.

(2) Use integer / long integer

If it is integer / long integer, redis will use int type (8 bytes) storage instead of string, which can save more space. Therefore, when you can use long integers / integers instead of strings, try to use long integers / integers.

(3) Shared objects

By using shared objects, we can reduce the creation of objects (and reduce the creation of redisobjects) and save memory space. At present, the shared objects in redis only contain 10000 integers (0-9999); This can be done by adjusting redis_ SHARED_ The indexes parameter increases the number of shared objects; For example, redis_ SHARED_ If the indexes is adjusted to 20000, the objects between 0 and 19999 can be shared.

Consider such a scenario: the forum website stores the number of views of each post in redis, and most of these views are distributed between 0-20000. At this time, increase redis appropriately_ SHARED_ With the parameters of indexes, the shared object can be used to save memory space.

(4) Avoid over design

However, we should pay attention to the trade-off between memory space and design complexity; The design complexity will affect the complexity and maintainability of the code.

If the amount of data is small, it is not cost-effective to make code development and maintenance more difficult in order to save memory; Take the 90000 key value pairs mentioned above as an example. In fact, the memory space saved is only a few MB. But if the amount of data is tens of millions or even hundreds of millions, it is necessary to consider the optimization of memory.

3. Focus on memory fragmentation rate

Memory fragmentation rate is an important parameter, which is of great significance to the optimization of redis memory.

If the memory fragmentation rate is too high (jemalloc is normal at about 1.03), it means that there are many memory fragments and the memory is wasted seriously; At this time, you can consider restarting redis service to rearrange data in memory to reduce memory fragmentation.

If the memory fragmentation rate is less than 1, the redis memory is insufficient, and some data use virtual memory (SWAP); Because the access speed of virtual memory is much lower than that of physical memory (2-3 orders of magnitude), the access speed of redis may become very slow at this time. Therefore, we must try to increase the physical memory (we can increase the number of server nodes, or improve the single machine memory), or reduce the data in redis.

To reduce the amount of data in redis, in addition to selecting the appropriate data type and using shared objects, we also need to set a reasonable maxmemory policy. When the memory reaches a certain amount, we can recycle the memory according to different priorities.

Author:Programming myth
Link:Learning redis in depth (1): redis memory model
Source: reprinted

Recommended Today

What is “hybrid cloud”?

In this paper, we define the concept of “hybrid cloud”, explain four different cloud deployment models of hybrid cloud, and deeply analyze the industrial trend of hybrid cloud through a series of data and charts. 01 introduction Hybrid cloud is a computing environment that integrates multiple platforms and data centers. Generally speaking, hybrid cloud is […]