Reducing the memory consumption of redis helps to reduce the time required to create and load snapshots, improve the efficiency of loading AOF files and rewriting AOF files, and shorten the time required for synchronization from the server (snapshot and AOF file rewriting are introduced in persistence options, from server synchronization to replication, fault handling, transaction and performance Optimization) And it allows redis to store more data without additional hardware.
Redis provides a set of configuration options for list, set, hash and ordered set. These options allow redis to store shorter structures (hereinafter referred to as “short structures”) in a more space efficient way.
When the length of lists, hashes and ordered sets is short or the volume is small, redis can choose to use a compact storage method called ziplist to store these institutions. Compressed list is an unstructured representation of three different types of objects: list, hash and ordered set: with redis, we usually use bidirectional linked list to represent list, hash to represent hash, hash plus skip list The way to represent an ordered collection is different. Compressed list stores data in the form of serialization. These serialized data need to be decoded every time they are read, and they need to be partially re encoded every time they are written, and they may need to move the data in memory.
Compressed list representation
This section uses the simplest list for comparison.
Two way linked list
When the list is not compressed, double linked list is used for storage. Each node of the linked list has three pointers
- Pointer to the previous node
- A pointer to the next node
- Pointer to the string value contained by the node
The string value is divided into three parts
- The length of the string
- The number of bytes remaining in the string
- Null terminated string itself
It can be found that each string stored before compression needs at least 21 bytes of overhead. (four bytes for each of three pointers, four bytes for each of two integers, and one byte for the empty character at the end of the string)
A compressed list is a sequence composed of nodes (non real nodes). Each node is composed of two length values and a string.
- The first length value: the length of the previous node, used for traversal from the back to the front (generally stored in a byte)
- Second length value: the length of the current node (usually stored in one byte)
- String: length equals bytes, no empty characters
It can be found that every string stored after compression needs at least 2 bytes of extra overhead.
Encoding with compressed list
Configuration options for using compressed lists in different structures
#Restrictions on the use of compressed lists for lists list-max-ziplist-entries 512 list-max-ziplist-value 64 #Restrictions on hash using compressed list representation hash-max-ziplist-entries 512 hash-max-ziplist-value 64 #Restrictions on the use of compressed list representation for ordered sets zset-max-ziplist-entries 512 zset-max-ziplist-value 64
...-entriesThe options describe the maximum number of elements allowed to be included in a list, hash, or ordered set when it is encoded as a compressed list;
...-valueOption describes the maximum size of each node in the compressed list. When any of the restrictions set by these options is broken, redis will convert the corresponding list, hash and ordered set from compressed list coding to other structures, and the memory consumption will increase accordingly, and even if it meets the restrictions again in the future, it will not be converted back to compressed list.
OBJECTThe redis command allows you to view the redis object of a given key internally. It is usually used for debugging or to understand the special encoding of the key in order to save space. When redis is used for caching, you can also use the
OBJECTThe information in the command determines the eviction policies of the key.
OBJECT REFCOUNT <key>: returns the number of times the value stored for a given key reference. It is mainly used for debugging
OBJECT ENCODING <key>: returns the internal representation of the value stored in the given key
OBJECT IDLETIME <key>: returns the idle time (idle, not read or written) of the given key since it was stored, in seconds
Integer set coding of sets
If all members of the set can be interpreted as decimal integers (within the signed integer range of the platform), and the number of members of the set is small enough, redis will store the set in an ordered integer array, which is also called integer set. Integer set can not only reduce memory consumption, but also improve the execution speed of all standard set operations.
Configuration options for integer sets
#Restrictions on the use of sets of integers set-max-intset-entries 512
When the integer set contains more elements than the limit set by the configuration option, the integer set is converted to a hash table representation.
Performance problems caused by long compressed list and large integer set
|Compressed list nodes||performance|
|< 1000||The difference is not big|
|5000 ~ 10000||It’s going down|
|50000||The decrease is obvious|
|> 100000||Too low to use|
It is recommended to limit the length of the compressed list to 1024 elements, and the size of each element should not exceed 64 bytes. For most hash applications, this configuration can take into account both the advantages of low memory consumption and high performance.
After version 3.2, the list bottom layer of redis uses QuickList by default. This data structure takes into account the advantages of bidirectional linked list and compressed list, so the list has been configured optimally.
When we design redis, we should also keep the key name short (including data keys, hash fields, members of sets and ordered sets, and all list nodes). When the amount of data stored in nodes reaches millions or billions, we can save MB to GB level space.
Sharding is essentially to divide the data into smaller parts based on some simple rules, and then decide where to send the data according to the part of the data. This technology can expand the storage space and increase the load that can be handled.
Next, we will apply the concept of fragmentation to hash, set and ordered set, and explain how to realize some of the standard functions of these data structures. In this case, the program no longer sets the value
XStore to key
YInside, but will be the value
XStore to key
Slice the list
It is very difficult to allocate a list without using Lua script, so we will introduce how to use Lua script to build a partitioned list, and support push and pop operations from both ends of the list in blocking and non blocking ways.
Partition the ordered set
ZREMRANGEBYSCOREThe fragment version of this kind of command needs to operate all the fragments of the ordered set to calculate the final result of the command, so these operations cannot run as fast as the ordinary operation of the ordered set, so it has little effect on the fragment of the ordered set.
If the complete information needs to be stored in a large ordered set, but only the top n and bottom n elements of the score can be operated, then you can use the hash fragment pair method described below to fragment the ordered set, maintain the extra pair of the highest score pair ordered set and the lowest score pair ordered set, and then
ZADDCommand to add new elements to these two ordered sets, and
ZREMRANGEBYRANKThe command ensures that the number of element pairs does not exceed the limit.
When partitioning hash keys, hash stored keys can be used as an information source, and hash function can be used to calculate a numeric hash value for keys. Then, according to the total number of keys to be stored and the number of keys to be stored in each partition, the required number of partitions can be calculated. Finally, the number of partitions and hash are used to determine which partition the key should be stored in.
person one is in love with
In fact, we usually do not consider the total number of keys when considering fragmentation. Basically, we set a number of partitions after analyzing the existing data
shard_numSo when there’s a key
keyWhen you need to calculate the corresponding partition, you only need to
cal_hash(key) % shard_numThe corresponding
shard_id. But similar
MD5There is a problem when hashing in this way. It is mentioned in the book that when the number of slices changes, there will be a large number of new and old hash values of keys that are different, so it is necessary to migrate the data to the corresponding hash value
shard_id. In order to avoid this situation, we need a consistent hash algorithm to make the migrated data as small as possible when the number of score slices changes, and ensure that the migrated data can still be more evenly distributed in each slice.
Store the string in the hash
If it is found that many associated short strings or numbers are stored in string keys, and these keys are continuously named as
namespace:idIn this form, you can consider storing these values in the partitioned hash. In some cases, this method can significantly reduce the memory consumption.
The set can also process the key in a hash like way to obtain the fragment ID, and then modify the corresponding command to support the fragment operation.
If the key is an integer and the maximum value is relatively small, in addition to directly using the key to obtain the partition ID, you can also use bitmap to record whether each key is in the “set”.
If the number of keys is very large and cannot be saved completely, but it can tolerate certain errors, you can use bloom filter to record whether each key is in the “set” (if it is judged that it does not exist, it must not exist; if it is judged that it exists, there is a very low probability that it does not exist).
Packing and storing binary bits and bytes
As mentioned earlier, when using a similar
namespace:idIn this way, when the string key is used to store short strings or counters, the fragmentation hash can effectively reduce the memory required to store these data. However, if some short and fixed length continuous IDs are stored, we can also use fragment hash to save memory when the data storage method is available.
In the brief introduction to the common commands of redis data structure, four commands that can be used to package and update redis strings efficiently are introduced
GETRANGE: used to read part of the stored string
SETRANGE: used to set part of the content stored in the string
GETBIT: used to get the value of a binary bit in a string
SETBIT: used to set a binary bit in a string
With these four commands, we can use redis string to store counter, fixed length string, Boolean value and other data in as compact a format as possible without compressing data.
Determines the format of the location information to be stored
We take the information stored is the user’s location information as an example. Different memory usage determines different location accuracy
- 1 byte: accurate to country
- 2 bytes: accurate to country and state / Province
- 3 bytes: accurate to postal code
- 4 bytes: accurate to latitude and longitude (2m)
Here we use 2 bytes to store location information. First, we can use an array to store the iso3 country (or region) codes of all countries (or regions), and then use the first byte to store the subscript of the country (or region) in the array. Then we can use a map to store the state / Province information of each country (or region) in the same array, and use the second byte to store the subscript of the state / Province in the corresponding array.
Store packaged data
After getting the location information corresponding to two bytes of data, it can be used
SETRANGEThe command stores it in the string key. However, we need to consider the total number of users. If the number of users reaches 750 million, 1.5 GB of memory is needed to store the data of all users. However, redis’s string key can only store 512 MB of data at most. When redis sets an existing string, if the set part exceeds the end of the existing string, redis will not be able to store the data More memory may need to be allocated to store new data, so setting the end of a long string takes longer than performing a simple task
SETBITMuch more calls. In order to solve the above problem, we need to fragment the data into multiple string keys.
We can store the location information of 2 ^ 20 users in each string, which is equivalent to building more than 1 million nodes in the string, and such a string needs 2 Mb of memory.
Aggregate fragment string
Aggregate the location information of all users
Find the maximum user ID stored in advance, and then calculate the maximum fragment ID to traverse the data of each user in each string fragment (using
GETRANGEAccording to the subscript corresponding to two bytes, find the corresponding country (or region) and state / Province information, and then make statistics.
Aggregate the location information of the specified user
Traverse each specified user ID, calculate its corresponding partition ID and the offset in the partition, and use the
GETRANGEGet the corresponding two bytes, find the corresponding country (or region) and state / Province information according to the subscript of the two bytes, and then make statistics.
This article starts with the official account: full Fu machine (click to view the original), open source in GitHub:reading-notes/redis-in-action