Redis real combat — 12. Reduce memory consumption

Time:2021-2-15

brief introduction

Reducing the memory consumption of redis helps to reduce the time required to create and load snapshots, improve the efficiency of loading AOF files and rewriting AOF files, and shorten the time required for synchronization from the server (snapshot and AOF file rewriting are introduced in persistence options, from server synchronization to replication, fault handling, transaction and performance Optimization) And it allows redis to store more data without additional hardware.P208

Short structureP208

Redis provides a set of configuration options for list, set, hash and ordered set. These options allow redis to store shorter structures (hereinafter referred to as “short structures”) in a more space efficient way.P208

When the length of lists, hashes and ordered sets is short or the volume is small, redis can choose to use a compact storage method called ziplist to store these institutions. Compressed list is an unstructured representation of three different types of objects: list, hash and ordered set: with redis, we usually use bidirectional linked list to represent list, hash to represent hash, hash plus skip list The way to represent an ordered collection is different. Compressed list stores data in the form of serialization. These serialized data need to be decoded every time they are read, and they need to be partially re encoded every time they are written, and they may need to move the data in memory.P209

Compressed list representationP209

This section uses the simplest list for comparison.

Two way linked listP209

When the list is not compressed, double linked list is used for storage. Each node of the linked list has three pointersP209

  • Pointer to the previous node
  • A pointer to the next node
  • Pointer to the string value contained by the node

The string value is divided into three partsP209

  • The length of the string
  • The number of bytes remaining in the string
  • Null terminated string itself

It can be found that each string stored before compression needs at least 21 bytes of overhead. (four bytes for each of three pointers, four bytes for each of two integers, and one byte for the empty character at the end of the string)P209

Compressed listP209

A compressed list is a sequence composed of nodes (non real nodes). Each node is composed of two length values and a string.P209

  • The first length value: the length of the previous node, used for traversal from the back to the front (generally stored in a byte)
  • Second length value: the length of the current node (usually stored in one byte)
  • String: length equals bytes, no empty characters

It can be found that every string stored after compression needs at least 2 bytes of extra overhead.P210

Encoding with compressed listP210

Configuration options for using compressed lists in different structures P210

#Restrictions on the use of compressed lists for lists
list-max-ziplist-entries 512
list-max-ziplist-value 64

#Restrictions on hash using compressed list representation
hash-max-ziplist-entries 512
hash-max-ziplist-value 64

#Restrictions on the use of compressed list representation for ordered sets
zset-max-ziplist-entries 512
zset-max-ziplist-value 64

Among them,...-entriesThe options describe the maximum number of elements allowed to be included in a list, hash, or ordered set when it is encoded as a compressed list;...-valueOption describes the maximum size of each node in the compressed list. When any of the restrictions set by these options is broken, redis will convert the corresponding list, hash and ordered set from compressed list coding to other structures, and the memory consumption will increase accordingly, and even if it meets the restrictions again in the future, it will not be converted back to compressed list.P210

debugging P210

OBJECTThe redis command allows you to view the redis object of a given key internally. It is usually used for debugging or to understand the special encoding of the key in order to save space. When redis is used for caching, you can also use theOBJECTThe information in the command determines the eviction policies of the key.

  • OBJECT REFCOUNT <key>: returns the number of times the value stored for a given key reference. It is mainly used for debugging
  • OBJECT ENCODING <key>: returns the internal representation of the value stored in the given key
  • OBJECT IDLETIME <key>: returns the idle time (idle, not read or written) of the given key since it was stored, in seconds
Integer set coding of setsP211

If all members of the set can be interpreted as decimal integers (within the signed integer range of the platform), and the number of members of the set is small enough, redis will store the set in an ordered integer array, which is also called integer set. Integer set can not only reduce memory consumption, but also improve the execution speed of all standard set operations.P211

Configuration options for integer sets P211

#Restrictions on the use of sets of integers
set-max-intset-entries 512

When the integer set contains more elements than the limit set by the configuration option, the integer set is converted to a hash table representation.P212

Performance problems caused by long compressed list and large integer setP212
Compressed list nodes performance
< 1000 The difference is not big
5000 ~ 10000 It’s going down
50000 The decrease is obvious
> 100000 Too low to use

It is recommended to limit the length of the compressed list to 1024 elements, and the size of each element should not exceed 64 bytes. For most hash applications, this configuration can take into account both the advantages of low memory consumption and high performance.P214

notes

After version 3.2, the list bottom layer of redis uses QuickList by default. This data structure takes into account the advantages of bidirectional linked list and compressed list, so the list has been configured optimally.

When we design redis, we should also keep the key name short (including data keys, hash fields, members of sets and ordered sets, and all list nodes). When the amount of data stored in nodes reaches millions or billions, we can save MB to GB level space.P214

Fragmentation structureP214

Sharding is essentially to divide the data into smaller parts based on some simple rules, and then decide where to send the data according to the part of the data. This technology can expand the storage space and increase the load that can be handled.P214

Next, we will apply the concept of fragmentation to hash, set and ordered set, and explain how to realize some of the standard functions of these data structures. In this case, the program no longer sets the valueXStore to keyYInside, but will be the valueXStore to keyY:<shardid>Inside.P214

Slice the list P214

It is very difficult to allocate a list without using Lua script, so we will introduce how to use Lua script to build a partitioned list, and support push and pop operations from both ends of the list in blocking and non blocking ways.

Partition the ordered set P215

becauseZRANGE, ZRANGEBYSCORE, ZRANK, ZCOUNT, ZREMRANGE, ZREMRANGEBYSCOREThe fragment version of this kind of command needs to operate all the fragments of the ordered set to calculate the final result of the command, so these operations cannot run as fast as the ordinary operation of the ordered set, so it has little effect on the fragment of the ordered set.

If the complete information needs to be stored in a large ordered set, but only the top n and bottom n elements of the score can be operated, then you can use the hash fragment pair method described below to fragment the ordered set, maintain the extra pair of the highest score pair ordered set and the lowest score pair ordered set, and thenZADDCommand to add new elements to these two ordered sets, andZREMRANGEBYRANKThe command ensures that the number of element pairs does not exceed the limit.P215

Split hashP215

When partitioning hash keys, hash stored keys can be used as an information source, and hash function can be used to calculate a numeric hash value for keys. Then, according to the total number of keys to be stored and the number of keys to be stored in each partition, the required number of partitions can be calculated. Finally, the number of partitions and hash are used to determine which partition the key should be stored in.P215

person one is in love with

In fact, we usually do not consider the total number of keys when considering fragmentation. Basically, we set a number of partitions after analyzing the existing datashard_numSo when there’s a keykeyWhen you need to calculate the corresponding partition, you only need tocal_hash(key) % shard_numThe correspondingshard_id. But similarCRC32andMD5There is a problem when hashing in this way. It is mentioned in the book that when the number of slices changes, there will be a large number of new and old hash values of keys that are different, so it is necessary to migrate the data to the corresponding hash valueshard_id. In order to avoid this situation, we need a consistent hash algorithm to make the migrated data as small as possible when the number of score slices changes, and ensure that the migrated data can still be more evenly distributed in each slice.

Store the string in the hash P217

If it is found that many associated short strings or numbers are stored in string keys, and these keys are continuously named asnamespace:idIn this form, you can consider storing these values in the partitioned hash. In some cases, this method can significantly reduce the memory consumption.P217

Piecewise setP218

The set can also process the key in a hash like way to obtain the fragment ID, and then modify the corresponding command to support the fragment operation.

If the key is an integer and the maximum value is relatively small, in addition to directly using the key to obtain the partition ID, you can also use bitmap to record whether each key is in the “set”.P221

If the number of keys is very large and cannot be saved completely, but it can tolerate certain errors, you can use bloom filter to record whether each key is in the “set” (if it is judged that it does not exist, it must not exist; if it is judged that it exists, there is a very low probability that it does not exist).

Packing and storing binary bits and bytesP221

As mentioned earlier, when using a similarnamespace:idIn this way, when the string key is used to store short strings or counters, the fragmentation hash can effectively reduce the memory required to store these data. However, if some short and fixed length continuous IDs are stored, we can also use fragment hash to save memory when the data storage method is available.P221

In the brief introduction to the common commands of redis data structure, four commands that can be used to package and update redis strings efficiently are introducedP221

  • GETRANGE: used to read part of the stored string
  • SETRANGE: used to set part of the content stored in the string
  • GETBIT: used to get the value of a binary bit in a string
  • SETBIT: used to set a binary bit in a string

With these four commands, we can use redis string to store counter, fixed length string, Boolean value and other data in as compact a format as possible without compressing data.P221

Determines the format of the location information to be storedP221

We take the information stored is the user’s location information as an example. Different memory usage determines different location accuracyP221

  • 1 byte: accurate to country
  • 2 bytes: accurate to country and state / Province
  • 3 bytes: accurate to postal code
  • 4 bytes: accurate to latitude and longitude (2m)

Here we use 2 bytes to store location information. First, we can use an array to store the iso3 country (or region) codes of all countries (or regions), and then use the first byte to store the subscript of the country (or region) in the array. Then we can use a map to store the state / Province information of each country (or region) in the same array, and use the second byte to store the subscript of the state / Province in the corresponding array.P222

Store packaged dataP223

After getting the location information corresponding to two bytes of data, it can be usedSETRANGEThe command stores it in the string key. However, we need to consider the total number of users. If the number of users reaches 750 million, 1.5 GB of memory is needed to store the data of all users. However, redis’s string key can only store 512 MB of data at most. When redis sets an existing string, if the set part exceeds the end of the existing string, redis will not be able to store the data More memory may need to be allocated to store new data, so setting the end of a long string takes longer than performing a simple taskSETBITMuch more calls. In order to solve the above problem, we need to fragment the data into multiple string keys.P223

We can store the location information of 2 ^ 20 users in each string, which is equivalent to building more than 1 million nodes in the string, and such a string needs 2 Mb of memory.P223

Aggregate fragment stringP224

Aggregate the location information of all users P224

Find the maximum user ID stored in advance, and then calculate the maximum fragment ID to traverse the data of each user in each string fragment (usingGETRANGEAccording to the subscript corresponding to two bytes, find the corresponding country (or region) and state / Province information, and then make statistics.

Aggregate the location information of the specified user P226

Traverse each specified user ID, calculate its corresponding partition ID and the offset in the partition, and use theGETRANGEGet the corresponding two bytes, find the corresponding country (or region) and state / Province information according to the subscript of the two bytes, and then make statistics.

This article starts with the official account: full Fu machine (click to view the original), open source in GitHub:reading-notes/redis-in-action
Redis real combat -- 12. Reduce memory consumption