Redis learning note 1 – data structure


Redis is an efficient NoSQL database, which uses key value to save data and is generally used as a high-speed distributed cache.

Redis key design skills

Redis, as a cache, can quickly find the required data through the key, which is generally used as the cache of the database. Therefore, the key design of redis can refer to the database table.
Taking the user table as an example, the database design is as follows:

user_id user_name password email
1 zhangsan secret1 [email protected]
2 lisi secret2 [email protected]

Key is the design suggestion as follows:

  1. Key fields are separated by colons
  2. Table name as key prefix
  3. The field name of the primary key of the table serves as the second segment
  4. The field value of the primary key of the table serves as the third segment
  5. Fields to query as the fourth paragraph

Now it’s time to passuser_id(1) Quick query usersuser_name(zhangsan)
The cache data is designed as follows:

set user:user_id:1:username zhangsan

Ifuser_nameFields are index fields, and users’ information needs to be queried frequently through user name. Then key fields can be queried.
The cache data is designed as follows:

set user:user_name:zhangsan:user_id 1

Then query other user information through user ID.
Of course, the key design still needs to be handled flexibly according to the business needs. Here is just a general idea. For example, the key + date is used for date related statistics.

Redis data type

Basic types

The basic data types of redis include string, list, hash (map), set and sortedset.

String is the most basic data type. It can not only save strings, but also save int, long, float and other single value data, and can also be used as a counter. Many projects will also serialize the object through JSON and save it as a string, which will be deserialized back when used. If you use this method, you need to explain the fields in detail through code comments or documents to avoid “missing each other” when you or others maintain them in the future.
Scenarios include:

  1. Normal string cache
  2. Int, long, float and other basic types
  3. Counter (incr, incrby, decr, decrby, etc.)
  4. Distributed lock (setnxSET if Not eXists)
  5. Bitmap (setbit, getbit, bit *), which can be used for mass data statistics or bloom filter, etc.

List has sequence table, which is similar to list in Java. It supports lpush, rpush, lpop, brpop, lrange and other rich operations.
Common usage scenarios include:

  1. Distributed message queuing
  2. Cache hot articles and products on the front page of the portal
  3. Paging data

The hash type is similar to the HashMap in Java, saving key value pairs. You can directly map the rows of a database table to the cache, or you can save POJO without nesting type.

Set is an unordered and unrepeated set. It can be used in the same way as set in Java. It can perform efficient operations such as intersection, union, and difference set (sinter, sun ion, sdiff).

Sorted Set
An ordered set is similar to the linkedhashset in Java, but it is not sorted according to the smooth insertion. Instead, each element of the set can be associated with a score of double type for sorting. Based on this feature, typical scenarios of sorted set include:

  1. Leaderboard (hot search): you can sort the scenes on the front page of the portal by hits, by time, by likes, etc.
  2. Weighted message queue: set different scores according to the importance of the message.

Advanced type

Bit operation is realized through bit operation method of string in basic type, including setbit, getbit, bitcount, bitop, etc., which can be used for mass data statistics or bloom filter, etc. For example, count the number of daily active users. Each bit represents a user, index is the user ID (assuming ID is int type), initialization is all 0, and the corresponding bit is set to 1 after the user logs in. By counting the number of 1, you can count the daily activity of the user.

The principle of Bloom filter is to hash data through multiple hash functions to get multiple values; set the bit corresponding to these values to 1 in bitmap; when querying, query the data corresponding to these bits; if not all of them are 1, it means that the data does not exist in bitmap. In Java, library can be used to calculate bloom filter.
Bitmap is used for the implementation of Bloom filter, please see here

Redis hyperloglog is used toCounting statisticsThe advantage of hyperloglog is that when the number or volume of input elements is very large, the space needed for counting is always fixed and small. Each hyperloglog key only needs 12KB of memory to count nearly 2 ^ 64 different elements. This is in contrast to a collection where the more elements consume memory when counting.

For example, we count the daily UV (number of user visits) of each page of the website. According to the general idea, I need to record the number of user IDs of each page with a collection. If the number of users is large, the storage of user IDs will take up a lot of storage space. With hyperloglog, I can store as many as 2 ^ 64 groups of data as possible in 12K space, andCount error kept within 0.81%。 Hyperloglog provides three methods:

  1. Pfadd: add data to the count set
  2. Pfcount: gets the number of data in the count set
  3. Pfmerge: merge data from two count sets

Please refer to the following articles for specific use and principle.

  • Use examples and principles
  • Use of hyperloglog
  • Implementation principle of hyperloglog

I’m ashamed that I didn’t fully understand the principle part.

The difference between the bloom filter implemented by bitmat and the count statistics implemented by hyperloglog:

  1. Bloom filter is used to query whether a data exists in the collection, but cannot count the data
  2. Hyperloglog statistics can only count the number of data in the collection, but cannot know whether a data exists

There are four steps for redis client to execute a command:

1. Send command → 2. Command queuing → 3. Command execution → 4. Return result

Together, it is called round trip time (RTT). The main time of RTT is spent on the above-mentioned 1 and 4-step network transmission, especially in the case of large network delay. Redis provides batch commands such as mget and Mset to optimize RTT. However, most commands (such as hgetall, without mhgetall) do not support batch operations. If a large number of commands need to be executed at one time, it will cost a lot of RTT and reduce the throughput of the service.
Using pipeline pipeline technology, the client is allowed to package multiple commands to the server, and then read the results. This technology can improve the throughput of redis service. Pipeline needs to be completed jointly by client and server, and no corresponding command line is provided. In addition, it should be noted that pipeline commands cannot be too many at a time, otherwise network congestion and client delay will be caused.

Redis learning note 1 - data structure

Jedis redis = new Jedis(ip, port);
Pipeline pipe = redis.pipelined (); // generates a pipeline
for (int i = 0; i < 10000; i++) {
    Pipe.hmset ("key" + I, "data" + I); // encapsulate the command to pipe. At this time, the command does not occur, but remains on the client
List < Object > List = pipe. Syncandreturnall(); // send the encapsulated pipe to redis once and return the result
//Pipe. Sync(); // submit command, no result returned

Redis transaction
Redis transactions can execute multiple commands at a time, with the following three important guarantees:

  1. Bulk operations are put into the queue cache before sending exec commands
  2. Enter transaction execution after receiving exec command. Any command in the transaction fails to execute, and other commands are still executed
  3. During transaction execution, command requests submitted by other clients will not be inserted into the transaction execution command sequence

From the beginning to the execution of a transaction, there are three stages:

  1. Start business
  2. Command entry
  3. Executive affairs

Transaction command example:

Multi start transaction
SET book-name "Mastering C++ in 21 days"
GET book-name
SADD tag "C++" "Programming" "Mastering Series"
Exec - execute transaction

Differences between native batch command (mget Mset) pipeline and transaction:

  1. Native batch commands are atomic, but only one key can be operated at a time.
  2. Pipeline packs and sends multiple pieces of data to the server to retrieve the execution result at one time. The command is non atomic.
  3. Transactions can guarantee the atomicity of multiple commands. Transaction commands are executed together, and no other commands are inserted in the middle. If any of the commands fail to execute, they will be ignored directly. Redis has no rollback mechanism.

Pub / sub
Redis pub / sub is a message communication mode: the sender (PUB) sends messages and the subscriber (sub) receives messages. Clients can subscribe to any number of channels.

Subscriber listening message:

SUBSCRIBE channel_name

Redis learning note 1 - data structure

Sender sends message:

PUBLISH channel_name "Message"

Redis learning note 1 - data structure

Lua script
Redis scripts use Lua interpreter to execute scripts. Redis 2.6 supports Lua environment through embedded. Common commands for executing scripts areEVAL
The basic syntax of the eval command is as follows:

EVAL script numkeys key [key ...] arg [arg ...]

Script example:

EVAL "return {KEYS[1],KEYS[2],ARGV[1],ARGV[2]}" 2 key1 key2 first second