Redis high availability model


This paper mainly analyzes several common usage modes of redis and their advantages and disadvantages.

IcommonMode of use

Several common uses of redis include:

• redis single copy;
• redis multiple copies (master-slave);
• redis sentinel;
• Redis Cluster;

• redis self research.

2、 Advantages and disadvantages of various use methods

1. Redis single copy

Redis single copy adopts the deployment architecture of a single redis node. There is no standby node to synchronize data in real time. It does not provide data persistence and backup strategies. It is suitable for pure cache business scenarios with low data reliability requirements.

Redis high availability model


• simple architecture and easy deployment;
• high cost performance: the cache does not need a standby node (the availability of a single instance can be guaranteed by supervisor or crontab). Of course, in order to meet the high availability of the business, one standby node can also be sacrificed, but only one instance provides services externally at the same time;
• high performance.


• data reliability is not guaranteed;
• after the cache is used and the process is restarted, the data is lost. Even if there is a standby node to solve the problem of high availability, it still can not solve the problem of cache preheating, so it is not suitable for businesses with high data reliability requirements;

• high performance is limited by the processing capacity of a single core CPU (redis is a single thread mechanism), and the CPU is the main bottleneck, so it is suitable for scenarios with simple operation commands and less sorting and calculation. You can also consider using memcached instead.

2. Redis multi replica (master-slave)

Redis multi replica adopts the master-slave deployment structure. Compared with single replica, the biggest feature is the real-time data synchronization between master and slave instances, and provides data persistence and backup strategies. The master-slave instances are deployed on different physical servers. According to the basic environment configuration of the company, the strategy of providing external services and separating reading and writing can be realized at the same time.

Redis high availability model


• high reliability: on the one hand, the dual active / standby architecture is adopted, which can automatically switch between active and standby in case of failure of the main library, promote the slave library to provide services for the main library, and ensure the smooth operation of services; On the other hand, opening the data persistence function and configuring a reasonable backup strategy can effectively solve the problems of data misoperation and abnormal data loss;
• read write separation strategy: the slave node can expand the read capacity of the master database node to effectively deal with large concurrent read operations.


• fault recovery is complex. If there is no redisha system (to be developed), when the master database node fails, it is necessary to manually promote a slave node to the master node, notify the business party to change the configuration, and let other slave database nodes copy the new master database node. The whole process requires human intervention and is cumbersome;
• the write capacity of the main library is limited by a single machine, and fragmentation can be considered;
• the storage capacity of the main library is limited by a single machine, and pika can be considered;

• the disadvantages of native replication are also prominent in earlier versions. For example, after the redis replication is interrupted, slave will initiate PSYNC. At this time, if the synchronization is unsuccessful, full synchronization will be carried out. When the main database performs full backup, it may cause millisecond or second level jamming; Due to the cow mechanism, the main library memory overflows in extreme cases, and the program exits or goes down abnormally; The backup file generated by the master library node leads to the consumption of server disk IO and CPU (compression) resources; Sending backup files with the size of several gigabytes causes the server exit bandwidth to surge and block requests. It is recommended to upgrade to the latest version.

3. Redis Sentinel

Redis sentinel is a native high availability solution launched by the community version. Its deployment architecture mainly includes two parts: redis sentinel cluster and redis data cluster.

Redis sentinel cluster is a distributed cluster composed of several sentinel nodes, which can realize fault discovery, automatic fault transfer, configuration center and client notification. The number of redis sentinel nodes should meet the odd number of 2n + 1 (n > = 1).

Redis high availability model


• redis sentinel cluster is easy to deploy;
• it can solve the problem of high availability switching in redis master-slave mode;
• it is convenient to realize the linear expansion of redis data nodes, easily break through redis’s own single thread bottleneck, and greatly meet redis’s high-capacity or high-performance business requirements;
• a set of sentinel can monitor a group of redis data nodes or multiple groups of data nodes.


• the deployment is more complicated than redis master-slave mode, and the principle understanding is more complicated;
• resource waste. The slave node in the redis data node does not provide services as a backup node;
• redis sentinel is mainly aimed at the high availability handover of the master node in redis data node. The failure judgment of redis data node is divided into subjective offline and objective offline. For the slave node of redis, it has subjective offline operation on the node and does not perform failover.
• it can not solve the problem of read-write separation, and its implementation is relatively complex.


• if the same service is monitored, you can select a sentinel cluster scheme to monitor multiple groups of redis data nodes. On the contrary, you can select a sentinel scheme to monitor a group of redis data nodes.
• the < quorum > in the sentinel monitor < master name > < IP > < port > < quorum > configuration is recommended to be set to half of the sentinel node plus 1. When sentinel is deployed in multiple IDC, the number of sentinel deployed by a single IDC is not recommended to exceed (sentinel number – quorum).

• set parameters reasonably to prevent false switching and control switching sensitivity:

a. quorum

b. down-after-milliseconds 30000

c. failover-timeout 180000

d. maxclient

e. timeout

• the server time of each deployed node should be synchronized as much as possible, otherwise the log timing will be chaotic.
• redis recommends using pipeline and multi keys operations to reduce RTT times and improve request efficiency.
• handle the configuration center by yourself to facilitate the client’s link access to the instance.

4、Redis Cluster

Redis cluster is a redis distributed cluster solution launched in the community version, which mainly solves the requirements of redis distribution. For example, when encountering bottlenecks such as single machine memory, concurrency and traffic, redis cluster can achieve good load balancing.

Redis cluster cluster nodes are configured with at least 6 nodes (3 master nodes and 3 slave nodes). The master node provides read-write operations, and the slave node serves as a standby node, does not provide requests, and is only used for failover.

Redis cluster adopts virtual slot partition. All keys are mapped into 0 ~ 16383 integer slots according to the hash function. Each node is responsible for maintaining part of the slots and the key value data printed by the slots.

Redis high availability model


• no central architecture;
• the data is stored and distributed in multiple nodes according to the slot. The data is shared among nodes, and the data distribution can be adjusted dynamically;
• scalability: it can be linearly extended to more than 1000 nodes, and nodes can be dynamically added or deleted;
• high availability: when some nodes are unavailable, the cluster is still available. By adding slave as the standby data copy, the automatic failure can be realized. The status information is exchanged between nodes through the gossip protocol, and the role promotion from slave to master is completed by the voting mechanism;
• reduce operation and maintenance costs and improve system scalability and availability.


• the client implementation is complex, and the driver requires the smart client to cache slots mapping information and update it in time, which improves the development difficulty. The immaturity of the client affects the stability of the business. At present, only jediscluster is relatively mature, and the exception handling part is not perfect, such as the common “Max redirect exception”.
• the node will be blocked for some reason (the blocking time is greater than the cluster node timeout), and it is judged to be offline. This kind of failover is not necessary.
• data is replicated asynchronously, which does not guarantee strong consistency of data.
• when multiple businesses use the same cluster, they cannot distinguish hot and cold data according to statistics, and the resource isolation is poor, which is prone to mutual influence.
• slave acts as a “cold standby” in the cluster, which can not relieve the reading pressure. Of course, the utilization of slave resources can be improved through the reasonable design of SDK.
• restrictions on key batch operations, such as using Mset and mget. Currently, only keys with the same slot value are supported to perform batch operations. For keys mapped to different slot values, since keys does not support cross slot queries, it is not friendly to perform operations such as Mset, mget and Sunion.
• the key transaction operation support is limited. It only supports the transaction operation of multiple keys on the same node. When multiple keys are distributed on different nodes, the transaction function cannot be used.
• as the minimum granularity of data partition, key cannot map a large key value object such as hash and list to different nodes.
• multi database space is not supported. Redis in a single machine can support up to 16 databases. In the cluster mode, only one database space can be used, that is, DB 0.
• the copy structure only supports one layer, the slave node can only copy the master node, and the nested tree copy structure is not supported.
• avoid generating hot keys, resulting in the main database node becoming a short board of the system.
• avoid generating big keys, resulting in network card explosion, slow query, etc.
• the retry time should be greater than the cluster node time.
• redis cluster does not recommend using pipeline and multi keys operations to reduce the scenes generated by Max redirect.

5. Redis self research

Redis’s self-developed high availability solution is mainly reflected in the processing mechanism of configuration center, fault detection and failover. It usually needs to be customized according to the actual online environment of enterprise business.

Redis high availability model


• high reliability and availability;
• high autonomy and controllability;
• it is appropriate to the actual needs of the business, with good scalability and compatibility.


• complex implementation and high development cost;
• it is necessary to establish supporting peripheral facilities, such as monitoring, domain name service, database for storing metadata information, etc;

• high maintenance costs.

Reprint invasion and deletion
Original address…