Principle of redis sentinel mode

Time:2022-4-18

Principle of redis sentinel mode

Redis high availability related technologies:

  • Persistence: stand alone backup (backup from memory to hard disk)
  • Master slave replication: multi machine hot standby, load balancing, fault recovery
  • Sentry: automated fault recovery
  • Cluster: write load balancing, horizontal expansion of storage capacity

Why sentinel mode

  1. Only rely on the persistence scheme, and the service cannot be restored after the server goes offline
  2. Using master-slave replication, after the master node goes offline, you can manually switch the slave node to the master, but the failover cannot be completed automatically

Sentinel mode

<img style=”zoom:67%;background-color:” />

major function

  1. Monitoring: sentinel will constantly check whether your master node and slave node are working normally.
  2. Notification: sentinel can notify the system administrator or other programs through API (PUB) if the monitored redis instance has problems.
  3. Automatic failover: if a Master goes offline, sentinel will start failover, the slave under the master will be selected as the new master, and other slave will start copying the new master. The application can update the new master address through the notification mechanism of redis service.
  4. Configuration provider: the client can use sentinel as the authoritative configuration publisher to obtain the latest master address. If a failover occurs, sentinel cluster will notify the client of the new master address and refresh the redis configuration.

Main configuration

  1. sentinel monitor <master-name> <ip> <redis-port> <quorum>: monitored redis master node

    Sentinel is a provider of redis configuration, not an agent. The client only obtains the configuration of data nodes from sentinel, so the IP here must be accessible by redis clients.

    The redis source code provides the sentinel configuration template: Sentinel conf

Sentinel start

$ redis-server sentinel.conf --sentinel
  1. Initialize a normal redis server
  2. Load sentinel special configuration, such as command table, parameters, etc. sentinel uses sentinel C, and ordinary redis uses redis Configuration in C
  3. In addition to saving the general state of the server, sentinel also saves sentinel related states

prepare

Sentinel and master: sentinel monitors the master and finds other sentinel and slaves through the master

Establish two asynchronous network connections:

  1. Command connection: used to send commands to redis master data nodes, for example:INFOCommand Understanding:

    1. The master operation information is used to update the local master dictionary (this data structure is also used in the implementation of redis hash)
    2. The slave information (role, IP, port, connection status, priority and copy offset) is used to update the local slave dictionary
  2. Subscription connection: subscription__sentinel__:helloChannel, used to discover other sentinel. The information in the channel includes:

    1. Sentinel’s own information (IP, port, runid, epoch)
    2. Information of the monitored master node (name, IP, port, epoch)
<img style="zoom:67%;" />

Sentinel and slave: sentinel automatically discovers slave

  1. Sentinel sends to the master nodeINFOGet all slave information after command

    <img style=”zoom:67%;” />

  2. Sentinel and slave establishCommand connectionandSubscription connection

    <img style=”zoom:67%;” />

Sentinel: automatic discovery mechanism

  1. Sentinel subscribes to each master and slave data node by using the pub / sub mechanism__sentinel__:helloChannel to automatically discover other sentinel nodes that also monitor the unified master
  2. Sentinel every 1s__sentinel__:helloSends a message containing the latest master configuration currently maintained. If a sentinel finds that its configuration version is lower than the received configuration version, it will update its master configuration with the new configuration
  3. And sentinel foundCommand connectionAfter that, you will connect through this command to exchange views on the master data node

monitor

  1. Regularly monitor redis data nodes

    1. Every 10 secondsEach sentinel sends a message to the master and slave nodesINFOcommand
    2. Every 2 secondsEach sentinel passes through the channel of the master node (the name is__sentinel__:hello)Exchange information (Pub / sub), including:
    3. Every 1 secondEach sentinel sends messages to other sentinels, redis masters and slave serversPINGCommand, used for heartbeat detection, as the judgment basis for node survival
  2. Subjective offline and objective offline (fault found)

    1. Subjectively down (sdown): the current sentinel instance considers a redis service as “unavailable”. If sentinel does not receive a valid reply (+ pong, – loading or – masterdown) within 30s (down after milliseconds) after sending a message to the redis master data node, sentinel will mark the master as offline (open the sri_s_down flag of flags in the master structure)
    2. Objectively down (odown):Multiple sentinel instances believe that the master is in sdown state, then the master will be in odown. Odown can be simply understood as that the master has been determined as “unavailable” by the cluster, and the failover mechanism will be enabled. Send sentinel to other sentinel nodesis-master-down-by-addrThe message inquires about the data nodes and learns that sentinel nodes that have reached the quota number think that the data nodes have been offline

handle

  1. Sentinel election(based on raft algorithm), select a leader

    Principle of redis sentinel mode

    Voting: modify local leader and leader_ epoch

    1. For those who have voted and become followers, no election will be held within twice the failover time (the timeout of failover is 3 minutes by default); If you haven’t voted, change to candidate and go to step 2
    2. The update failover status is start, epoch + 1, and the update timeout is a random time within 1s
    3. Send to other nodesis-master-down-by-addrOrder a vote. The command will bring its own epoch
    4. Election within 2x failover time
**The difference between sentinel election algorithm and raft:**

1. Elections are held before each failover
2. The quorum parameter is added. The number of votes required for candidate should not only exceed half, but also reach the value configured by the quorum parameter
3. The leader will not send the message of becoming a leader to other sentinels. After the leader selects the master from the slave, other sentinels will remove the objective offline identification of the old master after detecting that the new master is working normally, so there is no need to enter the failover process

⚠️ 

1. Sentinel cannot perform automatic failover when only a few sentinel processes operate normally.

2. Under normal circumstances, odd sentinels should be configured to avoid competition due to the same number of votes during switching

   **A voting process in which two nodes compete**

   | Sentinel 1           | Sentinel 2           |
   | -------------------- | -------------------- |
   |Vote for yourself|
   || vote for yourself|
   |Let S2 give itself 𞓜|
   || let S1 vote for yourself|
   |Request received from S2, rejected | request received from S1, rejected|
  1. Failover(switch redis master data node)

    1. Sentinel master selects the appropriate redis slave as the master

      Slave selection criteria:

      1. Healthy nodes:

        1. Online
        2. Recently successfully communicated (replied within 5S)PING(command)
        3. The data is relatively new (the lost contact time with the master does not exceed 10 * down after milliseconds)
      2. The slave node with the highest slave priority
      3. Copy the slave node with the largest offset (the most complete copy)
      4. Select the slave node with the smallest runid (the earliest node to start)
    2. implementSLAVEOF no one(the existing data will not be deleted, but the new data changes of the master node will not be accepted) the command makes it a new master node. Sentinel sends it once per secondINFOCommand until it becomes master
    3. Send to the remaining slave nodesSlaveof new masterCommand to make them slave nodes of the new master node
    4. Let the remaining slave copy the data of the new master through configurationsentinel parallel-syncs(sentinel. CONF) specifies the number of slave nodes that initiate replication operations to the new master node each time. The greater the value of parallel syncs, the faster the slave completes replication. However, the greater the pressure on the network load and hard disk load of the master node, and the slave is not available during loading the RDB sent by the master
    5. Update the original master node to be configured as a slave node and keep paying attention to it. Once this node returns to normal, it will be ordered to copy the new master node information
    6. After all the failover work is completed, the leader sentinel will push the + switch master message and reset the master. The reset operation will release all the slave objects of the original master and other sentinel objects listening to the master, and then create a new slave object

      Whether the slave can return data to the client during failover depends onslave-serve-stale-data(redis.conf)

    7. Continue to pay attention to the old masterAnd set it as the slave of the new master after it goes online again

Execute in sentinelsentinel failover masterThe sentinel node can be forced to perform failover without voting with other nodes

Sentinel defect

  1. In sentinel mode, write operations can only be performed on the master data node provided by sentinel, and load balancing is not possible
  2. During persistence, the disk brushing of the master node is blocked, and the success rate of service requests decreases
  3. The storage capacity of a slave node is limited by a single machine
  4. Partition problem: the original master redis 3 is disconnected from redis 1 and redis 2. At this time, redis 1 and redis 2 perform failover, and redis 1 is selected as the master. In this way, both redis 1 and redis 3 can accept write requests, but the data cannot be synchronized and the data is inconsistent

    Principle of redis sentinel mode

Why not use cluster mode

  1. The client needs to implement smart client and complete redirection
  2. Batch operation is limited, and cross slot query is not supported, so batch operation support is not friendly
  3. The key transaction operation support is limited. It only supports the transaction operation of multiple keys on the same node. When multiple keys are distributed on different nodes, the transaction function cannot be used
  4. As the minimum granularity of data partition, key cannot map a large key value object such as hash and list to different nodes
  5. Multiple database spaces are not supported. Redis in a single machine can support up to 16 databases. In the cluster mode, only one database space can be used, that is, DB 0

reference resources

  1. Redis Sentinel Documentation
  2. Further study redis (4): sentry
  3. Redis design and Implementation
  4. Redis deep Adventure: core principles and application practice