Full analysis of redis master-slave replication

Time:2021-9-3

How does redis master-slave replication work? Do you know how to synchronize data while maintaining high performance?

    • https://redis.io/topics/replication
      Note that the following is based on the latest version of redis 5,slaveNouns and configuration items have been officially changed toreplicaIn fact, it is a thing that refers to the slave node.

Basic process of master-slave replication

# Master-Replica replication. Use replicaof to make a Redis instance a copy of
# another Redis server. A few things to understand ASAP about Redis replication.
#
# +------------------+ +---------------+
# | Master | ---> | Replica |
# | (receive writes) | | (exact copy) |
# +------------------+ +---------------+
#
# 1) Redis replication is asynchronous, but you can configure a master to
# stop accepting writes if it appears to be not connected with at least
# a given number of replicas.
# 2) Redis replicas are able to perform a partial resynchronization with the
# master if the replication link is lost for a relatively small amount of
# time. You may want to configure the replication backlog size (see the next
# sections of this file) with a sensible value depending on your needs.
# 3) Replication is automatic and does not need user intervention. After a
# network partition replicas automatically try to reconnect to masters
# and resynchronize with them.
#
# replicaof
mainMasterAnd fromreplicaBasic process of replication

file

  • When the connection between the master and replica is stable, the master continues to perform incremental synchronization(partial resync), send incremental data to replica. After receiving the data, replica updates its own data in secondsREPLCONFACK Ping reports the processing status to the master.
  • If the replica is disconnected from the master and reconnected, the replica attempts to send a messagePSYNCCommand to the master if the conditions are met (for example, the reference is a known historical copy, orbacklogIf the backlog is sufficient, it triggers the continuation of incremental synchronization(partial resync)。 Otherwise, the master will trigger a full synchronization to the replica(full resync
    file

From the above basic process, we can see that if there is a problem with the network, we can lead to full synchronization(full resync), which will seriously affect the data progress of catching up with the master from the replica.
So how to solve it?
It can be from two aspects: master-slave response time strategy and master-slave space accumulation strategy.

Master slave response time strategy
  • 1. Ping the master every repl Ping replica period seconds to check whether the master is hung.
repl-ping-replica-period 10
  • 2. Replication timeout between replica (save) and master. The default value is 60s
  • a) From the replica perspective, RDB data transmitted by the master is not received during full synchronization sync
  • b) From the perspective of replica, no packet sent by master or Ping response sent by replica was received
  • c) The master angle does not receive the replica’s repconf ack pings (copy offset offset).
    When redis detects the repl timeout timeout (the default value is 60s), it will close the master-slave connection, and redis replica initiates a request to re-establish the master-slave connection.
repl-timeout 60
Master-slave space stacking strategy

After the master accepts data writing, it will writereplication buffer(this is mainly used for the data transmission buffer of master-slave replication), and it is also written to the backlogreplication backlog
When the replica disconnects and reconnects PSYNC (including replication ID and offset processed at present), ifreplication backlogIf a historical copy can be found in, an incremental synchronization is triggered(partial resync), otherwise it will be triggered
The master synchronizes the replica in full once(full resync)。

# Set the replication backlog size. The backlog is a buffer that accumulates
# replica data when replicas are disconnected for some time, so that when a replica
# wants to reconnect again, often a full resync is not needed, but a partial
# resync is enough, just passing the portion of data the replica missed while
# disconnected.
#
# The bigger the replication backlog, the longer the time the replica can be
# disconnected and later be able to perform a partial resynchronization.
#
# The backlog is only allocated once there is at least a replica connected.
#
# repl-backlog-size 1mb

a backlogreplication backlogRelevant parameters of:

#Incremental synchronization window
repl-backlog-size 1mb 
repl-backlog-ttl 3600

Full resync full synchronization workflow

file
Workflow of full synchronization:

  • Replica sends PSYNC.
    (it is assumed that the conditions for full synchronization are met)
  • The master handles the full synchronization through the sub process, and the sub process passes the synchronizationBGSAVECommand, fork a child process to write the snapshot dump.rdb. At the same time, the master starts buffering all new write commands received from the client toreplication buffer
  • The master sub process transmits RDB data to replica through the network card.
  • Replica saves RDB data to disk and then loads it into memory (deletes old data and blocks loading new data)
    (followed by incremental synchronization)

If the disk of the master is slow and the bandwidth is good, the diskless mode can be used (note that this is experimental):

Repl diskless sync no -- > Yes enables diskless mode
repl-diskless-sync-delay 5

Replica can provide services by default during full synchronization or disconnection.

replica-serve-stale-data yes

During the time window when the replica is loaded into memory, the replica will block the connection of the client.

Allow writes only with n attached replicas

file
The master uses asynchronous replication by default, which means that the client writes the command. The master needs to confirm it, and confirm that there are at least N copies, and the delay is less than m seconds, then it will accept the write, otherwise an error is returned

#It is not turned on by default
min-replicas-to-write     
min-replicas-max-lag

In addition, the client can useWAITThe command is similar to the ACK mechanism and can ensure that there are a specified number of confirmed copies in other redis instances.

127.0.0.1:9001>set a x
OK.
127.0.0.1:9001>wait 1 1000
1

Failover

file
replication IDIt is mainly used to identify the dataset ID from the current master.
There are two replication IDS: master_ replid,master_ replid2

127.0.0.1:9001> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=9011,state=online,offset=437,lag=1
master_replid:9ab608f7590f0e5898c4574299187a52ad0db7ec
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:437
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:437

When the master hangs up and one of the replicas is upgraded to master, it will open a new era and generate a new replication ID:master_replid
At the same time, the oldmaster_replidSet tomaster_replid2

# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=9021,state=online,offset=34874,lag=0
slave1:ip=127.0.0.1,port=9001,state=online,offset=34741,lag=0
master_replid:dfa343264a79179c1061f8fb81d49077db8e4e5f
master_replid2:9ab608f7590f0e5898c4574299187a52ad0db7ec
master_repl_offset:34874
second_repl_offset:6703
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:34874

In this way, when other replicas connect to a new master, they do not need another full synchronization. They can continue to synchronize the replica and then use the new era data.

How does replica handle expired keys?

  • Replica does not actively delete expired keys. Replica will delete them only when the master gives the composite del command to replica through LRU and other memory elimination strategies or active access expiration
  • There is a time difference above. Replica uses a logical clock internally. When the client attempts to read an expired key, replica will report that it does not exist.

@SvenAugustus(https://www.flysium.xyz/)
More attention to WeChat official account, focus on sharing the dry cargo related to server development and programming:

Recommended Today

The selector returned by ngrx store createselector performs one-step debugging of fetching logic

Test source code: import { Component } from ‘@angular/core’; import { createSelector } from ‘@ngrx/store’; export interface State { counter1: number; counter2: number; } export const selectCounter1 = (state: State) => state.counter1; export const selectCounter2 = (state: State) => state.counter2; export const selectTotal = createSelector( selectCounter1, selectCounter2, (counter1, counter2) => counter1 + counter2 ); // […]