Redis full replication and partial replication

Time:2020-11-22

Redis master slave replication

  • Redis instance is divided into master node and slave node
  • By default, redis is the master node
  • Each slave node can only have one master node, and the master node can have multiple slaves at the same time
  • The replicated data flow is unidirectional and can only be replicated from the master node to the slave node
  • The slaveof command can be configured dynamically at runtime or written to the configuration file in advance
  • Master slave replication
step Detailed description
Save master node information After slaveof, the slave node only saves the address information of the master node, and then it returns directly
Master slave establishes socket connection When a new master node is found, it will try to establish a network connection with the node; The slave node will establish a socket socket, which is specially used to receive the replication command sent by the host node. If the slave node cannot establish a connection, the scheduled task will try again indefinitely until the connection is successful or the slave of no one is executed to cancel the replication
Send ping command After a successful connection is established, the slave node sends a ping request for the first communication. The purpose of the Ping request is to detect whether the socket between the master and the slave is available, and whether the master node can currently accept the processing command. If the slave node does not receive the response from the master node after sending the ping command, or the master node is blocking and unable to respond to the command, the slave node will end Port replication connection, the next scheduled task will initiate reconnection
Authority verification If the master node sets the requirepass parameter, password verification is required. The slave node must configure the masterauth parameter to ensure that the password is the same as that of the master node. If the verification fails, the replication will be terminated and the slave node will initiate the replication process again
Synchronous dataset After the master-slave replication is connected to normal communication, the master node will send all the data it holds to the slave node for the first time
Command continuous copy When the master node synchronizes the current data to the slave node, it becomes a replication establishment process. Next, the master node will continuously send write commands to the slave node to ensure the consistency of the master-slave data
  • Start 6380, 6381
  • 6381 execute command
127.0.0.1:6381> slaveof 127.0.0.1 6380

Redis 5.0.0 should be changed to: replicaof < masterip > < masterport >
  • 6380 start

Redis full replication and partial replication

  • 6381 start

Redis full replication and partial replication

  • View info replication

Redis full replication and partial replication

Data synchronization

type describe
Full replication Generally used in the first replication scenario, redis only supports full replication in the early stage. It will send all the data of the master node to the slave node at one time. When the amount of data is large, it will cause a lot of cost to the master-slave node and the network
Partial reproduction It is used to deal with the data loss scenario caused by network flash off in master-slave replication. When the slave node is connected to the master node again, if the conditions permit, the master node will reissue the lost data to the slave node. Because the reissued data is far less than the full data, it can effectively avoid the high overhead of full replication

Copy offset

parameter describe
master_repl_offset The master and slave nodes participating in replication maintain their own copy offset. After processing the write command, the master node will make a cumulative record of the byte length of the command. The statistics are in the master in info replication_ repl_ In the offset index
slave0 The slave node reports its own copy offset to the master node every second, so the master node also saves the copy offset of the slave node
slave_repl_offset After receiving the command sent by the master node, the slave node will also accumulate its own offset.
  • By comparing the replication offset of master-slave nodes, we can judge whether the data of master-slave nodes are consistent

Copy backlog buffer

  • The replication backlog buffer is a fixed length queue stored on the master node. The default size is 1MB. When the master node has connected slave nodes, it is created. When the master node responds to the write command, it will not only send the name to the slave node, but also write to the replication backlog buffer
  • Since the buffer is essentially a fixed length FIFO queue, it can save the most recently copied data, which can be used to recover the data lost by partial replication and copy command
parameter describe
repl_backlog_active:1 Turn on copy buffer
repl_backlog_size:1048576 Maximum buffer length
repl_backlog_first_byte_offset:1 Start offset to calculate the available range of the current buffer
repl_backlog_histlen:2301 The effective length of the saved data
master_replid Master of master node instance_ Replid is the same
master_replid2 There is no switch, that is, the primary instance has not changed, so the initial value is 0

PSYNC command

  • The slave node uses PSYNC command to complete partial replication and full replication
30227:M 05 Aug 2019 18:52:44.698 * Replica 127.0.0.1:6381 asks for synchronization
30227:M 05 Aug 2019 18:52:44.698 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'e7d71fb600183a175afadbd1354e97edddb2541a', my replication IDs are 'e24f6e42917e7c162ec45a713b0ee3872005ee8b' and '0000000000000000000000000000000000000000')

Analysis of 6381 slave node printing

31771:S 06 Aug 2019 12:21:40.213 * DB loaded from disk: 0.000 seconds
31771:S 06 Aug 2019 12:21:40.213 * Before turning into a replica, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
#Start successfully
31771:S 06 Aug 2019 12:21:40.213 * Ready to accept connections
#Start to connect the master node
31771:S 06 Aug 2019 12:21:40.214 * Connecting to MASTER 127.0.0.1:6380
#Start syncing
31771:S 06 Aug 2019 12:21:40.214 * MASTER <-> REPLICA sync started
31771:S 06 Aug 2019 12:21:40.214 * Non blocking connect for SYNC fired the event.
31771:S 06 Aug 2019 12:21:40.214 * Master replied to PING, replication can continue...
#Try incremental synchronization
31771:S 06 Aug 2019 12:21:40.214 * Trying a partial resynchronization (request 668b25f85e84c5900e1032e4b5e1f038f01cfa49:5895).
#Full synchronization
31771:S 06 Aug 2019 12:21:40.215 * Full resync from master: c88cd043d66193e867929d9d5fadc952954371e5:0
31771:S 06 Aug 2019 12:21:40.215 * Discarding previously cached master state.
31771:S 06 Aug 2019 12:21:40.240 * MASTER <-> REPLICA sync: receiving 224 bytes from master
31771:S 06 Aug 2019 12:21:40.241 * MASTER <-> REPLICA sync: Flushing old data
31771:S 06 Aug 2019 12:21:40.241 * MASTER <-> REPLICA sync: Loading DB in memory
31771:S 06 Aug 2019 12:21:40.241 * MASTER <-> REPLICA sync: Finished with success

Full replication

Redis full replication and partial replication

  • Full replication is the earliest replication mode supported by redis, and it is also the stage that master-slave must go through when establishing replication for the first time. The commands that trigger full replication are sync and PSYNC
      1. Send the PSYNC command for data synchronization. Since it is the first time to replicate, the slave node does not have the copy offset and the master node’s running ID, so it sends psync-1
      1. According to psync-1, the master node resolves that it is a full replication and replies with + fullresync
      1. The slave node receives the response data from the master node and saves the run ID and offset offset
      1. The master node executes bgsave to save the RDB file to the local
31651:M 06 Aug 2019 11:08:40.802 * Starting BGSAVE for SYNC with target: disk
31651:M 06 Aug 2019 11:08:40.802 * Background saving started by pid 31676
31676:C 06 Aug 2019 11:08:40.805 * DB saved on disk
31676:C 06 Aug 2019 11:08:40.806 * RDB: 0 MB of memory used by copy-on-write
31651:M 06 Aug 2019 11:08:40.886 * Background saving terminated with success
31651:M 06 Aug 2019 11:08:40.886 * Synchronization with replica 127.0.0.1:6381 succeeded
      1. The master node sends the RDB to the slave node, and the slave node saves the received RDB file locally and directly as the data file of the slave node. After receiving the RDB, the slave node prints the relevant log
31645:S 06 Aug 2019 11:08:40.886 * MASTER <-> REPLICA sync: receiving 224 bytes from master
      1. During the period from receiving RDB snapshot to receiving completion, the master node still responds to the read-write command. Therefore, the master node saves the write command data in the replication client buffer during this period. After the slave node loads the RDB file, the master node sends the data in the buffer to the slave node to ensure the data consistency between the master and slave nodes.
    • redis.conf to configure
client-output-buffer-limit replica 256mb 64mb 60
    • If the primary node takes too long to create and transfer RDB, it is very easy to cause the master node replication client buffer overflow for high traffic write scenarios. The default configuration is as shown above. If the buffer consumption in 60 seconds continues to exceed 64MB or directly exceeds 256MB, the primary node will directly close the replication client connection, resulting in full synchronization failure
    • For the primary node, after sending all the data, it is considered that the full replication is completed
31651:M 06 Aug 2019 11:08:40.886 * Synchronization with replica 127.0.0.1:6381 succeeded
      1. After receiving all the data from the master node, the slave node will clear its old data
31645:S 06 Aug 2019 11:08:40.886 * MASTER <-> REPLICA sync: Flushing old data
      1. The RDB file is loaded after the data is cleared from the node. For larger RDB files, this step is still time-consuming. The total RDB loading time can be determined by calculating the time difference between logs
31645:S 06 Aug 2019 11:08:40.886 * MASTER <-> REPLICA sync: Loading DB in memory
31645:S 06 Aug 2019 11:08:40.886 * MASTER <-> REPLICA sync: Finished with success
      1. After the RDB is successfully loaded from the node, if the AOF persistence function is enabled on the current node, it will immediately perform the bgrewrite AOF operation. In order to ensure that the AOF persistence file is available immediately after full replication.
  • Reasons for time-consuming of full replication:
    • Bgsave time of master node
    • RDB file network transmission time
    • Clear data time from node
    • Possible AOF rewrite time
  • The following is redis 3.0
identification meaning
M Current master node log
S Current is slave node log
C Subprocess log

Partial reproduction

Redis full replication and partial replication

  • Partial replication is an optimization measure made by redis for the high overhead of full replication, which is implemented by PSYNC {runid} {offset}. When the slave node is copying the master node, the slave node will request the master node to reissue the lost command data in case of network flash down or command loss. If the master node’s replication backlog buffer memory is damaged, the data will be sent directly to the slave node, so as to maintain the consistency of master-slave replication. This part of the data reissued is generally far less than the full data
      1. When the master node’s direct network is interrupted, if the repl timeout time is exceeded, the master node will consider that the slave node has failed and interrupt the replication connection
31767:M 06 Aug 2019 14:13:26.096 # Connection with replica 127.0.0.1:6381 lost.
      1. When the master-slave connection is interrupted, the master node still responds to the command, but the replication connection interrupt command cannot be sent to the slave node. However, the replication backlog buffer inside the master node can still save the latest write command data. By default, the maximum cache size is 1MB, which can be viewed through the into replication
      1. When the slave node network is restored, the slave node will connect to the master node again
Print from node:
31934:S 06 Aug 2019 14:20:54.745 * MASTER <-> REPLICA sync started
31934:S 06 Aug 2019 14:20:54.745 * Non blocking connect for SYNC fired the event.
31934:S 06 Aug 2019 14:20:54.745 * Master replied to PING, replication can continue...
31934:S 06 Aug 2019 14:20:54.745 * Trying a partial resynchronization (request c88cd043d66193e867929d9d5fadc952954371e5:9996).
31934:S 06 Aug 2019 14:20:54.746 * Successful partial resynchronization with master.
31934:S 06 Aug 2019 14:20:54.746 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.

Master node printing:
31767:M 06 Aug 2019 14:21:49.065 * Replica 127.0.0.1:6381 asks for synchronization
31767:M 06 Aug 2019 14:21:49.066 * Partial resynchronization request from 127.0.0.1:6381 accepted. Sending 0 bytes of backlog starting from offset 10066.
      1. When the master-slave connection is restored, because the slave node has previously saved its own copied offset and the master node’s running ID. Therefore, they will be sent as PSYNC parameters to the master node for partial replication
31938:S 06 Aug 2019 14:21:49.065 * Trying a partial resynchronization (request c88cd043d66193e867929d9d5fadc952954371e5:10066).
      1. After receiving the PSYNC command, the master node first checks whether the parameter runid is consistent with itself. If it is consistent, it indicates that the current master node was copied before. Then, it looks up its own replication backlog buffer according to the parameter offset. If the data after the offset exists in the buffer, it sends a + coutinue response to the slave node, indicating that partial replication is possible. The reply received from the node is printed as follows:
31938:S 06 Aug 2019 14:21:49.066 * Successful partial resynchronization with master.
31938:S 06 Aug 2019 14:21:49.066 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
      1. The master node sends the data in the backup buffer to the slave node according to the offset to ensure that the master-slave replication enters the normal state. The amount of data sent can be obtained from the log of the master node
31767:M 06 Aug 2019 14:21:49.065 * Replica 127.0.0.1:6381 asks for synchronization
31767:M 06 Aug 2019 14:21:49.066 * Partial resynchronization request from 127.0.0.1:6381 accepted. Sending 0 bytes of backlog starting from offset 10066.

heartbeat

  • After the master and slave nodes establish replication, they maintain long connections and send heartbeat commands to each other
  • The judgment mechanism of master-slave heartbeat is as follows
      1. Both master and slave nodes have heartbeat detection mechanism. They simulate each other’s clients to communicate. The connection status of master node is flags = m, and that of slave nodes is flags = s
      1. By default, the master node sends ping command to the slave node every 10 seconds to judge the survival and connection status of the slave node. The transmission frequency can be controlled by repl Ping replica period 10
      1. The slave node sends the replconf ACK {offset} command every second in the master thread to report its current replication offset to the master node. The master node judges the timeout time of the slave node according to the replconf command, which is reflected in the lag information in info replication statistics. Lag represents the number of seconds of the last communication delay of the slave node, and the normal delay should be between 0 and 1. If the value of repl timeout configuration is exceeded (the default value is 60 seconds), it is determined that the slave node is offline and the replication client is disconnected. Even if the master node decides that the slave node is offline, if the slave node recovers again, the heartbeat detection and execution continue

Asynchronous replication

  • The master node is not only responsible for data reading and writing, but also responsible for synchronizing the write command to the slave node. The sending process of the write command is asynchronous, that is to say, the master node will directly return the write command to the client after processing the write command, and does not wait for the replication of the slave node to complete.

Separation of reading and writing

  • For the scenario with high read occupancy, we can reduce the pressure on the master node by allocating a part of the read traffic to the slave node. At the same time, we should pay attention to write operations only to the master node
  • It is suggested that you should consider using redis cluster and other distributed solutions before doing read-write separation

Recommended Today

MVC and Vue

MVC and Vue This article was written on July 27, 2020 The first question is: is Vue an MVC or an MVVM framework? Wikipedia tells us: MVVM is a variant of PM, and PM is a variant of MVC. So to a certain extent, whether Vue is MVC or MVVM or not, its ideological direction […]