Redis replication process details

Time:2020-9-11

The replication function of redis is divided into two steps: sync and command propagate

  • Synchronization is used to update the database state of the slave server to the current database state of the master server.
  • Command propagation is used to make the database of the master-slave server return to the consistent state when the database state of the master server is modified, resulting in the inconsistency of the database state of the master-slave server.

synchronization

Redis uses PSYNC command to complete master-slave data synchronization. The synchronization process is divided into full replication and partial replication.

Full replication: it is generally used in the primary replication scenario. It will send all the data from the master node to the slave node at one time. When the amount of data is large, it will cause a lot of cost to the master-slave node and the network.

Partial replication: it is used to deal with the network loss caused by network flash off in master-slave replication. When the slave node reconnects with the master node, if conditions permit, the master node will reissue the lost data to the slave node. Because the reissued data is far less than the full data, it can effectively avoid the high overhead of full replication.

To run the PSYNC command, the following components are required:

  • Master and slave nodes copy offsets separately
  • Master replicate backlog buffer
  • Master node running ID

Slave nodes participating in replication maintain their own copy offset. After processing the write command, the master node will record the byte length of the command and count the master in info replication_ repl_ Offset indicator.
After receiving the command sent by the master node, the slave node will also accumulate its own offset and report its copy offset to the master node every second.
By comparing the copy offset of master-slave node, we can judge whether the data of master-slave node is consistent.

The replication backlog buffer is a fixed length queue stored in the master node. The default size is 1MB. It is created when the master node has connected slaves. When the master node responds to a write command, it not only sends the command to the slave node, but also writes it to the replication backlog buffer.

The size of the replication backlog buffer is limited, so it can only save the most recent replication data, which can be used for data recovery in the case of partial copy and copy command loss.

Each redis node will dynamically assign a 40 bit hexadecimal string as the running ID after starting. The main function of the run ID is to uniquely identify the redis node. For example, the slave node saves the running ID of the master node to identify which master node it is replicating.

Full synchronization

Redis replication process details

slaveofExecution of orders

  • 1) The slave node sends the PSYNC command for data synchronization. Since it is the first replication, the slave node does not have the copy offset and the master node’s running ID, so the command sent is PSYNC? – 1.
  • 2) According to PSYNC? – 1, the master node resolves that it is a full replication and replies with + fullresync.
  • 3) The slave node receives the response data from the master node and saves the run ID and offset offset.
  • 4) The master node executes bgsave to save the RDB file locally. For RDB knowledge, please refer to redis RDB persistence details
  • 5) The master node sends the RDB file to the slave node, and the slave node saves the received RDB file locally and directly as the data file of the slave node. After receiving the RDB, the slave node prints the relevant log, and the amount of data sent by the master node can be viewed in the log.

You should pay attention to the master node with a large amount of data, such as when the RDB file generated exceeds 6GB or more. If the transfer time of RDB exceeds the value configured by repl timeout, the slave node will initiate to receive RDB files and clean up the downloaded temporary files, resulting in the failure of full replication.

  • 6) The master node still responds to the read command from the time the master node starts to save the RDB snapshot until the slave node receives the RDB snapshot. Therefore, the master node saves the write command in the replication client buffer during this period. After the slave node loads the RDB file, the master node sends the data in the buffer to the slave node to ensure the data consistency between the master and slave nodes.

If the master takes too long to create and transfer RDB, the master replication client buffer overflow may occur. The default configuration is client output buffer limit slave 256MB 64MB 60. If the buffer consumption continuously exceeds 64MB or exceeds 256MB directly within 60s, the primary node will directly close the replication client connection, resulting in full synchronization failure.

  • 7) After receiving all the data from the master node, the slave node will clear its old data. This step corresponds to the following log.
  • 8) The RDB file is loaded after the data is cleared from the node. For larger RDB files, this step is still time-consuming. The total RDB loading time can be determined by calculating the time difference between logs.
  • 9) The master server receiving the sync command executes the bgsave command, generates an RDB file in the background, and uses a buffer to record all write commands executed from now on.
  • 10) When the bgsave command of the master server is completed, the master server will send the RDB file generated by the gbsave command to the slave server, and the slave server will receive and load the RDB file, and update its database status to the database state when the master server executes the bgsave command.
  • 11) The master server sends all the write commands recorded in the buffer to the slave server. The slave server executes these write commands and updates its database status to the current state of the master server database.

By analyzing all processes of full replication, readers will find that full replication is a very time-consuming and laborious operation. Its time cost mainly includes:

  • Bgsave time of master node
  • RDB file network transmission time
  • Clear data time from node
  • Time to load RDB from node
  • Possible AOF rewrite time

In the process of full synchronization, it will not only consume a lot of time, but also carry out multiple persistence related operations and network data transmission, which will consume a lot of CPU, memory and network resources of the server where the master and slave nodes are located. Therefore, except for the first replication, which cannot be avoided by using full synchronization, other scenarios should avoid full replication and adopt partial synchronization function.

Partial synchronization

Partial replication is an optimization measure made by redis for the high overhead of full replication, which is implemented by PSYNC {runid} {offset}. When the slave node is copying the master node, if there are abnormal conditions such as network flash off or command loss, the slave node will ask the master node to reissue the lost command data. If the master node’s replication backlog buffer exists, it will be sent directly to the slave node, which ensures the consistency of master-slave replication. This part of the data reissued is generally far less than the total data, so the cost is very small.

Redis replication process details

  • 1) When the network is interrupted between the master and slave nodes, if the repl timeout time is exceeded, the master node will consider the slave node to be faulty and interrupt the replication connection.
  • 2) When the master-slave connection is interrupted, the master node still responds to the command, but the replication connection interrupt command cannot be sent to the slave node. However, there is a replication backlog buffer in the master node, which can still save the write command data of the latest period. The maximum cache size is 1MB by default.
  • 3) When the master-slave network is restored, the slave node will connect to the master node again.
  • 4) When the master-slave connection is restored, because the slave node has previously saved its own copied offset and the master node’s running ID. Therefore, they will be sent to the master node as PSYNC parameters to request a reissue replication operation.
  • 5) After receiving the PSYNC command, the master node first checks whether the parameter runid is consistent with itself. If it is consistent, it indicates that the current master node has been copied before. Then, it searches its own replication backlog buffer according to the parameter offset. If the data after the offset exists in the buffer, it sends a + continue response to the slave node, indicating that partial replication is possible.
  • 6) The master node sends the data in the backup buffer to the slave node according to the offset to ensure that the master-slave replication enters the normal state.

Heartbeat detection

After the master and slave nodes establish replication, they maintain long connections and send heartbeat commands to each other, as shown in the figure below.

The judgment mechanism of master-slave heartbeat is as follows:

  • 1) Both master and slave nodes have heartbeat detection mechanism. They simulate each other’s clients for communication. Through the client list command, the master node’s connection status is flags = m, and the slave node’s connection status is flags = s.
  • 2) By default, the master node sends ping command to the slave node every 10 seconds to judge the survival and connection status of the slave node. The sending frequency can be controlled by repl Ping slave period parameter.
  • 3) The slave node sends the replconf ACK {offset} command every 1 second in the master thread to report its current replication offset to the master node.

The replconf command can not only monitor the network status of the master-slave node in real-time, but also report the replication offset of the slave node. The master node will check whether the replication data is lost according to the offset uploaded from the slave node. If the data from the slave node is lost, the master node will pull the lost data from the master node’s replication cache and send it to the slave node.

Asynchronous replication and command propagation

The master node is not only responsible for data reading and writing, but also responsible for synchronizing the write command to the slave node. The sending process of the write command is asynchronous, that is to say, the master node will directly return the write command to the client after processing the write command, and does not wait for the replication of the slave node to complete.

Redis replication process details

This asynchronous process is handled by command propagation, which not only sends write commands to all slave servers, but also queues them into the replication backlog buffer.

Personal blog, welcome to play

Redis replication process details