In depth analysis of redis high availability series: persistent AOF and RDB


Welcome to official account brother: “code farmer rich brother”, dedicated to share the back-end technology (high concurrent architecture, distributed cluster system, Message Queuing Middleware, network, micro service, Linux, TCP/IP, HTTP, MySQL, Redis), Python and other original dry cargo and interview guide!

Free video benefits recommendation:

2T learning video tutorial + e-book free: bat interview intensive lecture video, 100 million level traffic second kill system, distributed system architecture, middleware message queue, python go introduction to proficient, Java practical project, Linux, network, MySQL high performance, redis cluster architecture, big data, architect crash, micro service, containerized docker k8s, elk Stack log system and other free video tutorial!

Overview of redis high availability

Before introducing redis high availability, let’s first explain the meaning of high availability in the context of redis.

We know that in web server, high availability refers to the time that the server can access normally. The standard of measurement is how long it can provide normal services (99.9%, 99.99%, 99.999%, etc.). However, in the context of redis, the meaning of high availability seems to be broader. In addition to ensuring the provision of normal services (such as master-slave separation and rapid disaster recovery technology), we also need to consider the expansion of data capacity and data security.

In redis, technologies to achieve high availability mainly include persistence, replication, sentry and cluster. The following describes their functions and the problems they solve.

  • Persistence: persistence is the simplest high availability method (sometimes not even classified as a high availability method). Its main function is data backup, that is, data is stored in the hard disk to ensure that data will not be lost due to process exit.
  • Replication: replication is the foundation of highly available redis, and sentry and cluster are all based on replication to achieve high availability. Replication mainly realizes multi machine backup of data, load balance of read operation and simple fault recovery. Defects: failure recovery cannot be automated; write operation cannot be load balanced; storage capacity is limited by single machine.
  • Sentry: on the basis of replication, sentry realizes automatic fault recovery. Defects: write operations can not be load balanced; storage capacity is limited by a single machine.
  • Cluster: through cluster, redis solves the problems of load balancing of write operation and storage capacity limited by single machine, and realizes a relatively perfect high availability solution.

Overview of redis persistence

Redis’s data is all in memory. If it goes down suddenly, all data will be lost. Therefore, there must be a mechanism to ensure that redis’s data will not be lost due to failure. This mechanism is the persistence mechanism of redis.

Redis provides two ways for persistence:

  • RDB: in the specified time interval to your data snapshot storage.
  • Aof: records every write operation to the server. When the server restarts, these commands will be re executed to recover the original data.

Because of the better real-time performance of AOF persistence, that is, less data is lost when the process exits unexpectedly, AOF is the mainstream persistence method at present, but RDB persistence still has its place.

Next, RDB persistence and AOF persistence are introduced in turn;

RDB persistence

RDB is the default persistence method, according to a certain policy, periodically generate a snapshot of the data in memory and save it to disk.

Each snapshot persistence is to write the memory data to the disk once, not to synchronize the dirty data incrementally. If there is a large amount of data and there are many write operations, it will inevitably cause a large number of disk IO operations, which may seriously affect the performance.

1. Working principle:

  • Redis calls fork () to generate a child process.
  • The subprocess writes the data to a temporary RDB file.
  • When the subprocess finishes writing the new RDB file, it replaces the old RDB file.

2. Trigger mechanism

RDB trigger persistence is divided into manual trigger and automatic trigger

  1. Save command (triggered manually)

When a client sends a Save command request to the redis server for persistence, because redis uses a main thread to process all requests, the Save command will block the redis server from processing requests from other clients until the data synchronization is completed. The Save command will block the process of the redis server until the RDB file is created. During the blocking period of the redis server, the server cannot process any command requests, so it is not recommended to use it in online environment

  1. Bgsave command (triggered manually)

Different from the Save command, bgsave is executed asynchronously. After the bgsave command is executed, the redis main process forks a sub process to save the data to the RDB file. After synchronizing the data, the original file is replaced, and then the main process is informed that the synchronization is completed.

  1. Auto trigger

In addition to manually triggering RDB persistence, redis also has an automatic triggering mechanism,

The centralized configuration of save Mn in configuration indicates that the system will automatically trigger bgsave operation when there are n modifications to the dataset in M seconds.

3. RDB automatic persistence configuration

#Time strategy
save 900 1
save 300 10
save 60 10000

#File name
dbfilename dump.rdb

#File save path
dir /etc/redis/data/

#If persistence fails, does the main process stop writing
stop-writes-on-bgsave-error yes

#Is it compressed
rdbcompression yes

#Check when importing
rdbchecksum yes
The RDB persistence strategy is relatively simple, which is explained as follows:

save 900 1 It means that if there is a write command within 900s, a snapshot will be triggered, which can be understood as a backup
save 300 10 It means that if there are 10 writes within 300s, a snapshot will be generated
The following is similar, so why do you need to configure so many rules? Because the read and write requests of redis in each period are definitely not balanced, in order to balance performance and data security, we can freely customize when to trigger the backup. So here is to make reasonable configuration according to its own redis writing situation.

stop-writes-on-bgsave-error yes This configuration is also a very important one. When the backup process makes an error, the main process stops accepting new write operations to protect persistent data consistency. If your business has a complete monitoring system, you can disable this configuration, otherwise, please open it.

rdbcompression yes It is used to configure whether to compress RDB files. It is recommended that you do not need to turn on it. After all, redis is a CPU intensive server. Turning on compression again will bring more CPU consumption. Compared with the cost of hard disk, CPU is more valuable.

rdbchecksum yesWhether to turn on the RDB file verification will work when writing and reading files; turning off the checksum will bring about 10% performance improvement when writing and starting files, but it can’t be found when the data is damaged

dbfilename dump.rdbRDB file name

dir ./Directory of RDB and AOF files

Of course, if you want to disable RDB configuration, it’s very easy. Just write “save” on the last line of save

4. Execution flow chart

In depth analysis of redis high availability series: persistent AOF and RDB

1) The redis parent process first determines whether it is currently executing save or the child process of bgsave / bgrewriteof (which will be described in detail later). If it is executing, the bgsave command returns directly. The sub processes of bgsave / bgrewriteof can not be executed at the same time, mainly based on performance considerations: two concurrent sub processes execute a large number of disk write operations at the same time, which may cause serious performance problems.

2) The parent process performs fork operation to create a child process. In this process, the parent process is blocked, and redis cannot execute any command from the client

3) After the parent process fork, the bgsave command returns the “background saving started” information and no longer blocks the parent process, and can respond to other commands

4) The child process creates an RDB file and generates a temporary snapshot file according to the memory snapshot of the parent process. After that, the original file is replaced by atoms

5) The child process sends a signal to the parent process to indicate completion, and the parent process updates the statistical information

5. Data Recovery & redis starts loading data

The loading of RDB files is automatically executed when the server starts, and there is no special command. However, due to the higher priority of AOF, when AOF is turned on, redis will give priority to loading AOF files to recover data;

Only when AOF is closed will RDB files be detected and loaded automatically when redis server is started. The server is blocked during the loading of RDB files until the loading is complete.

Therefore, if the memory data of redis is large, it will lead to long data recovery time. Therefore, online practice is more inclined to limit the memory of a single redis, and use multi node deployment combined with redis cluster

The execution of auto loading can be seen in the redis startup log
In depth analysis of redis high availability series: persistent AOF and RDB

When redis loads an RDB file, it checks the RDB file. If the file is damaged, an error will be printed in the log, and redis fails to start.

If you know more about Redis’s high availability cluster architecture, you can focus on the official account [code farmer and rich brother] and reply to Redis to get the Redis high availability cluster architecture video.

Aof persistence

RDB snapshots are not very reliable. If your computer suddenly goes down, or the power is cut off, or accidentally kills the process, then the latest data will be lost. Aof file provides a more reliable way of persistence. Whenever redis receives a command that will modify the dataset, it will append the command to the AOF file. When you restart redis, the command in AOF will be re executed to reconstruct the data.

1. Working principle

Since each write command of redis needs to be recorded, AOF does not need to be triggered. The execution process of AOF includes:

  • Append: append the write command of redis to the buffer AOF_ buf;
  • File write and file sync: according to different synchronization strategies, the AOF_ The content of buf is synchronized to the hard disk;
  • Rewrite: Rewrite AOF files periodically to achieve the purpose of compression.

In depth analysis of redis high availability series: persistent AOF and RDB

2. AOF persistent configuration

#Open AOF
appendonly yes

#File name
appendfilename "appendonly.aof"

#Synchronization mode
appendfsync everysec

#Whether to synchronize during AOF rewriting
no-appendfsync-on-rewrite no

#Override trigger configuration
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

#How to deal with errors when loading AOF
aof-load-truncated yes

#File rewriting strategy
aof-rewrite-incremental-fsync yes

3. AOF synchronization strategy

The synchronization step is divided into two steps

  • Redis will first append to the AOF buffer AOF after receiving the write command_ BUF, instead of writing directly to the file system, because AOF buffer is memory raised and the writing speed is extremely high, it can avoid writing commands to the hard disk every time, causing the hard disk IO to become the load bottleneck of redis
  • By calling the system function fsync(), the data in the AOF buffer is actually written to the disk for persistence. Because the data is stored in the buffer memory first, if the power is off or the machine is down, the data in the buffer will be lost if it is not in a hurry. Therefore, we must have a relatively reliable mechanism to ensure that the data is dropped.

Redis write command the command to write to disk is configured through appendfsync.
The three values of appendfsync represent three kinds of strategies

  • always: after the command is written into the AOF buffer, the system fsync operation is called immediately to synchronize to the AOF file, and the thread returns after fsync is completed. In this case, every write command has to be synchronized to the AOF file, and hard disk IO becomes a performance bottleneck.
  • noWhen the command is written to the AOF buffer, it calls the system write operation, does not make fsync synchronization for the AOF file, and is synchronous by the operating system, usually with a synchronous period of 30 seconds. In many cases, the time of data buffer accumulation is uncontrollable, and it can not guarantee the data synchronization.
  • everysecWhen the command is written to the AOF buffer, the system write operation is invoked. After the write completes, the thread returns; the fsync synchronous file operation is invoked once by a specialized thread per second. Everysec is a trade-off between the two strategies mentioned above. It is a balance between performance and data security. Therefore, it is the default configuration of redis and our recommended configuration.

4. AOF file rewrite

With the increase of write operations, AOF files will become larger and larger. For example, if you increment a counter 100 times, the final result is that the value of the counter in the data set is the final increment result, but the AOF file will record the 100 operations completely. In fact, only one command is needed to restore this record. That is to say, the 100 commands in the AOF file can be reduced to one. Therefore, redis supports the function of reconstructing AOF files in the background without service interruption.

Aof rewriting process:

In depth analysis of redis high availability series: persistent AOF and RDB

As for the process of file rewriting, there are two points to pay special attention to:

(1) Rewriting is done by the parent process fork and the child process;

(2) During rewriting, the write command executed by redis needs to be appended to the new AOF file, so redis introduces AOF_ rewrite_ BUF cache.

Compared with the figure above, the process of file rewriting is as follows:

1) The redis parent process first determines whether there is a child process executing bgsave / bgrewriteof. If there is, the bgrewriteof command will return directly. If there is a bgsave command, the bgsave command will be executed after the bgsave execution is completed. As mentioned earlier, this is mainly based on performance considerations.

2) The parent process performs fork operation to create a child process, in which the parent process is blocked.

3.1) after the parent process fork, the bgrewriteof command returns the information of “background append only file rewrite started” and does not block the parent process any more, and can respond to other commands. All the write commands of redis are still written to the AOF buffer and synchronized to the hard disk according to the appendfsync policy to ensure the correctness of the original AOF mechanism.

3.2) because the fork operation uses write time replication technology, the child process can only share the memory data of fork operation. Since the parent process is still responding to the command, redis uses AOF to rewrite the buffer (AOF in the figure)_ rewrite_ BUF) to save this part of data to prevent the loss of this part of data during the generation of new AOF file. In other words, during the execution of bgrewriteaof, the write command of redis is appended to AOF at the same time_ BUF and AOF_ rewirte_ BUF has two buffers.

4) According to the memory snapshot, the subprocess writes to the new AOF file according to the command merge rule.

5.1) after the child process writes the new AOF file, it sends a signal to the parent process, and the parent process updates the statistical information, which can be viewed through info persistence.

5.2) the parent process writes the data of AOF rewriting buffer to the new AOF file, which ensures that the database state saved in the new AOF file is consistent with the current state of the server.

5.3) replace the old file with the new AOF file to complete the AOF rewriting.

Override trigger:

  1. Manual trigger: call the bgrewriteaof command directly. The execution of this command is similar to that of bgsave: the fork subprocess performs specific work, and only blocks during fork.
  2. Auto trigger: by configurationauto-aof-rewrite-percentageAndauto-aof-rewrite-min-sizeTo do it

auto-aof-rewrite-percentage 100: redis will remember the size of the AOF file since it was last rewritten (if it has not been rewritten since redis started, remember the size of the AOF file used at startup). If the current file size exceeds the specified percentage of the remembered size, the configuration command will be triggered.  

auto-aof-rewrite-min-size 64mb: at the same time, you need to set a minimum file size. Only when the file size is greater than this value will the file be rewritten, in case the file is very small but has reached the percentage.   

To disable automatic log rewriting, we can set the percentage to 0:

auto-aof-rewrite-percentage 0: disable log rewriting

5. Data Recovery & redis starts loading data

As mentioned earlier, when AOF is enabled, redis will give priority to loading AOF files to recover data;

The RDB file is loaded to recover data only when AOF is closed.

When AOF is enabled and the AOF file exists, redis starts the log:
In depth analysis of redis high availability series: persistent AOF and RDB

Persistence scheme selection

1. Advantages and disadvantages of RDB and AOF

RDB and AOF have their own advantages and disadvantages

RDB persistence

  • Advantages: RDB file compact, small size, fast network transmission, suitable for full replication; recovery speed is much faster than AOF. Of course, one of the most important advantages of RDB over AOF is its relatively small impact on performance.
  • Disadvantages: the fatal disadvantage of RDB file lies in the persistence mode of its data snapshot, which determines that real-time persistence cannot be achieved. In today’s more and more important data, a lot of data loss is often unacceptable, so AOF persistence has become the mainstream. In addition, RDB files need to meet the specific format, and the compatibility is poor (for example, the old version of redis is not compatible with the new version of RDB files).

Aof persistence

  • Corresponding to RDB persistence, AOF has the advantages of supporting second level persistence and good compatibility, but it has the disadvantages of large file size, slow recovery speed and great impact on performance.

2. Performance and Practice

From the above analysis, we all know that fork is required for RDB snapshot and AOF rewriting, which is a heavyweight operation and will block redis. Therefore, in order not to affect the response of the redis main process, we need to reduce the blocking as much as possible.

  • Reduce the frequency of fork. For example, you can manually trigger RDB to generate snapshot and rewrite with AOF;
  • Control the maximum memory usage of redis to prevent the fork from consuming too much time;
  • Use more powerful hardware;
  • Reasonable configuration of Linux memory allocation strategy to avoid fork failure due to insufficient physical memory.

What are we going to do online? I offer some of my own practical experience.

  • If the data in redis is not particularly sensitive or can be rewritten to generate data in other ways, persistence can be turned off. If the data is lost, it can be recovered in other ways;
  • Make your own policy to check redis regularly, and then manually trigger backup and rewrite data;
  • If multiple instances are deployed on a single machine, it is necessary to prevent multiple machines from running persistence and rewriting operations at the same time, prevent competition among memory, CPU and IO resources, and make persistence serial;
  • Can join the master-slave machine, using a slave machine for backup processing, other machines normally respond to the client’s command;
  • RDB persistence and AOF persistence can be used together.


Redis’s high availability series: persistence is over. Persistence mainly includes RDB and AOF technologies. According to the principles and processes described above, you can choose your own persistence scheme according to the specific online needs.

In addition, it’s not easy to write original technical articles. It takes a lot of time and energy. I hope you can get something from the articles! Your praise and collection can become the driving force for me to continue to output original articles! You can also pay attention to my official account and subscribe to more articles.

Welcome to the official account:Ma Nong Fu Ge“To share back-end technology (high concurrency architecture, distributed cluster system, Message Queuing Middleware, network, microservice, Linux, TCP / IP, HTTP, mysql, redis), Python and other” original dry goods and interview guide!
Reply to official account.resources】Free access to 2T programming videos and e-books, reply【Redis】Get redis high availability cluster architecture video

In depth analysis of redis high availability series: persistent AOF and RDB