How does redis persistence work? A comparative analysis of RDB and AOF

Time:2021-1-24

How does redis persistence work? A comparative analysis of RDB and AOF

In this article, we will introduce the mechanism of redis high availability. In order to achieve high availability, redis mainly includes the following aspects:

  • Data persistence
  • Master slave replication
  • Automatic fault recovery
  • Clustering

In this article, we first introduce the basis of redis’s high availability guarantee: data persistence. Because the master-slave replication and automatic fault recovery of redis need to rely on redis persistence related things. At the same time, the data persistence of redis can also be used for data backup to ensure the security of data.

Redis is an in memory database. Its data is stored in memory. If the instance goes down, all the data will be lost. How to ensure the integrity and security of data is also one of the important mechanisms to improve the high availability of services.

Redis provides a perfect persistence mechanism, which can persist the data in memory to disk, which is convenient for us to backup data and recover data quickly.

In this article, we will analyze how redis achieves data persistence? What’s the difference between RDB and AOF? And their different use scenarios.

Persistence mode

There are two main ways of data persistence provided by redis

  • RDB: generate a data snapshot file
  • Aof: real time append command log file

They correspond to different use scenarios, and we will analyze them in turn.

RDB

introduce

The full name of RDB is redis database backup file, also known as redis data snapshot.

We can make redis generate RDB snapshot file locally by executing save or bgsave command. This RDB file contains nearly complete data content of the whole instance.How does redis persistence work? A comparative analysis of RDB and AOFIts advantages are as follows:

  • RDB file data is compressed and written, so the volume of RDB file is smaller than the whole instance memory
  • When the instance is recovered from downtime, the loading speed of RDB file is very fast, and the data in the file can be recovered quickly in a short time

Its disadvantages are also obvious

  • Because it is a snapshot of data at a certain time, its data is not complete
  • The cost of generating RDB file is relatively large, it will consume a lot of CPU and memory resources

Therefore, RDB is more suitable for the following scenarios:

  • Master slave full synchronous data
  • Database backup
  • For the business scenario that is not sensitive to data loss, the instance can recover data quickly after downtime

The full amount of data synchronization between the master and slave of redis is carried out by using RDB files, which we will talk about in detail in later articles.

From this, we can see that RDB is very suitable for data backup. We can let redis generate RDB file regularly, and then back up the snapshot file.

Generate RDB on time

Redis also provides a configuration item to trigger the generation of RDB files regularly

#Generate at least one write in the last 15 minutes
save 900 1
#At least 10 writes were generated in the last 5 minutes
save 300 10
#Generate at least 10000 writes in the last minute
save 60 10000

If any of the above conditions are met, redis will automatically generate a new RDB file to reduce the difference between RDB data content and instance data.

Copy On Write

Both save and bgsave commands can generate RDB files on redis, but the former is executed in the foreground. That is to say, when generating RDB files, the whole instance will be blocked. Before RDB is generated, any request cannot be processed. For instances with large memory, generating RDB files is very time-consuming, which is obviously unacceptable.

So we usually choose to execute bgsave and let redis generate RDB files in the background, so that redis can still process client requests without blocking the entire instance.

But it’s not that there is no cost to generate RDB in the background. In order to write the snapshot of memory data to the file in the background, redis adopts the copy on write technology provided by the operating system, which is also known as fork system call.

Fork system call will produce a child process, which shares the same memory address space with the parent process, so that the child process can have the same memory data as the parent process at this time.

Although the child process and the parent process share the same memory address space, when forking the child process, the operating system needs to copy the memory page table of the parent process to the child process. If the whole redis instance occupies a large amount of memory, its memory page table will also be large, which will take more time to copy. At the same time, this process will consume a lot of CPU resources. Before the copy is completed, the parent process is also blocked and cannot process the client request.

After the fork is executed, the subprocess can scan all its memory data, and then write all the data to the RDB file.

After that, the parent process still processes the client’s request. When processing the write command, the parent process will reallocate the new memory address space, apply for new memory use from the operating system, and no longer share it with the child process. This process is the origin of the name copy on write. In this way, the memory of the parent-child process will be separated gradually, the parent process will apply for new memory space and change the memory data, and the memory data of the child process will not be affected.

It can be seen that when generating RDB files, it not only consumes CPU resources, but also needs to occupy up to twice the memory space.

When we execute the info command in redis, we can see the time-consuming of the fork subprocess. We can evaluate whether the fork time meets the expectation through this time-consuming. At the same time, we should ensure that the redis machine has enough CPU and memory resources, and reasonably set the time to generate RDB.

AOF

introduce

The full name of AOF is append only file. Different from RDB, AOF records the detailed information of each command, including complete command types, parameters, etc. As long as the write command is generated, it will be written to the AOF file in real time.

How does redis persistence work? A comparative analysis of RDB and AOF

We can open AOF through configuration file

#Open AOF
appendonly yes
#Aof file name
appendfilename "appendonly.aof"
#File disk brushing mode
appendfsync everysec

Brush way

After AOF is turned on, redis will record each write command to a file and persist it to disk. In order to ensure the security of data files, redis also provides an opportunity to swipe files

  • Appendfsync always: the disk is flushed every time it is written, which has the greatest impact on the performance. The IO ratio of the occupied disk is relatively high, and the data security is the highest
  • Appendfsync everysec: swipe the disk once a second, which has a relatively small impact on the performance. When a node goes down, it will lose up to one second of data
  • Appendfsync No: according to the mechanism of the operating system, disk brushing has the least impact on the performance, and the data security is low. The loss of data due to node downtime depends on the mechanism of operating system disk brushing

As can be seen from the above, the advantage of AOF over RDB is that AOF data files are updated more timely, and more complete data is saved than RDB, so that the data can be recovered as complete as possible and the risk of data loss can be reduced.

If there are both RDB files and AOF files, redis will give priority to using AOF files for data recovery.

But its disadvantages are also obvious

  • As time goes on, AOF files will become larger and larger
  • Aof file disk flushing will increase the burden of disk IO, which may affect the performance of redis (when disk flushing per second is turned on)

Aof rewriting

For the first case, redis provides the function of AOF slimming, which can be set to automatically trigger AOF rewriting when the AOF file is large. Redis will scan the data of the whole instance and regenerate an AOF file to achieve the slimming effect. But this rewriting process also needs a lot of CPU resources.

#If the AOF file grows by more than what percentage from the last file, rewriting will be triggered
auto-aof-rewrite-percentage 100
#What is the minimum size of AOF file before rewriting
auto-aof-rewrite-min-size 64mb

Because AOF can reduce the risk of data loss as much as possible, it is generally suitable for business scenarios that are sensitive to data loss, such as businesses involving money transactions.

Performance impact

If the timing of AOF is set to swipe every write, the write performance of redis will be greatly reduced, because every write command needs to write a file and swipe to the disk to return. When the write volume is large, the burden of disk IO will be increased. Performance and data security cannot be achieved at the same time. Although redis provides a real-time disk brushing mechanism, it is rarely used in real scenes.

Usually, we choose to swipe disk per second, which can ensure good write performance, and lose up to one second of data when the instance is down, so as to achieve a balance between performance and security.

summary

Our summary of RDB and AOF is shown in the table below.How does redis persistence work? A comparative analysis of RDB and AOFWe need to choose the appropriate persistence method for different business scenarios, and we can also use it according to the advantages of RDB and AOF to ensure the security of redis data and its performance.

Author: kaito
Link:http://kaito-kidd.com/2020/06…

How does redis persistence work? A comparative analysis of RDB and AOF