The author introduces: LV Lei, member of better team and meituan’s comments on senior DBA. Better team participated in tidb Hackathon 2019, and its project “fast Pitr based on tidb binlog” won the best contribution award.
Students who have maintained the database should be able to understand that data backup is very important to the database, especially the key business. The original backup and recovery scheme of tidb has been verified by many customers, but there are several pain points for the system with huge business volume as follows:
- When there is a large amount of data in the cluster, it is difficult to do full backup frequently.
- Traditional tidb binlog will consume a lot of disk space and it will take a long time to replay a large number of binlogs.
- Binlog itself has a forward dependency. The loss of binlog at any point in time will lead to the failure of automatic recovery of subsequent data.
- Increase tidb GC_ life_ Time saves more versions of snapshot data. On the one hand, it can not be saved indefinitely. On the other hand, too many versions will affect performance and occupy cluster space.
< center > Figure 1 native binlog backup recovery
We have been using tidb online for more than two years. From 1.0 RC version to 1.0 official version, 2.0, 2.1 and now 3.0, we can feel the rapid progress and performance improvement of tidb. However, these pain points of backup and recovery are a hindrance factor in the promotion of tidb in key business. Therefore, we choose this topic: fast point in time recovery based on tidb binlog, that is, fast point in time recovery based on tidb binlog, which realizes gradual merge based on tidb binlog, and realizes fast Pitr at the minimum cost, which solves some pain points of existing original backup recovery schemes of tidb.
- According to the characteristics of the Internet industry and the 2 / 8 principle, only 20% of the data that will be updated every day is updated frequently. We have also calculated that cud accounts for 15:20:2 in the online trillion level DML, and update accounts for more than 50%. In the binlog of row mode, we only record the front image and the final image, and we can get a very lightweight “differential backup”, as shown in the figure
< bincenter > principle
We divide the binlog by time. For example, the daily binlog is a segment, and each segment is merged according to the above principle. The binlog is merged into a backup set. The backup set is some independent files. Since each backup set has been removed from the conflict in the merge stage, on the one hand, the volume is compressed, on the other hand, the row level concurrent playback can be used to improve the playback speed. Combined with full backup, it can quickly recover to the target time point to complete the Pitr function. Moreover, another advantage of this kind of merging is that the generated backup set can form a mutual backup relationship with the native binlog file, and the backup set can be repeatedly generated through the native binlog file.
< center > Figure 3 binlog parallel playback
The binlog segmentation method can flexibly define the starting point and end point
-start-datetime string recovery from start-datetime, empty string means starting from the beginning of the first file -start-tso int similar to start-datetime but in pd-server tso format -stop-datetime string recovery end in stop-datetime, empty string means never end. -stop-tso int similar to stop-datetime, but in pd-server tso format
- On this basis, we do some optimization
< center > Figure 4 after optimization
The format of backup sets is the same as that of tidb binlog. Therefore, backup sets can be merged again as needed to form a new backup set, thus accelerating the whole recovery process.
Map reduce model
Since all changes of the same key (primary key or unique index key) need to be merged into one event, the latest merged data of the row where the key is located should be maintained in memory. If binlog contains a large number of different key changes, it will take up a lot of memory. Therefore, a map reduce model is designed to process binlog data
< center > Figure 5 binlog merge mode
- Mapping stage: read the binlog file, output the file according to the database name + table name through the Pitr tool, and then store them in different small files according to the key hash. In this way, the changes of the same line of data are saved in the same file, which is convenient for processing in the reduce stage.
- Reducing stage: merge small files according to rules, remove duplicate files, and generate backup set files.
|Original event type new event type merged event type|
| —- | —- |—- |
| INSERT | DELETE | Nil |
| INSERT | UPDATE |INSERT |
| UPDATE | DELETE | DELETE |
| UPDATE | UPDATE | UPDATE |
| DELETE | INSERT | UPDATE |
- With the official reparo tool, the backup set is played back to the downstream Library in parallel.
The binlog file output by drainer only contains the data of each column and lacks the necessary table structure information (PK / UK). Therefore, it is necessary to obtain the initial table structure information and update the table structure information when processing DDL binlog data. DDL processing is mainly implemented in DDL handle structure
< center > Figure 6 DDL processing
Firstly, the historical DDL information stored in tikv is obtained by configuring the restful API of tidb, and the initial table structure information of binlog processing is obtained through these historical DDLS, and then the table structure information is updated when the DDL binlog is processed.
Because there are many kinds of DDL and the syntax is complex, it is impossible to complete a complete DDL processing module in a short time. Therefore, tidb Lite is used to build tidb of mocktikv mode into the program, execute DDL to the tidb, and then re obtain the table structure information.
- Fast recovery: merge drops the intermediate state, which not only reduces unnecessary playback operations, but also realizes row level concurrency.
- Save disk space: the test results show that our binlog compression rate can reach about 30%.
- High degree of completion: the program can run smoothly and has been demonstrated on site.
- Table level recovery: since the backup set is stored according to the table, it can flexibly restore a single table according to the demand at any time.
- High compatibility: the compatibility of components has been considered in the early stage of scheme design, and the Pitr tool can be compatible with most of the ecological tools of tidb.
Hackathon competition time is only two days, time is tight, the task is heavy, we have achieved the above functions, there are some functions that we did not have time to achieve.
The combination of increment and total quantity
< center > Figure 7 scheme outlook
The incremental backup set is logically some insert + update + delete statements.
The full backup set is a create schema + insert statement generated by mydumper.
We can put the insert statement in the incremental backup to the full backup set, and the full backup set is quickly imported into the downstream tikv cluster with lightning tool. The recovery speed of lightning is 5-10 times faster than that of logical recovery. In addition, a lighter incremental backup set (update + delete) can directly realize the Pitr function.
Pirt tool is actually a binlog merge process. In order to ensure data consistency during a period of binlog processing, in theory, if DDL changes, the merge process should take the initiative to break down, generate a backup set, and then continue to merge from this breakpoint. Therefore, two backup sets will be generated, affecting the compression ratio of binlog.
In order to speed up the recovery, we can do some preprocessing for DDL. For example, if a drop table operation of a table is found in a binlog, the drop table can be put forward. At the beginning of the program, the binlog of this table is ignored. Through these “pre” or “post” preprocessing, the efficiency of backup and recovery can be improved.
< center > figure 8 DDL preprocessing
With the help of Li Kun, we set up the better team, including Huang Xiao, Gao Haitao, I, and Wang Xiang of pingcap. Thank you for taking me to fly with me and finally won the best contribution award. The process of the competition is very exciting (almost rollover), and the code is only activated at the end of the competition. It is strongly recommended that students who participate in Hackathon in the future must seize the time to complete the works as soon as possible. In just two days, we learned a lot and gained a lot. We have seen a lot of excellent players and cool works. We still have a long way to go. I hope this project can continue to be maintained. We look forward to seeing more excellent teams and works in Hackathon next year.