If you want to understand the backup principle and common problem analysis of xtrabackup, just read this article

Time:2022-5-8

Summary:This article is from Huawei cloud MySQL R & D team. It mainly shares the backup process of MySQL backup tool xtrabackup, the optimization and improvement made by Huawei cloud database team, as well as the problems and solutions that may be encountered in use.

This article is shared from Huawei cloud communityHuawei cloud takes you to explore the backup principle and common problem analysis of xtrabackup》, author: gaussdb database.

This article is from Huawei cloud MySQL R & D team. It mainly shares the backup process of MySQL backup tool xtrabackup, the optimization and improvement made by Huawei cloud database team, as well as the problems and solutions that may be encountered in use. The content discussed in this article is mainly aimed at Huawei cloud RDS for MySQL and the user built community MySQL database. I hope it will help you understand and use xtrabackup, and be more calm in the face of xtrabackup problems in the future.

1、 Introduction to xtrabackup

Xtrabackup is an open source backup tool developed by percona team for physical hot backup of MySQL database,It has the characteristics of fast backup speed, supporting backup data compression, automatic verification of backup data, supporting streaming output, and almost no impact on business in the backup process, is a MySQL backup tool commonly used by various cloud manufacturers.

Currently, there are two versions of xtrabackup: xtrabackup 2.4 X and 8.0 x. Used to back up MySQL 5.0 respectively X and MySQL 8.0 X version. Next, we will introduce how xtrabackup backs up MySQL Community Edition and the backup principle of xtrabackup on Huawei cloud

2、 Xtrabackup backup of community MySQL

Xtrabackup is designed for percona MySQL and supports the backup of the official community version of MySQL. The process is shown in the following figure:
If you want to understand the backup principle and common problem analysis of xtrabackup, just read this article

Figure 1: xtrabackup backup official MySQL process

  1. Compatibility check:Xtrabackup community version only supports MyISAM, InnoDB, CSV and MRG_ MyISAM four storage engine tables, other storage engine tables will not be backed up; In this step, by querying tables, if it is found that the storage engine of the table is not one of the above four engines, a warning will be printed, indicating that xtrabackup will not back up the table.
  2. Start the redo background backup thread:Start the redo background backup thread and back up all incremental redo logs from the last checkpoint LSN of the backup instance until the end of the backup task.
  3. Load all InnoDB tablespaces:Open and scan the data files of all InnoDB tables, check the first page of all table spaces, and initialize the memory structure of all tables.
  4. Backup InnoDB table:Traverse the memory structure of the table constructed in step 3, back up the data file of each InnoDB table, and check whether the data of each page is correct during the backup process.
  5. Apply backup lock flush tables with read lock (ftwrl):Ftwrl lock is a MySQL instance level read lock. The locking process is complex, and after locking, all update operations and DDL of all tables will be blocked.
  6. Back up non InnoDB tables:Because we have locked the instance in step 5, it is safe to back up the non InnoDB table at this time, and there must be no write business at this time.
  7. Record the current gtid information of binlog:Note that we still hold the global read lock at this time. This step is mainly to facilitate us to use this backup set to quickly create a standby machine.
  8. Stop the redo backup thread.
  9. Release the lock resource and the backup ends.

It should be noted that xtrabackup 2.4 X and 8.0 X is different in steps 7 and 8. The difference is MySQL 8.0 The reasons for X are described below.

3、 Huawei cloud RDS for MySQL backup

When backing up a community MySQL instance, xtrabackup will add a global read lock (ftwrl) to the instance, which has a great impact on the business of the database. In serious cases, it will even cause the database to “hang”, which is unacceptable to customers. Therefore, Huawei cloud MySQL team has optimized this process, which mainly includes two points:

  1. For MySQL 5 X and 0 X added backup lock: lock tables for backup
  2. For MySQL 5 X added binlog lock: lock binlog for backup

After optimization, the backup process of MySQL by Huawei cloud xtrabackup is as follows:
If you want to understand the backup principle and common problem analysis of xtrabackup, just read this article

Figure 2 process diagram of xtrabackup backing up Huawei cloud MySQL

Compared with ftwrl lock,The backup lock lock tables for backup has little impact on the customer instance. The locking process is simple, and the DML operation of InnoDB table is not affected during lockingHowever, all update operations and DDL operations of non InnoDB tables are still not allowed.

After backing up all table files, xtrabackup needs to obtain binlog gtid information.

• for MySQL 5.0 X version, xtrabackup 2.4 The binary log will be locked and the binary log information will be obtained.
• for MySQL 8.0 X version, Huawei cloud xtrabackup 8.0 X follows the official consistent backup point query method. Xtrabackup query log_ Status, the MySQL server will add lightweight locks to redo log, binlog, etc. to obtain a consistent backup point. This process is very short and has little impact on the operation of the instance. MySQL 8.0. The backup consistency point of X will tell us the consistent redo log LSN and the gtid of binlog; After querying the backup consistency point, xtrabackup will back up the last binlog file to determine whether the arbitration transaction needs to be rolled back during recovery; Finally, the redo log backup thread task will stop when the read redo log LSN is greater than the queried redo log LSN of the backup consistency point.

Because xtrabackup 2.4 X and 8.0 X has differences in processing binlog and recovery process, which will be described in detail in subsequent articles.

4、 Common problems and Solutions

Huawei cloud has used xtrabackup to provide backup services for almost all MySQL instances of the company. In the process of use, we actively keep in touch with the community, report some problems in the process of use to percona community, and help xtrabackup evolve in a better direction. In addition, for some fatal problems found, if the community fails to repair them in time, Huawei cloud database team will repair them in time to ensure the correctness of backup data.

The following is a summary of the problems we may encounter in each stage of the backup process using xtrabackup, and an analysis of their causes and corresponding solutions,

1. Compatibility check stage

  • Problem phenomenon:After xtrabackup is started, it immediately “hangs” for a long time. After checking the log, it is found that the redo log backup thread has not been started.

reason:Failed to acquire MDL lock during xtrabackup compatibility check. Xtrabackup compatibility is checked by querying imformation_ schema. Tables the plug-in table implementation:

“SELECT CONCAT(table_schema, ‘/’, table_name), engine FROM information_schema.tables WHERE engine NOT IN (‘MyISAM’, ‘InnoDB’, ‘CSV’, ‘MRG_MYISAM’) AND table_schema NOT IN (‘performance_schema’, ‘information_schema’, ‘mysql’)”

When querying each table, you need to obtain the MDL lock of the corresponding table. If there are long-term DML or DDL statements in the MySQL instance, or more serious MDL deadlock occurs, the above query will be blocked in the MDL lock waiting stage. At this time, xtrabackup will “hang” for a long time.

terms of settlement:If the reason for waiting for the lock is only due to the blockage of other SQL statements, wait for the execution of other SQL statements to complete; If a deadlock occurs, analyze the cause of the deadlock and release the deadlock; Huawei cloud RDS for MySQL provides MDL lock view function, which can help users analyze MDL deadlock of business.

2. Redo log backup phase

  • Problem phenomenon 1:Redo log rollback, backup failed, xtrabackup reports the following error message:
    “xtrabackup: error:it looks like InnoDB log has wrapped around before xtrabackup could process all records due to either log copying being too slow, or log files being too small.\n”);”

reason:During the backup process, if the host’s business load is very high, resulting in the fast redo log writing speed, the redo log backup thread of xtrabackup will backup faster than the redo log writing speed. Because the MySQL redo log file is written in the round robin way, the newly written log overwrites the logs written before but not backed up, so the backup fails.

terms of settlement:It is recommended to back up in the low peak period or increase the file size of redo log.

  • Problem phenomenon 2:The backup failed due to DDL operation. The error information is as follows:

“An optimized (without redo logging) DDLoperation has been performed. All modified pages may not have been flushed to the disk yet.

PXB will not be able take a consistent backup. Retry the backup operation”

reason:During the backup process, the MySQL instance has a DDL operation to create an index, because the index creation will not write redo. If you continue the backup, it will cause data inconsistency. Therefore, xtrabackup backup failure is the expected behavior in this scenario.

terms of settlement:Do not create an index during the backup process. If necessary, it is recommended to directly bring the index in the table creation statement, or use the lock DDL parameter for backup (block new DDL operations on the instance).

  • Problem phenomenon 3:Undo truncate causes the backup to fail. The xtrabackup error message is as follows:

“An undo ddl truncation (could be automatic) operation has been performed.”

reason:During xtrabackup backup, if an undo truncate occurs in the MySQL instance, the undo log written to the new undo file (with different space IDS) may be lost, resulting in problems with the recovered data. The official repaired this problem in xtrabackup version 8.0.14 (based on MySQL 8.0.21). The repair method is redo backup thread. When parsing redo log, if it is found that the operation is the truncate operation of undo log, the backup will fail. Unfortunately, this repair does not completely solve the problem. In the following two scenarios, the recovered data of the community version of xtrabackup may still be inconsistent:

  1. MySQL version is lower than MySQL 8.0.21;
  2. During the backup process, the user creates a new undo tablespace.

terms of settlement:Turn off the truncate operation of undo tablespace during backup and prohibit users from creating undo tablespace, which can effectively prevent inconsistent recovery of backup data; In addition, Huawei cloud xtrabackup has further repaired this problem, which can effectively prevent this kind of phenomenon.

3. Load tablespace phase

  • Problem phenomenon 1:Xtrabackup reports an error: too many open files

reason:The operating system allows a limited number of files to be opened at the same time. Xtrabackup will open all table files at the same time in the load tablespace stage. If the number of tables opened by xtrabackup exceeds this limit, the backup will fail.

terms of settlement:Increase the operating system to allow the configuration of the maximum number of files to be opened at the same time, or use the lock DDL parameter (block new DDL operations on the instance).

  • Problem phenomenon 2:Rename table causes the backup to fail. The error message is as follows:

“Trying to add tablespace ‘xxxx’ with id xxx to the tablespace memory cache, but tablespace xxxx already exists in the cache!;”

reason:The whole process of opening the tablespace in xtrabackup is unlocked. If a rename table occurs, there is a probability that the same tablespace will be loaded repeatedly. At this time, xtrabackup will detect the duplicate tablespace ID, so the backup fails.

terms of settlement:Generally speaking, loading a table space is a fast operation, and renaming a table is not a very frequent operation. In this case, you can try again (percona xtrabackup 2.4. X only supports single thread loading of table space, while Huawei cloud xtrabackup supports multi thread loading of table space).

4. Backup InnoDB table stage

  • Problem phenomenon:InnoDB table data file is damaged and backup failed. The error information is as follows:

“xtrabackup: Database page corruption detected at page xxxx, retrying.”

reason:When backing up InnoDB table data files, xtrabackup will check the checksum of each page. If the checksum is found to be wrong, the backup fails. At this time, it indicates that the data of MySQL instance has been damaged (such as disk silence error).

terms of settlement:You need to restore the previous backup data or other methods to repair the data before the backup can succeed. In subsequent articles, we will also introduce the data repair methods in detail.

5、 Conclusion

This paper mainly compares and introduces the backup principle of xtrabackup, the backup community version of MySQL and the improvement of Huawei cloud, and shares the troubleshooting and solutions of common problems of xtrabackup. In the future, we will also bring you more in-depth analysis and more practical use skills, hoping to be helpful for everyone to understand and use xtrabackup. We will also continue to provide customers with better database services and guard their data security at all times.

Finally, let me tell you the good news. Since the annual package of MySQL is 19.9 yuan, it helps enterprises go to the cloud without worry,Welcome to experience

Click follow to learn about Huawei’s new cloud technology for the first time~