[MySQL] explain MySQL backup strategy in detail

Time:2021-7-12

1 about backup

1.1 why backup

  • disaster recovery In the process of running, the database will eventually encounter a variety of problems: hardware failure, data corruption caused by bugs, database unavailability caused by server downtime or other reasons. In addition, there are human operations:DELETEForget to add conditionsALTER TABLEExecution error tableDROP TABLEEven if you haven’t met these problems, according to Murphy’s law, there will always be times.
  • RollBACK Because a certain bug or system is hacked, causing a lot of losses, this time you need to roll back to a certain state. Common blockchain transactions are hacked and rolled back, and game vulnerabilities are exploited and rolled back as a whole.
  • auditSometimes there is a need to know the data at a certain time point, which may be the year-end audit or the lawsuit.
  • testA basic test requirement is to pull online data to the test environment regularly. If there is a backup, it is very convenient to pull data.

1.2 what are the backup methods

1.2.1 logical backup

Logical backup is the most common way, which is often used when the amount of data is relatively small.

Advantages of logical backup:

  • Backup recovery is relatively simple, such asmysqldumpIt’s the backup work of MySQL, and no additional installation is required. It can be used directly during recoverymysqlCommand to recover.
  • Backup and recovery can be done remotely, that is, backup commands can be executed on other machines.
  • The backup data is very intuitive and can be used after backupsed grepAnd other tools for data extraction or modification.
  • It has nothing to do with the storage engine, because the backup file is directly extracted from mysql, so intuitively, the backup data does not distinguish between engines, and can be easily extracted from mysqlMyISAMEngine changed toInnoDBEngine.
  • Avoid the impact of file damage, if you copy the original file directly, you may be affected by a file damage and get a damaged backup. Using logical backup, as long as MySQL can execute the select statement, you can get a reliable logical backup, which is very useful when the file is damaged.

Disadvantages of logical backup

  • Because MySQL service must be used for data operation, more CPU will be used for backup, and the backup time may be longer.
  • Logical backup is larger than database file in some scenarios, and the data stored in text is not always more efficient than storage engine. Of course, using compression will get a smaller backup, but it will take up CPU resources( If there are more indexes, the logical backup will be smaller than the physical backup.)
  • The recovery time is longer, and the data recovery using logical backup needs to occupy more resources for lock allocation, index construction, conflict checking and log refresh.

Common methods of logical backup:

  • mysqldumpyesMySQLThe built-in backup tool is versatile and very common. Some parameters are usually added to the usage, which will be described later.
  • select into outfileTo create a logical backup by dividing the data with symbolsCSVAnd other forms will be more practical.
  • mydumperMulti thread backup is allowed. The backup file will separate the table structure and data, which will be very effective when recovering some tables or data.

1.2.2 physical backup

Physical backup is very common when there is a large amount of data.

Advantages of physical backup:

  • The backup speed is fast, because the physical backup is based on the replication, which means that the backup can be as fast as the replication.
  • The recovery speed is fast. You only need to copy the file to the database directory to complete the recovery without checking the lock and building the index.
  • The recovery is simple. For the table of mysiam engine, you don’t need to stop the database, you just need to copy it into the data directory. For InnoDB, if there is a table space for each table, it can also operate without stopping the library. It can be imported by unloading and loading the table space (not very safe).

Disadvantages of physical backup:

  • There is no official physical hot backup tool. Without the support of official tools, it means that there is a high probability of problems, so we should be careful when using them
  • InnoDB’s original files are usually larger than logical backups. InnoDB table space often contains a lot of unused space. InnoDB table will not release space after deleting data, so even if the amount of data is small, the file may be large. In addition, in addition to data, the file also contains index, log and other information.
  • Physical backup can not always be cross platform and cross version. MySQL files are closely related to the operating system and MySQL version. If the environment is inconsistent with the original, problems may occur.

Common methods of physical backup:

  • xtrabackupIs the most commonly used physical backup toolperconaIt is open source and can realize non blocking backup of InnoDB storage engine and xtradb storage engine (for MyISAM, it still needs to be locked) to get a consistent backup.
  • Direct copy file / file system snapshotIn this way, forMySIAMThe engine is very efficient and only needs to executeFLUSH TABLE WITH READ LOCKYou can copy a backup file. But forInnoDBThe engine is more difficult becauseInnoDBThe engine uses a lot of asynchronous technology, even if it is executedFLUSH TABLE WITH READ LOCKIt will continue to merge logs and cache data. So we need to back it up in this wayInnoDB, need to ensurecheckpointIt’s up to date.

1.2 why backup binlog

If a DBA tells you that the database can be restored to any state within two months, it means that the binlog log of the database has been kept for at least two months. Benefits of backup binlog:

  • Recovery based on any point in time can be realized
  • Can be used for misoperation data flashback
  • It can be used for auditing

When you want to do data recovery, you will be very glad to do itbinlogBackup. Of course, usingbinlogThe premise of data recovery isbinlogFormat torowDon’t worry about space. The most important resource is hard disk space. aboutbinlogOur recommended configuration is

#Record the change of each line of data
binlog_format = row
#When redoing data, the standby database records a binlog
log_slave_updates = 1

1.3 replication and backup

Master slave replication is equal to one more data copy, but replication is not equal to backup, nor can it replace backup. Suppose that thedrop tableOperation, will immediately synchronize to the standby database, and perform the same operation, there is no way to use the standby database for data recovery when there is an accident.

Delayed replication can not replace backup, but it can speed up the recovery, which is a very useful strategy.

In actual use, in order not to affect the use of the main library, we often backup in the standby library, and record the synchronization point, so as to facilitate the construction of a new standby library. In standby database backup, it should be noted that master-slave replication can not guarantee the data consistency between master and standby. In fact, replication basedMySQLCluster does not guarantee the internal consistency of the cluster, and there is no very good way at present. The common way is to use clusterpt-table-checksumCheck the consistency.

2. Full backup

Full backup introduces the most common logical backup toolsmysqldumpAnd physical backup toolsxtrabackup。 If rightmysqldumpNot very satisfied, it can be usedmydumperInsteadmysqldump

2.1 mysqldump

mysqldumpIs the most used tool, but to use it well, you need to add some additional parameters.mysqldumpThere are many available parameters, which are not expanded here. It is recommended to visit the official website directlymysqldump。 usemysqldumpSome parameters requireselect,reload,lock tablesjurisdiction.

2.1.1 common examples

2.1.1.1 full database backup of InnoDB
mysqldump --opt --single-transaction --master-data=2 --default-character-set=utf8 -h<host> -u<user> -p<password> -A > backup.sql
  • --optIf this parameter indicates that the quick, add drop table, add locks, extended insert and lock tables parameters of the mysqldump command are activated at the same time, it can give a quick dump operation and generate a dump file that can be loaded into the MySQL server quickly. When backing up a large table, this parameter can prevent it from taking up too much memory
  • --single-transactionSet the isolation level of transaction to repeatable read, and then start the transaction during backup, so that all the same queries in a transaction can read the same data. Note that this parameter is only valid for engines that support transactions, if anyMyISAMThe consistency of data cannot be guaranteed
  • -AExport all databases
  • –-default-character-set=charsetSpecifies which character set to use when exporting data
  • --master-data=2Represents the number of records of the primary library during the backupbinlogandposClick, and comment out this line in the dump file, which will be used when using the backup file as a new backup library
2.1.1.2 full database backup of MyISAM
mysqldump --opt --lock-all-tables --master-data=2 --default-character-set=utf8 -h<host> -u<user> -p<password> -A > backup.sql
  • --lock-all-tablesLock table backup. becauseMyISAMIf you want to get a consistent backup, you can only lock the whole table.
2.1.1.3 compression on backup tape
mysqldump -h<host> -u<user> -p<password> -A | gzip >> backup.sql.gz
2.1.1.4 backup multiple libraries
mysqldump -h<host> -u<user> -p<password> --databases <dbname1> <dbname2> > backup.sql

2.1.2 recovery

The recovery method is relatively simple, just execute the SQL statement directly

mysql -h<host> -u<user> -p<password> < backup.sql

2.1.3 execution process of mysqldump

opengeneral_logCan viewmysqldumpThe implementation process of the--single-transaction --opt -AParameter as an example

FLUSH /*!40101 LOCAL */ TABLES
FLUSH TABLES WITH READ LOCK
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
START TRANSACTION
SHOW VARIABLES LIKE 'gtid\_mode'
SHOW MASTER STATUS
UNLOCK TABLES
...
SHOW CREATE DATABASE IF NOT EXISTS `employees`
SAVEPOINT sp
...
SELECT /*!40001 SQL_NO_CACHE */ * FROM `departments`
....

2.2 xtrabackup

2.2.1 installation method

Refer to the official website for more installation methodsxtrabackup

Here we userpmHow to install

yum install http://www.percona.com/downloads/percona-release/redhat/0.1-6/percona-release-0.1-6.noarch.rpm
yum update percona-release
#Qpress is used for compression and decompression
yum install percona-xtrabackup-24 qpress

2.2.2 usage

2.2.2.1 add backup account and authorize
CREATE USER 'backup'@'localhost' IDENTIFIED BY 'backup';
GRANT PROCESS,RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'backup'@'localhost';
FLUSH PRIVILEGES;
2.2.2.2 full standby
innobackupex --defaults-file=/etc/my.cnf --user=<user>  -- password=<pwd>  < Which directory do you want to back up to  -- no-timestamp --compress --compress-threads=4 --parallel=4
  • --no-timestampCreate a folder without using the current time. By default, the folder is created in the backup directory at the current time
  • --compresscompress
  • --compress-threads=NCompress thread
  • --parallel=NBackup thread
2.2.2.3 recovery
#Step 1: unzip
Innobackupex -- decompress < directory of backup file > -- parallel = 4

#Step 2: application log
Innobackupex -- apply log < directory of backup file > -- parallel = 4

#Step 3: copy the backup file to the data directory
Innobackupex -- dataDir = < MySQL data Directory > -- copy back < backup file directory > -- parallel = 4

3. Incremental backup

When the data becomes huge, a common strategy is to do regular incremental backup. For example: do a full backup on Monday, and then do an incremental backup from Tuesday to Sunday.

Incremental backup only contains changed data sets, which is generally not larger than the original data, so it can reduce the server overhead, backup time and backup space.

Of course, there are risks in using incremental backup. Every iteration of incremental backup is based on the last backup, which means that as long as one of the backups has a problem, it may cause all the backups to be unavailable.

Here are some incremental backup methods:

3.1 using xtrabackup for incremental backup

Xtrabackup allows incremental backup. The command is as follows:

innobackupex --defaults-file=/etc/my.cnf --user=<user>  -- password=<pwd>  -- No timestamp -- compress -- incremental -- incremental basedir = < directory of full backup >  < What directory do you want to incremental backup to

Recovery:

#Step 1: decompress the full backup
Innobackupex -- decompress

#Step 2: apply log to full backup
Innobackupex -- apply log -- redo only < directory of full backup file >

#Step 3: decompress the incremental backup
Innobackupex -- decompress

#Step 4: merge incremental data
Innobackupex -- apply log -- redo only -- incremental < directory of full backup file > -- incremental dir = < directory of incremental File >

#Step 5: apply the log to the merged data
Innobackupex -- apply log

#Step 6: copy the backup file to the data directory
Innobackupex -- dataDir = < MySQL data Directory > -- copy back < directory of full backup file >

3.2 using binlog for incremental backup

usebinlogIt’s easy to do incremental backup. It’s easy to do incremental backupFLUSH LOGSRotate the log, then put the old onebinlogJust copy it to the backup directory.

Use when recoveringMysqlbinlog -- start position = < POS point of backup set > binlog log | MySQL - U < user > - PThat’s it

4. Delay synchronization

Delayed synchronization is a common use mode of master-slave replication. When encountering misoperation, it is very useful whether it is used to recover data or to skip the error by skipping.

For example, in the main librarydropFind the binlog log and POS location of the command in the main library, then stop the synchronization in the delay library, and use thestart slave until master_ log_ File = '< corresponding file >', master_ log_ POS = < POS of the SQL before the misoperation command >;Wait for synchronization to this location, execute the command to skip a SQL, and then start synchronization.

Common delayed synchronous replication modes are as follows:

One master with three followers

[MySQL] explain MySQL backup strategy in detail

Sometimes, in order to reduce the pressure on the primary database, the delayed database is placed behind the standby node

[MySQL] explain MySQL backup strategy in detail

The delayed synchronization mode is as follows:

stop slave;
CHANGE MASTER TO MASTER_ Delay = N seconds;
start slave;

5. Data verification

Besides backup, it is very important to verify the availability of backup data. Imagine, when you need to carry out data recovery, suddenly found that the past backup data are invalid, how hard it must be. After a lot of friends have written a backup script and added it to a scheduled task, they will no longer pay attention to it as long as they check that the scheduled task has been executed and that there are files in the backup directory. They often find that there is something wrong with the backup data when they need to use the backup files.

At present, there is no very convenient way to check the data of backup files. Most of them are pulling the backup files out regularly to do backup recovery drills. For example, doing a backup recovery drill once a month can effectively improve the availability of backup files, and they are also down-to-earth.

In the data verification part, if it is a logical backup, it often checks the data of a table to see if it meets the expectation. If it is a physical backup, first use themysqlcheckWait for the command to check if there is any table damage, and then check the table data if there is no damage.

6. Summary

  1. Logical backup and physical backup can be used together. Different backup cycles use different tools. The full backup cycle should not be too long, at least once a week
  2. If the amount of data is large, incremental backup can be used to reduce the amount of data. It should be noted that the risk of incremental backup is greater
  3. To enable the binlog function, set it torowMode, settinglog_slave_updates = 1And it is better to back up binlog regularly
  4. If conditions permit, you can add a delay library, which will work wonders in emergency recovery
  5. Data verification should be done regularly, otherwise when backup and recovery is needed, the backup file will be invalid, and it will be too late to regret