Repair scheme for inconsistency between active and standby MySQL gtids

Time:2021-10-19

Scheme 1: rebuild replicas

MySQL version 5.6 and above introduces new global transaction ID (gtid) support in replication. When performing MySQL and MySQL 5.7 backups with gtid mode enabled, percona xtrabackup will automatically store the gtid value in xtrabackup_ binlog_ Info. This information can be used to create new (or repair damaged) gtid based copies.

prerequisite

Percona xtrabackup needs to be installed on the MySQL machine

advantage

It is relatively safe and easy to operate

shortcoming

  • When the amount of data is large, the backup takes a long time
  • When the database has read-write separation, the read request undertaken by slave needs to be transferred to the master

Operation steps

Master

Use the xtrabackup tool on the master to back up the current database. The user executing this command needs to have the permission to read the MySQL data directory


innobackupex --default-file=/etc/my.cnf --user=root -H 127.0.0.1 --password=[PASSWORD] /tmp

Copy the backup file to the slave machine

Slave

Execute this command on the slave machine to prepare the backup files


innobackupex --default-file=/etc/my.cnf --user=root -H 127.0.0.1 --password=[PASSWORD] --apply-log /tmp/[TIMESTAMP]

Back up and delete the slave data directory


systemctl stop mysqld
mv /data/mysql{,.bak}

Copy the backup to the target directory, give corresponding permissions, and then restart slave


innobackupex --default-file=/etc/my.cnf --user=root -H 127.0.0.1 --password=[PASSWORD] --copy-back /tmp/[TIMESTAMP]
chmod 750 /data/mysql
chown mysql.mysql -R /data/mysql
systemctl start mysqld

View the gtid of the last backup that has been performed, as shown in the following example


$ cat /tmp/[TIMESTAMP]/xtrabackup_binlog_info
mysql-bin.000002  1232    c777888a-b6df-11e2-a604-080027635ef5:1-4

The gtid will also be printed after the innobackupex backup is completed


innobackupex: MySQL binlog position: filename 'mysql-bin.000002', position 1232, GTID of the last change 'c777888a-b6df-11e2-a604-080027635ef5:1-4'

Log in to MySQL with root and configure as follows


NewSlave > RESET MASTER;
NewSlave > SET GLOBAL gtid_purged='c777888a-b6df-11e2-a604-080027635ef5:1-4';
NewSlave > CHANGE MASTER TO
       MASTER_HOST="$masterip",
       MASTER_USER="repl",
       MASTER_PASSWORD="$slavepass",
       MASTER_AUTO_POSITION = 1;
NewSlave > START SLAVE;

Check whether the copy status of the slave is normal


NewSlave > SHOW SLAVE STATUS\G
     [..]
     Slave_IO_Running: Yes
     Slave_SQL_Running: Yes
     [...]
     Retrieved_Gtid_Set: c777888a-b6df-11e2-a604-080027635ef5:5
     Executed_Gtid_Set: c777888a-b6df-11e2-a604-080027635ef5:1-5

We can see that the replica has retrieved a new transaction with number 5, so transactions from 1 to 5 have been on this replica. In this way, we have completed the construction of a new replica.

Scheme 2: use percona toolkit for data repair

PT toolkit contains two tools, Pt table checksum and Pt table sync, which are mainly used to detect whether the master and slave are consistent and repair data inconsistencies.

prerequisite

The percona toolkit tool needs to be installed on the MySQL machine

advantage

The repair speed is fast and there is no need to stop from the library

shortcoming

The operation is complex, and the database shall be backed up before operation
The table to be repaired needs to have a unique constraint

Operation steps

Background example

IP relation correspondence


| IP | Role |
| ---- | ---- |
| 192.168.100.132 | Master |
| 192.168.100.131 | Slave |

Suppose the table structure to be restored is as follows


mysql> show create table test.t;
+-------+-------------------------------------
| Table | Create Table                                                                 |
+-------+-------------------------------------
| t   | CREATE TABLE `t` (
 `id` int(11) NOT NULL,
 `content` varchar(20) DEFAULT NULL,
 PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+-------------------------------------

When the active and standby are consistent, the data of master and slave are as follows


mysql> select * from test.t;
+----+---------+
| id | content |
+----+---------+
| 1 | a    |
| 2 | b    |
+----+---------+
2 rows in set (0.00 sec)

In extreme cases, in case of the following inconsistency between active and standby, the situation is as follows:

  1. The master adds a new record with ID 3, as shown below, but it is not synchronized to the slave. At the same time, it automatically fails to the slave.
  2. Old slave has served as a new master for a period of time, and new records are added to the table.

After restarting the old master, the data of the old master is as follows:


old_master> select * from test.t;
+----+---------+
| id | content |
+----+---------+
| 1 | a    |
| 2 | b    |
| 3 | c    |
+----+---------+
3 rows in set (0.00 sec)

The data of new master is as follows:


new_master> select * from test.t;
+----+---------+
| id | content |
+----+---------+
| 1 | a    |
| 2 | b    |
| 3 | cc   |
| 4 | dd   |
+----+---------+
4 rows in set (0.00 sec)

At this time, if the old master is configured as the slave of the new master, an error will be reported, such as the following error


...Last_IO_Error: binary log: 'Slave has more GTIDs than the master has, using the master's SERVER_UUID.

You can see that the gtid of the old master has reached 255


Executed_Gtid_Set: 5b750c75-86c2-11eb-af71-000c2973a2d5:1-10,
60d082ee-86c2-11eb-a9df-000c2988edab:1-255

The gtid of the new master is only 254


mysql> show master status\G
*************************** 1. row ***************************
       File: mysql-bin.000001
     Position: 4062
   Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 5b750c75-86c2-11eb-af71-000c2973a2d5:1-2,
60d082ee-86c2-11eb-a9df-000c2988edab:1-254
1 row in set (0.00 sec)

At this time, we configure the old master to skip errors and restore the old master to a state that can be copied from the new master normally


old_master> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)

old_master> set gtid_next='60d082ee-86c2-11eb-a9df-000c2988edab:254'; --Specify the version of the next transaction,the GTID you want to skip
Query OK, 0 rows affected (0.00 sec)

old_master> begin;
Query OK, 0 rows affected (0.00 sec)

old_master> commit;                          -- Inject an empty transaction
Query OK, 0 rows affected (0.00 sec)

old_master> set gtid_next='AUTOMATIC';  -- Restore to automaic GTID
Query OK, 0 rows affected (0.00 sec)

old_master> start slave;
Query OK, 0 rows affected (0.13 sec)

Then we can see that the replication is normal on the old master


mysql> show slave status\G
      ...
       Slave_IO_Running: Yes
      Slave_SQL_Running: Yes
      ...
      Executed_Gtid_Set: 5b750c75-86c2-11eb-af71-000c2973a2d5:1-10,
60d082ee-86c2-11eb-a9df-000c2988edab:1-255
        Auto_Position: 1
     Replicate_Rewrite_DB:
         Channel_Name:
      Master_TLS_Version:

Finally, we clear the slave on the new master_ master_ info


new_master> reset slave all for channel '';
Query OK, 0 rows affected (0.00 sec)

new_master> show slave status\G;
Empty set (0.01 sec)

Check consistency

Next, we need to verify the master-slave consistency, execute Pt table checksum on the new master, the rows is 4, and there is a diffs


[[email protected] ~]# pt-table-checksum h='127.0.0.1',u='mha',p='[PASSWORD]',P=3306 --no-check-binlog-format --databases test
Checking if all tables can be checksummed ...
Starting checksum ...
      TS ERRORS DIFFS   ROWS DIFF_ROWS CHUNKS SKIPPED  TIME TABLE
03-29T19:24:18   0   1    4     1    1    0  0.322 test.t

Two way synchronization (the synchronization operation will modify the data, and the data backup will be performed before the operation)

During synchronization, Pt table sync will modify data on the master. The parameters of Pt table sync are as follows:

pt-table-sync --databases test --bidirectional --conflict-column='*' --conflict-comparison 'newest' h='192.168.100.132',u='mha',p='[PASSWORD]',P=3306 h='192.168.100.131' --print
--Database specifies the database to be executed
--Bidirectional is bidirectional synchronization
--Conflict column compares the column when a conflict occurs
--Conflict comparison strategy
--Print output comparison results
--Dry run test run
--Execute execute test

#The DSN on the left is slave
#The DSN on the right is master

Here, we specify – conflict name =’content ‘as the comparison column, and generally use the business primary key as the column. You can see that the statement to be executed is printed out


[[email protected] ~]# pt-table-sync --databases test --bidirectional --conflict-column='content' --conflict-comparison 'newest' h='192.168.100.132',u='mha',p='[PASSWORD]',P=3306 h='192.168.100.131' --print
/*192.168.100.132:3306*/ UPDATE `test`.`t` SET `content`='cc' WHERE `id`='3' LIMIT 1;
/*192.168.100.132:3306*/ INSERT INTO `test`.`t`(`id`, `content`) VALUES ('4', 'dd');

Next, execute the statement


[[email protected] ~]# pt-table-sync --databases test --bidirectional --conflict-column='content' --conflict-comparison 'newest' h='192.168.100.132',u='mha',p='[PASSWORD]',P=3306 h='192.168.100.131' --execute

Then perform data comparison on the master again, and you can see that the data is normal


[[email protected] ~]# pt-table-checksum h='127.0.0.1',u='mha',p='[PASSWORD]',P=3306 --no-check-binlog-format --databases test
Checking if all tables can be checksummed ...
Starting checksum ...
      TS ERRORS DIFFS   ROWS DIFF_ROWS CHUNKS SKIPPED  TIME TABLE
03-30T12:09:57   0   0    4     0    1    0  0.330 test.t

The above is the details of the repair scheme for the inconsistency between the active and standby MySQL gtid. For more information about the repair of the inconsistency between the active and standby MySQL gtid, please pay attention to other developeppaer articles!