Mgr architecture scheme across data centers in MySQL financial application scenario (2)

Time:2022-5-25
  • The original content of greatsql community cannot be used without authorization. Please contact Xiaobian and indicate the source for reprint.

How to deploy multiple sets of Mgr clusters in multiple data centers and realize rapid failover.

The previous article introduced how to deploy multiple sets of Mgr clusters in multiple data centers and build replication channels between clusters. In this way, once the primary AZ is unavailable, you can switch to the Mgr cluster of the standby AZ after verifying the data, which is very convenient.

In this article, we continue to introduce how to use async replication auto failover to realize automatic failover.

1. What is async replication auto failover

When a cluster fails, the new feature of “over MySQL 8.0” can be implemented automatically from the main MySQL database, which is more convenient. The literal translation is “asynchronous replication automatic failover”, but in fact, it also supports semi synchronous replication scenarios.

You can use MySQL Server's new asynchronous connection failover mechanism to automatically establish an asynchronous (source to replica) replication connection to a new source after the existing connection from a replica to its source fails. The connection fails over if the replication I/O thread stops due to the source stopping or due to a network failure. The asynchronous connection failover mechanism can be used to keep a replica synchronized with multiple MySQL servers or groups of servers that share data. To activate asynchronous connection failover for a replication channel set SOURCE_CONNECTION_AUTO_FAILOVER=1 on the CHANGE MASTER TO statement for the channel, and set up a source list for the channel using the asynchronous_connection_failover_add_source and asynchronous_connection_failover_delete_source functions.

See the official document 17.4.9 switching sources with asynchronous connection failover for details

2. Mgr based database architecture scheme of two places and three centers

Under the architecture of two places and three centers, the following deployment scheme can be adopted

Mgr architecture scheme across data centers in MySQL financial application scenario (2)

In this architecture scheme, mgr-b can use asynchronous replication or enhanced semi synchronous replication channel to replicate data from mgr-a, which depends on the network conditions between the two AZS.

In the financial application scenario, this network condition can generally be guaranteed, so the enhanced synchronization mode is preferred.

The asynchronous replication mode will be adopted in the off-site City, while the asynchronous replication mode in the AZ will have a large delay rate.

In the above scheme, both mgr-b and C face a problem: how to quickly switch and realize automatic failover after the host instance pointed to by the replication source is unavailable.

In the past, switching can only be realized by third-party tools.

After MySQL 8.0.22 added the “async replication auto failover” feature, there was no such trouble.

Its working mechanism is to set multiple replication sources on a replication channel. It also supports setting different weights for multiple sources. When the primary replication source is found to be faulted and abnormally interrupted (it will try to reconnect several times first), it can automatically switch to the new replication source. When the original replication source is restored, it will be switched back if its weight is higher.

3. Configuring async replication auto failover

The deployment process is very simple, just a few commands.

3.1. Create a replication channel

In the normal way, create a replication channel on the slave instance (taking mgr-b as an example in this case)

[[email protected] mgrB-1][(none)]> CHANGE REPLICATION SOURCE TO
MASTER_HOST='172.16.16.10', MASTER_PORT=3306, 
MASTER_USER='repl', MASTER_PASSWORD='repl',
MASTER_AUTO_POSITION=1,
SOURCE_ CONNECTION_ AUTO_ Failover = 1, # here is the key, which means automatic failover is enabled
MASTER_ RETRY_ Count = 3, # up to 3 retries
MASTER_ CONNECT_ Retry = 10 # each retry interval is 10 seconds
FOR CHANNEL 'MGR-A';

#Briefly explain the following parameters
- SOURCE_ CONNECTION_ AUTO_ Failover = 1 # here is the key, indicating that automatic failover is enabled
- MASTER_ RETRY_ Count = 3 # means the maximum number of retries is 3. The default is 86400
- MASTER_ CONNECT_ Retry = 10 # means the interval between retries is 10 seconds. The default is 60 seconds

Confirm that the added replication channel is effective:

[[email protected] mgrB-1][(none)]> SELECT * FROM performance_schema.replication_applier_status\G
*************************** 1. row ***************************
              CHANNEL_NAME: mgr-a
             SERVICE_STATE: ON
           REMAINING_DELAY: NULL
COUNT_TRANSACTIONS_RETRIES: 0
*************************** 2. row ***************************
              CHANNEL_NAME: group_replication_applier
             SERVICE_STATE: ON
           REMAINING_DELAY: NULL
COUNT_TRANSACTIONS_RETRIES: 0

3.2. Add multiple replication sources to the replication channel

Next, add multiple replication sources to the replication channel (just call the UDF multiple times):

[[email protected] mgrB-1][(none)]> SELECT asynchronous_connection_failover_add_source('MGR-A','172.16.16.10',3306,null,60);
[[email protected] mgrB-1][(none)]> SELECT asynchronous_connection_failover_add_source('MGR-A','172.16.16.11',3306,null,60);
[[email protected] mgrB-1][(none)]> SELECT asynchronous_connection_failover_add_source('MGR-A','172.16.16.12',3306,null,60);

Briefly explain the following parameters

  • Mgr-a # refers to the replication channel with the same name as the above replication channel
  • 172.16.16.10 # indicates the IP address of the replication source
  • 3306 # indicates the port of the replication source
  • Null # indicates network_ Namespace, the future feature, can be emptied now
  • 60 # indicates the weight of the replication source. We have described the role of different weights above. The larger the value, the more chance you have to be the replication source

Confirm that multiple replication sources are effective:

[[email protected] mgrB-1][(none)]> SELECT * FROM performance_schema.replication_asynchronous_connection_failover;
+--------------+--------------+------+-------------------+--------+--------------+
| CHANNEL_NAME | HOST         | PORT | NETWORK_NAMESPACE | WEIGHT | MANAGED_NAME |
+--------------+--------------+------+-------------------+--------+--------------+
| mgr-a        | 172.16.16.10 | 3306 |                   |     60 |              |
| mgr-a        | 172.16.16.11 | 3306 |                   |     60 |              |
| mgr-a        | 172.16.16.12 | 3306 |                   |     60 |              |
+--------------+--------------+------+-------------------+--------+--------------+

Start the replication channel:

[[email protected] mgrB-1][(none)]> START REPLICA FOR CHANNEL 'MGR-A';

Confirm that the status of replication channel and Mgr are normal:

[[email protected] mgrB-1][(none)]> SELECT * FROM performance_schema.replication_connection_status\G
*************************** 1. row ***************************
                                      CHANNEL_NAME: mgr-a
                                        GROUP_NAME:
                                       SOURCE_UUID: b084f8a1-96a8-11eb-9a70-525400fb993a
                                         THREAD_ID: 3084
                                     SERVICE_STATE: ON
                         COUNT_RECEIVED_HEARTBEATS: 5974
                          LAST_HEARTBEAT_TIMESTAMP: 2021-05-29 18:53:13.879720
                          RECEIVED_TRANSACTION_SET: 476c0276-be03-11eb-bd34-525400e802e2:21-31:1000016-1000017
                                 LAST_ERROR_NUMBER: 0
                                LAST_ERROR_MESSAGE:
                              LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                           LAST_QUEUED_TRANSACTION: 476c0276-be03-11eb-bd34-525400e802e2:31
 LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_QUEUED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2021-05-27 17:19:43.201000
     LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP: 2021-05-27 17:19:43.203315
       LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP: 2021-05-27 17:19:43.203349
                              QUEUEING_TRANSACTION:
    QUEUEING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
   QUEUEING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        QUEUEING_TRANSACTION_START_QUEUE_TIMESTAMP: 0000-00-00 00:00:00.000000
*************************** 2. row ***************************
                                      CHANNEL_NAME: group_replication_applier
                                        GROUP_NAME: f195537d-19ac-11eb-b29f-5254002eb6d6
                                       SOURCE_UUID: f195537d-19ac-11eb-b29f-5254002eb6d6
                                         THREAD_ID: NULL
                                     SERVICE_STATE: ON
                         COUNT_RECEIVED_HEARTBEATS: 0
                          LAST_HEARTBEAT_TIMESTAMP: 0000-00-00 00:00:00.000000
                          RECEIVED_TRANSACTION_SET: 476c0276-be03-11eb-bd34-525400e802e2:1-31:1000015-1000017,
f195537d-19ac-11eb-b29f-5254002eb6d6:1-18
                                 LAST_ERROR_NUMBER: 0
                                LAST_ERROR_MESSAGE:
                              LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
                           LAST_QUEUED_TRANSACTION: f195537d-19ac-11eb-b29f-5254002eb6d6:18
 LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_QUEUED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
     LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP: 2021-05-27 17:04:03.407281
       LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP: 2021-05-27 17:04:03.407317
                              QUEUEING_TRANSACTION:
    QUEUEING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
   QUEUEING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
        QUEUEING_TRANSACTION_START_QUEUE_TIMESTAMP: 0000-00-00 00:00:00.000000

Execute show replica status to view the status:

*************************** 1. row ***************************
             Replica_IO_State: Waiting for master to send event
                  Source_Host: 172.16.16.10
                  Source_User: repl
                  Source_Port: 3306
                Connect_Retry: 10
...
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
...
                  Source_UUID: 5499a6cb-91cb-11eb-966f-525400e802e2
...
           Source_Retry_Count: 3
...
           Retrieved_Gtid_Set: 476c0276-be03-11eb-bd34-525400e802e2:21-31:1000016-1000017
            Executed_Gtid_Set: 476c0276-be03-11eb-bd34-525400e802e2:1-31:1000015-1000017,
f195537d-19ac-11eb-b29f-5254002eb6d6:1-18
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: mgr-a
...               

First remember the source in the output above_ Host and source_ UUID and other information. The following simulates the scenario of automatically switching the replication source after the replication source server goes down.

4. Simulate the fault and confirm that it can be switched automatically

On the current replication source server, execute kill – 9 to kill the mysqld process, and then you can see the following logs from the server:

#Try reconnecting the old replication source server three times at an interval of 10 seconds
[ERROR] [MY-010584] [Repl] Slave I/O for channel 'mgr-a': error connecting to master '[email protected]:3306' - retry-time: 10 retries: 3 message: Can't connect to MySQL server on '172.16.16.10:3306' (111), Error_code: MY-002003

#Retry failed, stop the copy I / O thread
[Note] [MY-010563] [Repl] Slave I/O thread for channel 'mgr-a' killed while connecting to master
[Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.

#Start the replication I / O thread again and connect to the new replication source server
[System] [MY-010562] [Repl] Slave I/O thread for channel 'mgr-a': connected to master '[email protected]:3306',replication started in log 'FIRST' at position 8598
#Inform UUID of switching
[Warning] [MY-010549] [Repl] The master's UUID has changed, although this should not happen unless you have changed it manually. The old UUID was ec2fcbeb-976c-11eb-a652-525400e2078a.

Run show replicate status again to confirm that the replication source has been switched:

*************************** 1. row ***************************
             Replica_IO_State: Waiting for master to send event
                  Source_Host: 172.16.16.11
                  Source_User: repl
                  Source_Port: 3306
                Connect_Retry: 10
...
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
...
                  Source_UUID: ec2fcbeb-976c-11eb-a652-525400e2078a
...
           Source_Retry_Count: 3
...
           Retrieved_Gtid_Set: 476c0276-be03-11eb-bd34-525400e802e2:21-32:1000016-1000017
            Executed_Gtid_Set: 476c0276-be03-11eb-bd34-525400e802e2:1-32:1000015-1000017,
f195537d-19ac-11eb-b29f-5254002eb6d6:1-18
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: mgr-a
...

Because the weights of the three replication sources are set to be the same, the original replication source server will not be switched back after downtime and recovery. If the weight of the old replication source server is set high, it will switch back to the original source again after it is restored:

#The handover is initiated directly without any attempt to reconnect
[Note] [MY-011026] [Repl] Slave I/O thread killed while reading event for channel 'mgr-a'.
[Note] [MY-010570] [Repl] Slave I/O thread exiting for channel 'mgr-a', read up to log 'FIRST', position 8871
[Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
[System] [MY-010562] [Repl] Slave I/O thread for channel 'mgr-a': connected to master '[email protected]:3306',replication started in log 'FIRST' at position 8871
[Warning] [MY-010549] [Repl] The master's UUID has changed, although this should not happen unless you have changed it manually.  The old UUID was 5499a6cb-91cb-11eb-966f-525400e802e2. --  Switch back to the original master again

This is very convenient to achieve automatic failover.

Now, we use Mgr + enhanced semi synchronous replication + automatic failover to build a set of two location multi center database architecture scheme under the financial level application scenario. It is recommended to choose greatsql with higher reliability and stability, and you can use Mgr with more confidence (greatsql, create a better Mgr Ecology).

We will continue to introduce other architecture solutions based on Mgr later.

Enjoy GreatSQL 🙂

Article recommendation:

GreatSQL MGR FAQ
https://mp.weixin.qq.com/s/J6…

Wan Da #12, after the Mgr cluster hangs up, how can we choose the master automatically without manual intervention
https://mp.weixin.qq.com/s/07…

“2021 data technology Carnival · on line”: evolution and practice of MySQL high availability architecture
https://mp.weixin.qq.com/s/u7…

Packet capture analysis of the slowness of an SQL statement
https://mp.weixin.qq.com/s/AY…

Wan Da #15, what are the conditions that may cause the Mgr service to fail to start
https://mp.weixin.qq.com/s/in…

Technology sharing | why Mgr consistency mode does not recommend after
https://mp.weixin.qq.com/s/rN…

About greatsql

Greatsql is a MySQL branch maintained by Wanli database. It focuses on improving the reliability and performance of Mgr and supports the parallel query feature of InnoDB. It is a branch version of MySQL suitable for financial applications.

Gitee:
https://gitee.com/GreatSQL/Gr…

GitHub:
https://github.com/GreatSQL/G…

Bilibili:
https://space.bilibili.com/13…

Wechat & QQ group:
You can search and add greatsql community assistant wechat friends, send verification information “add group” to join greatsql / Mgr communication wechat group

QQ group: 533341697
Wechat assistant: wanlidbc

This article is composed of blog one article multi posting platformOpenWriterelease!