Automatic failover of replication architecture from node in MySQL 8.0.23

Time:2022-5-11

I have been in contact with Mgr for some time. With the arrival of MySQL 8.0.23, the high availability architecture based on MySQL group replication (Mgr) provides a new architecture idea.

How can the slave in the disaster recovery room better support the Mgr in the host room?

How many nodes can Mgr break?

This time, I will talk about some thoughts and functions of Mgr with you on the above two questions.

1、 Fault tolerance of the number of MySQL group reference members

I’m sure you will be familiar with the above table. I often ask in the interview: “how bad are the MGRS with four nodes at most?”, Most people answered: “at most one is bad, and two are bad, so the brain can’t work.”

Let’s take a look at the Mgr processing method. Is this the answer?

1) We have a 4-node Mgr

Bury a question: this figure looks like a single mode, but the arrow is not one-way. Is it wrong?

2) At this time, the second-04 suddenly goes down. What will the Mgr cluster look like?

The cluster status will change to:

  • Each node will exchange its own information at a fixed time.
  • When the second-04 node information is not received, other members will wait for 5 seconds.
  • During this period, the second-04 certainly did not send a message, so the healthy members thought that the second-04 was in a suspicious state and marked the unreachable state.
  • Then, the healthy members are selected according to the parameter: group_ replication_ member_ expel_ Timeout, continue to wait (at this time, second-04 is still in unreachable state).
  • When the group is exceeded_ replication_ member_ expel_ After the timeout, the healthy members will expel the second-04 node from the cluster.

So here’s the point. Knock on the blackboard

In second-04, when not expelled:

At this time, the cluster is (4 nodes – 3 health – 1 bad). If one node continues to be bad during this period, the cluster will become (4 nodes – 2 health – 2 bad). The cluster does not meet the majority principle, and each node cannot be written (unless manual intervention is made to forcibly specify the cluster member list).

In second-04, after being expelled:

At this time, the cluster is (3-node-3-healthy-0 bad), and the 4-node cluster degenerates into a 3-node healthy cluster. At this time, the cluster can continue to break one node and become (3-node-2-healthy-1 bad)

Therefore, whether one or two 4-node clusters can be broken depends on which stage of the cluster processing process.

PS:

Let’s talk about the problem buried just now: this figure looks like a single mode, but the arrow is not one-way. Is it wrong?

First, in the single mode, the second node cannot be written by default, but it is only because the super read only of the second node is turned on.

If the second node is super read only = 0, the second node can write normally and synchronize other nodes (primary and other second). The transmission is still based on Paxos protocol.

Run a train: the second node reversely synchronizes other nodes. It will not go through the conflict detection stage (the theoretical efficiency is higher than the multi write mode). There is no verification. If you are interested, you can study it.

2、 Asynchronous connection failover

MySQL 8.0.22 has introduced asynchronous replication connection failover. Many friends have sent a document to introduce it. Here I only briefly describe it:

1) One master and one slave in the same machine room, and one slave node in the remote machine room

2) Master failure changes slave-01 to master, and slave-02 cannot connect to the original master

3) If the “asynchronous connection failover configuration” is configured for the slave-02, the slave-02 will automatically try to establish a replication relationship with the original slave-01 (new master) according to the pre-defined configuration after identifying the failure of the original master:

This function is very good. The reference to third-party tools (such as MHA’s repair of master-slave relationship) can be replaced by MySQL’s native function.

But after the test, I have some doubts:

1. Does “asynchronous” replication fail over not support semi synchronous architecture? Can’t you ensure that the data is not lost, or can’t you completely replace MHA?
A: in fact, it supports enhanced semi synchronization.

2. To pre configure the master list for failover, do you still need to maintain the node of machine room B when the architecture of machine room a changes?
A: Yes.

3. If machine room a is an Mgr, the Mgr node (Master) is abnormal, but the service is not turned off and can be accessed. Isn’t machine room B always connected?
A: Yes

Then, MySQL 8.0.23 was released, which brought the enhancement of this function:

Slave can support Mgr clusters and can dynamically identify Mgr members to establish master slave relationships

Finally, let’s run a lap:

1) First, we have a 3-node Mgr cluster, version 8.0.22 (asynchronous connection failover works on the IO thread of slave, so slave is version 8.0.23)


+----------------------------+-------------+--------------+-------------+---------------------+
| now(6)           | member_host | member_state | member_role | VIEW_ID       |
+----------------------------+-------------+--------------+-------------+---------------------+
| 2021-01-22 13:41:27.902251 | mysql-01  | ONLINE    | SECONDARY | 16112906030396799:9 |
| 2021-01-22 13:41:27.902251 | mysql-02  | ONLINE    | PRIMARY   | 16112906030396799:9 |
| 2021-01-22 13:41:27.902251 | mysql-03  | ONLINE    | SECONDARY  | 16112906030396799:9 |
+----------------------------+-------------+--------------+-------------+---------------------+

2) Then, on the independent slave node, we specify the “failover list of master connections” on the slave

SELECT asynchronous_connection_failover_add_managed('ch1', 'GroupReplication', 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaa1', 'mysql-02', 3306, '', 80, 60);

Briefly explain the following parameters:
Ch1: Chanel name
Groupreplication: mandatory write dead parameters. Currently, Mgr clusters are supported
Aaaaaa AAAA AAAA AAAA AAAA aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Mysql-02: one of Mgr members
80: the priority of the primary node (0-100). If multiple primary nodes have the same priority, randomly select the node to act as the master.
60: the priority of the second node (0-100) is basically prepared for the single mode

3) Specify replication channel information for slave


CHANGE REPLICATION SOURCE TO SOURCE_USER='rpl_user', SOURCE_PASSWORD='123456', SOURCE_HOST='mysql-02',SOURCE_PORT=3306,SOURCE_RETRY_COUNT=2,SOURCE_CONNECTION_AUTO_FAILOVER=1,SOURCE_AUTO_POSITION=1 For CHANNEL 'ch1';

4) Start slave and view the “transferable list of connections”

If IO thread is not enabled, Mgr members will not be recognized automatically. And copy users

rpl_ User needs to check performance at Mgr node_ Schema has select permission


start slave;
SELECT * FROM performance_schema.replication_asynchronous_connection_failover;
+--------------+----------+------+-------------------+--------+--------------------------------------+
| CHANNEL_NAME | HOST   | PORT | NETWORK_NAMESPACE | WEIGHT | MANAGED_NAME             |
+--------------+----------+------+-------------------+--------+--------------------------------------+
| ch1     | mysql-01 | 3306 |          |   60 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaa1 |
| ch1     | mysql-02 | 3306 |          |   80 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaa1 |
| ch1     | mysql-03 | 3306 |          |   60 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaa1 |
+--------------+----------+------+-------------------+--------+--------------------------------------+

5) Then we will mysql-02 stop group_ Replication (not shutting down the service),

The slave list automatically eliminates mysql-02 and reconnects with other nodes — mysql-02 (primary):


stop group_replication;

-- Slave:
SELECT * FROM performance_schema.replication_asynchronous_connection_failover;
+--------------+----------+------+-------------------+--------+--------------------------------------+
| CHANNEL_NAME | HOST   | PORT | NETWORK_NAMESPACE | WEIGHT | MANAGED_NAME             |
+--------------+----------+------+-------------------+--------+--------------------------------------+
| ch1     | mysql-01 | 3306 |          |   80 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaa1 |
| ch1     | mysql-03 | 3306 |          |   60 | aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaa1 |
+--------------+----------+------+-------------------+--------+--------------------------------------+

show slave status\G
*************************** 1. row ***************************
        Slave_IO_State: Waiting for master to send event
         Master_Host: mysql-01
         Master_User: rpl_user
         Master_Port: 3306
        Connect_Retry: 60
       Master_Log_File: mybinlog.000003
     Read_Master_Log_Pos: 4904
        Relay_Log_File: mysql-01-relay-bin-ch1.000065
        Relay_Log_Pos: 439
    Relay_Master_Log_File: mybinlog.000003
       Slave_IO_Running: Yes
      Slave_SQL_Running: Yes
      ...

At this point, the configuration is complete. The list can be automatically maintained by slave when Mgr nodes increase or decrease. No more use cases.

PS:

If you want to manually switch the slave’s established master node (primary) to connect to other nodes (second), you only need to delete the “transferable list of replication connections”, readjust the second priority and add it back.

--Delete configuration
SELECT asynchronous_connection_failover_delete_managed('ch1', 'aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaa1');


--Add again and adjust the priority of the second to be higher than that of the primary
SELECT asynchronous_connection_failover_add_managed('ch1', 'GroupReplication', 'aaaaaaaaaaaa-aaaa-aaaa-aaaaaaaaaaa1', 'mysql-03', 3306, '', 60, 80);

Reference connection:

https://mysqlhighavailability.com/automatic-asynchronous-replication-connection-failover/

https://my.oschina.net/u/4591256/blog/4813037

https://dev.mysql.com/doc/refman/8.0/en/replication-functions-source-list.html

This is the end of this article about automatic failover of replication architecture from nodes in MySQL 8.0.23. For more information about automatic failover of MySQL, please search the previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!

Recommended Today

Implementation of redis key space notification

catalogue Introduction realization Use in business summary Introduction Recently, a regular activity is being developed, and the activity is multi session. After that, you need to push the information to the client at the beginning of the activity and once at the end of the activity. The simple design scheme is to cache the configuration […]