Explain MySQL semi synchronization in detail

Time:2021-9-17

preface

Years later, when working on Tencent’s second side, the first question asked after writing the algorithm is, what is MySQL’s semi synchronization? I was confused at that time. I thought I was asking the two-stage submission of MySQL? The results confirmed that it was not a two-stage submission, and then the interviewer saw that I didn’t even know what I was asking, so he skipped this question and talked about the next question directly. So this time, I will summarize the knowledge content of this part. There are a lot of text content, which may be a little boring, but it is still interesting for those who are interested in this aspect.

Master slave replication of MySQL

In large-scale projects, we usually use the replication function of Mysql to create MySQL master-slave clusters. Data synchronization can be achieved by configuring one or more standby databases for the server. The function of replication is not only conducive to building high-performance applications, but also the basis of high availability, scalability, disaster recovery, backup and data warehouse.

To put it mildly, the master-slave replication of MySQL realizes the separation of reading and writing. Compared with the single point database, it improves the performance of the business system and optimizes the user experience. In addition, the high availability of the database is realized through master-slave replication. When the master node MySQL hangs, the slave database can be used to top it.

Replication methods supported by MySQL

MySQL supports three replication methods:

  • Statement based replication (also known as logical replication) mainly refers to the repeated execution of SQL statements executed on the primary database on the secondary database. MySQL uses this kind of replication by default, which is more efficient. However, there are some problems. If functions such as UUID () and rand () are used in SQL, the data copied to the slave library will be biased.
  • Row based replication refers to copying the updated data to the slave database instead of executing side statements. It is only supported from MySQL 5.1.
  • Mixed replication: statement replication is adopted by default. When it is found that the statement cannot accurately copy data (for example, the statement contains functions such as UUID (), Rand ()), row based replication is adopted.

Principle of master-slave replication

The replication principle of MySQL can be roughly divided into these three steps

  1. Change the data on the master database and record it in the binary log.
  2. Copy logs from the primary library to your own relay log.
  3. The standby database reads the events in the relay log and puts them back on the standby database data.

The main process is as follows:

Here are the three steps of replication:

Step 1: record binary logs on the master database. First, the master database should enable the binlog logging function and authorize slave to access the slave database. One thing to note here is that the order in the binlog log is recorded according to the order of transaction submission rather than the execution order of each statement.

Step 2: copy binlog from the library to its local relaylog. First, the slave library will start a working thread, called I / O thread. The I / O thread will establish a common client connection with the main library, and then start a special binlog dump thread on the main library. This dump thread will read the events in the binlog. After catching up with the main database, it will sleep until the main database notifies of a new update statement. In this way, binlog data is transferred to the relaylog on the slave library through the I / O thread on the slave library and the binlog dump thread on the master library.

Step 3: start an SQL thread from the database, read events from the relaylog and execute them in the standby database, so as to update the standby database data.

==This replication architecture decouples the acquisition and playback events, and the running I / O thread can work independently of the SQL thread. However, this architecture also limits the replication process. The most important point is that queries running concurrently on the primary database can only be serialized in the secondary database, because there is only one SQL thread to replay events in the relay log==

When it comes to the problem of serial execution of master-slave replication, I think of a problem I encountered in my work before. There is a business scenario. We have an operation to initialize a batch of data, which is obtained from the interface of an external system, and then I obtain data from the interface of an external system through multiple threads in the thread pool in parallel, After each thread obtains the data, it is directly inserted into the database. Then, after all the data is stored in the database, execute batch query, query the data just inserted into the database and put it into elasticsearch. As a result, the data put into the ES is always incomplete. Later, it can’t be studied for a long time. Finally, the problem is solved by letting the query go to the main database. At that time, I didn’t know that the problem was caused by the serialization of MySQL master-slave replication.

MySQL master-slave replication mode

MySQL master-slave replication actually supports asynchronous replication, semi synchronous replication, gtid replication and other replication modes.

Asynchronous mode

The default replication mode of MySQL is asynchronous mode, which mainly refers to the I / O thread on the master server of MySQL. When the data is written to Binlong, it will directly return to the client. The data is updated successfully, regardless of whether the data is transmitted to the slave server or written to the relaylog. In this mode, copying data is actually risky. Once the data is only written to the binlog of the master database, it will cause data loss before it is synchronized to the slave database.

However, this mode is also the most efficient, because the function of changing data is only completed in the main database. Copying data from the database will not affect the write data operation of the main database.

As I said above, although this asynchronous replication mode is efficient, it has a great risk of data loss, so there is the semi synchronous replication mode to be introduced later.

Semi synchronous mode

Since version 5.5, MySQL supports the semi synchronous master-slave replication mode in the form of a plug-in. What is semi synchronous master-slave replication mode? Here is a comparison to illustrate:

  • Asynchronous replication mode: as we have described above, in the asynchronous replication mode, after the master database executes the transactions submitted by the client, as long as the execution logic is written to the binlog, it will immediately return to the client. It does not care whether the slave database is successfully executed. In this way, there will be a hidden problem, that is, when the binlog executed by the master database has not been synchronized to the slave database, the master database hangs, At this time, the slave database will be forcibly promoted to the primary database, which may cause data loss.
  • Synchronous replication mode: after the master database executes the transaction submitted by the client, it needs to wait until all slave databases also execute the transaction before returning to the client for successful execution. Because you have to wait until all slave libraries are executed, the execution process will be blocked and wait for the return results, so there will be a serious impact on performance.
  • Semi synchronous replication mode: semi synchronous replication mode, which can be said to be a replication mode between asynchronous and synchronous. After executing the transactions submitted by the client, the main database needs to wait for at least one slave database to receive the binlog and write the data to the relay log before returning the success result to the client. Semi synchronous replication mode improves the availability of data compared with asynchronous mode, but it also produces a certain performance delay. It requires at least one round-trip time of TCP / IP connection.

In the semi synchronous replication mode, you can clearly know that after a transaction is committed successfully, the transaction will exist in at least two places, one is the master database and the other is one of the slave databases. The main principle is that when the dump thread of the master notifies the slave library, an ACK mechanism is added, that is, it will confirm whether the slave library receives the transaction flag code. The dump thread of the master not only sends binlog to the slave library, but also receives the ack of slave. When an exception occurs and there is no ack transaction in slave, it will be automatically degraded to asynchronous replication until the exception is repaired, and then it will be automatically changed to semi synchronous replication

The process of MySQL semi synchronous replication is as follows:

Hidden dangers of semi synchronous replication
The semi synchronous replication mode also has some data risks. When a transaction waits for an ACK from the slave database after the master database is committed, if the Master goes down, there will be two problems at this time.

  • The transaction has not been sent to the slave: if the transaction has not been sent to the slave, the client will resubmit the transaction after receiving the failure result. Because the resubmitted transaction is executed on the new master, it will be executed successfully. Later, if the previous master is restored, it will join the cluster as a slave. At this time, the previous transaction will be executed twice, The first time this machine was executed as a master, and the second time it was synchronized from the main library as a slave.
  • The transaction has been synchronized to the slave: because the transaction has been synchronized to the slave, when the client commits the transaction again after receiving the failure result, the transaction will be executed twice on the current slave machine.

In order to solve the above hidden dangers, MySQL has added a new semi synchronous mode since version 5.7. The implementation process of the new semi synchronous method is to move the “storage commit” step to the back of “write slave dump”. This ensures that the master database transaction is committed only after the slave transaction ack. MySQL version 5.7.2 adds a parameter to configure: RPL_ semi_ sync_ master_ wait_ Point, this parameter has two configurable values:

  • AFTER_ Sync: the parameter value is after_ During sync, the new semi synchronous replication method is adopted.
  • AFTER_ Commit: represents the semi synchronous replication mode in the old way.

MySQL starts from version 5.7.2. The default semi synchronous replication method is after_ The sync mode is, but the scheme is not omnipotent because after_ In the sync mode, transactions in the master database are submitted only after the transactions are synchronized to the slave. If the master hangs while the master database waits for the slave to synchronize successfully, the master transaction submission fails, and the client also receives the result of transaction execution failure. However, the contents of binlog have been written to the relay log on the slave, and there will be more slave data, However, with more data, the general problem is not serious. More data is better than less data. Mysql, when it can’t solve the problem of distributed data consistency, can ensure that it doesn’t lose data. More data is better than losing data.

Here are several parameters of semi synchronous replication mode:


mysql> show variables like '%Rpl%';
+-------------------------------------------+------------+
| Variable_name                             | Value      |
+-------------------------------------------+------------+
| rpl_semi_sync_master_enabled              | ON         |
| rpl_semi_sync_master_timeout              | 10000      |
| rpl_semi_sync_master_trace_level          | 32         |
| rpl_semi_sync_master_wait_for_slave_count | 1          |
| rpl_semi_sync_master_wait_no_slave        | ON         |
| rpl_semi_sync_master_wait_point           | AFTER_SYNC |
| rpl_stop_slave_timeout                    | 31536000   |
+-------------------------------------------+------------+
--Semi synchronous replication mode switch
rpl_semi_sync_master_enabled
--Semi synchronous replication, timeout, in milliseconds. When it exceeds this time, it will automatically switch to asynchronous replication mode 
rpl_semi_sync_master_timeout
--Introduced in MySQL 5.7.3, this variable sets how many slave responses the master needs to wait before returning to the client. The default value is 1.
rpl_semi_sync_master_wait_for_slave_count
--This value indicates whether the number of slaves in the current cluster can still meet the currently configured semi synchronous replication mode. The default value is on. When the semi synchronous replication mode is not met, all slaves will switch to asynchronous replication, and this value will also change to off
rpl_semi_sync_master_wait_no_slave
--It represents the method of submitting transactions for semi synchronous replication. After 5.7.2, it defaults to after_ SYNC
rpl_semi_sync_master_wait_point

Gtid mode

MySQL has launched the gtid replication mode since version 5.6. Gtid is the abbreviation of global transaction identifier. Gtid is composed of UUID + transactionid. UUID is the unique identifier of a single MySQL instance. A server will be automatically generated when the MySQL instance is started for the first time_ UUID, and it is written to the auto.cnf (MySQL / data / auto. CNF) file in the data directory by default. Transactionid is the number of transactions executed on the mysql, which increases as the number of transactions increases. This ensures that the gtid is globally unique in a group of replications.

In this way, you can clearly see from the gtid which instance the current transaction is committed from and how many transactions are committed.

Let’s take a look at the specific form of gtid:


mysql> show master status;
+-----------+----------+--------------+------------------+-------------------------------------------+
| File      | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                         |
+-----------+----------+--------------+------------------+-------------------------------------------+
| on.000003 |      187 |              |                  | 76147e28-8086-4f8c-9f98-1cf33d92978d:1-322|
+-----------+----------+--------------+------------------+-------------------------------------------+
1 row in set (0.00 sec)

How gtid works

Due to the uniqueness of gtids in a group of master-slave replication clusters, it is ensured that the transactions of each gtid are executed only once on one mysql. So how to implement this mechanism? What is the principle of gtid?

When connecting to the master server from the server, the gtid (executed_gtid_set: i.e. the executed transaction code) and the gtid (retrieved_gtid_set: i.e. the transaction number of the master database received from the slave database) are transmitted to the master server. The master server will send the missing gtid and the corresponding transaction ID from the slave server to the slave server to complete the data. When the primary server goes down, it will find the conf server with the most successful data synchronization and directly promote it to the primary server. If a mandatory server is not the slave server with the most successful synchronization, it will first go to the most successful server through the change command to complete the gtid, and then promote the mandatory machine to the master.

The main data synchronization mechanism can be divided into these steps:

  • ==When the master updates the data, the production gtid is recorded in the binlog before the transaction==
  • ==The I / O thread on the slave side writes the changed binlog to the relay log==
  • ==The SQL thread obtains the gtid from the relay log, and then compares whether the binlog on the slave side has records==
  • ==If there is a record, it indicates that the transaction of the gtid has been executed, and the slave will ignore the gtid==
  • ==If there is no record, slave will execute the gtid transaction from the relay log and record it to the binlog==
  • ==During parsing, judge whether there is a primary key. If there is no primary key, use the secondary index. If there is no secondary index, scan the whole table==

The initial structure is shown in the figure below

As can be seen from the above figure, after the master hangs up, slave-1 completes the master transaction, and slave-2 delays a little, so it does not complete the master transaction. At this time, slave-1 is promoted. After slave-2 connects to the new master (slave-1), slave-2 transmits the latest gtid to the new master, and then slave-1 sends the transaction to slave-2 from the next gtid of this gtid. This mode of self seeking replication location reduces the possibility of transaction loss and failure recovery time.

Advantages and disadvantages of gtid

Through the above analysis, we can conclude that the advantages of gtid are:

  • ==Each transaction corresponds to an execution ID, and a gtid will only be executed once on a server==
  • ==Gtid is used to replace the traditional copy method. The biggest difference between gtid copy and ordinary copy mode is that there is no need to specify the binary file name and location==
  • ==Reduce manual intervention and service failure time. When the host hangs up, upgrade a standby machine from many standby machines to the host through software==

The disadvantages of gtid are also obvious:

  • ==First, non transactional storage engines are not supported==
  • ==Copy of the create table… Select statement is not supported (the main database directly reports an error); (principle: two SQL statements will be generated, one is DDL create table SQL and the other is insert into insert data. Since DDL will lead to automatic submission, this SQL requires at least two gtids, but in gtid mode, only one gtid can be generated for this SQL.)==
  • ==One SQL is not allowed to update one transaction engine table and non transaction engine table at the same time==
  • ==In a MySQL replication group, all gtids are required to be turned on or off==
  • ==Restart is required to start gtid (except mysql5.7)==
  • ==After gtid is enabled, the original traditional replication method is no longer used (unlike semi synchronous replication, which can be degraded to asynchronous replication after semi synchronous replication fails)==
  • ==The create temporary table and drop temporary table statements are not supported==
  • ==SQL is not supported_ slave_ skip_ counter;==

In fact, there are a lot of contents in this part of gtid. If you want to study it in depth, you can take a look at this article. Finally, some prerequisites for opening gtid:

  • MySQL version 5.6, add in my.cnf file:
gtid_ Mode = on (required) # enable gtid function
log_ Bin = log bin = MySQL bin (required) # enable binlog binary log function
Log slave updates = 1 (required) # or 1 can be written as on
Enforce gtid consistency = 1 (required) # or write 1 as on
  • MySQL 5.7 or later, add the following to my.cnf file:
gtid_ Mode = on (required)
Enforce gtid consistency = 1 (required)
log_ Bin = MySQL bin (optional) # high availability switch. It's best to enable this function
Log slave updates = 1 (optional) # high availability switching. It is best to turn on this function

The above is the detailed explanation of MySQL semi synchronization. For more information about MySQL semi synchronization, please pay attention to other relevant articles of developeppaer!

Recommended Today

Supervisor

Supervisor [note] Supervisor – H view supervisor command help Supervisorctl – H view supervisorctl command help Supervisorctl help view the action command of supervisorctl Supervisorctl help any action to view the use of this action 1. Introduction Supervisor is a process control system. Generally speaking, it can monitor your process. If the process exits abnormally, […]