Tencent side 2: what is the semi synchronization of MySQL?



Years later, when working on Tencent’s second side, the first question asked after writing the algorithm is, what is MySQL’s semi synchronization? I was confused at that time. I thought I was asking the two-stage submission of MySQL? The results confirmed that it was not a two-stage submission, and then the interviewer saw that I didn’t even know what I was asking, so he skipped this question and talked about the next question directly. So this time, I will summarize the knowledge content of this part. There are a lot of text content, which may be a little boring, but it is still interesting for those who are interested in this aspect.

Master slave replication of MySQL

In large-scale projects, we usually use the replication function of Mysql to create MySQL master-slave clusters. Data synchronization can be achieved by configuring one or more standby databases for the server. The function of replication is not only conducive to building high-performance applications, but also the basis of high availability, scalability, disaster recovery, backup and data warehouse.

To put it mildly, the master-slave replication of MySQL realizes the separation of reading and writing. Compared with the single point database, it improves the performance of the business system and optimizes the user experience. In addition, the high availability of the database is realized through master-slave replication. When the master node MySQL hangs, the slave database can be used to top it.

Replication methods supported by MySQL

MySQL supports three replication methods:

  • Statement based replication (also known as logical replication) mainly refers to the repeated execution of SQL statements executed on the primary database on the secondary database. MySQL uses this kind of replication by default, which is more efficient. However, there are some problems. If functions such as UUID () and rand () are used in SQL, the data copied to the slave library will be biased.
  • Row based replication refers to copying the updated data to the slave database instead of executing side statements. From mysql5 1 is supported.
  • Mixed replication: statement replication is adopted by default. When it is found that the statement cannot accurately copy data (for example, the statement contains functions such as UUID (), Rand ()), row based replication is adopted.

Principle of master-slave replication

The replication principle of MySQL can be roughly divided into these three steps

  1. Change the data on the master database and record it in the binary log.
  2. Copy logs from the primary library to your own relay log.
  3. The standby database reads the events in the relay log and puts them back on the standby database data.

The main process is as follows:

Tencent side 2: what is the semi synchronization of MySQL?

Here are the three steps of replication:

Step 1:Binary logs are recorded on the master database. First, the master database should enable the binlog logging function and authorize slave to access the slave database. One thing to note here is that the order in the binlog log is recorded according to the order of transaction submission rather than the execution order of each statement.

Step 2:Copy binlog from the library to its local relaylog. First, the slave library will start a working thread, called I / O thread. The I / O thread will establish a common client connection with the main library, and then start a special binlog dump thread on the main library. This dump thread will read the events in the binlog. After catching up with the main database, it will sleep until the main database notifies of a new update statement.

In this way, binlog data is transferred to the relaylog on the slave library through the I / O thread on the slave library and the binlog dump thread on the master library.

Step 3:Start an SQL thread from the database, read events from the relaylog and execute them in the standby database, so as to update the standby database data.

This replication architecture decouples the acquisition and playback events, and the running I / O thread can work independently of the SQL thread. However, this architecture also limits the replication process. The most important point is that queries running concurrently on the primary database can only be serialized in the secondary database, because there is only one SQL thread to replay events in the relay log.

When it comes to the problem of serial execution of master-slave replication, I think of a problem I encountered in my work before. There is a business scenario. We have an operation to initialize a batch of data, which is obtained from the interface of an external system, and then I obtain data from the interface of an external system through multiple threads in the thread pool in parallel, After each thread obtains the data, it is directly inserted into the database. Then, after all the data is stored in the database, execute batch query, query the data just inserted into the database and put it into elasticsearch. As a result, the data put into the ES is always incomplete. Later, it can’t be studied for a long time. Finally, the problem is solved by letting the query go to the main database. At that time, I didn’t know that the problem was caused by the serialization of MySQL master-slave replication.

MySQL master-slave replication mode

MySQL supports master-slave replication,Asynchronous replicationSemi synchronous replicationGtid replicationAnd other replication modes.

Asynchronous mode

The default replication mode of MySQL is asynchronous mode, which mainly refers to the I / O thread on the master server of MySQL. When the data is written to Binlong, it will directly return to the client. The data is updated successfully, regardless of whether the data is transmitted to the slave server or written to the relaylog. In this mode, copying data is actually risky. Once the data is only written to the binlog of the master database and has not been synchronized to the slave database, it will cause data loss.

However, this mode is also the most efficient, because the function of changing data is only completed in the main database. Copying data from the database will not affect the write data operation of the main database.

Tencent side 2: what is the semi synchronization of MySQL?

As I said above, although this asynchronous replication mode is efficient, it has a great risk of data loss, so there is the semi synchronous replication mode to be introduced later.

Semi synchronous mode

MySQL from5.5The version starts to support the semi synchronous master-slave replication mode in the form of a plug-in. What is semi synchronous master-slave replication mode?

Here is a comparison to illustrate:

  • Asynchronous replication mode: we have already introduced the asynchronous replication mode. After the master database executes the transactions submitted by the client, it will immediately return to the client as long as the execution logic is written to the binlog. It does not care whether the slave database is successfully executed. There will be a hidden danger that the master database hangs when the binlog executed by the master database has not been synchronized to the slave database, At this time, the slave database will be forcibly promoted to the primary database, which may cause data loss.
  • Synchronous replication mode: after the master database executes the transaction submitted by the client, it needs to wait until all slave databases also execute the transaction before returning to the client for successful execution. Because you have to wait until all slave libraries are executed, the execution process will be blocked and wait for the return results, so there will be a serious impact on performance.
  • Semi synchronous replication mode: semi synchronous replication mode is a replication mode between asynchronous and synchronous. After executing the transactions submitted by the client, the master database will wait for at least one slave database to receive the binlog and write the data to the relay log before returning the success result to the client. Semi synchronous replication mode improves the availability of data compared with asynchronous mode, but it also produces a certain performance delay. It requires at least one round-trip time of TCP / IP connection.

In the semi synchronous replication mode, you can clearly know that after a transaction is committed successfully, the transaction will exist in at least two places, one is the master database and the other is one of the slave databases. The main principle is that when the dump thread of the master notifies the slave library, an ACK mechanism is added, that is, it will confirm whether the slave library receives the transaction flag code. The dump thread of the master not only sends binlog to the slave library, but also receives the ack of slave. When an exception occurs and there is no ack transaction in slave, it will be automatically degraded to asynchronous replication until the exception is repaired, and then it will be automatically changed to semi synchronous replication

The process of MySQL semi synchronous replication is as follows:

Tencent side 2: what is the semi synchronization of MySQL?

Hidden dangers of semi synchronous replication

The semi synchronous replication mode also has some data risks. When a transaction waits for an ACK from the slave database after the master database is committed, if the Master goes down, there will be two problems at this time.

  • The transaction has not been sent to slave: if the transaction has not been sent to the slave, the client will resubmit the transaction after receiving the failure result. Because the resubmitted transaction is executed on the new master, it will be executed successfully. Later, if the previous master is restored, it will join the cluster as a slave. At this time, the previous transaction will be executed twice, The first time this machine was executed as a master, and the second time it was synchronized from the main library as a slave.
  • The transaction has been synchronized to slave: because the transaction has been synchronized to the slave, when the client submits the transaction again after receiving the failure result, the transaction will be executed twice on the current slave machine.

In order to solve the above hidden dangers, MySQL has added a new semi synchronous mode since version 5.7. The implementation process of the new semi synchronous mode is to“Storage Commit”This step moves to“Write Slave dump”Back. This guaranteesThe master database transaction can only be committed after the slave transaction ack。 MySQL version 5.7.2 adds a parameter to configure: RPL_ semi_ sync_ master_ wait_ Point, this parameter has two configurable values:

  • AFTER_SYNC: the parameter value is after_ During sync, the new semi synchronous replication method is adopted.
  • AFTER_COMMIT: represents the semi synchronous replication mode of the old mode.
Tencent side 2: what is the semi synchronization of MySQL?

MySQL starts from version 5.7.2. The default semi synchronous replication method is after_ The sync mode is, but the scheme is not omnipotent because after_ In the sync mode, transactions in the master database are submitted only after the transactions are synchronized to the slave. If the master hangs while the master database waits for the slave to synchronize successfully, the master transaction submission fails, and the client also receives the result of transaction execution failure. However, the contents of binlog have been written to the relay log on the slave, and there will be more slave data, However, with more data, the general problem is not serious. More data is better than less data. Mysql, when it can’t solve the problem of distributed data consistency, can ensure that it doesn’t lose data. More data is better than losing data.

Here are several parameters of semi synchronous replication mode:

mysql> show variables like '%Rpl%';
| Variable_name                             | Value      |
| rpl_semi_sync_master_enabled              | ON         |
| rpl_semi_sync_master_timeout              | 10000      |
| rpl_semi_sync_master_trace_level          | 32         |
| rpl_semi_sync_master_wait_for_slave_count | 1          |
| rpl_semi_sync_master_wait_no_slave        | ON         |
| rpl_semi_sync_master_wait_point           | AFTER_SYNC |
| rpl_stop_slave_timeout                    | 31536000   |
--Semi synchronous replication mode switch
--Semi synchronous replication, timeout, in milliseconds. When it exceeds this time, it will automatically switch to asynchronous replication mode 
--Introduced in MySQL 5.7.3, this variable sets how many slave responses the master needs to wait before returning to the client. The default value is 1.
--This value indicates whether the number of slaves in the current cluster can still meet the currently configured semi synchronous replication mode. The default value is on. When the semi synchronous replication mode is not met, all slaves will switch to asynchronous replication, and this value will also change to off
--It represents the method of submitting transactions for semi synchronous replication. After 5.7.2, it defaults to after_ SYNC

Gtid mode

MySQL has launched the gtid replication mode since version 5.6. Gtid is the abbreviation of global transaction identifier. Gtid is composed of UUID + transactionid. UUID is the unique identifier of a single MySQL instance. A server will be automatically generated when the MySQL instance is started for the first time_ UUID, and it is written to auto. In the data directory by default CNF (MySQL / data / auto. CNF) file. Transactionid is the number of transactions executed on the mysql, which increases as the number of transactions increases. This guaranteesGtid is globally unique in a set of replications

In this way, you can clearly see from the gtid which instance the current transaction is committed from and how many transactions are committed.

Let’s take a look at the specific form of gtid:

mysql> show master status;
| File      | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set                         |
| on.000003 |      187 |              |                  | 76147e28-8086-4f8c-9f98-1cf33d92978d:1-322|
1 row in set (0.00 sec)




How gtid works

Due to the uniqueness of gtids in a group of master-slave replication clusters, it is ensured that the transactions of each gtid are executed only once on one mysql.

So how to implement this mechanism? What is the principle of gtid?

When connecting from the server to the master server, send the gtid you have executed(Executed_ Gtid_ Set: the transaction code that has been executed)And get the gtid(Retrieved_ Gtid_ Set: that is, the transaction number of the main library has been received from the library)All to the master server. The master server will send the missing gtid and the corresponding transaction ID from the slave server to the slave server to complete the data. When the primary server goes down, it will find the conf server with the most successful data synchronization and directly promote it to the primary server. If a mandatory server is not the slave server with the most successful synchronization, it will first go to the most successful server through the change command to complete the gtid, and then promote the mandatory machine to the master.

The main data synchronization mechanism can be divided into these steps:

  • When the master updates the data, the production gtid is recorded in the binlog before the transaction.
  • The I / O thread on the slave side writes the changed binlog to the relay log.
  • The SQL thread obtains the gtid from the relay log, and then compares whether the binlog on the slave side has records.
  • If there is a record, it indicates that the transaction of the gtid has been executed, and the slave will ignore the gtid.
  • If there is no record, slave will execute the gtid transaction from the relay log and record it to the binlog.
  • During parsing, judge whether there is a primary key. If there is no primary key, use the secondary index. If there is no secondary index, scan the whole table.

The initial structure is shown in the figure below

Tencent side 2: what is the semi synchronization of MySQL?

When the Master goes down, it will evolve into the following figure.

Tencent side 2: what is the semi synchronization of MySQL?

As can be seen from the above figure, after the master hangs up, slave-1 completes the master transaction, and slave-2 delays a little, so it does not complete the master transaction. At this time, slave-1 is promoted. After slave-2 connects to the new master (slave-1), slave-2 transmits the latest gtid to the new master, and then slave-1 sends the transaction to slave-2 from the next gtid of this gtid. This mode of self seeking replication location reduces the possibility of transaction loss and failure recovery time.

Advantages and disadvantages of gtid

Through the above analysis, we can conclude that the advantages of gtid are:

  • Each transaction corresponds to an execution ID, and a gtid will only be executed once on a server;
  • Gtid is used to replace the traditional copy method. The biggest difference between gtid copy and ordinary copy mode is that there is no need to specify the binary file name and location;
  • Reduce manual intervention and service failure time. When the host hangs up, upgrade a standby machine from many standby machines to the host through software;

The disadvantages of gtid are also obvious:

  • First, non transactional storage engines are not supported;
  • Create table is not supported Copy the select statement (the main database directly reports an error); (principle: two SQL statements will be generated, one is DDL create table SQL and the other is insert into insert data. Since DDL will lead to automatic submission, this SQL requires at least two gtids, but in gtid mode, only one gtid can be generated for this SQL.)
  • One SQL is not allowed to update one transaction engine table and non transaction engine table at the same time;
  • In a MySQL replication group, all gtids are required to be turned on or off.
  • Restart is required to start gtid (except mysql5.7);
  • After gtid is enabled, the original traditional replication method is no longer used (unlike semi synchronous replication, which can be degraded to asynchronous replication after semi synchronous replication fails);
  • The create temporary table and drop temporary table statements are not supported;
  • SQL is not supported_ slave_ skip_ counter;

In fact, there are a lot of contents in this part of gtid. If you want to study it in depth, you can take a look at this article.

Finally, some prerequisites for opening gtid:

  • MySQL version 5.6, in my Add to CNF file:
gtid_ Mode = on (required) # enable gtid function
log_ Bin = log bin = MySQL bin (required) # enable binlog binary log function
Log slave updates = 1 (required) # or 1 can be written as on
Enforce gtid consistency = 1 (required) # or write 1 as on
  • MySQL 5.7 or later, in my Add to CNF file:
gtid_ Mode = on (required)
Enforce gtid consistency = 1 (required)
log_ Bin = MySQL bin (optional) # high availability switch. It's best to enable this function
Log slave updates = 1 (optional) # high availability switching. It is best to turn on this function