Comprehensive analysis of MySQL master-slave replication mechanism

Time:2021-10-16

As a relational database, MySQL provides a built-in data replication mechanism, which makes it possible to realize advanced features such as high availability architecture based on its replication mechanism, so that MySQL can be suitable for the production environment without the help of additional plug-ins or other tools. This is one of the conditions for large-scale practical application of MySQL.

The replication mechanism based on MySQL can not only realize the high availability of the database, but also realize advanced features such as performance expansion, remote disaster recovery and hot and cold separation.

  • High availability: by configuring a certain replication mechanism, MySQL realizes cross host data replication, so as to obtain a certain high availability. If you need to obtain higher availability, you only need to configure multiple replicas or cascade replication.
  • Performance expansion: since the replication mechanism provides multiple data backups, in scenarios where the requirements for read-write consistency are not high, you can configure one or more replicas to distribute read requests to replica nodes, so as to improve the overall read-write performance.
  • Remote disaster recovery: you can easily obtain a certain remote disaster recovery capability by deploying the replica node to a remote computer room. In practice, factors that may affect the overall performance, such as network delay, need to be considered.
  • Transaction separation: by configuring the replication mechanism and sending low-frequency and large amount of transactions to the replica node for execution, these transactions can avoid competing with high-frequency transactions for computing resources, so as to avoid the overall performance problem.

In order to obtain the above capabilities, you need to understand the basic MySQL replication mechanism and select the appropriate configuration in combination with the actual application scenario.

Master slave replication mechanism

MySQL implements master-slave replication based on binlog. The slave node tracks and obtains the latest updates in the binlog of the master node and replays them in itself, so as to replicate the data of the master node.

The following figure is a schematic diagram of MySQL master-slave replication process. Three threads are involved in the whole process, and their responsibilities are:

  • Master node binlog dump thread: this thread is created after the slave node connects to the master node and is responsible for sending newly written data in binlog to the slave node. When reading binlog, the dump thread will first obtain the lock of binlog, release it immediately after reading, and then send the read data to the slave node.
  • Slave node i / O thread: the slave node i / O thread is responsible for sending a data synchronization request to the master node, receiving the data sent by the master node and writing it to the relay log.
  • SQL thread from node: this thread reads data updates from the relay log and replays them.

Asynchronous replication

By default, MySQL master-slave replication is asynchronous replication. Under this mechanism, the master node will respond to the client’s request immediately after completing the local log writing, and the data replication process of the slave node will be executed asynchronously.

Obviously, under this mechanism, the replication process will not affect the response of the primary node to the client request. Therefore, compared with a single node, it will not cause significant loss of overall performance.

However, under this mechanism, if the data is submitted by the master node but not synchronized to the slave node, the master node goes down. At this time, if the master-slave switch occurs and new data is written, the data may be lost or inconsistent.

Semi synchronous replication

Since version 5.6, MySQL supports semi synchronous replication. Compared with asynchronous replication, this mechanism has the following differences:

After receiving the request from the client, the master node must complete the log writing of the node and wait for at least one response from the slave node to complete data synchronization (or timeout) before responding to the request.

The slave node will respond to the master node only after writing the relay log and completing the disk flushing.

When the slave node responds to the timeout, the master node will degenerate the synchronization mechanism into asynchronous replication. After at least one slave node recovers and completes data catch-up, the master node will restore the synchronization mechanism to semi synchronous replication.

It can be seen that compared with asynchronous replication, semi synchronous replication improves the availability of data to a certain extent. When it does not degenerate to asynchronous replication, if the master node goes down, the data has been replicated to at least one slave node.

At the same time, when responding to the client, the slave node needs to complete the response. Compared with asynchronous replication, it takes more time for network interaction on the master-slave node and time for the slave node to write files and brush disks. Therefore, the overall response performance of the cluster to the client must be reduced.

Master slave copy format

Since MySQL’s replication mechanism is based on binlog, the format of binlog determines the format of master-slave replication. Binlog has two types: line based and statement based. Therefore, there are also two corresponding formats for replication.

Statement-Based Replication(SBR)

For the statement based replication mechanism, binlog only records the executed statements. This method has the following advantages:

  • Since version 3.23, it has been a mature technology with long-term verification
  • Less data is written to log files, which means less file writing and network transmission consumption, so master-slave replication can be completed faster and performance can be improved.
  • The log file records the statements executed on all databases and can be used for auditing and other purposes

It has the following disadvantages:

  • User defined functions (UDFs) and functions with uncertain execution results cannot be copied
  • When updating data, more row locks are required than row based replication
  • For complex statements such as insert before update, the slave node needs to be replayed completely, while the copy based on row format only needs to execute the final result

Row-Based Replication(RBR)

Under the row based replication mechanism, the corresponding binlog is also row based. At this time, each data update is converted to the changes of all affected rows when it is written to the binlog.

This replication method has the following advantages:

  • All data changes can be copied safely without being affected by UDF and special functions.
  • Most DBMS adopt this replication method, and the cost of knowledge migration is low.
  • When updating data, fewer row locks are required, resulting in higher performance.

It has the following disadvantages:

  • When DML with a large amount of data is involved, the row based log will generate a large amount of log data. The large amount of data means a longer time in log file writing and network transmission, which may lead to significant deterioration of the overall performance and concurrency problems.
  • The executed statements cannot be viewed through the log, and the statements executed from the node cannot be known.

In the actual architecture application, it is necessary to make rational use of the master-slave replication mechanism according to the business characteristics of the system, and select the appropriate master-slave replication format.

The above is the details of the comprehensive analysis of MySQL master-slave replication mechanism. For more information about MySQL master-slave replication mechanism, please pay attention to other relevant articles of developeppaer!