Necessary for Architects: principle and application of MySQL master-slave synchronization


In daily work, MySQL database is essential storage, in which read-write separation is basically standard configuration. Behind this, MySQL needs to turn on master-slave synchronization to form a master-slave or master-slave multi-slave architecture. Mastering the principle of master-slave synchronization and knowing how to apply it in practice are the necessary skills of an architect. The landlord will make a summary in this article. It’s enough to read this one.

1. Master slave synchronization principle

Master slave synchronization architecture diagram (asynchronous synchronization)

This is the most common master-slave synchronization architecture.
Necessary for Architects: principle and application of MySQL master-slave synchronization

Master slave synchronous process (asynchronous synchronization)

  1. The master database writes the data changes to the binlog file
  2. Initiate a dump request from the library I / O thread
  3. The I / O thread of the master library pushes binlog to the slave library
  4. Write the local relay log file from the library I / O thread (the same format as binlog)
  5. Read the relay log from the SQL thread of the library and execute it serially again to get the same data as the main library

What is binlog?

Every time the master database commits a transaction, it will record the data changes into a binary file, which is called binlog. Note: only write operations will be recorded to binlog, and read-only operations will not (such as select and show statements).

Binlog has three formats:

  • Statement format: binlog records the SQL statements actually executed
  • Row format: binlog records the data before and after the change (involving all columns), such as update table_ a set col1=value1, col2=value2 … where col1=condition1 and col2=condition2 …
  • Mixed format: statement format is selected by default, and row format is used only when necessary

Binlog format comparison

  • Statement level: the advantage is that the binlog file is small, but the disadvantage is that the slow SQL of the master database will appear again on the slave database, and some functions that depend on the environment or context may produce inconsistent data
  • Row level: the disadvantage is that the file is large (if a statement involves multiple lines, it will be enlarged by N times). The advantage is that there is no such slow SQL problem and does not depend on the environment or context
  • In order to obtain the change data before and after, canal recommends using the row level

Two ways of master-slave synchronization

  • Asynchronous synchronization: the default mode, which may cause data loss during master-slave switching. Because whether the master database commits or not has nothing to do with the master-slave synchronization process and is not perceived.
  • Semi synchronization: the high availability scheme, supported by the newer MySQL version, requires at least one slave Library (1 by default, and the specific number can be specified) to ack writing to the relay log before the master library commits and returns the result to the client.

Master slave synchronization process (semi synchronization)

  1. When the slave library connects to the master library, it indicates that it supports semi synchronous replication
  2. The master database also needs to support semi synchronous replication. Before committing a transaction, the master database will block and wait for at least one ack written to the relay log from the database until it times out
  3. If the blocking wait times out, the master database will temporarily switch back to the asynchronous synchronous mode. When at least one slave database catches up with the progress of semi synchronization, the master database will switch to the semi synchronous mode again

Semi synchronous applicable scenario

High availability backup: semi synchronous replication can ensure the consistency between the slave library and the master library. When the master library fails, switching to the slave library will not lose data. In order to ensure stability (not to drag down the main database due to slow semi synchronization), generally, it does not undertake business traffic and ACK as soon as possible, but only for synchronous backup.

2. Master slave synchronization application scenario

Common scenario: Online slave library asynchronous synchronization, high availability backup semi synchronization

Necessary for Architects: principle and application of MySQL master-slave synchronization

Big data access requirements with high consistency requirements

Big data fetching may lead to soaring CPU utilization and slower ack in the slave database. You can set the number of acks required for semi synchronization to 1. Under normal circumstances, the high availability backup can quickly ACK, so the master database will commit and return, and it doesn’t matter if the big data fetching replication is slower. In this way, the main database and business will not be affected due to the slow access ack of big data.
Necessary for Architects: principle and application of MySQL master-slave synchronization

Reference: MySQL official document