Clickhouse and his friends (9) real time replication and implementation of MySQL

Time:2021-7-9

In this article, I turn to Si DashenBohutang’s blog

Clickhouse and his friends (9) real time replication and implementation of MySQL

A lot of people think they are going to the wrong night show when they see the title, but they are not.

Clickhouse can be mounted as a slave Library of Mysql to synchronize MySQL data in real time in full amount and then in increments. This function can be said to be the most eye-catching and just needed function this year. Based on it, we can easily build a set of enterprise level solutions to make the integration of OLTP and OLAP no longer a headache.

Currently, it supports MySQL 5.6/5.7/8.0, and is compatible with delete / update statements and most commonly used DDL operations.
codeIt has been merged into the upstream master branch and is expected to be released as an experimental function in version 20.8.

After all, it is the integration of the two heterogeneous ecosystems. There is still a lot of work to be done. At the same time, we are looking forward to the feedback of community users to speed up the iteration.

Code acquisition

obtainclickhouse/masterCode can be compiled, method seeClickhouse and his friends (1) compile, develop, test

MySQL Master

We need a MySQL with binlog enabled as the master:

docker run -d -e MYSQL_ROOT_PASSWORD=123 mysql:5.7 mysqld --datadir=/var/lib/mysql --server-id=1 --log-bin=/var/lib/mysql/mysql-bin.log --gtid-mode=ON --enforce-gtid-consistency

Create databases and tables, and write data:

mysql> create database ckdb;
mysql> use ckdb;
mysql> create table t1(a int not null primary key, b int);
mysql> insert into t1 values(1,1),(2,2);
mysql> select * from t1;
+---+------+
| a | b    |
+---+------+
| 1 |    1 |
| 2 |    2 |
+---+------+
2 rows in set (0.00 sec)

ClickHouse Slave

At present, the database is used as the unit for replication, and different databases can come from different MySQL masters, so that multiple MySQL source data can be synchronized to a Clickhouse for OLAP analysis.

First, turn on the experience switch

clickhouse :) SET allow_experimental_database_materialize_mysql=1;

Create a replication channel:

clickhouse :) CREATE DATABASE ckdb ENGINE = MaterializeMySQL('172.17.0.2:3306', 'ckdb', 'root', '123');
clickhouse :) use ckdb;
clickhouse :) show tables;
┌─name─┐
│ t1   │
└──────┘
clickhouse :) select * from t1;
┌─a─┬─b─┐
│ 1 │ 1 │
└───┴───┘
┌─a─┬─b─┐
│ 2 │ 2 │
└───┴───┘

2 rows in set. Elapsed: 0.017 sec.

Look at the synchronization site of Clickhouse
cat ckdatas/metadata/ckdb/.metadata

Version:    1
Binlog File:    mysql-bin.000001
Binlog Position:    913
Data Version:    0

Delete

First, perform a delete operation on MySQL master

mysql> delete from t1 where a=1;
Query OK, 1 row affected (0.01 sec)

Then view the record on the Clickhouse slave side

clickhouse :) select * from t1;

SELECT *
FROM t1

┌─a─┬─b─┐
│ 2 │ 2 │
└───┴───┘

1 rows in set. Elapsed: 0.032 sec.

At this time, the data version in the metadata has been incremented to 2:1

cat ckdatas/metadata/ckdb/.metadata 
Version:    1
Binlog File:    mysql-bin.000001
Binlog Position:    1171
Data Version:    2

Update

MySQL Master:

mysql> select * from t1;
+---+------+
| a | b    |
+---+------+
| 2 |    2 |
+---+------+
1 row in set (0.00 sec)

mysql> update t1 set b=b+1;

mysql> select * from t1;
+---+------+
| a | b    |
+---+------+
| 2 |    3 |
+---+------+
1 row in set (0.00 sec)

ClickHouse Slave:

clickhouse :) select * from t1;

SELECT *
FROM t1

┌─a─┬─b─┐
│ 2 │ 3 │
└───┴───┘

1 rows in set. Elapsed: 0.023 sec.

performance testing

testing environment

MySQL 8c16g virtual machine, 192.168.0.3, basic data 10188183 records
Clickhouse 8c16g virtual machine, 192.168.0.4
Benchyou 8c8g virtual machine, 192.168.0.5, 256 concurrent write, https://github.com/xelabs/benchyou

The performance test has a great relationship with the hardware environment. Here we use the virtual machine mode, and the data is for reference.

Total performance

8c16G-vm :) create database sbtest engine=MaterializeMySQL('192.168.0.3:3306', 'sbtest', 'test', '123');

8c16G-vm :) watch lv1;

WATCH lv1

┌─count()─┬───────────────now()─┬─_version─┐
│       0 │ 2020-07-29 06:36:04 │        1 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 1113585 │ 2020-07-29 06:36:05 │        2 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 2227170 │ 2020-07-29 06:36:07 │        3 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 3340755 │ 2020-07-29 06:36:10 │        4 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 4454340 │ 2020-07-29 06:36:13 │        5 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 5567925 │ 2020-07-29 06:36:16 │        6 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 6681510 │ 2020-07-29 06:36:18 │        7 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 7795095 │ 2020-07-29 06:36:22 │        8 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│ 8908680 │ 2020-07-29 06:36:25 │        9 │
└─────────┴─────────────────────┴──────────┘
┌──count()─┬───────────────now()─┬─_version─┐
│ 10022265 │ 2020-07-29 06:36:28 │       10 │
└──────────┴─────────────────────┴──────────┘
┌──count()─┬───────────────now()─┬─_version─┐
│ 10188183 │ 2020-07-29 06:36:28 │       11 │
└──────────┴─────────────────────┴──────────┘
← Progress: 11.00 rows, 220.00 B (0.16 rows/s., 3.17 B/s.)

In this hardware environment, the performance of full synchronization is about 100%424507/s42wTransactions per second.
Because there is no dependency between the full amount of data, it can be further optimized to parallel and accelerate synchronization.
The full amount of performance directly determines the reconstruction speed of Clickhouse slave after it breaks down. If your MySQL has1 billionData, about40 minutesThen it can be rebuilt.

Incremental performance (real time synchronization)

Under the current configuration, the single thread playback consumption capacity of Clickhouse slave is greater than the concurrent production capacity of MySQL master 256. The test shows that they remain stableReal time synchronization

Pressure measurement data,2.1wTransaction / sec (MySQL does not go on TPS in the current environment)

./bin/benchyou --mysql-host=192.168.0.3 --mysql-user=test --mysql-password=123 --oltp-tables-count=1 --write-threads=256 --read-threads=0

time            thds               tps     wtps    rtps
[13s]        [r:0,w:256,u:0,d:0]  19962    19962   0    

time            thds               tps     wtps    rtps
[14s]        [r:0,w:256,u:0,d:0]  20415    20415   0 

time            thds               tps     wtps    rtps
[15s]        [r:0,w:256,u:0,d:0]  21131    21131   0

time            thds               tps     wtps    rtps
[16s]        [r:0,w:256,u:0,d:0]  21606    21606   0

time            thds               tps     wtps    rtps
[17s]        [r:0,w:256,u:0,d:0]  22505    22505   0

Click house side single thread playback capability,2.1wTransaction / sec, real time synchronization:

┌─count()─┬───────────────now()─┬─_version─┐
│  150732 │ 2020-07-30 05:17:15 │       17 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│  155477 │ 2020-07-30 05:17:16 │       18 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│  160222 │ 2020-07-30 05:17:16 │       19 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│  164967 │ 2020-07-30 05:17:16 │       20 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│  169712 │ 2020-07-30 05:17:16 │       21 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│  174457 │ 2020-07-30 05:17:16 │       22 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│  179202 │ 2020-07-30 05:17:17 │       23 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│  183947 │ 2020-07-30 05:17:17 │       24 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│  188692 │ 2020-07-30 05:17:17 │       25 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│  193437 │ 2020-07-30 05:17:17 │       26 │
└─────────┴─────────────────────┴──────────┘
┌─count()─┬───────────────now()─┬─_version─┐
│  198182 │ 2020-07-30 05:17:17 │       27 │
└─────────┴─────────────────────┴──────────┘

Realization mechanism

Before discussing the mechanism, you need to understand the binlog event of MySQL, which mainly includes the following types:

1. MYSQL_QUERY_EVENT    -- DDL
2. MYSQL_ WRITE_ ROWS_ Event -- insert data
3. MYSQL_ UPDATE_ ROWS_ Event -- update data
4. MYSQL_ DELETE_ ROWS_ Event -- delete data

When a transaction is submitted, MySQL will process the executed SQL into the corresponding binlog event and persist it to the binlog file.

Binlog is an important way for MySQL to export. As long as you implement MySQL replication protocol, you can stream consume binlog events produced by mysql. See theReplication Protocol

Due to historical reasons, the agreement is complicated and weird, which is not the focus of this paper.

For Clickhouse to consume MySQL binlog, there are three main difficulties:

  • DDL compatible
  • Delete / update support
  • Query filtering

DDL

DDL compatibility takes a lot of code to implement.

First, let’s see what MySQL tables will look like after they are copied to Clickhouse.

MySQL master:

mysql> show create table t1\G;
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `a` int(11) NOT NULL,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

ClickHouse slave:

ATTACH TABLE t1
(
    `a` Int32,
    `b` Nullable(Int32),
    `_sign` Int8,
    `_version` UInt64
)
ENGINE = ReplacingMergeTree(_version)
PARTITION BY intDiv(a, 4294967)
ORDER BY tuple(a)
SETTINGS index_granularity = 8192

We can see that:

  • Two hidden fields are added by default:_ Sign (- 1 delete, 1 write) and_ Version (data version)
  • The engine is converted to replacing merge tree to_ Version as column version
  • The original primary key field a serves as the sort and partition key

This is just a table copy. There are many other DDL processes, such as adding columns and indexes. If you are interested, you can see the code in parsers / MySQL.

Update and delete

When we execute in MySQL master:

mysql> delete from t1 where a=1;
mysql> update t1 set b=b+1;

Click house T1 data_ Sign and_ Version:

clickhouse :) select a,b,_sign, _version from t1;

SELECT 
    a,
    b,
    _sign,
    _version
FROM t1

┌─a─┬─b─┬─_sign─┬─_version─┐
│ 1 │ 1 │     1 │        1 │
│ 2 │ 2 │     1 │        1 │
└───┴───┴───────┴──────────┘
┌─a─┬─b─┬─_sign─┬─_version─┐
│ 1 │ 1 │    -1 │        2 │
└───┴───┴───────┴──────────┘
┌─a─┬─b─┬─_sign─┬─_version─┐
│ 2 │ 3 │     1 │        3 │
└───┴───┴───────┴──────────┘

According to the returned results, we can see that it is composed of three parts.

Part1 bymysql> insert into t1 values(1,1),(2,2)Generation:

┌─a─┬─b─┬─_sign─┬─_version─┐
│ 1 │ 1 │     1 │        1 │
│ 2 │ 2 │     1 │        1 │
└───┴───┴───────┴──────────┘

Part2 bymysql> delete from t1 where a=1Generation:

┌─a─┬─b─┬─_sign─┬─_version─┐
│ 1 │ 1 │    -1 │        2 │
└───┴───┴───────┴──────────┘
explain:
_ Sign = - 1 indicates deletion

Part 3 byupdate t1 set b=b+1Generation:

┌─a─┬─b─┬─_sign─┬─_version─┐
│ 2 │ 3 │     1 │        3 │
└───┴───┴───────┴──────────┘

Use final query:

clickhouse :) select a,b,_sign,_version from t1 final;

SELECT 
    a,
    b,
    _sign,
    _version
FROM t1
FINAL

┌─a─┬─b─┬─_sign─┬─_version─┐
│ 1 │ 1 │    -1 │        2 │
└───┴───┴───────┴──────────┘
┌─a─┬─b─┬─_sign─┬─_version─┐
│ 2 │ 3 │     1 │        3 │
└───┴───┴───────┴──────────┘

2 rows in set. Elapsed: 0.016 sec.

You can see that replaceingmergetree has been replaced according to the_ Version and orderby de duplicate records.

Query

MySQL master:

mysql> select * from t1;
+---+------+
| a | b    |
+---+------+
| 2 |    3 |
+---+------+
1 row in set (0.00 sec)

ClickHouse slave:

clickhouse :) select * from t1;

SELECT *
FROM t1

┌─a─┬─b─┐
│ 2 │ 3 │
└───┴───┘

clickhouse :) select *,_sign,_version from t1;

SELECT 
    *,
    _sign,
    _version
FROM t1

┌─a─┬─b─┬─_sign─┬─_version─┐
│ 1 │ 1 │    -1 │        2 │
│ 2 │ 3 │     1 │        3 │
└───┴───┴───────┴──────────┘
Note: there is also a deletion record_ Sign is - 1

Materialize MySQL is defined as a storage engine, so when reading, it will be based on the_ Sign status. If it is – 1, it is deleted and filtered.

Parallel playback

Why does MySQL need parallel playback?

Suppose that MySQL master has 1024 concurrent data written and updated at the same time, and a large number of binlog events are generated instantaneously. There is only one thread on MySQL slave, and one event is replayed one after another, so MySQL implements the parallel playback function!

Then, can MySQL slave completely (or nearly) simulate the concurrent behavior of the master at that time?

In order to parallel, the first thing to solve is the dependency problem: we need the master to mark which events can be paralleled and which events have a sequence relationship, because it is the first scene.

MySQL by adding in binlog:

  • last_ The same can be done in parallel
  • sequece_ Number, execute first, and describe the sequence of dependencies
last_committed=3   sequece_number=4   -- event1
last_committed=4   sequece_number=5   -- event2
last_committed=4   sequece_number=6   -- event3
last_committed=5   sequece_number=7   -- event4

Event 2 and event 3 can be played in parallel, while event 4 needs to wait for the previous event to complete before it can be played back.
The above is just a general principle. At present, MySQL has three parallel modes to choose from:

  1. Database based parallel computing
  2. Parallel computing based on group commit
  3. Write set parallel algorithm based on primary key non conflict

To maximize the speed of MySQL slave playback, the whole mechanism is extremely complex.

Back to the problem of Clickhouse slave, the delay is no longer the main problem in our single thread playback, which is determined by their mechanism
When playing back MySQL slave, you need to convert binlog event into SQL, and then simulate the writing of master. This kind of logical replication is the most important reason for low performance.
In playback, Clickhouse directly converts binlog events into the underlying block structure, and then writes them directly to the underlying storage engine, which is close to physical replication. It can be understood as playing back binlog events directly to the page of InnoDB.

Read latest

Although the playback of the Clickhouse slide is very fast and close to real-time, how to always read the latest data on the Clickhouse slide?

In fact, it’s very simple. With the help of MySQL binlog gtid feature, we execute with the master every time we read_ Gtid synchronization, and then wait for these executed_ Gtid playback is complete.

Data consistency

How can we verify the consistency of MySQL master data and Clickhouse slave data for scenarios with high consistency requirements?

This preliminary idea is to provide a function compatible with MySQL checksum algorithm. We just need to compare the checksum values on both sides.

summary

Clickhouse real-time replication and synchronization of MySQL data is a roadmap of upstream 2020. It is challenging in overall architecture, and no one has ever received orders. The challenges mainly come from two aspects

  • Familiar with MySQL replication channel and protocol
  • Very familiar with the overall mechanism of Clickhouse

In this way, a high-speed highway was built between two hills which were a little far away10851The expressway is jointly constructed by two road builders, zhang1024 (click house side) and bohutang (MySQL replication), and has been merged into the upstream branch.

As for the data synchronization of MySQL, the current solution is basically to install a binlog consumption tool in the middle. This tool parses the event, and then converts it into the SQL statement of Clickhouse and writes it to the Clickhouse server. The link is long and the performance loss is large.

10851High speed is to implement a set of binlog consumption scheme in Clickhouse, and then parse it into the internal block structure of Clickhouse according to the event, and then directly write back to the underlying storage engine. It is almost the most efficient way to achieve real-time synchronization with MySQL, making the analysis closer to reality.

Based on the database level replication, the multi-source replication function is realized. If the replication channel is broken, we just need to delete the database on the Clickhouse side and rebuild it again, which is very fast and convenient. OLTP + OLAP is so simple!

If you want to be rich, build roads first!

special column:

Welcome to my WeChat official account [MySQL database technology].
Clickhouse and his friends (9) real time replication and implementation of MySQL

Recommended Today

The selector returned by ngrx store createselector performs one-step debugging of fetching logic

Test source code: import { Component } from ‘@angular/core’; import { createSelector } from ‘@ngrx/store’; export interface State { counter1: number; counter2: number; } export const selectCounter1 = (state: State) => state.counter1; export const selectCounter2 = (state: State) => state.counter2; export const selectTotal = createSelector( selectCounter1, selectCounter2, (counter1, counter2) => counter1 + counter2 ); // […]