Data synchronization tool debezium

Time:2022-1-13

Data synchronization tool debezium

brief introduction

Debezium is an open source distributed platform for capturing change data. It can respond to all insert, update and delete operations of the database. Debezium depends on Kafka, so you need to install debezium in advance
Install zookeeper, Kafka and kakfa connect.

Different from canal and Maxwell
1. Canal and Maxwell need to be in my The CNF configuration file registers the libraries that need to be synchronized. Devezium does not need to declare. When registering the synchronization link, it grabs the binlog of the person to whom it is issued for synchronization. It only needs the permission in place to work;
2. Canal metadata is managed by itself, Maxwell metadata is maintained by external mysql, and debezium metadata is maintained by Kafka;
3. If an SQL operation in canal has multiple lines, binlog is collected in the form of a collection and encapsulated into JSON, and all fields are in string form, so type information must be included. Maxwell an SQL operation has n lines,
There are n rows of binlog JSON corresponding. The data type is the original data type and there is no type information, so it is relatively light. Each affected row of debezium corresponds to a binlog JSON with data type and other additional schema information,
The data type is the original type, which is heavy in general.
4. Both canal and Maxwell synchronize binlog in the unit of library, so binlog is reported in a mixed manner, debezium synchronizes data in the unit of table, and a table corresponds to a topic;
5. Flex CDC adds integrated debezium, but in actual use, it often rewrites the schema parser in Maxwell style.

Kafka Connect

Kafka connect is a tool for scalable and reliable data streaming between Apache Kafka and other systems. The connector can easily import or export a large amount of data.
Kafka connect currently supports two modes, standard and distributed. Standalone is mainly used for introductory testing, so let's implement the distributed pattern.
Official website address: https://kafka.apache.org/documentation.html#connect
      
Distributed, distributed mode can automatically balance in processing work, allow dynamic expansion or reduction, and provide fault tolerance in active tasks and configuration and offset submission data. Very similar to the standalone model,
The biggest difference lies in the class and configuration parameters started. The parameters determine how Kafka connect processes allocate work if they store offsets. In the distributed mode, Kafka connect stores offsets, configuration and task status
Stored in topic. It is recommended to manually create topic and specify the number of partitions, or you can automatically create topic through configuration file parameters.

Parameter configuration:
group. ID is the unique name of the connect cluster cluster by default, and cannot be duplicate. It is used to form a connect cluster group
config. storage. Topic is used to store Kafka connector and task configuration information.
offset. storage. Topic the topic used to store the offset.
status. storage. Topic the topic used to store the status.

to configure

cd /opt/softwares/kafka_2.11-0.11.0.0
vim connect-distributed.properties
-----------------------------------------------------------
bootstrap.servers=hadoop01:9092,hadoop02:9092,hadoop03:9092
-----------------------------------------------------------

Installing the debezium plug-in

cd /opt/softwares/kafka_2.11-0.11.0.0
mkdir plugins

tar -zxvf debezium-connector-mysql-1.2.0.Final-plugin.tar.gz -C /opt/softwares/kafka_2.11-0.11.0.0/plugins

vi config/connect-distributed-mysql.properties 
-----------------------------------------------------------
plugin.path=/opt/softwares/kafka_2.11-0.11.0.0/plugins
-----------------------------------------------------------

distribute

cd /opt/softwares/kafka_2.11-0.11.0.0
xsync plugins
xsync config/connect-distributed-mysql.properties 

Auxiliary script

  • Start connector script
vi  bin/kafkaConnectorStart
-----------------------------------------------------------
#!/bin/bash

curr_dir=`pwd`

KAFKA_HOME=/opt/softwares/kafka_2.11-0.11.0.0
ssh [email protected] "$KAFKA_HOME/bin/connect-distributed.sh -daemon $KAFKA_HOME/config/connect-distributed.properties"
ssh [email protected] "$KAFKA_HOME/bin/connect-distributed.sh -daemon $KAFKA_HOME/config/connect-distributed.properties"
ssh [email protected] "$KAFKA_HOME/bin/connect-distributed.sh -daemon $KAFKA_HOME/config/connect-distributed.properties"

xcall jps

cd $curr_dir
-----------------------------------------------------------
  • Stop connector script
vi  bin/kafkaConnectorStop
-----------------------------------------------------------
#!/bin/bash

curr_dir=`pwd`

KAFKA_HOME=/opt/softwares/kafka_2.11-0.11.0.0

ssh [email protected] "ps -ef | grep ConnectDistributed | grep -v grep | awk -F ' ' '{print $2}' | xargs kill -s 9"
ssh [email protected] "ps -ef | grep ConnectDistributed | grep -v grep | awk -F ' ' '{print $2}' | xargs kill -s 9"
ssh [email protected] "ps -ef | grep ConnectDistributed | grep -v grep | awk -F ' ' '{print $2}' | xargs kill -s 9"

xcall jps

cd $curr_dir
-----------------------------------------------------------
  • Kafka management script
vi  bin/kafka
-----------------------------------------------------------
#!/bin/bash

old=`pwd`
new=$KAFKA_HOME
cd $new
zks='hadoop01:2181,hadoop02:2181,hadoop03:2181'
brks='hadoop01:9092,hadoop02:9092,hadoop03:9092'

case $1 in
    list)
        echo "kafka-topics.sh --zookeeper $zks --list"
        kafka-topics.sh --zookeeper $zks --list
        ;;
    create)
        echo "kafka-topics.sh --zookeeper $zks --create --topic $2 --config message.timestamp.type=LogAppendTime --replication-factor $3 --partitions $4"
        kafka-topics.sh --zookeeper $zks --create --topic $2 --config message.timestamp.type=LogAppendTime --replication-factor $3 --partitions $4
        ;;
    delete)
        echo "kafka-topics.sh --zookeeper $zks --delete --topic $2"
        kafka-topics.sh --zookeeper $zks --delete --topic $2
        ;;
    produce)
        echo "kafka-console-producer.sh --broker-list $brks --topic $2"
        kafka-console-producer.sh --broker-list $brks --topic $2
        ;;
    consume)
        echo "kafka-console-consumer.sh --zookeeper $zks --from-beginning --topic $2"
        kafka-console-consumer.sh --zookeeper $zks --from-beginning --topic $2
        ;;
    group_consume)
        echo "kafka-console-consumer.sh --zookeeper $zks --topic $2"
        kafka-console-consumer.sh --zookeeper $zks --from-beginning --topic $2 --consumer.config config/consumer.properties
        ;;
    delete_group)
        echo "kafka-consumer-groups.sh --zookeeper $zks --delete --group $2"
              kafka-consumer-groups.sh --zookeeper $zks --delete --group $2
        ;;
    delete_topic_from_group)
        echo "kafka-consumer-groups.sh --zookeeper $zks --delete --group $2 --topic $3"
              kafka-consumer-groups.sh --zookeeper $zks --delete --group $2 --topic $3
        ;;
    desc)
        echo "kafka-topics.sh --zookeeper $zks --describe --topic $2"
        kafka-topics.sh --zookeeper $zks --describe --topic $2
        echo "largest: kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list $brks --topic $2 --time -1"
                kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list $brks --topic $2 --time -1
        echo "earliest: kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list $brks --topic $2 --time -2"
                kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list $brks --topic $2 --time -2
        ;;
    produce_test)
       echo "kafka-producer-perf-test.sh --topic $2 --throughput -1 --record-size 10 --num-records $3 --producer-props bootstrap.servers=$brks"
             kafka-producer-perf-test.sh --topic $2 --throughput -1 --record-size 10 --num-records $3 --producer-props bootstrap.servers=$brks
        ;;
    *)
       echo "kafka (list | create | delete | produce | produce_test | consume | group_consume | delete_group | delete_topic_from_group  | desc | *)"
esac

cd $old

exit 0
-----------------------------------------------------------
  • Kafka startup script
vi bin/kafkaStart
-----------------------------------------------------------
#!/bin/bash

curr_dir=`pwd`

KAFKA_HOME=/opt/softwares/kafka_2.11-0.11.0.0
ssh [email protected] "$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties > /dev/null 2>&1 &"
ssh [email protected] "$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties > /dev/null 2>&1 &"
ssh [email protected] "$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties > /dev/null 2>&1 &"

xcall jps

cd $curr_dir
-----------------------------------------------------------
  • Kafka shutdown script
vi bin/kafkaStop
-----------------------------------------------------------
#!/bin/bash

curr_dir=`pwd`

KAFKA_HOME=/opt/softwares/kafka_2.11-0.11.0.0

ssh [email protected] "$KAFKA_HOME/bin/kafka-server-stop.sh stop > /dev/null 2>&1 &"
ssh [email protected] "$KAFKA_HOME/bin/kafka-server-stop.sh stop > /dev/null 2>&1 &"
ssh [email protected] "$KAFKA_HOME/bin/kafka-server-stop.sh stop > /dev/null 2>&1 &"

xcall jps

cd $curr_dir
-----------------------------------------------------------

MySQL binlog configuration

vi /etc/my.cnf
-----------------------------------------------------------

server-id= 1
log-bin=mysql-bin
#Full full field, minimum only change field
binlog_row_image=full
binlog_format=row
binlog-do-db=test
-----------------------------------------------------------

#Restart
sudo service mysqld restart

#Authorization
CREATE USER 'debezium'@'%' IDENTIFIED BY 'debezium';
GRANT ALL PRIVILEGES ON  *.* to "debezium"@"%" IDENTIFIED BY "debezium";
FLUSH PRIVILEGES;

Start test

zkStart

kafkaStart

kafkaConnectorStart
  • New topic
kafka list | grep connect
connect-configs
connect-offsets
connect-status

Register synchronization link

#Request
curl -H "Content-Type: application/json" -X POST -d  '{
      "name" : "inventory-connector",
      "config" : {
          "connector.class" : "io.debezium.connector.mysql.MySqlConnector",
         "database.hostname" : "hadoop01",
          "database.port" : "3306",
          "database.user" : "debezium",
          "database.password" : "debezium",
          "database.server.id" : "1",
          "database.server.name" : "debezium",
          "database.whitelist" : "debezium_test",
          "database.history.kafka.bootstrap.servers":"hadoop01:9092,hadoop02:9092,hadoop03:9092",
          "database.history.kafka.topic":"debezium_binlog",
          "include.schema.change":"true"
      }
  }' http://hadoop01:8083/connectors
  
Note:
1.database. server. Name is the logical name corresponding to the database, database Whitelist is the name of the database to be synchronized (unlike canal and Maxwell, there is no need to register the table to be synchronized in my.cnf. Whoever is specified here can actively grab whose binlog);
2. Detailed configuration reference https://debezium.io/documentation/reference/1.2/connectors/mysql.html  。
  
#Respond
{"name":"inventory-connector","config":{"connector.class":"io.debezium.connector.mysql.MySqlConnector","database.hostname":"hadoop01","database.port":"3306","database.user":"debezium","database.password":"debezium","database.server.id":"1","database.server.name":"debezium","database.whitelist":"debezium_test","database.history.kafka.bootstrap.servers":"hadoop01:9092,hadoop02:9092,hadoop03:9092","database.history.kafka.topic":"debezium_binlog","include.schema.change":"true","name":"inventory-connector"},"tasks":[{"connector":"inventory-connector","task":0}],"type":"source"}```

#Check whether the synchronization configuration is effective
curl -XGET  http://hadoop01:8083/connectors
["inventory-connector"]

#View synchronization configuration information
curl -XGET  http://hadoop01:8083/connectors/inventory-connector

#View the running status of synchronization configuration
curl -XGET  http://hadoop01:8083/connectors/inventory-connector/status
response
{"name":"inventory-connector","connector":{"state":"RUNNING","worker_id":"192.168.152.103:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"192.168.152.103:8083"}],"type":"source"}[[email protected] ~]$

#View new topic
kafka list
debezium  << database.server.name
debezium_binlog << database.history.kafka.topic

Demo binlog

use debezium_test;

create table student(id int(3) primary key auto_increment,name varchar(50),score int(3));

insert into student(name,score) values('a1',98);

  • Find the latest binlog file (in / var / lib / MySQL / directory by default)
mysql> show binary logs;
+------------------+------------+
| Log_name         | File_size  |
+------------------+------------+
| mysql-bin.000001 |        732 |
| mysql-bin.000002 |        177 |
| mysql-bin.000003 |   80873886 |
| mysql-bin.000004 | 1080773872 |
| mysql-bin.000005 |   96386423 |
| mysql-bin.000006 |    3051102 |
| mysql-bin.000007 |  214750185 |
| mysql-bin.000008 |        969 |
| mysql-bin.000009 |        177 |
| mysql-bin.000010 | 1102045188 |
| mysql-bin.000011 | 1116717194 |
| mysql-bin.000012 |  793675797 |
| mysql-bin.000013 |       3041 |
| mysql-bin.000014 |        177 |
| mysql-bin.000015 |        177 |
| mysql-bin.000016 |        177 |
| mysql-bin.000017 |        177 |
| mysql-bin.000018 |        401 |
| mysql-bin.000019 |        177 |
| mysql-bin.000020 |        177 |
| mysql-bin.000021 |        177 |
| mysql-bin.000022 |        177 |
| mysql-bin.000023 |        177 |
| mysql-bin.000024 |       2079 |
| mysql-bin.000025 |        177 |
| mysql-bin.000026 |        399 | <<<<
+------------------+------------+
  • Find the binlog file currently being written
mysql> show master status\G;
*************************** 1. row ***************************
             File: mysql-bin.000026  <<<
         Position: 686
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)

ERROR:
No query specified
  • Find MySQL bin Binlog event in 000026
mysql> show binlog events in 'mysql-bin.000026';
+------------------+-----+----------------+-----------+-------------+---------------------------------------------------------------------------------------------------------------+
| Log_name         | Pos | Event_type     | Server_id | End_log_pos | Info                                                                                                          |
+------------------+-----+----------------+-----------+-------------+---------------------------------------------------------------------------------------------------------------+
| mysql-bin.000026 |   4 | Format_desc    |         1 |         123 | Server ver: 5.7.18-log, Binlog ver: 4                                                                         |
| mysql-bin.000026 | 123 | Previous_gtids |         1 |         154 |                                                                                                               |
| mysql-bin.000026 | 154 | Anonymous_Gtid |         1 |         219 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS'                                                                          |
| mysql-bin.000026 | 219 | Query          |         1 |         399 | use `debezium_test`; create table student(id int(3) primary key auto_increment,name varchar(50),score int(3)) |
| mysql-bin.000026 | 399 | Anonymous_Gtid |         1 |         464 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS'                                                                          |
| mysql-bin.000026 | 464 | Query          |         1 |         545 | BEGIN                                                                                                         |
| mysql-bin.000026 | 545 | Table_map      |         1 |         608 | table_id: 881 (debezium_test.student)                                                                         |
| mysql-bin. 000026 | 608 | Write_ rows     |         1 |         655 | table_ id: 881 flags: STMT_ END_ F < < write data
| mysql-bin.000026 | 655 | Xid            |         1 |         686 | COMMIT /* xid=649 */                                                                                          |
+------------------+-----+----------------+-----------+-------------+---------------------------------------------------------------------------------------------------------------+
  • View binlog
sudo mysqlbinlog --no-defaults -v -v --base64-output=DECODE-ROWS /var/lib/mysql/mysql-bin.000026 | tail -n 20

SET TIMESTAMP=1622268842/*!*/;
BEGIN
/*!*/;
# at 545
#210529 14:14:02 server id 1  end_log_pos 608 CRC32 0xd1a01157  Table_map: `debezium_test`.`student` mapped to number 881
# at 608
#210529 14:14:02 server id 1  end_log_pos 655 CRC32 0xa82e02e2  Write_rows: table id 881 flags: STMT_END_F
### INSERT INTO `debezium_test`.`student`   <<<<<
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2='a1' /* VARSTRING(50) meta=50 nullable=1 is_null=0 */
###   @3=98 /* INT meta=0 nullable=1 is_null=0 */
# at 655   <<<<<
#210529 14:14:02 server id 1  end_log_pos 686 CRC32 0x7423ae77  Xid = 649
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET [email protected]_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
  • Modify record
update student set score=50 where id=1;
  • Full style binlog (with all columns)
sudo mysqlbinlog --no-defaults -v -v --base64-output=DECODE-ROWS /var/lib/mysql/mysql-bin.000026 | tail -n 20

#210529 14:26:18 server id 1  end_log_pos 895 CRC32 0xf1883ab9  Table_map: `debezium_test`.`student` mapped to number 881
# at 895
#210529 14:26:18 server id 1  end_log_pos 955 CRC32 0x91434f23  Update_rows: table id 881 flags: STMT_END_F
### UPDATE `debezium_test`.`student`  <<<<<
### WHERE
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2='a1' /* VARSTRING(50) meta=50 nullable=1 is_null=0 */
###   @3=98 /* INT meta=0 nullable=1 is_null=0 */
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2='a1' /* VARSTRING(50) meta=50 nullable=1 is_null=0 */
###   @3=50 /* INT meta=0 nullable=1 is_null=0 */
# at 955 <<<<<
#210529 14:26:18 server id 1  end_log_pos 986 CRC32 0xaf5c23b5  Xid = 653
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET [email protected]_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
  • Minimal style binlog (only modified columns are brought)
set  binlog_row_image ='minimal';

update student set score=55 where id=1;

sudo mysqlbinlog --no-defaults -v -v --base64-output=DECODE-ROWS /var/lib/mysql/mysql-bin.000026 | tail -n 20

SET TIMESTAMP=1622269694/*!*/;
BEGIN
/*!*/;
# at 1132
#210529 14:28:14 server id 1  end_log_pos 1195 CRC32 0x70a769fd     Table_map: `debezium_test`.`student` mapped to number 881
# at 1195
#210529 14:28:14 server id 1  end_log_pos 1241 CRC32 0x79a7963b     Update_rows: table id 881 flags: STMT_END_F
### UPDATE `debezium_test`.`student`  <<<<<
### WHERE
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
### SET
###   @3=55 /* INT meta=0 nullable=1 is_null=0 */
# at 1241  <<<<<
#210529 14:28:14 server id 1  end_log_pos 1272 CRC32 0x9d7fa34a     Xid = 655
COMMIT/*!*/;
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET [email protected]_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

New topic

kafka list
debezium.debezium_test.student

Data in Kafka

kafka consume debezium.debezium_test.student
Both schema and data are very important
{
    "schema":{
        "type":"struct",
        "fields":[
            {
                "type":"struct",
                "fields":[
                    {
                        "type":"int32",
                        "optional":false,
                        "field":"id"
                    },
                    {
                        "type":"string",
                        "optional":true,
                        "field":"name"
                    },
                    {
                        "type":"int32",
                        "optional":true,
                        "field":"score"
                    }
                ],
                "optional":true,
                "name":"debezium.debezium_test.student.Value",
                "field":"before"
            },
            {
                "type":"struct",
                "fields":[
                    {
                        "type":"int32",
                        "optional":false,
                        "field":"id"
                    },
                    {
                        "type":"string",
                        "optional":true,
                        "field":"name"
                    },
                    {
                        "type":"int32",
                        "optional":true,
                        "field":"score"
                    }
                ],
                "optional":true,
                "name":"debezium.debezium_test.student.Value",
                "field":"after"
            },
            {
                "type":"struct",
                "fields":[
                    {
                        "type":"string",
                        "optional":false,
                        "field":"version"
                    },
                    {
                        "type":"string",
                        "optional":false,
                        "field":"connector"
                    },
                    {
                        "type":"string",
                        "optional":false,
                        "field":"name"
                    },
                    {
                        "type":"int64",
                        "optional":false,
                        "field":"ts_ms"
                    },
                    {
                        "type":"string",
                        "optional":true,
                        "name":"io.debezium.data.Enum",
                        "version":1,
                        "parameters":{
                            "allowed":"true,last,false"
                        },
                        "default":"false",
                        "field":"snapshot"
                    },
                    {
                        "type":"string",
                        "optional":false,
                        "field":"db"
                    },
                    {
                        "type":"string",
                        "optional":true,
                        "field":"table"
                    },
                    {
                        "type":"int64",
                        "optional":false,
                        "field":"server_id"
                    },
                    {
                        "type":"string",
                        "optional":true,
                        "field":"gtid"
                    },
                    {
                        "type":"string",
                        "optional":false,
                        "field":"file"
                    },
                    {
                        "type":"int64",
                        "optional":false,
                        "field":"pos"
                    },
                    {
                        "type":"int32",
                        "optional":false,
                        "field":"row"
                    },
                    {
                        "type":"int64",
                        "optional":true,
                        "field":"thread"
                    },
                    {
                        "type":"string",
                        "optional":true,
                        "field":"query"
                    }
                ],
                "optional":false,
                "name":"io.debezium.connector.mysql.Source",
                "field":"source"
            },
            {
                "type":"string",
                "optional":false,
                "field":"op"
            },
            {
                "type":"int64",
                "optional":true,
                "field":"ts_ms"
            },
            {
                "type":"struct",
                "fields":[
                    {
                        "type":"string",
                        "optional":false,
                        "field":"id"
                    },
                    {
                        "type":"int64",
                        "optional":false,
                        "field":"total_order"
                    },
                    {
                        "type":"int64",
                        "optional":false,
                        "field":"data_collection_order"
                    }
                ],
                "optional":true,
                "field":"transaction"
            }
        ],
        "optional":false,
        "name":"debezium.debezium_test.student.Envelope"
    },
    "payload":{
        "before":null,
        "after":{
            "id":1,
            "name":"a1",
            "score":98
        },
        "source":{
            "version":"1.2.0.Final",
            "connector":"mysql",
            "name":"debezium",
            "ts_ms":1622268842000,
            "snapshot":"false",
            "db":"debezium_test",
            "table":"student",
            "server_id":1,
            "gtid":null,
            "file":"mysql-bin.000026",
            "pos":608,
            "row":0,
            "thread":9,
            "query":null
        },
        "op":"c",
        "ts_ms":1622268842853,
        "transaction":null
    }
}

Delete synchronization link

curl -X DELETE http://hadoop01:8083/connectors/inventory-connector

Recommended Today

Sqlserver distributed transaction usage instance

Copy codeThe code is as follows: –BEGIN DISTRIBUTED TRANSACTION [transactionname]–Marks the beginning of a TSQL distributed transaction managed by the distributed transaction coordinator MSDTC–Server a server is the master server. When the connection issues a subsequent commit transaction or–When the rollback transaction statement, the master server requests MSDTC to manage the involved servers–Completion of distributed […]