Real time incremental data synchronization based on canal and Flink (1)

Time:2021-1-22

Canal is an open source project of Alibaba, which is developed in pure Java. Based on database incremental log parsing, it provides incremental data subscription & consumption, and currently mainly supports MySQL (also supports MariaDB).

get ready

Common binlog commands

#Enable binlog logging
show variables like 'log_bin';
#View binlog type
show global variables like 'binlog_format';
#View detailed log configuration information
show global variables like '%log%';
#MySQL data storage directory
show variables like '%dir%';
#View the directory of binlog
show global variables like "%log_bin%";
#View the biglog file and size used by the current server
show binary logs;
#Check the name and position of the latest binlog log file
show master status;

Configure MySQL binlog

For self built mysql, you need to start the binlog write function first, and configure binlog format to row mode, my.cnf The configuration is as follows

[mysqld]
Log bin = MySQL bin # open binlog
Binlog format = row # select row mode
server_ Id = 1 # configuration of MySQL replacement needs to be defined, and it should not be repeated with canal's slaveid

to grant authorization

Authorize canal to link to MySQL account, which has the permission of MySQL slave. If you have an account, you can grant it directly

CREATE USER canal IDENTIFIED BY 'canal';  
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
FLUSH PRIVILEGES;

Deploy canal

Install canal

[[email protected] softwares]$ tar -xzvf canal.deployer-1.1.4.tar.gz  -C /opt/modules/canal/
  • directory structure
drwxr-xr-x 2 root root 4096 Mar  5 14:19 bin
drwxr-xr-x 5 root root 4096 Mar  5 13:54 conf
drwxr-xr-x 2 root root 4096 Mar  5 13:04 lib
drwxrwxrwx 4 root root 4096 Mar  5 14:19 logs

Configuration modification

  • Modify conf / example/ instance.properties , which is amended as follows:
## mysql serverId
canal.instance.mysql.slaveId = 1234
#Position info, you need to change it to your own database information
canal.instance.master.address = kms-1.apache.com:3306 
#User name / password, you need to change it to your own database information
canal.instance.dbUsername = canal  
canal.instance.dbPassword = canal
#MQ config, Kafka topic name
canal.mq.topic=test
  • Modify conf/ canal.properties , which is amended as follows:
#Configure zookeeper address
canal.zkServers =kms-2:2181,kms-3:2181,kms-4:2181
#Options: TCP (default), Kafka, rocketmq,
canal.serverMode = kafka
#Configure Kafka address
canal.mq.servers = kms-2:9092,kms-3:9092,kms-4:9092

Start canal

sh bin/startup.sh

Close canal

sh bin/stop.sh

Deploy canal admin (optional)

Canal admin is designed to provide overall configuration management, node operation and maintenance and other operation oriented functions for canal, and provide a relatively friendly Web UI operation interface to facilitate more users to operate quickly and safely.

requirement

Qualified dependency of canal admin:

  • MySQL is used to store configuration and node data
  • Canal version, requirement > = 1.1.4 (need to rely on canal server to provide dynamic operation and maintenance management interface for admin)

Install canal admin

[[email protected] softwares]$ tar -xzvf canal.admin-1.1.4.tar.gz  -C /opt/modules/canal-admin/
  • directory structure
drwxrwxr-x 2 kms kms 4096 Mar  6 11:25 bin
drwxrwxr-x 3 kms kms 4096 Mar  6 11:25 conf
drwxrwxr-x 2 kms kms 4096 Mar  6 11:25 lib
drwxrwxr-x 2 kms kms 4096 Sep  2  2019 logs
  • Configuration modification
vi conf/application.yml
server:
  port: 8089
spring:
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8

spring.datasource:
  address: kms-1:3306
  database: canal_manager
  username: canal
  password: canal
  driver-class-name: com.mysql.jdbc.Driver
  url: jdbc:mysql://${spring.datasource.address}/${spring.datasource.database}?useUnicode=true&characterEncoding=UTF-8&useSSL=false
  hikari:
    maximum-pool-size: 30
    minimum-idle: 1

canal:
  adminUser: admin
  adminPasswd: admin
  • Initialize original database
mysql -uroot -p
#Import initialization SQL
#Note: (1) canal will be created by default in the initialization SQL script_ Manager database, it is recommended to use root and other super privilege account for initialization 
#    (2)canal_ manager.sql By default, it will be in the conf directory
> mysql> source /opt/modules/canal-admin/conf/canal_manager.sql
  • Start canal admin
sh bin/startup.sh
  • visit

Can passhttp://kms-1: 8089 / access, default password: admin / 123456

  • Canal server configuration

Using canal_ local.properties Configuration overlay for canal.properties , configure the following configuration content in canal_ local.properties In the file, it’s OK.

# register ip
canal.register.ip =
# canal admin config
canal.admin.manager = 127.0.0.1:8089
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
# admin auto register
canal.admin.register.auto = true
canal.admin.register.cluster =
  • Start canal serve
sh bin/startup.sh  local

Note: start canal server first, then start canal admin, and then log in to canal admin to add serve and instance.

Launch Kafka console consumer test

bin/kafka-console-consumer.sh --bootstrap-server kms-2:9092,kms-3:9092,kms-4:9092  --topic test --from-beginning 

At this time, if the MySQL data table changes, the log of row type will be written into kakfa. The specific format is JSON:

  • Insert operation
{
    "data":[
        {
            "id":"338",
            "City": Chengdu,
            "Province": Sichuan Province
        }
    ],
    "database":"qfbap_ods",
    "es":1583394964000,
    "id":2,
    "isDdl":false,
    "mysqlType":{
        "id":"int(11)",
        "city":"varchar(256)",
        "province":"varchar(256)"
    },
    "old":null,
    "pkNames":[
        "id"
    ],
    "sql":"",
    "sqlType":{
        "id":4,
        "city":12,
        "province":12
    },
    "table":"code_city",
    "ts":1583394964361,
    "type":"INSERT"
}
  • Update operation
{
    "data":[
        {
            "id":"338",
            "City": Mianyang City,
            "Province": Sichuan Province
        }
    ],
    "database":"qfbap_ods",
    "es":1583395177000,
    "id":3,
    "isDdl":false,
    "mysqlType":{
        "id":"int(11)",
        "city":"varchar(256)",
        "province":"varchar(256)"
    },
    "old":[
        {
            "City": Chengdu
        }
    ],
    "pkNames":[
        "id"
    ],
    "sql":"",
    "sqlType":{
        "id":4,
        "city":12,
        "province":12
    },
    "table":"code_city",
    "ts":1583395177408,
    "type":"UPDATE"
}
  • Delete operation
{
    "data":[
        {
            "id":"338",
            "City": Mianyang City,
            "Province": Sichuan Province
        }
    ],
    "database":"qfbap_ods",
    "es":1583395333000,
    "id":4,
    "isDdl":false,
    "mysqlType":{
        "id":"int(11)",
        "city":"varchar(256)",
        "province":"varchar(256)"
    },
    "old":null,
    "pkNames":[
        "id"
    ],
    "sql":"",
    "sqlType":{
        "id":4,
        "city":12,
        "province":12
    },
    "table":"code_city",
    "ts":1583395333208,
    "type":"DELETE"
}

Explanation of JSON log format

  • Data: the latest data is a JSON array. If it is inserted, it means the latest inserted data. If it is updated, it means the latest updated data. If it is deleted, it means the deleted data
  • Database: database name
  • Es: event time, 13 bit timestamp
  • ID: sequence number of event operation, 1,2,3
  • Isddl: is it a DDL operation
  • Mysqltype: field type
  • Old: old data
  • Pknames: primary key name
  • SQL: SQL statement
  • Sqltype: it is processed by canonical conversion. For example, the unsigned int will be converted to long, and the unsigned long will be converted to BigDecimal
  • Table: table name
  • TS: log time
  • Type: operation type, such as delete, update, insert

Summary

This paper first introduces the configuration of MySQL binlog log and the construction of canal, then describes the configuration of transferring canal data to Kafka, and finally explains the JSON data parsed by canal in detail. This paper is the first one to realize real-time incremental data synchronization based on canal and Flink. The next one introduces how to use Flink to realize real-time incremental data synchronization.


Official account “big data technology and multi warehouse”, reply to “information” to receive big data package.

Recommended Today

Don’t be a tool man. Touching hands teaches you Jenkins!

Hello everyone, I’m a piece of cake, a piece of cake eager to be Cai Bucai in the Internet industry. Soft or hard, praise is soft, white whoring is just!Ghost ~ remember to give me a third company after watching it! This article mainly introducesJenkins If necessary, please refer to If it helps, don’t forgetgive […]