Flink best practice: using canal to synchronize MySQL data to tidb

Time:2021-7-22

Introduction: This article will introduce how to import the data in MySQL into Kafka in the form of binlog + canal, and then be consumed by Flink.

1、 Background

This article will introduce how to import data from MySQL into Kafka in the form of binlog + canal and then be consumed by Flink.

In order to quickly verify the functionality of the whole process, all the components are deployed in the form of stand-alone. If you don’t have enough physical resources, you can put all the components in this article into a 4G 1U virtual machine environment.

If it needs to be deployed in the production environment, it is recommended to replace each component with a highly available cluster deployment scheme.

Among them, we have created a single node zookeeper environment, which is shared by Flink, Kafka, canal and other components.

For all components that need JRE, such as Flink, Kafka, canal and zookeeper, considering that upgrading JRE may affect other applications, we choose each component to use its own JRE environment independently.

This paper is divided into two parts. The first seven sections mainly introduce the construction of the basic environment, and the last section describes how the data flows in each component.

Flink best practice: using canal to synchronize MySQL data to tidb

Data flows through the following components:

  • MySQL data source generates binlog.
  • Canal reads binlog, generates canal JSON, and pushes it to the topic specified by Kafka.
  • Flink uses the Flink SQL connector Kafka API to consume the data in Kafka topic.
  • Flink writes data to tidb through Flink connector JDBC.
  • The structure of tidb + Flink supports the development and running of various kinds of applications.

At present, the main features include:

  • Batch flow integration.
  • Precise state management.
  • Event time support.
  • Accurate primary state consistency guarantee.
  • Flink can run on a variety of resource management frameworks including yarn, mesos and kubernetes. It also supports independent deployment on bare metal clusters. Tidb can be deployed on AWS, kubernetes, GCP and gke. At the same time, it also supports the independent deployment of tiup on bare metal cluster.

The common applications of tidb + Flink are as follows:

Event driven applications:

  • Anti fraud.
  • Anomaly detection.
  • Rule based alarms.
  • Business process monitoring.

Data analysis and application:

  • Network quality monitoring.
  • Product update and test evaluation analysis.
  • Ad hoc analysis of factual data.
  • Large scale graph analysis.

Data pipeline application:

  • E-commerce real-time query index construction.
  • E-commerce continues to use ETL.

2、 Environment introduction

2.1 operating system environment

[[email protected] topology]# cat /etc/redhat-release
CentOS Stream release 8

2.2 software environment
Flink best practice: using canal to synchronize MySQL data to tidb
Flink best practice: using canal to synchronize MySQL data to tidb

2.3 machine distribution
Flink best practice: using canal to synchronize MySQL data to tidb

3、 Deploying tidb cluster

Compared with the traditional stand-alone database, tidb has the following advantages:

  • Pure distributed architecture, with good scalability, supports flexible expansion and contraction.
  • It supports SQL, exposes the network protocol of MySQL, and is compatible with the syntax of most mysql. It can directly replace MySQL in most scenarios.
  • By default, it supports high availability. In the case of a small number of copies failure, the database itself can automatically repair and fail over data, which is transparent to the business.
  • It supports acid transactions and is friendly to some scenarios with strong consistent requirements, such as bank transfer.
  • It has rich tool chain ecology, covering data migration, synchronization, backup and other scenarios.

In the kernel design, tidb distributed database divides the whole architecture into several modules, and each module communicates with each other to form a complete tidb system. The corresponding architecture is as follows:

Flink best practice: using canal to synchronize MySQL data to tidb

In this paper, we only do the simplest function test, so we deploy a single node but replica tidb, which involves the following three modules:

  • Tidb server: SQL layer, which exposes the connection endpoint of MySQL protocol, is responsible for accepting the connection of client, executing SQL parsing and optimization, and finally generating distributed execution plan.
  • PD (placement driver) server: the meta information management module of the whole tidb cluster, which is responsible for storing the real-time data distribution of each tikv node and the overall topology of the cluster, providing the tidb dashboard control interface, and assigning transaction IDs for distributed transactions.
  • Tikv server: responsible for storing data. Externally, tikv is a distributed key value storage engine providing transactions.

3.1 tiup deployment template file

# # Global variables are applied to all deployments and used as the default value of
# # the deployments if a specific deployment value is missing.
global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/opt/tidb-c1/"
  data_dir: "/opt/tidb-c1/data/"
# # Monitored variables are applied to all the machines.
#monitored:
#  node_exporter_port: 19100
#  blackbox_exporter_port: 39115
#  deploy_dir: "/opt/tidb-c3/monitored"
#  data_dir: "/opt/tidb-c3/data/monitored"
#  log_dir: "/opt/tidb-c3/log/monitored"
# # Server configs are used to specify the runtime configuration of TiDB components.
# # All configuration items can be found in TiDB docs:
# # - TiDB: https://pingcap.com/docs/stable/reference/configuration/tidb-server/configuration-file/
# # - TiKV: https://pingcap.com/docs/stable/reference/configuration/tikv-server/configuration-file/
# # - PD: https://pingcap.com/docs/stable/reference/configuration/pd-server/configuration-file/
# # All configuration items use points to represent the hierarchy, e.g:
# #   readpool.storage.use-unified-pool
# #
# # You can overwrite this configuration via the instance-level `config` field.
server_configs:
  tidb:
    log.slow-threshold: 300
    binlog.enable: false
    binlog.ignore-error: false
    tikv-client.copr-cache.enable: true
  tikv:
    server.grpc-concurrency: 4
    raftstore.apply-pool-size: 2
    raftstore.store-pool-size: 2
    rocksdb.max-sub-compactions: 1
    storage.block-cache.capacity: "16GB"
    readpool.unified.max-thread-count: 12
    readpool.storage.use-unified-pool: false
    readpool.coprocessor.use-unified-pool: true
    raftdb.rate-bytes-per-sec: 0
  pd:
    schedule.leader-schedule-limit: 4
    schedule.region-schedule-limit: 2048
    schedule.replica-schedule-limit: 64
pd_servers:
  - host: 192.168.12.21
    ssh_port: 22
    name: "pd-2"
    client_port: 12379
    peer_port: 12380
    deploy_dir: "/opt/tidb-c1/pd-12379"
    data_dir: "/opt/tidb-c1/data/pd-12379"
    log_dir: "/opt/tidb-c1/log/pd-12379"
    numa_node: "0"
    # # The following configs are used to overwrite the `server_configs.pd` values.
    config:
      schedule.max-merge-region-size: 20
      schedule.max-merge-region-keys: 200000
tidb_servers:
  - host: 192.168.12.21
    ssh_port: 22
    port: 14000
    status_port: 12080
    deploy_dir: "/opt/tidb-c1/tidb-14000"
    log_dir: "/opt/tidb-c1/log/tidb-14000"
    numa_node: "0"
    # # The following configs are used to overwrite the `server_configs.tidb` values.
    config:
      log.slow-query-file: tidb-slow-overwrited.log
      tikv-client.copr-cache.enable: true
tikv_servers:
  - host: 192.168.12.21
    ssh_port: 22
    port: 12160
    status_port: 12180
    deploy_dir: "/opt/tidb-c1/tikv-12160"
    data_dir: "/opt/tidb-c1/data/tikv-12160"
    log_dir: "/opt/tidb-c1/log/tikv-12160"
    numa_node: "0"
    # # The following configs are used to overwrite the `server_configs.tikv` values.
    config:
      server.grpc-concurrency: 4
      #server.labels: { zone: "zone1", dc: "dc1", host: "host1" }
#monitoring_servers:
#  - host: 192.168.12.21
#    ssh_port: 22
#    port: 19090
#    deploy_dir: "/opt/tidb-c1/prometheus-19090"
#    data_dir: "/opt/tidb-c1/data/prometheus-19090"
#    log_dir: "/opt/tidb-c1/log/prometheus-19090"
#grafana_servers:
#  - host: 192.168.12.21
#    port: 13000
#    deploy_dir: "/opt/tidb-c1/grafana-13000"
#alertmanager_servers:
#  - host: 192.168.12.21
#    ssh_port: 22
#    web_port: 19093
#    cluster_port: 19094
#    deploy_dir: "/opt/tidb-c1/alertmanager-19093"
#    data_dir: "/opt/tidb-c1/data/alertmanager-19093"
#    log_dir: "/opt/tidb-c1/log/alertmanager-19093"

3.2 tidb cluster environment
This paper focuses on the non deployment of tidb cluster. As a fast experimental environment, only a single copy tidb cluster is deployed on one machine. There is no need to deploy the monitoring environment.

[[email protected] topology]# tiup cluster display tidb-c1-v409
Starting component `cluster`: /root/.tiup/components/cluster/v1.3.2/tiup-cluster display tidb-c1-v409
Cluster type:       tidb
Cluster name:       tidb-c1-v409
Cluster version:    v4.0.9
SSH type:           builtin
Dashboard URL:      http://192.168.12.21:12379/dashboard
ID                   Role  Host           Ports        OS/Arch       Status   Data Dir                      Deploy Dir
--                   ----  ----           -----        -------       ------   --------                      ----------
192.168.12.21:12379  pd    192.168.12.21  12379/12380  linux/x86_64  Up|L|UI  /opt/tidb-c1/data/pd-12379    /opt/tidb-c1/pd-12379
192.168.12.21:14000  tidb  192.168.12.21  14000/12080  linux/x86_64  Up       -                             /opt/tidb-c1/tidb-14000
192.168.12.21:12160  tikv  192.168.12.21  12160/12180  linux/x86_64  Up       /opt/tidb-c1/data/tikv-12160  /opt/tidb-c1/tikv-12160
Total nodes: 4

Create tables for testing

mysql> show create table t1;
+-------+-------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                  |
+-------+-------------------------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+-------+-------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

4、 Deploy zookeeper environment

In this experiment, zookeeper environment is configured separately to provide services for Kafka and Flink environment.

As an experimental demonstration scheme, only a stand-alone environment is deployed.

4.1 unzip zookeeper package

[[email protected] soft]# tar vxzf apache-zookeeper-3.6.2-bin.tar.gz
[[email protected] soft]# mv apache-zookeeper-3.6.2-bin /opt/zookeeper

4.2 deploying JRE for zookeeper

[[email protected] soft]# tar vxzf jre1.8.0_281.tar.gz
[[email protected] soft]# mv jre1.8.0_281 /opt/zookeeper/jre

Modify / opt / zookeeper / bin / zkenv.sh file to add Java_ Home environment variable

## add bellowing env var in the head of zkEnv.sh
JAVA_HOME=/opt/zookeeper/jre

4.3 create zookeeper configuration file

[[email protected] conf]# cat zoo.cfg | grep -v "#"
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/zookeeper/data
clientPort=2181

4.4 start zookeeper

[[email protected] bin]# /opt/zookeeper/bin/zkServer.sh start

4.5 check the status of zookeeper

## check zk status
[[email protected] bin]# ./zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: standalone
## check OS port status
[[email protected] bin]# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      942/sshd
tcp6       0      0 :::2181                 :::*                    LISTEN      15062/java
tcp6       0      0 :::8080                 :::*                    LISTEN      15062/java
tcp6       0      0 :::22                   :::*                    LISTEN      942/sshd
tcp6       0      0 :::44505                :::*                    LISTEN      15062/java
## use zkCli tool to check zk connection
[[email protected] bin]# ./zkCli.sh -server 192.168.12.24:2181

4.6 suggestions on zookeeper
I personally have a little immature suggestion about zookeeper:

The zookeeper cluster version must enable network monitoring. In particular, we should pay attention to the network bandwidth in system metrics.

5、 Deploy Kafka

Kafka is a distributed stream processing platform, which is mainly used in two kinds of applications

A real-time stream data pipeline is constructed, which can reliably obtain data between systems or applications( Equivalent to message queue)
Build real-time streaming applications to transform or influence these streaming data( It is stream processing, which changes internally between Kafka stream topic and topic
Flink best practice: using canal to synchronize MySQL data to tidb

Kafka has four core APIs:

  • The producer API allows an application to publish a stream of data to one or more Kafka topics.
  • The consumer API allows an application to subscribe to one or more topics and process the streaming data published to them.
  • The streams API allows an application, as a stream processor, to consume the input streams generated by one or more topics, and then to produce an output stream to one or more topics for effective conversion in the input and output streams.
  • The connector API allows to build and run reusable producers or consumers to connect Kafka topics to existing applications or data systems. For example, connect to a relational database and capture all changes in the table.

In this experiment, we only do functional verification and build a stand-alone Kafka environment.

5.1 download and unzip Kafka

[[email protected] soft]# tar vxzf kafka_2.13-2.7.0.tgz
[[email protected] soft]# mv kafka_2.13-2.7.0 /opt/kafka

5.2 deploy JRE for Kafka

[[email protected] soft]# tar vxzf jre1.8.0_281.tar.gz
[[email protected] soft]# mv jre1.8.0_281 /opt/kafka/jre

Modify the JRE environment variable of Kafka

[[email protected] bin]# vim /opt/kafka/bin/kafka-run-class.sh
## add bellowing line in the head of kafka-run-class.sh
JAVA_HOME=/opt/kafka/jre

5.3 modify Kafka configuration file
Modify Kafka configuration file / opt / Kafka / config / server.properties

## change bellowing variable in /opt/kafka/config/server.properties
broker.id=0
listeners=PLAINTEXT://192.168.12.22:9092
log.dirs=/opt/kafka/logs
zookeeper.connect=i192.168.12.24:2181

5.4 start Kafka

[[email protected] bin]# /opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties

5.5 view the version information of Kafka
Kafka does not provide the option of — version to view the version information of Kafka.

[[email protected] ~]# ll /opt/kafka/libs/ | grep kafka
-rw-r--r-- 1 root root  4929521 Dec 16 09:02 kafka_2.13-2.7.0.jar
-rw-r--r-- 1 root root      821 Dec 16 09:03 kafka_2.13-2.7.0.jar.asc
-rw-r--r-- 1 root root    41793 Dec 16 09:02 kafka_2.13-2.7.0-javadoc.jar
-rw-r--r-- 1 root root      821 Dec 16 09:03 kafka_2.13-2.7.0-javadoc.jar.asc
-rw-r--r-- 1 root root   892036 Dec 16 09:02 kafka_2.13-2.7.0-sources.jar
-rw-r--r-- 1 root root      821 Dec 16 09:03 kafka_2.13-2.7.0-sources.jar.asc
... ...

2.13 is the version information of scale and 2.7.0 is the version information of Kafka.

6、 Deploying Flink

Apache Flink is a framework and distributed processing engine for stateful computing on unbounded and bounded data streams. Flink can run in all common cluster environments, and can compute at memory speed and any size.

Apache Flink, a distributed processing framework supporting high throughput, low latency and high performance, is a framework and distributed processing engine for stateful computing of unbounded and bounded data streams. Flink is designed to run in all common cluster environments, with memory execution speed and arbitrary scale.

Flink best practice: using canal to synchronize MySQL data to tidb

In this experiment, only functional test is done, and only stand-alone Flink environment is deployed.

6.1 download and distribute Flink

[[email protected] soft]# tar vxzf flink-1.12.1-bin-scala_2.11.tgz
[[email protected] soft]# mv flink-1.12.1 /opt/flink

6.2 deploying Flink’s JRE

[[email protected] soft]# tar vxzf jre1.8.0_281.tar.gz
[[email protected] soft]# mv jre1.8.0_281 /opt/flink/jre

6.3 add lib required by Flink
When Flink consumes Kafka data, it needs a Flink SQL connector Kafka package.

The link between Flink and MySQL / tidb requires the Flink connector JDBC package.

[[email protected] soft]# mv flink-sql-connector-kafka_2.12-1.12.0.jar /opt/flink/lib/
[[email protected] soft]# mv flink-connector-jdbc_2.12-1.12.0.jar /opt/flink/lib/

6.4 modifying the Flink configuration file

## add or modify bellowing lines in /opt/flink/conf/flink-conf.yaml
jobmanager.rpc.address: 192.168.12.23
env.java.home: /opt/flink/jre

6.5 launch Flink

[[email protected] ~]# /opt/flink/bin/start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host r23.
Starting taskexecutor daemon on host r23.

6.6 view Flink GUI
Flink best practice: using canal to synchronize MySQL data to tidb

7、 Deploy MySQL

7.1 decompressing MySQL package

[[email protected] soft]# tar vxf mysql-8.0.23-linux-glibc2.12-x86_64.tar.xz
[[email protected] soft]# mv mysql-8.0.23-linux-glibc2.12-x86_64 /opt/mysql/

7.2 create MySQL service file

[[email protected] ~]# touch /opt/mysql/support-files/mysqld.service
[[email protected] support-files]# cat mysqld.service
[Unit]
Description=MySQL 8.0 database server
After=syslog.target
After=network.target
[Service]
Type=simple
User=mysql
Group=mysql
#ExecStartPre=/usr/libexec/mysql-check-socket
#ExecStartPre=/usr/libexec/mysql-prepare-db-dir %n
# Note: we set --basedir to prevent probes that might trigger SELinux alarms,
# per bug #547485
ExecStart=/opt/mysql/bin/mysqld_safe
#ExecStartPost=/opt/mysql/bin/mysql-check-upgrade
#ExecStopPost=/opt/mysql/bin/mysql-wait-stop
# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=300
# Place temp files in a secure directory, not /tmp
PrivateTmp=true
Restart=on-failure
RestartPreventExitStatus=1
# Sets open_files_limit
LimitNOFILE = 10000
# Set enviroment variable MYSQLD_PARENT_PID. This is required for SQL restart command.
Environment=MYSQLD_PARENT_PID=1
[Install]
WantedBy=multi-user.target
## copy mysqld.service to /usr/lib/systemd/system/
[[email protected] support-files]# cp mysqld.service  /usr/lib/systemd/system/

7.3 create my.cnf file

[[email protected] opt]# cat /etc/my.cnf
[mysqld]
port=3306
basedir=/opt/mysql
datadir=/opt/mysql/data
socket=/opt/mysql/data/mysql.socket
max_connections = 100
default-storage-engine = InnoDB
character-set-server=utf8
log-error = /opt/mysql/log/error.log
slow_query_log = 1
long-query-time = 30
slow_query_log_file = /opt/mysql/log/show.log
min_examined_row_limit = 1000
log-slow-slave-statements
log-queries-not-using-indexes
#skip-grant-tables

7.4 initialize and start MySQL

[[email protected] ~]# /opt/mysql/bin/mysqld --initialize --user=mysql --console
[[email protected] ~]# chown -R mysql:mysql /opt/mysql
[[email protected] ~]# systemctl start mysqld
## check mysql temp passord from /opt/mysql/log/error.log
2021-02-24T02:45:47.316406Z 6 [Note] [MY-010454] [Server] A temporary password is generated for [email protected]: I?nDjijxa3>-

7.5 create a new MySQL user to connect to canal

## change mysql temp password firstly
mysql> alter user 'root'@'localhost' identified by 'mysql';
Query OK, 0 rows affected (0.00 sec)
## create a management user 'root'@'%'
mysql> create user 'root'@'%' identified by 'mysql';
Query OK, 0 rows affected (0.01 sec)
mysql> grant all privileges on *.* to 'root'@'%';
Query OK, 0 rows affected (0.00 sec)
## create a canal replication user 'canal'@'%'
mysql> create user 'canal'@'%' identified by 'canal';
Query OK, 0 rows affected (0.01 sec)
mysql> grant select, replication slave, replication client on *.* to 'canal'@'%';
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

7.6 creating tables for testing in MySQL

mysql> show create table test.t2;
+-------+----------------------------------------------------------------------------------+
| Table | Create Table                                                                     |
+-------+----------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
  `id` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

8、 Deploy canal

Canal is mainly used to provide incremental data subscription and consumption based on MySQL incremental log parsing.

In the early days, due to the deployment of dual computer rooms in Hangzhou and the United States, Alibaba had the business requirement of cross computer room synchronization, and the implementation method was mainly based on the business trigger to obtain incremental changes.

Since 2010, the business has gradually tried to analyze the database log to get incremental changes for synchronization, which has resulted in a large number of database incremental subscription and consumption business.

Flink best practice: using canal to synchronize MySQL data to tidb

Services based on log incremental subscription and consumption include:

  • Database mirroring.
  • Real time database backup.
  • Index construction and real-time maintenance (split heterogeneous index, inverted index, etc.).
  • Business cache refresh.
  • Incremental data processing with business logic.

The current canal supports source MySQL versions, including 5.1. X, 5.5. X, 5.6. X, 5.7. X and 8.0. X.

8.1 unzip canal package

[[email protected] soft]# mkdir /opt/canal && tar vxzf canal.deployer-1.1.4.tar.gz -C /opt/canal

8.2 deploy canal’s JRE

[[email protected] soft]# tar vxzf jre1.8.0_281.tar.gz
[[email protected] soft]# mv jre1.8.0_281 /opt/canal/jre
## configue jre, add bellowing line in the head of /opt/canal/bin/startup.sh 
JAVA=/opt/canal/jre/bin/java

8.3 modify the configuration file of canal
Modify the / opt / canal / conf / canal.properties configuration file

## modify bellowing configuration
canal.zkServers =192.168.12.24:2181
canal.serverMode = kafka
Canal. Destinations = example # # #, you need to create an example folder in / opt / canal / conf directory to store the configuration of destination
canal.mq.servers = 192.168.12.22:9092
Modify / opt / canal / conf / example / instance.properties configuration file

## modify bellowing configuration
canal.instance.master.address=192.168.12.25:3306
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
Canal. Instance. Filter. Regex =. * \ \.. * # filter the database table
canal.mq.topic=canal-kafka

9、 Configure data flow

9.1 MySQL binlog > canal > Kafka
9.1.1 viewing MySQL binlog information
Check the MySQL binlog information to ensure that the binlog is normal.

mysql> show master status;
+---------------+----------+--------------+------------------+-------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000001 |     2888 |              |                  |                   |
+---------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

9.1.2 create a topic in Kafka
Create a topic canal Kafka in Kafka. The name of this topic should correspond to canal. MQ. Topic = canal Kafka in canal configuration file / opt / canal / conf / example / instance. Properties

[[email protected] kafka]# /opt/kafka/bin/kafka-topics.sh --create \
> --zookeeper 192.168.12.24:2181 \
> --config max.message.bytes=12800000 \
> --config flush.messages=1 \
> --replication-factor 1 \
> --partitions 1 \
> --topic canal-kafka
Created topic canal-kafka.
[2021-02-24 01:51:55,050] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions Set(canal-kafka-0) (kafka.server.ReplicaFetcherManager)
[2021-02-24 01:51:55,052] INFO [Log partition=canal-kafka-0, dir=/opt/kafka/logs] Loading producer state till offset 0 with message format version 2 (kafka.log.Log)
[2021-02-24 01:51:55,053] INFO Created log for partition canal-kafka-0 in /opt/kafka/logs/canal-kafka-0 with properties {compression.type -> producer, message.downconversion.enable -> true, min.insync.replicas -> 1, segment.jitter.ms -> 0, cleanup.policy -> [delete], flush.ms -> 9223372036854775807, segment.bytes -> 1073741824, retention.ms -> 604800000, flush.messages -> 1, message.format.version -> 2.7-IV2, file.delete.delay.ms -> 60000, max.compaction.lag.ms -> 9223372036854775807, max.message.bytes -> 12800000, min.compaction.lag.ms -> 0, message.timestamp.type -> CreateTime, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, index.interval.bytes -> 4096, unclean.leader.election.enable -> false, retention.bytes -> -1, delete.retention.ms -> 86400000, segment.ms -> 604800000, message.timestamp.difference.max.ms -> 9223372036854775807, segment.index.bytes -> 10485760}. (kafka.log.LogManager)
[2021-02-24 01:51:55,053] INFO [Partition canal-kafka-0 broker=0] No checkpointed highwatermark is found for partition canal-kafka-0 (kafka.cluster.Partition)
[2021-02-24 01:51:55,053] INFO [Partition canal-kafka-0 broker=0] Log loaded for partition canal-kafka-0 with initial high watermark 0 (kafka.cluster.Partition)

To view all topics in Kafka:

[[email protected] kafka]# /opt/kafka/bin/kafka-topics.sh --list --zookeeper 192.168.12.24:2181
__consumer_offsets
canal-kafka
ticdc-test

To view the information of topic ticdc test in Kafka:

[[email protected] ~]# /opt/kafka/bin/kafka-topics.sh --describe --zookeeper 192.168.12.24:2181  --topic canal-kafka
Topic: ticdc-test       PartitionCount: 1       ReplicationFactor: 1    Configs: max.message.bytes=12800000,flush.messages=1
        Topic: ticdc-test       Partition: 0    Leader: 0       Replicas: 0     Isr: 0

9.1.3 starting canal
Before starting canal, you need to check the port status on the canal node

## check MySQL 3306 port
## canal.instance.master.address=192.168.12.25:3306
[[email protected] bin]# telnet 192.168.12.25 3306
## check Kafka 9092 port
## canal.mq.servers = 192.168.12.22:9092
[[email protected] bin]# telnet 192.168.12.22 9092
## check zookeeper 2181 port
## canal.zkServers = 192.168.12.24:2181
[[email protected] bin]# telnet 192.168.12.24 2181

Start canal:

[[email protected] bin]# /opt/canal/bin/startup.sh
cd to /opt/canal/bin for workaround relative path
LOG CONFIGURATION : /opt/canal/bin/../conf/logback.xml
canal conf : /opt/canal/bin/../conf/canal.properties
CLASSPATH :/opt/canal/bin/../conf:/opt/canal/bin/../lib/zookeeper-3.4.5.jar:/opt/canal/bin/../lib/zkclient-0.10.jar:/opt/canal/bin/../lib/spring-tx-3.2.18.RELEASE.jar:/opt/canal/bin/../lib/spring-orm-3.2.18.RELEASE.jar:/opt/canal/bin/../lib/spring-jdbc-3.2.18.RELEASE.jar:/opt/canal/bin/../lib/spring-expression-3.2.18.RELEASE.jar:/opt/canal/bin/../lib/spring-core-3.2.18.RELEASE.jar:/opt/canal/bin/../lib/spring-context-3.2.18.RELEASE.jar:/opt/canal/bin/../lib/spring-beans-3.2.18.RELEASE.jar:/opt/canal/bin/../lib/spring-aop-3.2.18.RELEASE.jar:/opt/canal/bin/../lib/snappy-java-1.1.7.1.jar:/opt/canal/bin/../lib/snakeyaml-1.19.jar:/opt/canal/bin/../lib/slf4j-api-1.7.12.jar:/opt/canal/bin/../lib/simpleclient_pushgateway-0.4.0.jar:/opt/canal/bin/../lib/simpleclient_httpserver-0.4.0.jar:/opt/canal/bin/../lib/simpleclient_hotspot-0.4.0.jar:/opt/canal/bin/../lib/simpleclient_common-0.4.0.jar:/opt/canal/bin/../lib/simpleclient-0.4.0.jar:/opt/canal/bin/../lib/scala-reflect-2.11.12.jar:/opt/canal/bin/../lib/scala-logging_2.11-3.8.0.jar:/opt/canal/bin/../lib/scala-library-2.11.12.jar:/opt/canal/bin/../lib/rocketmq-srvutil-4.5.2.jar:/opt/canal/bin/../lib/rocketmq-remoting-4.5.2.jar:/opt/canal/bin/../lib/rocketmq-logging-4.5.2.jar:/opt/canal/bin/../lib/rocketmq-common-4.5.2.jar:/opt/canal/bin/../lib/rocketmq-client-4.5.2.jar:/opt/canal/bin/../lib/rocketmq-acl-4.5.2.jar:/opt/canal/bin/../lib/protobuf-java-3.6.1.jar:/opt/canal/bin/../lib/oro-2.0.8.jar:/opt/canal/bin/../lib/netty-tcnative-boringssl-static-1.1.33.Fork26.jar:/opt/canal/bin/../lib/netty-all-4.1.6.Final.jar:/opt/canal/bin/../lib/netty-3.2.2.Final.jar:/opt/canal/bin/../lib/mysql-connector-java-5.1.47.jar:/opt/canal/bin/../lib/metrics-core-2.2.0.jar:/opt/canal/bin/../lib/lz4-java-1.4.1.jar:/opt/canal/bin/../lib/logback-core-1.1.3.jar:/opt/canal/bin/../lib/logback-classic-1.1.3.jar:/opt/canal/bin/../lib/kafka-clients-1.1.1.jar:/opt/canal/bin/../lib/kafka_2.11-1.1.1.jar:/opt/canal/bin/../lib/jsr305-3.0.2.jar:/opt/canal/bin/../lib/jopt-simple-5.0.4.jar:/opt/canal/bin/../lib/jctools-core-2.1.2.jar:/opt/canal/bin/../lib/jcl-over-slf4j-1.7.12.jar:/opt/canal/bin/../lib/javax.annotation-api-1.3.2.jar:/opt/canal/bin/../lib/jackson-databind-2.9.6.jar:/opt/canal/bin/../lib/jackson-core-2.9.6.jar:/opt/canal/bin/../lib/jackson-annotations-2.9.0.jar:/opt/canal/bin/../lib/ibatis-sqlmap-2.3.4.726.jar:/opt/canal/bin/../lib/httpcore-4.4.3.jar:/opt/canal/bin/../lib/httpclient-4.5.1.jar:/opt/canal/bin/../lib/h2-1.4.196.jar:/opt/canal/bin/../lib/guava-18.0.jar:/opt/canal/bin/../lib/fastsql-2.0.0_preview_973.jar:/opt/canal/bin/../lib/fastjson-1.2.58.jar:/opt/canal/bin/../lib/druid-1.1.9.jar:/opt/canal/bin/../lib/disruptor-3.4.2.jar:/opt/canal/bin/../lib/commons-logging-1.1.3.jar:/opt/canal/bin/../lib/commons-lang3-3.4.jar:/opt/canal/bin/../lib/commons-lang-2.6.jar:/opt/canal/bin/../lib/commons-io-2.4.jar:/opt/canal/bin/../lib/commons-compress-1.9.jar:/opt/canal/bin/../lib/commons-codec-1.9.jar:/opt/canal/bin/../lib/commons-cli-1.2.jar:/opt/canal/bin/../lib/commons-beanutils-1.8.2.jar:/opt/canal/bin/../lib/canal.store-1.1.4.jar:/opt/canal/bin/../lib/canal.sink-1.1.4.jar:/opt/canal/bin/../lib/canal.server-1.1.4.jar:/opt/canal/bin/../lib/canal.protocol-1.1.4.jar:/opt/canal/bin/../lib/canal.prometheus-1.1.4.jar:/opt/canal/bin/../lib/canal.parse.driver-1.1.4.jar:/opt/canal/bin/../lib/canal.parse.dbsync-1.1.4.jar:/opt/canal/bin/../lib/canal.parse-1.1.4.jar:/opt/canal/bin/../lib/canal.meta-1.1.4.jar:/opt/canal/bin/../lib/canal.instance.spring-1.1.4.jar:/opt/canal/bin/../lib/canal.instance.manager-1.1.4.jar:/opt/canal/bin/../lib/canal.instance.core-1.1.4.jar:/opt/canal/bin/../lib/canal.filter-1.1.4.jar:/opt/canal/bin/../lib/canal.deployer-1.1.4.jar:/opt/canal/bin/../lib/canal.common-1.1.4.jar:/opt/canal/bin/../lib/aviator-2.2.1.jar:/opt/canal/bin/../lib/aopalliance-1.0.jar:
cd to /opt/canal/bin for continue

9.1.4 view canal log
View / opt / canal / logs / example / example.log

2021-02-24 01:41:40.293 [destination = example , address = /192.168.12.25:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position
2021-02-24 01:41:40.293 [destination = example , address = /192.168.12.25:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - prepare to find start position just show master status
2021-02-24 01:41:40.542 [destination = example , address = /192.168.12.25:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> find start position successfully, EntryPosition[included=false,journalName=binlog.000001,position=4,serverId=1,gtid=<null>,timestamp=1614134832000] cost : 244ms , the next step is binlog dump

9.1.5 viewing consumer information in Kafka
Insert a test message into MySQL:

mysql> insert into t2 values(1);
Query OK, 1 row affected (0.00 sec)

Check the consumer information. You have the test data you just inserted

/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server 192.168.12.22:9092 --topic canal-kafka --from-beginning
{"data":null,"database":"test","es":1614151725000,"id":2,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"create database test","sqlType":null,"table":"","ts":1614151725890,"type":"QUERY"}
{"data":null,"database":"test","es":1614151746000,"id":3,"isDdl":true,"mysqlType":null,"old":null,"pkNames":null,"sql":"create table t2(id int)","sqlType":null,"table":"t2","ts":1614151746141,"type":"CREATE"}
{"data":[{"id":"1"}],"database":"test","es":1614151941000,"id":4,"isDdl":false,"mysqlType":{"id":"int"},"old":null,"pkNames":null,"sql":"","sqlType":{"id":4},"table":"t2","ts":1614151941235,"type":"INSERT"}

9.2 Kafka → Flink pathway
Create the T2 table in Flink, and the connector type is Kafka.

## create a test table t2 in Flink
Flink SQL> create table t2(id int)
> WITH (
>  'connector' = 'kafka',
>  'topic' = 'canal-kafka',
>  'properties.bootstrap.servers' = '192.168.12.22:9092',
>  'properties.group.id' = 'canal-kafka-consumer-group',
>  'format' = 'canal-json',
>  'scan.startup.mode' = 'latest-offset'
> );
Flink SQL> select * from t1;

Insert a test data in MySQL:

mysql> insert into test.t2 values(2);
Query OK, 1 row affected (0.00 sec)

You can synchronize data in real time from Flink

Flink SQL> select * from t1;
 Refresh: 1 s                                                                                                             Page: Last of 1                                                                                                     Updated: 02:49:27.366
                        id
                         2

9.3 Flink → tidb access
9.3.1 create tables for testing in downstream tidb

[[email protected] soft]# mysql -uroot -P14000 -hr21
mysql> create table t3 (id int);
Query OK, 0 rows affected (0.31 sec)

9.3.2 creating test tables in Flink

Flink SQL> CREATE TABLE t3 (
>     id int
> ) with (
>     'connector' = 'jdbc',
>     'url' = 'jdbc:mysql://192.168.12.21:14000/test',
>     'table-name' = 't3',
>     'username' = 'root',
>     'password' = 'mysql'
> );
Flink SQL> insert into t3 values(3);
[INFO] Submitting SQL update statement to the cluster...
[INFO] Table update statement has been successfully submitted to the cluster:
Job ID: a0827487030db177ee7e5c8575ef714e

9.3.3 viewing inserted data in downstream tidb

mysql> select * from test.t3;
+------+
| id   |
+------+
|    3 |
+------+
1 row in set (0.00 sec)

Original link
This article is the original content of Alibaba cloud and cannot be reproduced without permission.