abstract
The project needs to use Kafka stream to load the data in MySQL database, and then do a data filtering function similar to ETL. In this way, the Kafka data imported into a topic and the data in the database connected to MySQL through Kafka connect are filtered and de duplicated.
content
1、 Kafka installation
- The Kafka connector function is introduced from Kafka version 1.0 and above. First, we need to check whether the corresponding version supports connect (intuitively: the bin directory contains connect, and the conf directory contains connect);
Bin directory:
Conf Directory:
- We use version: Kafka_ 2.11-1.0.1.jar. Where 2.11 is the scala version and 1.0.1 is the Kafka version;
2、 Download the Kafka connect JDBC plug-in
Go to the website:https://www.confluent.io/hub/…
Download;
Select the corresponding version
Extract the following directory structure:
Get:
Extract the jar files from Lib in the plug-in and put them into the LIBS directory of Kafka:
3、 Copy the MySQL driver of Java to the LIBS directory of Kafka
4: Connect-mysql-source.properties configuration file
Copy the files in the etc directory of Kafka connect JDBC to the config directory of Kafka and modify them to connect-mysql-source.properties;
Copy to Kafka config:
Modify the configuration according to the local data source:
# A simple example that copies all tables from a SQLite database. The first few settings are
# required for all connectors: a name, the connector class to run, and the maximum number of
# tasks to create:
name=test-source-mysql-jdbc-autoincrement
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=10
# The remaining configs are specific to the JDBC source connector. In this example, we connect to a
# SQLite database stored in the file test.db, use and auto-incrementing column called 'id' to
# detect new rows as they are added, and output to topics prefixed with 'test-sqlite-jdbc-', e.g.
# a table called 'users' will be written to the topic 'test-sqlite-jdbc-users'.
#connection.url=jdbc:mysql://192.168.101.3:3306/databasename?user=xxx&password=xxx
connection.url=jdbc:mysql://127.0.01:3306/us_app?user=root&password=root
table.whitelist=ocm_blacklist_number
#Bulk is batch import. In addition, there are increasing and timestamp modes
mode=bulk
#timestamp.column.name=time
#incrementing.column.name=id
topic.prefix=connect-mysql-
Configuration description reference:https://www.jianshu.com/p/9b1…
5、 Modify config / connect-standalone.properties.in Kafka directory
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
6、 Start Kafka connect
bin/connect-standalone.sh config/connect-standalone.properties config/connect-mysql-source.properties
Note: connect-standalone.sh is a single node mode. In addition, there is a connect distributed cluster mode. If you use the cluster mode, you need to modify connect-distributed.properties
7、 Consume Kafka and check whether the import is successful
You can start a consumer and consume connect mysql OCM from the starting point_ blacklist_ Number. If you can see the output, your connector configuration is successful.
./kafka-console-consumer.sh --zookeeper 127.0.0.1:2181 --topic connect-mysql-ocm_blacklist_number --from-begin
reference resources:
https://blog.csdn.net/u014686…
Kafka stream reference:
https://www.infoq.cn/article/…