Million level commodity data real-time synchronization, query results in seconds

Time:2020-9-24

Weiwei technology team (Kevin)
https://juejin.im/post/5e6989…

A while ago, the boss arranged a new task to build a business commodity search system, which can provide users with fast and accurate search ability. When users input search content, they should be able to search from the two dimensions of business name and commodity name. The search results are sorted according to the accuracy rate, and the data structure is combined according to the association relationship of the commodities to which the business belongs The API is provided to the business system to call.

The background is very simple, the reality is quite complex! We are faced with the following problems:

① Business database and commodity database are different servers, and the amount of data reaches one million. How can we achieve cross database data synchronization?

② There is a subordination relationship between the business and the product data. Otherwise, the spicy chicken leg fort of KFC will be hanged to McDonald’s, which will be embarrassing!

3. Business data is constantly updated, such as price revision, inventory, shelves and so on. Search service can not find out a bunch of outdated data. If customers clearly find out the goods, they will be out of the rack after entering the point, so customers will have to make complaints about it. How to achieve real-time synchronization of search data and source database addition, deletion and modification?

With the above three questions, we started the overall architecture design of search service.

System architecture design ideas

In order to design a suitable system architecture, we analyzed the status quo.
First of all, business data and commodity data are stored in two independent MySQL 8 databases. In order to meet the association between business data and commodity data, we need to transfer the required tables in the two databases to our search system database in real time.

Secondly, after the data from the merchant and commodity database ETL to the search system database, it needs to be real-time combined into the merchant related commodity data structure, and stored in ES in the format of parent-child document.

Finally, the addition, deletion and modification operations of the merchant and commodity database need to be synchronized to es in real time, that is, the data in ES needs to support real-time addition, deletion and modification.

Therefore, we designed two canal components, the first of which is to implement data ETL, extract some tables and fields of merchant and commodity databases to search service database; then, using the second canal, we read binlog of MySQL database of search service, and transmit it to Kafka message queue in real time, and then by canal Adapter is used to correlate data, map parent-child documents, etc., and store processed data in elasticsearch.

The specific system architecture design is shown in the following figure.

Million level commodity data real-time synchronization, query results in seconds

The design of the business commodity search system architecture

Project actual combat

1. Environment and software description

  • Operating system: CentOS 7
  • canal:canal.adapter-1.1.4
  • canal.deployer-1.1.4
  • kafka:kafka_2.12-2.3.0
  • ElasticSearch:elasticsearch-6.3.2
  • kibana:kibana-6.3.2

2. Using canal to realize data ETL to mysql8

This step uses canal to extract the required tables from two independent MySQL 8 databases to the MySQL database of search service.

2.1 installing canaldeployer
  • (1) Decompress canal.deployer -1.1.4. tar.gz
  • (2) Configure canal deployer

Enter the canaldeployer / conf directory and modify it canal.properties File, mainly configure servermode, MQ and destination.

Firstly, we change the server mode to Kafka mode to increase the buffer capacity and stability of the system

serverMode

Million level commodity data real-time synchronization, query results in seconds

Next, configure the MQ information of Kafka (please install Kafka by yourself)

Kafka MQ information

Million level commodity data real-time synchronization, query results in seconds

Finally, configure the instances that need to be instantiated. Three instances are configured here, which means that canal deploy will start these three instances and synchronize the binlog of MySQL into the topic of Kafka. As shown in the figure below:

Destinations instance configuration

Million level commodity data real-time synchronization, query results in seconds

(3) Configure canal deployer instance

Enter the canaldeployer / conf / example directory and find a instance.properties File, this is the example given by canal, we can refer to its configuration.

  • ① We copy the entire example directory and rename it to one of the destinations configured in the previous step, such as xxsearch;
  • ② Enter the directory xxsearch and edit instance.properties File, which mainly configures the information of the source database, the required data tables and fields, and specifies the topic name of Kafka. In this way, the binlog of the source database will be converted into JSON data and transferred to Kafka topic through canal deployer in real time. As follows:

Canaldeploy instance source database configuration

Million level commodity data real-time synchronization, query results in seconds

Configuration of canaldeploy instance Kafka topic

Million level commodity data real-time synchronization, query results in seconds

  • ③ Enter the canaldeployer / bin directory and execute/ startup.sh , start the canal deployer and its instance.

So far, the canal deployer is built.

2.2 installation canal.adapter

We need to take advantage of it canal.adapter The binlog JSON data in Kafka topic is stored in MySQL 8 after cleaning and conversion. Since canal does not support MySQL 8, we need to make some adjustments.

  • (1) Add mysql8 connection driver

decompression canal.adapter -1.1.4. tar.gz , enter the canaladapter / lib directory, remove mysql-connector-java-5.1.40.jar, and import mysql-connector-java-8.0.18.jar

  • (2) Configure the canal adapter to output data to MySQL 8.

Enter the canaladapter / conf directory and edit it application.yml File, mainly configure consumption Kafka, source database information and search system database information, as shown below:

ETL to MySQL 8 configuration

Million level commodity data real-time synchronization, query results in seconds

Next, go to the canaladapter / conf / RDB directory and use the official mytest_ user.yml For example, configure Kafka topic name, source database name, source data table name, target database name and target data table name. It is recommended that a table corresponds to a YML file.

ETL table structure mapping configuration

Million level commodity data real-time synchronization, query results in seconds

  • (3) Start canaladapter

Enter the canaladapter / bin directory and execute/ startup.sh Start the canal adapter and observe the logs / adapter/ adapter.log Log file: manually add a new record in the search system database to see whether the following logs will be printed, that is, there are two records, one info and one debug, indicating that the configuration is successful.

Canaladapter log

Million level commodity data real-time synchronization, query results in seconds

At this point, the ETL phase of the data is completed, and the data can be synchronized from two different MySQL 8 databases to the MySQL database of the search service in real time.

3. Realize data multi table Association and parent-child document mapping
(1) Configure the canaladapter of the second canal

Enter the canaladapter / conf directory and edit it application.yml File, mainly configure consumption Kafka, search system database, and ES connection information, as shown below:

Configuration of canaladapter MQ and MySQL

Million level commodity data real-time synchronization, query results in seconds

Canaladapter es configuration

Million level commodity data real-time synchronization, query results in seconds

(2) Configure multi table Association

Enter the directory “canvipter / myalam”_ user.yml To edit the multi table Association configuration:

Multi table Association configuration

Million level commodity data real-time synchronization, query results in seconds

Note that SQL supports free combination of multiple table associations, but there are some limitations:

  • (a) The main table cannot be a subquery statement
  • (b) You can only use left outer join, that is, the leftmost table must be the main table
  • (c) Association slave table if it is a subquery, there cannot be more than one table
  • (d) There can be no where query condition in the main SQL (there can be a where condition in the sub query of the table, but it is not recommended, which may cause inconsistent data synchronization, such as modifying the field content in the where condition)
  • (e) The association condition only allows the ‘=’ operation of the primary and foreign keys, and other constant judgments are not allowed, such as: on a.role_ id=b.id and b.statues=1
  • (f) The associated condition must have a field in the main query statement, such as on a.role_ Id = b.id where a.role_ ID or b.id must appear in the main select statement
  • (g) The mapping attribute of elastic search corresponds to the query value of SQL one by one (select * is not supported), for example: select a.id as_ id, a.name, a.email as _ Email from user, where name will be mapped to the name field of ES mapping_ Email will map to mapping’s_ Here, the alias (if any) is used as the final mapping field_ ID can be filled in the_ id: _ ID mapping
(3) Configure parent child documents

With the official biz_ order.yml For example, VIM biz_ order.yml , configure parent-child document mapping:

Configure parent child document mapping

Million level commodity data real-time synchronization, query results in seconds

(4) In elastic search 6, establish the mapping relationship between index and parent-child documents

Enter kibana page, click dev tools, execute the following command to establish index and parent-child document mapping:

Establish index and parent-child document mapping

Million level commodity data real-time synchronization, query results in seconds

Among them, ES6 and kibana installation, there is no special configuration here, and it is not to be described in detail.

(5) Start the canal adapter

Enter the canaladapter / bin directory and execute/ startup.sh Start the canal adapter and observe the logs / adapter/ adapter.log Log file: manually add a new record in the search system database to see whether the following log will be printed. If it is printed, it means that the configuration is successful.

Configure adapter log sample correctly

Million level commodity data real-time synchronization, query results in seconds

Operation results

Now, we can use kibana to execute DSL statements to query. We have added a “KFC” store in the merchant system in advance, and then added “tomatoes” and “fresh tomatoes” in the commodity system, and linked the products to “KFC”. Then we query “KFC” or “tomato” and get the following results (excluding the ES default field)

Million level commodity data real-time synchronization, query results in seconds

Results of a query through DSL
As can be seen from the figure, we can query products by merchant name, store and commodity by commodity name, and canal supports real-time addition, deletion and modification of data, so the data of es will be consistent with the business system and commodity system, and the data structure includes merchants and corresponding commodities to meet the business needs.

5. Summary

So far, the basic framework of commercial goods search system based on canal, Kafka, mysql8 and elasticsearch6 technologies has been completed. We use the canal deployer to read the binlog of the business and commodity system’s MySQL database in real time and send it to Kafka. Then, the Canada adapter consumes Kafka. The binlog JSON data is multi table associated and mapped by parent-child documents. Finally, it is stored in ES6 for the upper level search service to call.

**The search service system finally went online successfully, providing real-time data synchronization for millions of business products of the company, displaying the second level search results, meeting the business requirements, the boss said, adding chicken legs to each of the R & D teams! Think about it. I’m a little excited. Hey, hey~~
**

Recommended reading

  • Detailed explanation of Alibaba data platform architecture
  • Design idea of website authority management system based on RBAC
  • [concurrent programming] essential foundation of high concurrent programming

Learning material sharing

Collection ready12 setsMicroservices, spring boot, and spring cloud core technical data. This is part of the information directory:

  • Spring security authentication and authorization
  • Spring boot project practice (background service architecture and operation and maintenance architecture of small and medium sized Internet companies)
  • Spring boot project (enterprise rights management project)
  • Spring cloud microservice architecture project (distributed transaction solution)
  • Spring cloud + spring boot + docker complete video tutorial
  • Spring cloud website project (real estate sales)
  • Spring cloud microservice project practice (large e-commerce architecture system)
  • Single point landing basic to actual combat
  • Spring boot project actual combat (enterprise wechat ordering system) (primary practice)
  • Spring cloud Internet application project (weather forecast system)
  • Spring source code deep analysis + annotation development full set of video tutorial
  • Spring boot project practice (financial product system)

    Official account back office replyarch028Get information:

    Million level commodity data real-time synchronization, query results in seconds