Weiwei technology team (Kevin)
A while ago, the boss arranged a new task to build a business commodity search system, which can provide users with fast and accurate search ability. When users input search content, they should be able to search from the two dimensions of business name and commodity name. The search results are sorted according to the accuracy rate, and the data structure is combined according to the association relationship of the commodities to which the business belongs The API is provided to the business system to call.
The background is very simple, the reality is quite complex! We are faced with the following problems:
① Business database and commodity database are different servers, and the amount of data reaches one million. How can we achieve cross database data synchronization?
② There is a subordination relationship between the business and the product data. Otherwise, the spicy chicken leg fort of KFC will be hanged to McDonald’s, which will be embarrassing!
3. Business data is constantly updated, such as price revision, inventory, shelves and so on. Search service can not find out a bunch of outdated data. If customers clearly find out the goods, they will be out of the rack after entering the point, so customers will have to make complaints about it. How to achieve real-time synchronization of search data and source database addition, deletion and modification?
With the above three questions, we started the overall architecture design of search service.
System architecture design ideas
In order to design a suitable system architecture, we analyzed the status quo.
First of all, business data and commodity data are stored in two independent MySQL 8 databases. In order to meet the association between business data and commodity data, we need to transfer the required tables in the two databases to our search system database in real time.
Secondly, after the data from the merchant and commodity database ETL to the search system database, it needs to be real-time combined into the merchant related commodity data structure, and stored in ES in the format of parent-child document.
Finally, the addition, deletion and modification operations of the merchant and commodity database need to be synchronized to es in real time, that is, the data in ES needs to support real-time addition, deletion and modification.
Therefore, we designed two canal components, the first of which is to implement data ETL, extract some tables and fields of merchant and commodity databases to search service database; then, using the second canal, we read binlog of MySQL database of search service, and transmit it to Kafka message queue in real time, and then by canal Adapter is used to correlate data, map parent-child documents, etc., and store processed data in elasticsearch.
The specific system architecture design is shown in the following figure.
The design of the business commodity search system architecture
Project actual combat
1. Environment and software description
- Operating system: CentOS 7
2. Using canal to realize data ETL to mysql8
This step uses canal to extract the required tables from two independent MySQL 8 databases to the MySQL database of search service.
2.1 installing canaldeployer
- (1) Decompress canal.deployer -1.1.4. tar.gz
- (2) Configure canal deployer
Enter the canaldeployer / conf directory and modify it canal.properties File, mainly configure servermode, MQ and destination.
Firstly, we change the server mode to Kafka mode to increase the buffer capacity and stability of the system
Next, configure the MQ information of Kafka (please install Kafka by yourself)
Kafka MQ information
Finally, configure the instances that need to be instantiated. Three instances are configured here, which means that canal deploy will start these three instances and synchronize the binlog of MySQL into the topic of Kafka. As shown in the figure below:
Destinations instance configuration
(3) Configure canal deployer instance
Enter the canaldeployer / conf / example directory and find a instance.properties File, this is the example given by canal, we can refer to its configuration.
- ① We copy the entire example directory and rename it to one of the destinations configured in the previous step, such as xxsearch;
- ② Enter the directory xxsearch and edit instance.properties File, which mainly configures the information of the source database, the required data tables and fields, and specifies the topic name of Kafka. In this way, the binlog of the source database will be converted into JSON data and transferred to Kafka topic through canal deployer in real time. As follows:
Canaldeploy instance source database configuration
Configuration of canaldeploy instance Kafka topic
- ③ Enter the canaldeployer / bin directory and execute/ startup.sh , start the canal deployer and its instance.
So far, the canal deployer is built.
2.2 installation canal.adapter
We need to take advantage of it canal.adapter The binlog JSON data in Kafka topic is stored in MySQL 8 after cleaning and conversion. Since canal does not support MySQL 8, we need to make some adjustments.
- (1) Add mysql8 connection driver
decompression canal.adapter -1.1.4. tar.gz , enter the canaladapter / lib directory, remove mysql-connector-java-5.1.40.jar, and import mysql-connector-java-8.0.18.jar
- (2) Configure the canal adapter to output data to MySQL 8.
Enter the canaladapter / conf directory and edit it application.yml File, mainly configure consumption Kafka, source database information and search system database information, as shown below:
ETL to MySQL 8 configuration
Next, go to the canaladapter / conf / RDB directory and use the official mytest_ user.yml For example, configure Kafka topic name, source database name, source data table name, target database name and target data table name. It is recommended that a table corresponds to a YML file.
ETL table structure mapping configuration
- (3) Start canaladapter
Enter the canaladapter / bin directory and execute/ startup.sh Start the canal adapter and observe the logs / adapter/ adapter.log Log file: manually add a new record in the search system database to see whether the following logs will be printed, that is, there are two records, one info and one debug, indicating that the configuration is successful.
At this point, the ETL phase of the data is completed, and the data can be synchronized from two different MySQL 8 databases to the MySQL database of the search service in real time.
3. Realize data multi table Association and parent-child document mapping
(1) Configure the canaladapter of the second canal
Enter the canaladapter / conf directory and edit it application.yml File, mainly configure consumption Kafka, search system database, and ES connection information, as shown below:
Configuration of canaladapter MQ and MySQL
Canaladapter es configuration
(2) Configure multi table Association
Enter the directory “canvipter / myalam”_ user.yml To edit the multi table Association configuration:
Multi table Association configuration
Note that SQL supports free combination of multiple table associations, but there are some limitations:
- (a) The main table cannot be a subquery statement
- (b) You can only use left outer join, that is, the leftmost table must be the main table
- (c) Association slave table if it is a subquery, there cannot be more than one table
- (d) There can be no where query condition in the main SQL (there can be a where condition in the sub query of the table, but it is not recommended, which may cause inconsistent data synchronization, such as modifying the field content in the where condition)
- (e) The association condition only allows the ‘=’ operation of the primary and foreign keys, and other constant judgments are not allowed, such as: on a.role_ id=b.id and b.statues=1
- (f) The associated condition must have a field in the main query statement, such as on a.role_ Id = b.id where a.role_ ID or b.id must appear in the main select statement
- (g) The mapping attribute of elastic search corresponds to the query value of SQL one by one (select * is not supported), for example: select a.id as_ id, a.name, a.email as _ Email from user, where name will be mapped to the name field of ES mapping_ Email will map to mapping’s_ Here, the alias (if any) is used as the final mapping field_ ID can be filled in the_ id: _ ID mapping
(3) Configure parent child documents
With the official biz_ order.yml For example, VIM biz_ order.yml , configure parent-child document mapping:
Configure parent child document mapping
(4) In elastic search 6, establish the mapping relationship between index and parent-child documents
Enter kibana page, click dev tools, execute the following command to establish index and parent-child document mapping:
Establish index and parent-child document mapping
Among them, ES6 and kibana installation, there is no special configuration here, and it is not to be described in detail.
(5) Start the canal adapter
Enter the canaladapter / bin directory and execute/ startup.sh Start the canal adapter and observe the logs / adapter/ adapter.log Log file: manually add a new record in the search system database to see whether the following log will be printed. If it is printed, it means that the configuration is successful.
Configure adapter log sample correctly
Now, we can use kibana to execute DSL statements to query. We have added a “KFC” store in the merchant system in advance, and then added “tomatoes” and “fresh tomatoes” in the commodity system, and linked the products to “KFC”. Then we query “KFC” or “tomato” and get the following results (excluding the ES default field)
Results of a query through DSL
As can be seen from the figure, we can query products by merchant name, store and commodity by commodity name, and canal supports real-time addition, deletion and modification of data, so the data of es will be consistent with the business system and commodity system, and the data structure includes merchants and corresponding commodities to meet the business needs.
So far, the basic framework of commercial goods search system based on canal, Kafka, mysql8 and elasticsearch6 technologies has been completed. We use the canal deployer to read the binlog of the business and commodity system’s MySQL database in real time and send it to Kafka. Then, the Canada adapter consumes Kafka. The binlog JSON data is multi table associated and mapped by parent-child documents. Finally, it is stored in ES6 for the upper level search service to call.
**The search service system finally went online successfully, providing real-time data synchronization for millions of business products of the company, displaying the second level search results, meeting the business requirements, the boss said, adding chicken legs to each of the R & D teams! Think about it. I’m a little excited. Hey, hey~~
- Detailed explanation of Alibaba data platform architecture
- Design idea of website authority management system based on RBAC
- [concurrent programming] essential foundation of high concurrent programming
Learning material sharing
Collection ready12 setsMicroservices, spring boot, and spring cloud core technical data. This is part of the information directory:
- Spring security authentication and authorization
- Spring boot project practice (background service architecture and operation and maintenance architecture of small and medium sized Internet companies)
- Spring boot project (enterprise rights management project)
- Spring cloud microservice architecture project (distributed transaction solution)
- Spring cloud + spring boot + docker complete video tutorial
- Spring cloud website project (real estate sales)
- Spring cloud microservice project practice (large e-commerce architecture system)
- Single point landing basic to actual combat
- Spring boot project actual combat (enterprise wechat ordering system) (primary practice)
- Spring cloud Internet application project (weather forecast system)
- Spring source code deep analysis + annotation development full set of video tutorial
- Spring boot project practice (financial product system)
Official account back office reply