Product recommendation system based on Flink

Time:2020-11-27

FlinkCommodityRecommendationSystem

Recs Flink commodity recommendation system

1. Preface

The system is namedRecs, inspired byRecommendation System。 The logo is made by online logo website. The author developed the project for learningFlinkAnd related big data middleware. For the purpose of demonstration, the matching web is developed by using springboot + Vue. The author has the experience of web development in Python + Django + JavaScript. Considering that the project is developed with Java, in order to unify the technology stack, now I have learned the springboot framework and Vue.

This project draws lessons fromECommerceRecommendSystemOpen source learning project, the front-end part of the reference more, in the author built a good framework on the basis of optimization. The UI and some bugs are modified, and some new functions are added. After the development and training of this project, the author has a more systematic understanding of big data related technologies, and has gained a lot. In the development process, we have encountered many problems, but we have overcome them one by one. The author’s experience is that the best way to solve the problem is to read official documents and actively use Google. Finally, the relevant technologies are now learning and using, and the knowledge is relatively one-sided, so there are many places to be optimized in this project. Welcome to issue, learn together and make progress together.

2. Project introduction

2.1 RECs system architecture

Product recommendation system based on Flink

The main workflow of the system is as follows:

  • User login / registration system.
  • Users rate the product.
  • The score data is sent to the real-time recommendation task of the recommendation module through Kafka.
  • The system performs the real-time recommendation task, and stores the data in the rating and user product tables of HBase. Real time tasks include: real-time topn and recommendation based on user behavior.
  • Real time topn stores the calculation results in the online hot table of HBase, and stores the calculation results in the online recommended table of HBase based on user behavior recommendation.
  • The web side obtains the data needed by relevant modules and displays the results by querying HBase.

2.2 home page

Product recommendation system based on Flink

There are four modules:

  • Guess what you like: Based on user behavior recommendation, when the user rates the product, Flink scores the product according to the user’s history, and calculates the recommendation result combined with itemcf.
  • Hot goods: historical hot goods
  • Products with high scores
  • Real time hot products: use Flink time sliding window to make statistics of popular products in the past hour, sliding every 5 minutes.

2.3 commodity details

Product recommendation system based on Flink

  • Display product details
  • People who have seen the product have also seen it: recommend it based on itemcf

2.4 login

Product recommendation system based on Flink

3. Module description

3.1 recommendation module

Development environment: IDEA + Maven + git + windows && wsl

Software architecture:flink + hbase + kafka + mysql + redis

Development guidance:The calculation tasks of Flink are stored in the task package. Dataloader is the data loading task, offline recommender is the offline recommendation task, and onlinerecommender is the real-time recommendation task. Read the code in modules.

3.1.1 guess you like it

Real time recommendation:

  • Query the list of the most recently rated products from redist. The redis key is“ONLINE_PREFIX_” + userId
  • From HBase tableuserProductQuery the user’s historical score product list in.
  • According to the user’s scoreproductIdFrom HBase tableitemCFRecommendQuery the related product list in the table
  • The list of related products is filtered according to the list of products with the latest score and the list of products with historical score.
  • According to the similarity between the recently scored products and the current products and the user’s historical scores, the recommended products are reordered.
3.1.2 hot products

The products scored by users at all times are sorted in reverse order according to the scoring times, and the popular products are selected.

  • Flink will HBaseratingThe table is loaded into memory, and the occurrence times are counted according to productid group
  • Sort in reverse order according to the number of occurrences.
3.1.3 high praise products

According to the average score of the goods, they are sorted in reverse order,

3.1.4 real time hot products

Using FlinktimeWindowSort the data of the past hour to select the most popular products. The time window slides every five minutes.

3.1.5 people who have seen the product have also seen it

Item based recommendation (itemcf)

3.1.6 data loading module

Consumption of Kafka topic isratingAnd store the data in HBaseratingIn order to ensure the uniqueness of datarowKeyThe format is:

userId_productId_timestamp

3.2 back end (recommended_ backend)

Development environment: IDEA + Maven + git + windows && wsl(ubuntu 20.4)+ postwomen

Technical architecture: Springboot + hibernate + mysql + hbase

Development guidance:The core module of furestcontroller is furestcontroller.

Project architecture:

Product recommendation system based on Flink

3.3 front end (recommended_ front)

Development environment: VScode + nodejs + windows && wsl

Technical architecture: Vue + typescript + element-ui

4. Development and operation steps

4.1 environment construction

  • mysql
  • hbase
  • flink
  • redis
  • kafka
  • zookeeper

4.2 create data table

  • mysql

There are two tables. One isproductUsed to store product details, another isuserUsed to store user information.

Create table SQL script inrecommendation/src/main/resources/mysql.sqlin

  • hbase

    • rating
    • userProduct
    • itemCFRecommend
    • goodProducts
    • historyHotProducts
    • onlineRecommend
    • onlineHot

Create table statement inrecommendation/src/main/resources/hbase.txtin

4.3 data warehousing

Product information is stored inrecommendation/src/main/resources/product.csvIn the file, we run a Flink task to load the data into mysql. The corresponding table was created earlierproductsurface

  • Start Flink and runrecommendation/.../task/DataLoader/DataLoaderTask.java
  • The product information is stored in MySQL

4.4 start development environment

  • Execute startup script

The startup script is to start HBase, Kafka, Flink, redis, zookeeper, etc. deployed before with one click

In order to facilitate the development, the author wrote shell scripts to start and stop the environmentrecommendation/main/resourcesDirectory, respectively startAll.sh And stopAll.sh

  • Start the springboot backend project
  • Start Vue front end
  • Start real time recommendation task
  • Offline tasks start regularly

Finally, the author is going through the autumn recruitment in 2020. If you think this project is good, please give a star!