The system is named
Recs, inspired by
Recommendation System。 The logo is made by online logo website. The author developed the project for learning
This project draws lessons fromECommerceRecommendSystemOpen source learning project, the front-end part of the reference more, in the author built a good framework on the basis of optimization. The UI and some bugs are modified, and some new functions are added. After the development and training of this project, the author has a more systematic understanding of big data related technologies, and has gained a lot. In the development process, we have encountered many problems, but we have overcome them one by one. The author’s experience is that the best way to solve the problem is to read official documents and actively use Google. Finally, the relevant technologies are now learning and using, and the knowledge is relatively one-sided, so there are many places to be optimized in this project. Welcome to issue, learn together and make progress together.
2. Project introduction
2.1 RECs system architecture
The main workflow of the system is as follows:
- User login / registration system.
- Users rate the product.
- The score data is sent to the real-time recommendation task of the recommendation module through Kafka.
- The system performs the real-time recommendation task, and stores the data in the rating and user product tables of HBase. Real time tasks include: real-time topn and recommendation based on user behavior.
- Real time topn stores the calculation results in the online hot table of HBase, and stores the calculation results in the online recommended table of HBase based on user behavior recommendation.
- The web side obtains the data needed by relevant modules and displays the results by querying HBase.
2.2 home page
There are four modules:
- Guess what you like: Based on user behavior recommendation, when the user rates the product, Flink scores the product according to the user’s history, and calculates the recommendation result combined with itemcf.
- Hot goods: historical hot goods
- Products with high scores
- Real time hot products: use Flink time sliding window to make statistics of popular products in the past hour, sliding every 5 minutes.
2.3 commodity details
- Display product details
- People who have seen the product have also seen it: recommend it based on itemcf
3. Module description
3.1 recommendation module
Development environment: IDEA + Maven + git + windows && wsl
Software architecture:flink + hbase + kafka + mysql + redis
Development guidance:The calculation tasks of Flink are stored in the task package. Dataloader is the data loading task, offline recommender is the offline recommendation task, and onlinerecommender is the real-time recommendation task. Read the code in modules.
3.1.1 guess you like it
Real time recommendation:
- Query the list of the most recently rated products from redist. The redis key is
“ONLINE_PREFIX_” + userId
- From HBase table
userProductQuery the user’s historical score product list in.
- According to the user’s score
productIdFrom HBase table
itemCFRecommendQuery the related product list in the table
- The list of related products is filtered according to the list of products with the latest score and the list of products with historical score.
- According to the similarity between the recently scored products and the current products and the user’s historical scores, the recommended products are reordered.
3.1.2 hot products
The products scored by users at all times are sorted in reverse order according to the scoring times, and the popular products are selected.
- Flink will HBase
ratingThe table is loaded into memory, and the occurrence times are counted according to productid group
- Sort in reverse order according to the number of occurrences.
3.1.3 high praise products
According to the average score of the goods, they are sorted in reverse order,
3.1.4 real time hot products
timeWindowSort the data of the past hour to select the most popular products. The time window slides every five minutes.
3.1.5 people who have seen the product have also seen it
Item based recommendation (itemcf)
3.1.6 data loading module
Consumption of Kafka topic is
ratingAnd store the data in HBase
ratingIn order to ensure the uniqueness of data
rowKeyThe format is:
3.2 back end (recommended_ backend)
Development environment: IDEA + Maven + git + windows && wsl（ubuntu 20.4）+ postwomen
Technical architecture: Springboot + hibernate + mysql + hbase
Development guidance:The core module of furestcontroller is furestcontroller.
3.3 front end (recommended_ front)
Development environment: VScode + nodejs + windows && wsl
Technical architecture: Vue + typescript + element-ui
4. Development and operation steps
4.1 environment construction
4.2 create data table
There are two tables. One is
productUsed to store product details, another is
userUsed to store user information.
Create table SQL script in
Create table statement in
4.3 data warehousing
Product information is stored in
recommendation/src/main/resources/product.csvIn the file, we run a Flink task to load the data into mysql. The corresponding table was created earlier
- Start Flink and run
- The product information is stored in MySQL
4.4 start development environment
- Execute startup script
The startup script is to start HBase, Kafka, Flink, redis, zookeeper, etc. deployed before with one click
In order to facilitate the development, the author wrote shell scripts to start and stop the environment
recommendation/main/resourcesDirectory, respectively startAll.sh And stopAll.sh
- Start the springboot backend project
- Start Vue front end
- Start real time recommendation task
- Offline tasks start regularly
Finally, the author is going through the autumn recruitment in 2020. If you think this project is good, please give a star!