Free fish recommendation, know you like.
Chu Rui (Liu Sijia), Jin Yi (Chen Xiaoping), Ming Dong (Ou Mingdong), Zixu (Yang Zixu).
Intelligent recommendation, feature engineering and feature processing.
Explanation of terms:
- Machine learning: an algorithm for generating “model” from data on a computer;
- Data set: a set of records;
- Model: generally referring to the results learned from data;
- Feature: the information extracted from the original data that is useful for the result prediction, which can be text or data;
- Feature engineering is a process of using professional background knowledge and skills to process data, so that features can play a better role in machine learning algorithm. The process includes feature extraction, feature construction, feature selection and other modules;
- End computing / edge computing: different from the traditional centralized thinking, its computing nodes are closer to the terminal, making the service response higher than the traditional centralized cloud computing.
In recent years, with the rapid development of cloud computing and big data, intelligent recommendation based on machine learning has also been developed rapidly. With cloud computing, not only the model can be updated every day, but also the hourly model can be updated for more accurate recommendation. In order to better optimize the recommendation system and enrich the real-time characteristics, a large number of behavioral data are usually collected and reported to the cloud. With the continuous growth of Daw (number of active users per day), the problem of cloud centric computing mode is exposed.
The centralized mode not only consumes a lot of server resources, but also faces the delay caused by massive data processing（Minute delay, unforgivable）。
2. Real time of recommendation system
Why is the delay a fatal and unforgivable problem for the recommendation system?
The more real-time the recommendation system, the faster the update speed, the more accurate the recommendation. Nowadays, users are more and more impatient. If they don’t grasp users’ heart quickly, users will easily lose.
The real-time nature of recommendation system refers to:
(1) real time of “model”
(2) real time of “feature”
2.1 real time of “model”
Real time of “model”: constantly updating the model can let the model find the latest popular trends and the latest correlation information.
For example, if everyone pays attention to a star, the model will find that the star has become a popular trend and what everyone pays attention to through most people’s behavior data.
At present, the real-time performance of the model is not analyzed in detail, and the current model training is still strongly dependent on cloud computing.
Of course, many people are also trying to combine end-to-end computing and cloud computing. I believe that in the near future, there should be achievements.
2.2 real time of “features”
Real time of “feature”: Taking the user’s latest behavior data as the input feature, the model will discover the user’s latest behavior habits and make relevant prediction and recommendation.
For example, if you browse a mobile phone, the system will push you mobile phones of different brands and prices; if you continuously browse Huawei mobile phones, the system will push you Huawei mobile phones of different prices. The richer your continuous behavior data, the more accurate your recommendation will be.
PS: we say that the delay caused by data processing of centralized computing mode affects the real-time characteristics.
3. Real time characteristics of recommendation system
Why can’t feature updates be forgiven for a minute delay? And the model only needs to be updated daily?
The real-time nature of the model represents the popular trend. Generally, the popular trend will not change in real time, and only needs to meet the trend change of the activity.
The real-time characteristic represents the real-time behavior of individual users. If the real-time behavior of users cannot be responded, users may lose.
The real-time nature of features is very important. In fact, not all features have real-time requirements, such as gender, age, etc. We call it real-time feature, which requires real-time feature.
The recommended real-time features of idle fish include:
(1) browsing behavior features, such as exposure, exposure duration, rolling speed, etc
(2) the behavior characteristics of the detail page, such as entering the detail page, inquiry list, collection, comment, like, click the big picture, etc
(3) characteristics of purchase behavior, no recommendation after placing an order.
Browsing behavior characteristics
In the recommended scenario, even users don’t know what they want. We hope that by refining the characteristics of users’ browsing behavior, we can interpret the products they are concerned about.
After the introduction of end-to-end computing, we can collect more dimensional user behavior data to make the user model more accurate.
PS: for a long time, the browsing behavior characteristics recommended by idle fish are almost missing. The algorithm has been recommending pseudo exposure data, because there are not enough servers in the cloud to process the characteristics of the exposure data. The amount of exposure data is too large. Each user can generate multiple exposure data by rolling at will.
4. Real time clustering feature & real time intention feature
In order to solve the problem of cloud processing delay, we abstract the real-time features and precipitate two real-time feature processing schemes on the end:
(1) real time clustering features
(2) real time intention characteristics.
4.1 real time clustering features
We use end-to-end computing to conduct real-time clustering statistics on behavioral data.
We are now clustering statistics by 60 second time slot (too large time slot is not good, too small data is reported too frequently, QPS is too high);
We count the exposure times, exposure time, click PV and other data of a category (mobile phone, skirt, beauty makeup, etc.) in 60 seconds
If it is a strong feature event (click event), the feature report will be triggered in real time. If it is a weak feature event (exposure event), it will accumulate to 10 and report again.
Data statistics two weeks after full release:
4.2 real time intention characteristics
In this scheme, the end intelligence model is used to interpret the real-time intention of behavior data and report it to the cloud.
PS: we haven’t completed the plan yet. We are still trying it.
In this paper, we use end-to-end computing (edge computing) to advance Feature Engineering, realize decentralization, solve the problem that a large number of raw data can not be processed in real time in the cloud, and improve the real-time feature of the recommendation system.
The introduction of end-to-end computing not only improves the real-time performance of the recommendation system, but also improves the user’s behavior data, making many previously impossible things possible, greatly improving the recommendation efficiency.
Practice has proved that end cloud collaboration will become the future trend.
Now, we have reported the feature data to the cloud in real time, but the recommendation of related products will not appear until the next interface request. There is a real-time response problem.
Recently, we are trying to redraw the data of the unexposed goods (it is still in the online AB test, the effect is very good, it should be full soon).
Scheme overview: the end model determines whether to redraw the data, and then the client initiates an interface request (request new recommended product data), and then replaces the unexposed product.
PS: This paper will not introduce the redraw scheme in detail, and there will be special articles later.
Read the original text
This is the original content of yunqi community, which can not be reproduced without permission.