In depth analysis of the core concepts and advantages of the open source recommendation algorithm framework easyrec

Time:2022-5-10

Introduction: how to quickly build a recommendation model through machine learning Pai

Author: Cheng Mengli – machine learning Pai team

With the popularity of mobile apps, personalized recommendation and advertising have become an indispensable part of many apps. They have brought huge improvements in improving the user experience and increasing the revenue of app. The application of deep learning in the field of search, broadcasting and promotion has also been very in-depth, and has brought great improvement to the effect of various scenes. For each stage of the recommendation process, there are many models in the industry. Most of these models also have open source implementations, but these implementations are usually scattered in every corner of GitHub, and their data processing and feature construction methods are different. If we want to apply these models in a new scene, we usually need to make more changes:

  • For input transformation, the input format and feature structure of open source implementation are usually inconsistent with those on the line. It usually takes about 1-2 weeks to adapt an algorithm. It is also inevitable to introduce bugs due to unfamiliar code. If you want to try five algorithms, it takes five times the transformation time. If the algorithm resources are limited, is it necessary to bear the pain and give up some attempts that may be effective?
  • Many open source implementations only achieve good results on public data sets, and the optimal parameters on public data sets are not necessarily suitable for actual scenarios, so parameter tuning also requires a large workload; Sometimes the effect is not good, not because the method is not good, but because the selected parameters are not very good. If there is no systematic parameter adjustment method, many algorithms simply try. Without deep explore, how can we have a deep understanding of the algorithm? Why can’t you find the seemingly simple improvement? Why did you try a similar direction, but didn’t get the effect? The effects are usually piled up with computational power and countless attempts;
  • The open source implementation uses tensorflow 1.4, while the online tensorflow 2.3, the parameters of many functions have changed (do you want to scold Google a hundred times here, and the API you promised will not change); Because many open source implementations have not been verified in actual scenarios, their reliability is questionable. There may be a drop out and a BN less, and the effect is far from satisfactory;
  • It took a lot of effort to adjust the effect of the model, and found that there will be many problems on the line, such as too slow training speed, too large memory occupation, reasoning QPS can not keep up, good offline effect, online effect and so on. Encounter so many problems, do you still have the energy to do your next idea? Can you still have high morale and persevere in exploring new directions?

These problems make us have more than enough heart but less strength. We work overtime late into the night every day. We don’t know when it will be a head: we have to use nine cattle and two tigers to verify a simple idea. The so-called world martial arts can only be broken quickly, especially for algorithm students in the field of search, broadcasting and promotion: through fast iteration, we can verify more ideas, find more problems, and find out the optimal characteristics and model structure. If the speed is slow, maybe your model has not been adjusted, your business objectives have changed, and the layout of the front end has also changed. Your business side may not believe you, and you will not have a chance to go online.

At this point, our appeal is relatively clear. We just want to write less code or even verify our ideas without writing code. In response to these problems and demands, we have launched a new and one-step recommendation modeling framework, which is committed to helping you solve the problems in recommendation modeling, feature construction, parameter tuning, deployment, etc., so that you can write less code, do less repetitive and meaningless dirty work (these easyrecs are contracted), go less to pits and step less on Thunder (these easyrecs are all for you), So that we can quickly verify the new idea online and improve the iterative efficiency of the recommended model.

advantage

In depth analysis of the core concepts and advantages of the open source recommendation algorithm framework easyrec

Compared with other modeling frameworks, easyrec has significant advantages in the following aspects:

  • Support multi platform and multi data source training
  • Supported platforms include: maxcompute (original ODPs), datascience (based on kubernete), DLC (deep learning container), alink, local;
  • Supported data sources include OSS, HDFS, hive, maxcompute table, Kafka and datahub;
  • Users usually only need to define their own model. After passing the local test, they can train on a variety of distributed platforms;
  • Support multiple tensorflow versions (> = 1.12, < = 2.4, pai-tf), which can seamlessly connect to the user’s environment without code migration and change;
  • Support the implementation of mainstream Feature Engineering, especially the display of cross features, which can significantly improve the effect;
  • Support HPO automatic parameter adjustment, which significantly reduces the workload of user parameter adjustment and improves the model effect in multiple scenarios;
  • The mainstream depth model is realized, covering recall, sorting, rough sorting, rearrangement, multi-objective, multi interest and so on;
  • Support earlystop, bestexport, feature importance, feature selection, model distillation and other advanced functions.

framework

In depth analysis of the core concepts and advantages of the open source recommendation algorithm framework easyrec

Easyrec modeling framework is a data parallel training method based on estimator as a whole, and supports multi machine and multi card training through the structure of parameter server. The main modules of easyrec include input, feature construction, depth model, loss and metric, and each module can be customized. Aiming at various problems that users may encounter when training with TF, such as worker exit failure and using num_ Epoch evaluator cannot exit and AUC calculation is inaccurate. Easyrec has made in-depth optimization. Easyrec has also made in-depth optimization in combination with Pai TF (PAI optimized tensorflow) and aligraph to solve the problems of slow training speed of Adam optimizer, slow asynchronous training, hash conflict and negative sampling in large sample space.

Model

Easyrec has built-in advanced in-depth learning model in the industry, covering the requirements of recommending the whole link, including recall, rough sorting, sorting, rearrangement, multi-objective, cold start, etc.

In depth analysis of the core concepts and advantages of the open source recommendation algorithm framework easyrec

Easyrec also supports user-defined models. As shown below, to implement a user-defined model in easyrec, you only need to define three parts: model structure, loss and metric. Data processing and feature engineering can directly reuse the capabilities provided by the framework. Therefore, it can significantly save users’ modeling time and cost, and focus on the exploration of model structure. For common model types, such as rankmodel and multitask model, the loss and metric parts can also directly reuse the definition of the parent class.

In depth analysis of the core concepts and advantages of the open source recommendation algorithm framework easyrec

Automatic tuning and automatic feature Engineering

Easyrec automatic parameter adjustment has the ability of Pai automl automatic parameter adjustment, and realizes the automatic tuning of various parameters. Any parameter defined in easyrec can be searched. Common parameters include hash_ bucket_ size, embedding_ dim, learning_ rate,dropout, batch_ Norm, feature selection, etc. When you can’t get some parameters on time, you can start automatic parameter adjustment to help you find the best setting; The parameters obtained by automatic optimization are usually better than those set by patting the head, and sometimes bring unexpected surprises.

Feature engineering is usually the key to improve the recommendation effect. Making high-order feature combination usually helps to improve the model effect, but the space of high-order combination is very large. Mindless combination will lead to feature explosion and slow down the speed of training and reasoning. Therefore, easyrec introduces the ability of automatic feature engineering to automatically find high-order features with improvement and further improve the effect of the model.

Search results (top5):

In depth analysis of the core concepts and advantages of the open source recommendation algorithm framework easyrec

In depth analysis of the core concepts and advantages of the open source recommendation algorithm framework easyrec

Model deployment

Easyrec model can be deployed to Pai EAS environment with one click or through TF serving. In order to improve the information performance, easyrec introduces the PAI blade’s ability to perform placement optimization, Op fusion, subgraph de duplication and other functions. Through the above optimization, QPS is increased by more than 30% and RT is reduced by 50%. In the future, the function of fp16 will be introduced to further improve the information performance and reduce the memory consumption. In order to support large-scale embedding, easyrec splits and op replaces large models and stores embedding in redis and other distributed caches, breaking through the limitation of memory. Getting embedded from redis will be slower than memory. Cache high-frequency IDs to reduce access to redis and improve the speed of embedded lookup.

In depth analysis of the core concepts and advantages of the open source recommendation algorithm framework easyrec

Feature consistency

Feature engineering is a key part of the search, broadcasting and promotion process, and it is usually the reason for the inconsistency between online and offline effects. In order to maintain the consistency of offline and online in fast iteration, the common method is to use the same set of code online and offline. Construction process of offline training data: first, construct user feature (including real-time and offline parts), item feature and context_ Feature, then join the training samples (including label), and finally generate the training samples input into easyrec through the jar package of feature engineering. Online process: import the user feature (offline part) and item feature into redis, hologres and other distributed storage, and the recommendation engine is based on the user_ ID and item_ ID to query the corresponding features, call the library of Feature Engineering for processing, and then send it to easyrec model for prediction. The real-time features of the online part are usually generated by the platforms supporting streaming computing such as blink and alink, while the real-time features of the offline part are constructed in two ways: offline simulation and online drop features. These two methods have their own advantages and disadvantages: due to log loss and other problems, offline simulation usually has a small amount of inconsistency with online simulation; If you want to add new features, you usually have to wait a long time to save enough samples. Our solution is to drop the sequence of user behavior online, and then process various statistical features offline through the same jar package, such as 1H / 2H // 24h clicks.

Online feature engineering requires higher calculation efficiency, and the amount of calculation is also larger than that of offline: the sample of offline calculation is usually one user paired with m exposed items (if the recall model is used, some randomly sampled negative samples will be added), while the sample of online calculation is one user paired with n items (n > > m). If the naive calculation method is adopted for online calculation, expand a request into n samples for calculation respectively, the efficiency usually can not keep up. It is not difficult to find that the user feature has made more repeated calculations. Optimizing the calculation efficiency of the user feature can significantly improve the online QPS. In combination with the feature generation module used in Amoy system, we have made in-depth optimization, including memory allocation, string parsing, de duplication, multi-threaded parallel computing, etc. on the premise of ensuring consistency, we have significantly improved the efficiency of computing.

In depth analysis of the core concepts and advantages of the open source recommendation algorithm framework easyrec

Incremental training and real-time training

Incremental training can usually bring significant improvement in the effect, because incremental training has seen more samples and more fully trained the embedded part. Easyrec supports restoring from the previous day’s checkpoint and then continuing training on the data of the new day. In order to quickly adapt to the scenes where the sample distribution of news, holidays, promotion and other scenes changes rapidly, we provide support for real-time training. Easyrec constructs real-time samples and features through blink, calls feature generation to process the features, and then reads the real-time sample stream through Kafka and datahub for training. The stability of real-time training is more important. During the training process, we monitor the positive and negative sample ratio, the distribution of features and the AUC of the model in real time. When the distribution of samples and features changes beyond the threshold, we will alarm and stop updating the model. When saving the checkpoint, easyrec will synchronously record the offsets of the current training (there will be multiple offsets when multiple workers train together). When the system restarts in case of failure, it will resume the training from the saved offsets.

In depth analysis of the core concepts and advantages of the open source recommendation algorithm framework easyrec

Effect verification

Easyrec has been verified in multiple user scenarios (20 +), including product recommendation, information flow advertising, social media, live broadcast, video recommendation, etc. The following are some of the improvements that some customers have made using easyrec in their scenarios:

  • Advertising push of an app: AUC is increased by 1 point, online CTR is increased by 4%, and resource consumption is reduced by half;
  • A large live broadcast app: Based on easyrec multitower model, AUC increased by 2%;
  • A large social media: Based on the easyrec multitower model, AUC increased by 6% and online effect increased by 50%;
  • A large e-commerce platform: Based on the easyrec DSSM model, the online UV value increased by 11% and uvctr increased by 4%;
  • A short video app: Based on the easyrec dbmtl model, the online duration is increased by 30%, and the + multimodal feature is further increased by 10%.

Finally, easyrec is open source through GitHub
(https://github.com/alibaba/Ea…)Here, we welcome all fellow travelers to jointly build, including: enriching the feature structure of each scene, introducing more models verified in the actual scene, improving the performance of model offline and online reasoning, and so on. In this increasingly inwardly involved industry (we can imagine why tensorflow is getting worse and worse, which should have a lot to do with inwardness. The modification of API is relatively random, there are problems of over design and expansion, and bugs emerge one after another. The world has been suffering for a long time), we hope to form everyone’s joint force and illuminate our common path through such an open source work. Here, we also pay tribute to our predecessors xgboost and hope that this work can be carried forward and have a far-reaching impact like xgboost.

Original link
This article is the original content of Alibaba cloud and cannot be reproduced without permission.