How to design algorithm model to drive business growth: 7 steps +5 keys


Source: growingio

Author: heyunxiao, chief data scientist of growingio

Hello, I’m heyunxiao, chief data scientist of growingio. Our team has accumulated a lot of experience and generated their own thinking in the project practice of algorithm model driven business and intelligent operation.

Today, I am glad to share with you some theory and practice of model algorithms with the help of this open class.

1. From human decision-making to data decision-making,Analysis ability goes to a higher level

With the development of big data and cloud computing, our enterprises are able to obtain more and more data, and the data dimensions are becoming richer and richer. At the same time, the tools that help us mine and analyze data are becoming more and more powerful, such as the well-known cloud platforms and big data platforms. In terms of model algorithms, the industry and academia have also invested a lot of resources to develop and iterate. Therefore, a variety of new models and algorithms have been continuously developed and improved, and the development speed is very fast.

In this context, when enterprises have enough data or have the ability to collect a considerable amount of data, intelligent operation means will become a major exploration point for enterprise growth.

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 1: enterprise decision-making faces the gap from human decision-making to machine decision-making

Specifically, many enterprises have gained a lot of meaningful insight and growth in recent years with the help of Bi (Business Intelligence) tools. However, since Bi tools are designed by analysts and then used by operators, the analysis made by Bi tools is generally not particularly complex from the perspective of human input and use scenarios, and may only be low-dimensional, such as one-dimensional and two-dimensional analysis.

Relatively speaking, the large amount and high dimension of data that can be processed by the machine learning and artificial intelligence models we share today, and the complexity of the relationship between data and data that can be mined, are far beyond the scope that normal people can understand.

For example, the common ensemble model, in-depth learning, the well-known reinforcement learning model that was very successful in the application of playing go a few years ago, and the Gan model behind the face changing technology that we were very concerned about last year. So we will face a problem: how to apply these difficult to understand but powerful tools to our business system.

Today’s sharing will focus on this issue, analyze the key steps of model driven projects in combination with team cases, and help you avoid detours in the process of actual model operation or project management.

2. From data to business value,Disassemble the modeling process one by one

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 2: model driven project process

Infer business problems from pain points

Generally, the starting point of a model driven project is a business scenario or pain point, such as what kind of problem you want to solve. Our team will have a detailed communication with the demander, such as what is the reason for this problem in the business scenario, what solutions are available, and what data are required for the development model. These insights from the business perspective have very good guiding significance for our later project deployment.

For example, growingio is currently making prediction models for customers from different industries. Although the business scenarios of different industries are different, some common problems can always be found after abstraction. For example, the improvement of user conversion rate is a key indicator of common concern in different industries.

Growingio has a customer of home decoration platform. They make profits by matching designers and users with home decoration needs. The platform hopes to screen users who are willing to find designers from the platform through intelligent analysis, so as to concentrate limited resources on these high potential users for accurate operation. In general, it means reaching the most high conversion rate users with the least investment, and ultimately improving the overall conversion rate.

In real life, most industries have such an appeal to improve the conversion rate.

Data acquisition and standardization

When we have a general plan, the next step is to collect data. Some enterprises that have deployed growingio customer data platform (CDP) in advance have their data encapsulated in our customer data platform. These standardized data can be used directly.

For new customers, in addition to data communication, we also need data cleaning and process standardization. The speed of this stage will be relatively slow.

Data preprocessing

Data preprocessing is the most time-consuming process in the whole modeling process, and it is also a key step to achieve the success of the project and ensure the accuracy of the model.

Taking the retail industry as an example, suppose a retail customer wants to be able to predict which users will buy in the store, or which brand and category they will buy in the future.

Generally, the retail data we collect are all transaction data, which records the user consumption information one after another. What we need to do is to predict which users will generate purchase conversion behavior in the future, and the past user consumption data may contain this information. Therefore, we need to convert these transaction data into user characteristics and commodity characteristics for input into our prediction model.

Algorithm – model validation – output management

In the scenario of predicting which users may be converted, we usually use the binary classification model of 1 or 0.

When the scene is complex and involves many kinds of goods or items, we can do some multi classification models to expand in depth. For example, when making purchase recommendations for e-commerce platforms, in the face of too many types of goods, you can achieve the recommendation effect of “thousands of people and thousands of faces” through personalized recommendations.

After the preliminary model is built based on the above four steps, we need to do a lot of offline tests to verify the model.

After the whole process, we will make some portraits of the verified model to better understand the logic behind the model. At the same time, the model portrait can also help us determine the overall marketing strategy.

Activation and online verification

At this point, everyone has a certain understanding of the model, and the accuracy can be guaranteed. The model can be launched. After going online, we will also do some corresponding tests online, and solidify the whole process to make it an automated model product. At the same time, we will keep the model updated automatically at a certain pace according to the business needs.

Among growingio’s existing customers, their forecast models are automatically updated every day. In this case, customers can achieve continuous output through this model and enable corresponding business scenarios.

3. From model to business growth,Analyze the five key points of the project

3.1 clear modeling objectives

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 3: modeled business objectives

A successful technology model is often the result of the interaction of business insight, data and algorithm. Business objectives determine what data we need to collect, what algorithms we need to use, what validation we need to do, and what strategies we need to develop. In summary, business objectives are a fundamental driver.

Business scenarios are usually diverse, so we need to fine tune the modeling process according to customer needs.

For the students who operate users, they may need to recruit, retain, predict the loss of users and give early warning; For the front-end students, they need to develop reasonable pricing strategies and promote sales; For students in charge of advertising business, they need to evaluate the efficiency of advertising channels, so as to generate some insights about marketing mix, so as to facilitate the formulation of advertising budget and allocation strategy for the next stage.

There are also some scenarios related to the supply chain, such as inventory backlog or out of stock caused by inaccurate order evaluation. At this time, we need more accurate demand forecasting to build a more ideal supply chain and deliver a reasonable number of goods to the right place at the right time.

3.2 model and algorithm selection

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 4: selecting models and algorithms

The first step in selecting an algorithm is to clarify the purpose of selecting an algorithm, that is, to extract effective information from the data. Because the algorithm does not generate information by itself, we can only extract from the data.

So what is valid information? The effectiveness here is in terms of business objectives. Generally, the business requirements and the solutions we provide basically determine the type of algorithm we choose. For example, simple cluster classification, prediction or personalized recommendation scenarios correspond to sequential decision making, reinforcement learning and other algorithms respectively.

It is worth noting that when selecting models and algorithms, we need to make a balance between accuracy and interpretability. Generally speaking, if we do prediction model or classification model, we will pay more attention to accuracy; The ultimate purpose of attribution model is to explain which factors work and which do not work. At this time, the interpretability of the model is more important.

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 5: from simple to complex

In the initial stage of modeling, we will give priority to trying some relatively simple models, such as linear models, so as to facilitate our interpretation.

Secondly, the extensibility of the model is considered. Due to the exponential growth of current user behavior data, we need to consider this in combination with the platform computing power and the implementation of model results.

In general, there are many kinds of algorithm models, and there is a large choice. However, when we consider these dimensions, the range of choices will be relatively reduced.

3.3 characteristic engineering and selection

The purpose of the algorithm is to mine effective information from data, so the data we use to fit the algorithm should contain as much information and prediction ability as possible

Prediction model analysis

For example, in the case of growingio customers, we once made a prediction model to predict which users will go to the store for consumption. In this model, the dependent variable is whether the user will purchase in the store. However, since we predict the purchase in the next week / month / quarter, the dependent variable needs to be adjusted from the perspective of user behavior cycle or final marketing execution strategy.

As for the independent variable, we need to consider the factors that can affect users’ consumption behavior. Specifically, it is to find the information that has the ability to predict the things we need to solve, extract the variables that have the ability to predict from the information, and then put them into our model.

Such as gender, age, education level, income and other basic user characteristics; Another example is the brand, discount category and other commodity characteristics. For snacks, taste is also an important factor affecting users’ purchasing behavior.

Secondly, users’ past behavior and online behavior also contain a lot of information. For example, the RMF dimension commonly used in our marketing, user loyalty, membership, whether to participate in discount consumption, and the user’s past product browsing, collection likes and other behaviors. These data can greatly improve the prediction performance of our model.

Personalized recommendation

In the feature engineering of personalized recommendation scenarios, we need to guess what products users are more interested in. If we can put the features containing user interests into the model, the performance of the model will be better.

In addition to the basic user and commodity characteristics mentioned in the prediction model, it is also necessary to consider the commodity categories that users may be interested in in in combination with the scenarios. For example, holidays will affect users’ purchase interest; For the content platform, the terminal or the network status will affect users’ browsing preferences. At the same time, social networks are also a major factor affecting users’ purchase. The interest behaviors of users’ neighbors, friends, colleagues, etc. are highly likely to be reflected on users, which provide us with clues to analyze users’ purchase interests from another perspective.

Secondly, in the scenario of personalized recommendation of goods, we often face the problem of cold start. For some new users, we know little about them due to the lack of sufficient behavior data about them. At this time, we can use other information such as region to gain insight into users.

For example, users’ tastes may be related to their region. For example, users in the South may prefer light or sweet food; New customers probably prefer some popular products that are popular and popular, or some products with large discounts.

Of course, we also have some special scenes. On the general e-commerce platform, if we infer that users are interested in some goods, we can directly recommend them. As for the love and marriage website, it has a two-way problem, that is, only when both parties have a good impression, can it be regarded as a successful recommendation. For operators, they also have some special restrictions. Therefore, the characteristics of the industry and platform are also factors we need to consider when selecting algorithms.

In fact, we didn’t talk about the very technical issues in the whole process. We mainly judged which features were helpful to us through the business knowledge or the current situation of the field.

After determining the data based on these judgments, we can do some exploratory data analysis of EDA. These analyses can be simple, low dimensional, or some correlation or simple linear regression. The one-dimensional model can help us judge which features are more predictive.

Finally, we need to complete the verification of the model. The final verification model will provide us with more useful information, such as which features will be useful and which features will not.

3.4 inspection model

Generally speaking, model verification has two purposes, one is to verify the accuracy of the model, and the other is to solve the stability of the model. Generally speaking, we build a model based on a certain original, but if we take another sample, can the validity of the model be guaranteed?

In addition, if this model is valid for the data of the current month, will it still be able to maintain an approximate degree of accuracy by the next month? This is the problem we need to solve.

When measuring the accuracy of algorithm model projects, technical indicators and commercial indicators are usually used:

Technical indicators

There are some technical indicators for different types of models. I believe everyone is very familiar with them.

Business indicators

The ultimate goal of modeling is to bring growth to the business, so business indicators are more important than technical indicators to some extent. For example, take a look at CTR (click through rate) or the revenue increment brought by the introduction of the model.

Generally, we train the model in the offline environment, which means that after the model is online, we need to check the operation and effect of the online environment of the model, so we also have some corresponding detection methods and means.

A/b testing is a familiar testing method. More complex are cross validation, in time testing and out of time testing. These are common detection methods for basic models.

Of course, compared with common prediction models, there are some special models, such as time series, which are not composed of independent data points. There is a problem of persistence; Another example is the reinforcement learning scenario, whose output is a relatively long action sequence, which does not exist in our historical data. The test method of these models is a very interesting problem, and it is also a problem that many scholars and industry have studied.

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 6: test model effect with technical indicators

As shown in Figure 6, when we are working on a two classification or multi classification problem, we can run through the historical data, use our model to sort and group users, and distinguish between users with high probability conversion and users with low probability conversion. For example, if the top 10% of users can cover 40% of the converted users, we will probably generate 400 lift. Indicators such as ROC curve or AUC will have good guiding significance for us to evaluate the performance of the model.

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 7: test model effect with business indicators

Figure 7 shows the effect of our model from the perspective of business indicators. In the case of the home decoration platform mentioned in the first part, we divided users into high intention group and low intention group through the model, and conducted pop-up guidance when they were online.

The results showed that the conversion rate of the low willingness group was about 0.05% after the pop-up window was added; However, the conversion rate can reach 0.67% after high intention and pop-up window. The ten times conversion rate is supported by orderly project process, systematic statistical analysis and advanced model construction.

3.5 model portrait

Although the model has powerful functions and strong prediction ability, it will also become very complex and difficult to explain. Moreover, modeling projects often require multi team cooperation, so we must have consensus and enough confidence in the model, so that we can safely deliver and use the model.

After the output of the model, we will use the portrait to explain the model, which can help us sort out the logic behind the model and complete some model optimization.

Although the data logic of the model itself is very complex, the core logic behind it can be summarized in two points: first, business logic; Second, long-term effect. If we can explore the business logic that determines the model, it will play a very good role in refining our marketing strategies.

In terms of methodology, we can start from three aspects.

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 8: variable importance test

The first is the importance of variables. The current algorithm model is very complex, and we usually fit many features. In the end, we can understand the logic of the model after sorting by extracting the importance of variables (as shown in Figure 8).

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 9: validate selected metrics with business metrics

The second is to consider the relationship between the characteristics we use and the business indicators. Through the model output, we can see that a certain variable or dimension varies greatly among different populations, so they may be a better optional feature. After it is used for user clustering, it can be refined.

The third is the relationship between input and output. For example, when we have a physical examination in the hospital, many different indicators will be indicated in the report. After reading the report, the doctor will tell us something to pay attention to, or what problems we have in health. We will ask what the logic behind the doctor’s report is and how he came to a conclusion. I believe that the whole diagnosis process of doctors is very complicated, because it will never be a simple linear relationship. But when doctors explain its logic to us, they must express the whole logic in the simplest words. Because if it is too complicated, we may not be able to understand it.

In fact, the case that doctors interpret the health report also gives us some hints. Although our model is very complex as a whole, we may be able to approximate it locally with some easy to explain models such as linear models or tree models. If the approximation is successful, we can explain why the complex model can draw these conclusions according to these simple models at least in one part.

The above is the whole process and precautions of modeling.

4. Growingio digital growth solution

In this part, I want to briefly introduce some work growingio is currently doing, specifically, the idea behind the intelligent solution platform.

If we want to automate the model as much as possible, we need to consider at least three aspects: computing platform, data and algorithm.

I believe we are familiar with the concepts of computing platform, cloud computing and algorithm. Here I would like to highlight the role of data standardization. When making models, we intuitively think that the algorithm consumes the most time, but in fact, most of our time is spent on feature extraction.

To sum up, if we can standardize the data and take the standardized data as the starting point, the time required for project modeling will be greatly reduced.

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 10: growingio customer data platform (CDP)

It can be seen from Figure 10 that the computing platform, data and algorithm are the infrastructure of our growingio intelligent solution platform. The closer to the intermediate link, the closer to the algorithm. It is more complex in mathematics, but it is also a part with a high degree of automation or standardization.

At both ends of the platform, because we need to connect specific businesses and implement application scenarios, there will be great differences between different models. In this case, the business and technical teams need to cooperate with each other. In the specific operation, we can abstract the topics of common concern in different industries and extract some common scenarios to help our customers in different industries.

Based on this, the logic of growingio growth platform comes out: first, make an effective package of hardware, software and data algorithms, then extract effective information and business insight from the data through the algorithm and form a landing business strategy, and use reasonable channels to reach the cloud to improve efficiency, enhance intelligence, and finally bring growth to customers.

In growingio, we have a professional team to extract scenarios of common interest in various industries. Now we have extracted algorithm models for some common scenarios.

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 11:19 model / algorithm scenarios applicable to e-commerce

For the e-commerce industry, there are many brands and sellers on the platform. We have divided 19 models and algorithm scenarios (as shown in Figure 11). Brand operation is to evaluate users’ purchase preferences through seller data; User operation mainly focuses on which users are more sensitive to price.

It should be noted that there is a concept of Pareto principle, that is, 20% of high-value users bring us 80% of business revenue. Therefore, we need to filter out these high-value users from a large number of users through decentralization.

We can also help the e-commerce platform predict the potential life cycle value of users through RFM layered method, and predict the time of users’ next purchase or the goods they may purchase, so as to make some corresponding revenue strategies. In addition, user operations also include scenarios such as early warning of lost users, potential customer mining, etc.

Commodity operation includes price setting strategy, category optimization, personalized recommendation and scenarios. Specifically, we can calculate the price elasticity of users by measuring their loyalty to the platform and products, and formulate the optimal price strategy according to the substitutability of products; Through the analysis and judgment of the user’s purchase portfolio, the products with similar products are divided. Through the placement of commodities in physical stores, it is convenient for users to find the commodities they need.

The e-commerce platform may also hold various promotional activities. After that, we need to review the activities and evaluate what long-term and short-term effects the marketing activities can bring us; At the same time, promotional activities may involve the demand for coupon distribution. Our model can help customers improve the accuracy of personalized coupon distribution and improve activity efficiency.

In addition to the solutions of e-commerce platforms, the growingio team also extracted models and algorithm scenarios of different industries such as operators and content communities, and summarized 20 models and algorithm scenarios applicable to operators and 17 models and algorithm scenarios applicable to content communities.

Due to time constraints, I will not give you a detailed introduction. The following is an overview of the model and algorithm scenarios. Interested students can pay attention to them.

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 12: 20 model / algorithm scenarios applicable to operators

How to design algorithm model to drive business growth: 7 steps +5 keys

Figure 13:17 model / algorithm scenarios applicable to content communities