[dry goods] in Las Vegas, how do programmers kill slot machines with bandits algorithm?


AB test: from burying point to abandoning treatment

The first step of AB test abandonment was as follows

Whether we use the frequency school or the Bayesian school, we need to make decisions or go through the whole process of AB testing. However, many times, the opportunity cost of using ab test to make all decisions is too high, the labor cost is too high (data scientists are too expensive), the loss caused by poor version and other reasons make the use of AB test data-driven become a slogan.

The second step of AB test abandonment was as follows

Even if a developer is determined to take the road of using ab test as data-driven, it is too expensive to build an own AB test platform, while using the third-party AB test service lacks flexible data analysis ability.

If there is no embedded point for an event, the only way to do ab testing is to redistribute the SDK. When the SDK has not reached a certain coverage rate, there is still no way to do ab testing. Therefore, using ab testing to do product iteration is postponed until it is forgotten. AB test abandoned treatment.

AB test abandonment step 3:

Even if a developer uses the statistical SDK of Youmeng +, he scientifically makes user-defined embedding points, scientifically divides users, estimates the number of samples, correctly collects data, and correctly conducts AB test, and then finds that there is no difference between the two versions. Or sometimes even find the new version even worse (cue, take the example of Facebook, which is used badly).

As an operator, how do you report your negative results to the boss? How do you decide to change the version as a big guy in a technical team. AB test abandoned treatment. I once asked a big guy, why is the AB test so mature and useful method not so popular in China? The big guy said: after every revision / operation activity, everyone is waiting to ask for credit. Who wants to see the results of data analysis?

In the process of doing AB testing many times, there are still big guys asking if there is an ab test algorithm with faster iteration speed? Is there a less strict AB test? During the operation scenario, the most frequently asked question is: how long does it take for you to do ab test in three days? Can you do ab test before operation? This kind of problem that hits the soul. After in-depth communication, the AB test requirements for this kind of problem are actually hoping to be able to reduce the risk of faster, automatic optimization scheme.

AB test therapy

Do we have any good ways to solve these problems? Of course, there are solutions. For the first and second step of AB test, the solution to the reason of abandoning treatment can only be to carry out scientific buried points to meet the main statistical needs first, because AB test is based on the statistical module. For ab test, the solution of the third step is multi armed bandits.

Multi armed bandits

So what’s going on with this algorithm that automatically optimizes to find the best solution? How can this algorithm achieve faster, automatic selection of optimization schemes?

Zhang San in Las Vegas

Let’s tell a story about Zhang San’s gambling in Las Vegas (after all, statistics originated from gambling). One day, gambler Zhang San came to Las Vegas with his savings. He wanted to win the Las Vegas casinos with his black technology glasses and the recently researched bandits algorithm to become a gambler.

According to his years of gambling experience, the winning rate of each slot machine in the casino is different, but the winning rate of each slot machine will not change. According to the rumors in the river and the lake, there is a slot machine in this casino with a winning rate of more than 50%. His strategy is to find the slot machine with the largest winning rate.

[dry goods] in Las Vegas, how do programmers kill slot machines with bandits algorithm?

So how can Zhang San find the biggest slot machine? One of the simplest strategies is to try every slot machine in the casino, calculate the winning rate of each slot machine, and then select the slot machine with the largest winning rate. This method is similar to ab test, which distributes the traffic evenly to many schemes.

One obvious drawback of this method is that the cost of trial and error is very high, and the slot machine with the largest winning rate can be found in the end. If we can find that some of the solutions may not be the best in the process of trying, then we will not waste time and energy on the second best plan, then can we find the best solution faster and spend less money? So the question is, how do we define which algorithm is better at finding the best solution?

[dry goods] in Las Vegas, how do programmers kill slot machines with bandits algorithm?

What is calculated here is the difference between the winning amount of the best scheme minus the winning amount of bandits algorithm in exploring the best scheme.

Zhang San’s bandits algorithm

As a gambler, Zhang San naturally knows some bandits’ algorithms, so what strategies does he plan to use? What he learned from his master was epsilon green and upper bound confidence (UCB).

The algorithm of epsilon greedy is that the number of times of epsilon proportion is not the best scheme, and the number of times of 1-epsilon proportion selects the best scheme at present. Epsilon refers to the proportion that needs to be selected manually. For example, 10% of the time, the non current best scheme is selected, and 90% of the time, the current best scheme is selected.

However, there is an obvious problem with this method. The master told him before he left that the bandits method might fall into a local optimal solution. For a long time, there was no way to find the global optimal solution, that is, it might not be possible to find the slot machine with the highest winning rate. The master told him to use the bandits carefully.

So Zhang San decided to use the algorithm of UCB to bet. How did the algorithm of UCB be realized?

[dry goods] in Las Vegas, how do programmers kill slot machines with bandits algorithm?

This is the score of each slot machine. The first item is the average winning rate of this slot machine. The second item is the bonus item related to the number of attempts, where t is the number of experiments currently conducted, and t is the number of experiments_ {ij} is the number of times the slot machine has been tried. There is also a coefficient before the second term bonus to adjust the influence of the bonus term.

After each experiment, the score of each slot machine was recalculated, and the slot machine with the highest score was selected for the next experiment. The bandits algorithm of UCB can find the best solution in a long enough time. Generally speaking, the algorithm of UCB is better than epsilon greedy under the definition of regret.

Li Si’s bandits algorithm

In other words, Zhang San has a senior brother who called Li Si. In his early years, he practiced Bayes Dafa under the master Bayes. One of the great advantages of Bayes is that it can make use of the achievements of other people’s practice, which is the prior distribution in Bayes.

Li Si watched Zhang San’s experiment on slot machines and recorded the winning rate of each slot machine. But Li Si can’t wait too long. When Zhang San finds out the slot machine with the biggest winning rate, he can’t rely on that slot machine to win money. So Li Si came to an end after he felt that he had accumulated enough data. He used the Thompson sampling method based on Bayes.

On the basis of Zhang San’s attempt, Li Si gave each slot machine a priori probability based on beta distribution, and then began to look for the slot machine with the largest winning rate. In each experiment, he took a random number based on the beta distribution, and then selected the slot machine with the largest random number for the experiment. When the slot machines accumulate more data, the variance of beta distribution is smaller, and the random number selected each time is closer to the mean value. When the slot machine accumulates less data, the variance of beta distribution is larger, and the random number selected each time will change.

Bandit algorithm of Master Zhang San and Wang Wu

Zhang San’s master actually came to Las Vegas early. Through internal intelligence, he knows that the winning rate of each slot machine will vary with many factors, such as whether it is a weekend or not, whether the person is a man or a woman, etc.

However, Zhang San’s and Li Si’s algorithms do not consider some other external factors. This kind of bandits algorithm considering other external factors is called contextual bandits. Master Zhang San uses linucb algorithm based on UCB algorithm + ridge expression.

If you want to know who is the quickest to find the legendary slot machine among Zhang San, Li Si and Wang Wu, please continue to look down.

When should bandits and ab tests be used?

[dry goods] in Las Vegas, how do programmers kill slot machines with bandits algorithm?

Figure from VWO’s website

The main problem of bandits algorithm is how to find the best solution faster and with less loss. The figure above shows the optimization of traffic allocation in bandits’ search for the best solution. Bandits can find the best solution with minimum loss.

Why do we have to do ab test?

First of all, AB testing is mainly used to guide the important business decision / product version iteration, which may be affected by many indicators. Bandits can only be optimized based on a single index. Of course, multiple indicators can be superimposed into a composite index, but the optimization goal of bandits is a single index.

Secondly, AB test is mainly used to obtain the statistical significance of each version. So it’s more abstract, that is, you have spent time developing a new version. You need to be sure whether this version has the previous version, and what is the good about it? Is the retention improved or the user’s usage time increased.

These knowledge gains can be used in the iteration after the product, but bandits can’t help you to analyze and get the knowledge.

So when should we use the bandits algorithm?

When the problem you care about, like Zhang San, is only a single indicator of conversion rate, retention rate, etc., and you don’t care about the interpretation and analysis of data results. When your operation activity is only a few days or a day, you don’t have time to wait for the AB test to reach statistical significance. This is the faster AB test mentioned by some big men and app developers. What’s more, if you have some indicators that need to be optimized for a long time, and these indicators often change, then this is also an important application scenario of bandits.

[dry goods] in Las Vegas, how do programmers kill slot machines with bandits algorithm?

The graph is from VWO’s blog

In a word, AB test is suitable for testing some changes with a long change period, and the acquired knowledge should have generalization ability. However, bandits algorithm is suitable for some optimization scenarios with fast change and short cycle, and the knowledge obtained may not have generalization ability.

Using bandits in Youmeng +

The u-push products of Youmeng + cover a large number of external users, and the push strategies of a large number of developers are very simple timing broadcast, and there is almost no personalized customized sending strategy (except for the headline). Even if developers want to optimize the sending time and content based on the existing tools, the existing tag and user behavior data accumulation will not be sufficient.

For the time being, the domestic business partners do not have this function, which is also because their data volume is far less than the data coverage of Youmeng +. However, many platforms for developer services in the United States, such as recombee, airship, leanplum and so on, not only optimize the sending time, but also implement full link closed-loop user activation and anti churn products based on user life cycle and other tags.

Our future work is to achieve this very user friendly product, and our starting point is to optimize the delivery time, that is, the function of leanplum. If we can send this message when users use the app or when they have a strong willingness to accept push messages, then users are more willing to open the message after reaching the user. In this way, a win-win situation of improving user experience and higher push click through rate is realized.

The time optimization scheme of Youmeng + is based on Thompson sampling method and uses beta distribution to score the granularity of user + app + period.

We found that collaborative filtering can improve the click times of users who have no clicks in the data, while Thompson sampling can better determine the optimal sending time of users who have clicks.

So how to combine collaborative filtering and Thompson sampling together to improve the user’s push experience and click through rate will be the direction of future exploration.

The end of the story

At the end of the story, Zhang San, Li Si and Wang Wu all lost their savings and left Las Vegas because they didn’t know the statistical principle of gambler’s run. This story tells us that we should stay away from gambling. Small gambling is not pleasant, and big gambling is even more harmful.