What is abtest
The product change is not from our random “pat the head”, but needs to be driven by the actual data, let the user’s feedback guide us how to better improve the service. As Chen Gang, CEO of Ma cellular, said in an interview, “some things need sense, but most of them can be judged by science. “
When it comes to abtest, I believe many readers are familiar with it. In short, abtest is the process of dividing users into different groups, testing different versions of products online, and finding out which version scheme is better through the real data fed back by users.
We take the original version as the control group, and use abtest based on the principle that each version is iterated as small as possible. Once the indicator analysis is completed, the version with the best performance of user feedback data will go online in full amount.
In many cases, the adjustment of a button, a picture or a sentence may bring a very obvious growth. Here is an application case of abtest in Ma cellular:
As shown in the figure, the e-commerce business team of our trading center wanted to optimize a search list of “skiing”. You can see that the page display before optimization is relatively thin. But we are not sure whether the more complex forms of presentation will make users feel less concise and disgusted. Therefore, we put the pages before and after the revision online for abtest. The final data feedback shows that the optimized style UV is improved by 15.21% and the conversion rate is increased by 11.83%. Using abtest helps us reduce the risk of iterations.
With this example, we can more intuitively understand several features of abtest:
- Apriori: in the way of traffic segmentation and small traffic test, let some online small traffic users use it first to verify our ideas, and then promote it to the full traffic based on data feedback to reduce product loss.
- Parallelism: we can run two or more versions of the test at the same time to compare, and ensure that the environment of each version is consistent, so that the previous quarter can determine whether to publish or not. Now it may only take a week to avoid the problem of complex process and long cycle, and save verification time.
- Scientific natureWhen we count the test results, abtest requires that we use statistical indicators to judge whether the results are feasible, so as to avoid relying on empiricism to make decisions.
In order to make our verification conclusion more accurate, reasonable and efficient, we implement a set of algorithm guarantee mechanism according to Google’s practice to strictly realize the scientific allocation of traffic.
A multi-layer shunt model based on openresty
Most of the ABTest of the company is by providing the interface, obtaining the user data by the business side and then calling the interface, so that the original traffic will be doubled, and the business invasion is more obvious. The supporting scenario is relatively simple, resulting in the need for many business side needs to issue a lot of diversion system, which is difficult to reuse for different scenarios.
In order to solve the above problems, our shunting system is implemented based on openresty, and uses HTTP or grpc protocol to transmit the shunting information. In this way, the shunting system works in the upstream of the business, and because of the characteristics of openresty’s own traffic distribution, no secondary traffic will be generated. For business parties, they only need to provide differentiated services and will not intrude into the business.
The main reasons for choosing “openresty” for “abtest” are as follows:
In the design of abtest system, we split three elements: the first is the determined terminal, which contains the device and user information; the second is the determined URI; the third is the matching allocation strategy, that is, how to allocate the traffic.
First, the device initiates the request, and ab gateway extracts the device ID, URI and other information from the request. At this time, the terminal information and URI information have been determined. Then through the URI information traversal matching to the corresponding policy, the request finds the current matching AB experiment and version through the shunt algorithm, and the AB gateway will notify the downstream through two ways. For applications running on physical web machines, a key named abtest will be added to the header, which contains the hit AB experiment and version information. For microservice applications, the information that hits the microservice will be added to the cookie and handed over to the microservice gateway for processing.
Stable diversion guarantee: murmurhash algorithm
Murmurhash algorithm is used in the shunt algorithm. The hash factors involved in the algorithm include device ID, policy ID and traffic layer ID.
Murmurhash is a commonly used algorithm for abtest in the industry. It can be applied to many open source projects, such as redis, memcached, Cassandra, HBase, etc. Murmurhash has two obvious characteristics:
- Fast, dozens of times faster than secure hash algorithm
- The change is strong enough. For similar strings, such as “ABC” and “abd”, they can be evenly distributed on the hash ring, mainly used to realize the splitting of orthogonal and mutually exclusive experiments
Here is a brief explanation of orthogonality and mutex:
- mutex。 It means that two experiments are independent and users can only enter one experiment. Generally speaking, for the experiments on the same traffic layer, such as the mixed list experiment and the pure graph list experiment, the same user can only see one experiment at the same time, so they are mutually exclusive.
- orthogonal。 Orthogonality means that there is no inevitable relationship between users entering all experiments. For example, users entering version a of Experiment 1 are evenly distributed when they conduct other experiments, rather than concentrated in a certain block.
Experimental shunt in flow layer
The experimental hash factors in the flow layer include device ID and flow layer ID. When a request flows through a traffic layer, only one experiment in the layer will be hit, that is, the same user can only hit one experiment at most in each layer of the same request. First, the hash factor is hashed, and murmurhash2 algorithm is used to ensure that the hash factor changes slightly, but the value of the result changes violently. Then + 1 is added to 100, and the value between 1 and 100 is finally obtained.
The schematic diagram is as follows:
Version diversion in the experiment
The experimental hash factors include device ID, policy ID and traffic layer ID. Use the same strategy for version matching. The matching rules are as follows:
Stability guarantee: multilevel cache strategy
Just now, after each request comes, the system will try to obtain the experimental strategy matching it. The experimental policy is configured from the background. We synchronize the configured policy to our policy pool in the form of message queue.
Our initial plan is that after each request comes, we will read data from redis. In this way, the stability of redis is required to be high, and a large number of requests will also cause high pressure on redis. Therefore, we introduce a multi-level caching mechanism to form a policy pool. The strategy pool is divided into three layers:
First layer lrucache, is a simple and efficient caching strategy. Its characteristic is that with the existence of the life cycle of nginx worker process, worker is exclusive and very efficient. Due to the exclusive feature, every cache will exist in every worker process, so it will occupy more memory.
Lua shared DICT, as the name implies, this cache can be shared across workers. When nginx reload, its data will not be lost, only when it is restarted. But there is a feature, in order to read and write safely, the read-write lock is implemented. So in some extreme cases, there may be performance problems.
The third layer is redis.
In terms of the whole set of strategies, although multi-level cache is adopted, there is still a certain risk. When L1 and L2 cache fail (such as nginx restart), redis may face the risk of “streaking” due to too much traffic. Here, we use Lua resty lock to solve this problem. When the cache fails, only the part of the lock request can be obtained Go back to the source to ensure that redis will not be under so much pressure.
The statistics of online data show that the first level cache hit rate is more than 99%, the second level cache hit rate is 0.5%, and the request back to redis is only 0.03%.
- throughput: currently 5% of the flow of the whole station
- Low latency: Online average delay is less than 2ms
- Full platform: support app, H5, wxApp, PC, cross language
- Disaster tolerance：
- Automatic degradation: when the reading policy from redis fails, AB will automatically enter into the non shunting mode, and then try to read redis every 30s (each machine) until the data is read to avoid frequent sending
- Request manual degradation: when there is too much server event log or too much system load, close all experiments or AB shunting by “one click shutdown” in the background
Response time distribution
JMeter is used as the testing tool, with a concurrency of 100 for 300s.
fromresponse timeIt can be seen that except for the request deviation value at the beginning, it is within 1ms on average. At the beginning of the analysis, there was a large gap because there was no data in the multi-level cache at that time.
TPSAfter all, there is a hash algorithm, but it is generally acceptable.
The normal a / b publishing is mainly done by the API gateway. When the business requirements are complex, the A / b conference can open the more complex a / b publishing capability by interacting with the microservices.
It should be noted that abtest is not fully applicable to all products, because the results of abtest need a lot of data support, and the website with larger daily traffic will get more accurate results. Generally speaking, we suggest that in the A / B test, the daily flow of each version should be more than 1000 UVs, otherwise the test cycle will be very long, or it is difficult to obtain accurate (result convergence) data result inference.
To design a complete abtest platform, we need to do a lot of detailed work. Because of the limited space, this paper only focuses on the shunt algorithm. In summary, the abtest shunt system of horse cell has achieved some results in the following aspects:
- The method of traffic interception and distribution is adopted, and the original form of interface is abandoned. It has no invasion to business code, no obvious impact on performance and no secondary traffic.
- With the strategy of traffic stratification and experiment binding, we can define the shunting experiment more precisely and intuitively. Through the mechanism of reporting the hit experiment version with the client, the storage of service data is reduced and the function of serial experiment shunting can be realized.
- In the aspect of data transmission, by adding diffluence information in the HTTP header, the business side does not need to care about the specific implementation language.
Recent planning improvements:
- Monitoring system.
- User portrait and other refined customization ab.
- The statistical effect supports the product functions such as confidence interval and eigenvalue.
- The influence of the experiment on Polaris index is evaluated by aarrr model.
There are many things that need to be improved in the future of this system. We will continue to explore and look forward to communicating with you.
Author of this article：Li Pei,Technical expert of informatization research and development of wasp’s nest basic platform;Zhang Lihu,Ma cellular Hotel R & D static data team engineer.
(original content of Ma cellular technology, reprint must indicate the source and save the QR code image at the end of the article, thank you for your cooperation.)
Focus on horse cell technology and find out more about what you need