High end play method series of AB Experiment 4 – low penetration of experiment? User not touched? CACE/LATE

Time:2020-6-9

CACE full name compiler average casual effect or local average treatment effect. The application in the observation data needs to be combined with the instrument variable. Here we only discuss some learning provided by the CACE framework for random AB experiments. Have you ever encountered the following low experimental permeability?

  • The new function entry is very deep. Most of the incoming users don’t really use the new function. How to calculate the benefit of the new function under the condition that the user layer can only be randomly divided
  • Touch arrival strategy, randomly grouping when sending touch arrival, but there is loss in touch arrival process, the proportion of users who actually touch is very small, how to calculate touch arrival revenue

background

Of course, if your new strategy penetration is very low, maybe your strategy itself needs to be adjusted. But for some strategies designed to improve the experience of a small number of users, or the initial water test of the strategy, CACE may be more suitable as a measure of your experiment than the overall difference (ATE) of group ab. Because it may tell you that the strategy has a significant benefit for a small number of users, all you have to do is to continue iterating to expand user penetration.

Ate focuses on the revenue of the whole experimental group – control group, and of course, the revenue to the whole user that can be predicted after the strategy goes online in full amount. CACE estimates the benefit of the experiment to the real users.Note that the benefits of these users can not be generalized to all users. One experiment has different effects on different users, and the other strategy has limited penetration. The lower the penetration rate, the stronger the selectivity to users, the greater the difference between the users reached and the overall users, and it is more difficult to generalize the CACE calculated to the whole users

CACE framework

Let’s recall the calculation of ate. T is the treatment, for example, the new function added to the app, y is the outcome, for example, the usage time of the user’s app, and the experimental effect is generally estimated by ate, because this is the closest to the final benefit of the whole user after the full amount of the experiment

\[ATE = E(Y|T =1) – E(Y|T=0)
\]

CACE adds the variable w experimental penetration, that is, whether the user has actually used the new function. CACE estimates the benefit of the experiment to the real user. If your experimental penetration is 100%, then CACE = ate. With the decrease of experimental penetration, theoretically, CACE will be higher and higher than ate, because the benefits of some users are diluted by all users.

How about saying so many caces? Don ‘t rush to show 2 mistakes that are often used

  1. Per protocol analysis
\[E(Y|Z=1, W(z=1)=1) – E(Y|Z=0)
\]
  1. As treated analysis = experimental group penetration users – experimental group non penetration users
\[E(Y|Z=1, W(z=1)=1) – E(Y|Z=1, W(z=1)=0)
\]

Both of the above methods step into a pit called selection bias at the same time, that is, the function penetration itself is affected by the user’s behavior / subjective will, so there will be user selection. As a result, the penetration users can not represent the whole users, but also differ from the non penetration users. If you really want to find the right one out of the wrong one, per protocol is generally better.

CACE calculation

User defined

CACE divides users into four categories: compiler, never Tucker, always Tucker, defier. Simply speaking, compiler doesn’t take medicine if it’s given, never Tucker doesn’t take medicine if it’s killed, always Tucker takes medicine if it’s OK, defier doesn’t give me medicine if it’s not given. These four groups of people can be defined by W and Z as follows

High end play method series of AB Experiment 4 - low penetration of experiment? User not touched? CACE/LATE

High end play method series of AB Experiment 4 - low penetration of experiment? User not touched? CACE/LATE

###Assumptions

1. Independence
This must be true in random AB experiments, but we need to find additional instrument variables in the observation data, which will not be discussed here

\[Z_i \perp (Y_i(0),Y_i(1),W_i(0),Y_i(1))
\]

2. Exclusion Restriction
This hypothesis is not necessarily true even in random AB experiments,So we need to judge based on the strategy itself, the basic principle is that the treatment group itself has no impact on users, only users who are indeed infiltrated by treatment will be affected. Hypothesis 2 guarantees the performance of never Tucker and always Tucker in the experimental group and the control group.

\[Y(z,w) = Y(z’,w) \,\,\, \text{for all z, $z’$,w}
\]

I have seen how to calculate the paper of CACE when hypothesis 2 is not tenable, but I haven’t encountered any similar situation, so I will add it later.

3. Monotonicity/No-Defier
Monotone hypothesis holds in most cases, that is, t is a positive effect on W, and there is no defier. At this time, the population corresponding to W and Z will be simplified as follows. The never Tucker target population is the experimental group’s impervious population, so it can be directly estimated

\[W_i(1)>W_i(0)
\]

High end play method series of AB Experiment 4 - low penetration of experiment? User not touched? CACE/LATE

calculation

The hypothesis of random experiment ensures that the proportion of compiler, always Tucker and never Tucker in the control group and the experimental group is the same, so we can directly calculate the proportion of compiler, always Tucker and never Tucker in the population, as follows

\[\begin{align}
\pi_a &= p(W(0)=W(1)=1) = E(W|Z=0)\\
\pi_c &= p(W(0)=0,W(1)=1) = E(W|Z=1) – E(W|Z=0)\\
\pi_n &= P(W(0)=W(1)=0) = 1- E(W|Z=1) \\
\end{align}
\]

Because the non infiltrated users in the experimental group must be never takers, and the infiltrated users in the control group must be always takers (in some functional random experiments, there is no always takers), so the performance of these users can be directly obtained

\[\begin{align}
E(Y|W=1,Z=0) &= E(Y(1)|always)\\
E(Y|W=0,Z=1) &= E(Y(0)|never)\\
\end{align}
\]

We can calculate the CACE of compiler by taking this as a breakthrough. First, we decompose the population of control group and experimental group as follows

\[\begin{align}
E(Y|Z=0) &= \pi_a * E(Y(1)|always) + \pi_n * E(Y(0)|never) +
\pi_c * E(Y(0)|compiler) \\
E(Y|Z=1) &= \pi_a * E(Y(1)|always) + \pi_n * E(Y(0)|never) +
\pi_c * E(Y(1)|compiler) \\
\end{align}
\]

Obviously, the difference of AB group only comes from the difference of compiler. In fact, in the absence of always Tucker, CACE only magnifies the inter group benefit by the same proportion of the experimental group penetration

\[\begin{align}
CACE &= E(Y(1)|compiler) – E(Y(0)|compiler)\\
&= \frac{E(Y|Z=1)-E(Y|Z=0)}{\pi_c}\\
&= \frac{E(Y|Z=1)-E(Y|Z=0)}{E(W|Z=1) – E(W|Z=0)}
\end{align}
\]

For the calculation of significance, I prefer to apply CACE only when the original ate is already significant, so as to avoid analyzing some meaningless fluctuation data. CACE is only used to estimate the absolute benefit of penetration users. Of course, if you want to calculate the significance of case, you can use bootstrap to get se. Of course, because case itself is ration, we can also use a more scientific method to calculate se. For details, please refer to ref4.

In this way, it is inevitable to be asked whether the revenue of this part of users can be generalized to all users. Theoretically, it is not possible, but it cannot be killed with one hammer. A simple and intuitive way is to compare\(E(Y(0)|compiler)\),\(E(Y(0)|always)\),\(E(Y(0)|never)\)Whether there is significant difference between them, the larger the difference, the smaller the possibility of generalization.

Interested in high-end play of AB experiment? here
High end play series of AB Experiment 1 – AB experimental crowd orientation / individual effect difference / GitHub collection of hte papers
AB experiment series 2 – more sensitive AB experiment, cup!
AB experimental high-end play series 3 – AB group is not random? Observation test? Propensity Score

Welcome to leave a message or comment ~


Ref

  1. Imbens G. Methods for Estimating Treatment Effects IV: Instrumental Variables and Local Average Treatment Effects. Technical report, Lecture Notes 2, Local Average Treatment Effects, Impact Evaluation Network, Miami; 2010
  2. Complier average causal effect? Exploring what we learn from an RCT with participants who don’t do what they are told. 2017
  3. http://ec2-184-72-107-21.compute-1.amazonaws.com/_assets/files/events/slides_late
  4. Schochet, Peter Z. and Hanley Chiang (2009). Estimation and Identification of the Complier Average Causal Effect Parameter in Education RCTs (NCEE 2009-4040).