Stata: RA: Regression adjustment, IPW: inverse probability weighting, ipwra, aipw

Time:2021-4-22

Link to the original text:http://tecdat.cn/?p=10148


Today’s topic is the therapeutic effect in Stata.

The therapeutic effect estimator estimates the causal relationship between treatment and outcome based on the observed data.

We will discuss four therapeutic effect estimators:

  1. RA: Regression adjustment
  2. IPW: inverse probability weighting
  3. Ipwra: inverse probability weighting with regression adjustment
  4. Aipw: enhanced inverse probability weighting

    Like any regression analysis of observational data, the explanation of causality must be based on reasonable basic scientific principles.

introduce

We will discuss the treatment and outcome.

One treatment may be a new drug, with the result that blood pressure or cholesterol levels rise. Treatment can be surgery or the outcome of the patient’s activity. Treatment can be a vocational training program as well as the result of employment or wages. Treatments can even be advertisements designed to boost product sales.

Consider whether maternal smoking affects the weight of the baby at birth. Only observational data can be used to answer such questions.

The problem with observational data is that the subjects choose whether to receive treatment or not. For example, mothers decide whether to smoke or not. It is said that these subjects have chosen to enter the treatment group and the untreated group.

In an ideal world, we will design an experiment to test the relationship between causality and treatment outcome. We randomly assigned the subjects to the treatment group or the untreated group. Randomization ensures that the treatment is independent of the outcome, greatly simplifying the analysis.

Causal inference requires unconditional estimation of outcomes at each level of treatment. Whether the data were observational or experimental, we only observed the outcome of each subject who received treatment. For the experimental data, the random allocation of treatment ensures that the treatment is independent of the outcome. For the observation data, we model the treatment allocation process. If our model is correct, then according to the covariates in our model, the treatment allocation process is considered as good as the random condition.

Let’s consider an example. Figure 1 is a scatter plot similar to the observation data used by Cattaneo (2010). The treatment variable was the mother’s smoking status during pregnancy, and the result was the baby’s birth weight.

Stata: RA: Regression adjustment, IPW: inverse probability weighting, ipwra, aipw

Red dots indicate mothers who smoke during pregnancy, while green dots indicate mothers who are not pregnant. The mother’s choice of whether to smoke or not complicates the analysis.

We cannot estimate the effect of smoking on birth weight by comparing the average birth weight of smoking and non-smoking mothers. Why not? Take another look at our chart. Older mothers tend to be heavier, whether or not they smoke during pregnancy. In these data, older mothers are also more likely to smoke. Therefore, the age of the mother is related to the treatment status and outcome. So how should we proceed?

RA: regression adjusted estimator

RA estimators were used to model the results to illustrate the non randomized treatment allocation.

We may ask, “if smoking mothers choose not to smoke, how will the results change?” Or “if non-smoking mothers choose to smoke, how will the results change?”. If we know the answers to these counterfactual questions, the analysis will be easy: we just need to subtract the observed results from the counterfactual results.

We can build measures of these unobserved potential outcomes, and our data may look like this:

Stata: RA: Regression adjustment, IPW: inverse probability weighting, ipwra, aipw

In Figure 2, solid dots are used to show observed data, while hollow dots are used to show potential results that are not observed. The hollow red dots represent the potential consequences of non-smoking in smokers. The hollow green dots represent the potential consequences of non-smokers’ smoking.

We can estimate potential outcomes that are not observed by fitting a single linear regression model with observed data (real points) to both treatment groups.

Stata: RA: Regression adjustment, IPW: inverse probability weighting, ipwra, aipw

In Figure 3, we provide a regression line (green line) for non-smokers and a separate regression line (red line) for smokers.

Let’s see what these two lines mean

Stata: RA: Regression adjustment, IPW: inverse probability weighting, ipwra, aipw

The left side of Figure 4 is marked as “completed”observation“The green dot is an observation of non-smoking mothers. Green regression line marked withE(y0)The key point is to take into account the age of the mother and the expected birth weight of a non-smoking baby. Red regression line marked withE(y1)The key point is the expected birth weight of the baby after smoking by the same mother.

The differences between these expectations estimate the covariate specific therapeutic effect of untreated patients.

Now, let’s look at another counterfactual.

The red mark on the right side of Figure 4 isObserved“Red” is an observation of mothers who smoke during pregnancy. The dots on the green and red regression lines again indicate the expected birth weight (potential outcome) of the mother and infant under both treatment conditions.

The differences between these expectations estimate the covariate specific therapeutic effect of the recipients.

Note that we estimate the average treatment effect (ATE) based on the covariate value of each variable. In addition, no matter what kind of treatment we actually received, we estimated the effect on each subject. The average of these effects for all subjects in the data estimated ate.

We can also use figure 4 to elicit a prediction of the outcome that each subject will achieve at each level of treatment, regardless of the treatment received. The mean of these predictors for all subjects in the data estimated the mean potential outcome (POM) for each treatment level.

The difference of estimated POM is the same as that of ate.

Ate on ATET was similar to ate, but only the subjects observed in the treatment group were used. This method of calculating therapeutic effect is called regression adjustment (RA).

. webuse cattaneo2.dta, clear

To estimate the POM in both treatment groups, we entered


. teffects ra (bweight mage) (mbsmoke), pomeans

We specify the result model in the first set of brackets, with the result variable and its subsequent covariates. In this example, the result variable isbweightThe only covariate ismage

We specify the processing model (processing variables only) in the second set of parentheses. In this example, we only specify processing variablesmbsmoke. We will discuss covariates in the next section.

The result of typing the command is

 Iteration 0:   EE criterion =  7.878e-24
Iteration 1:   EE criterion =  8.468e-26

Treatment-effects estimation                    Number of obs      =      4642
Estimator      : regression adjustment
Outcome model  : linear
Treatment model: none
------------------------------------------------------------------------------
             |               Robust
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
POmeans      |
     mbsmoke |
  nonsmoker  |   3409.435   9.294101   366.84   0.000     3391.219    3427.651
     smoker  |   3132.374   20.61936   151.91   0.000     3091.961    3172.787
------------------------------------------------------------------------------

The output reported that if all mothers smoked, the average birth weight would be 3132 grams, and if no mother smoked, the average birth weight would be 3409 grams.

We can estimate the ate of smoking at birth weight by subtracting POM: 3132.374 – 3409.435 = – 277.061. Standard error and confidence interval were obtained

 Iteration 0:   EE criterion =  7.878e-24
Iteration 1:   EE criterion =  5.185e-26

Treatment-effects estimation                    Number of obs      =      4642
Estimator      : regression adjustment
Outcome model  : linear
Treatment model: none
-------------------------------------------------------------------------------
              |               Robust   
      bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
ATE           |        
      mbsmoke |
(smoker vs    |        
  nonsmoker)  |  -277.0611   22.62844   -12.24   0.000    -321.4121   -232.7102
--------------+----------------------------------------------------------------
POmean        |        
      mbsmoke |
   nonsmoker  |   3409.435   9.294101   366.84   0.000     3391.219    3427.651
-------------------------------------------------------------------------------

The output report is the same ate we calculated manually: – 277.061. Ate was the average difference between the birth weight of each mother who smoked and that of no mother who smoked.

IPW: inverse probability weighted estimator

RA estimators were used to model the results to illustrate the non randomized treatment allocation. Some researchers prefer to model the treatment allocation process rather than the results.

We know that in our data, smokers tend to be older than nonsmokers. We also hypothesized that the age of the mother directly affects birth weight. We see this in Figure 1.

Stata: RA: Regression adjustment, IPW: inverse probability weighting, ipwra, aipw

The graph shows that treatment allocation depends on the age of the mother. We want a way to adjust this dependency. In particular, we want us to have more green dots for older people and red dots for younger people. If you do, the average birth weight of each group will change. We don’t know how this will affect the mean difference, but we know it will be a better estimate of the difference.

To obtain similar results, we will weight the lower age group of smokers and the higher age group of non-smokers, and the higher age group of smokers and the lower age group of non-smokers.

We will use the following form of probability model or logit model

Pr (women smoking) = f (a + b * age)

teffectsLogit is used by default, but we will specifyprobitOptions.

Once we fit the model, we can get the predicted PR for each observation in the data. We call this_ p   i_。 Then, in the POM calculation (this is just the average calculation), we will use these probabilities to weight the observations. We weighted the observation of smokers to 1/  _ p   i,_ So when the probability of becoming a smoker is small, the weight will be larger. We weighted the observation of non-smokers by 1 / (1-  _ p   i_) So that when the probability of non-smokers is small, the weight will be larger.

The result is that figure 1 is replaced by the following figure:

Stata: RA: Regression adjustment, IPW: inverse probability weighting, ipwra, aipw

In Figure 5, larger circles represent larger weights.

Use this IPW estimator to estimate POM

The result is

 Iteration 0:   EE criterion =  3.615e-15
Iteration 1:   EE criterion =  4.381e-25

Treatment-effects estimation                    Number of obs      =      4642
Estimator      : inverse-probability weights
Outcome model  : weighted mean
Treatment model: probit
------------------------------------------------------------------------------
             |               Robust
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
POmeans      |
     mbsmoke |
  nonsmoker  |   3408.979   9.307838   366.25   0.000     3390.736    3427.222
     smoker  |   3133.479   20.66762   151.61   0.000     3092.971    3173.986
------------------------------------------------------------------------------

Our output reports that if all mothers smoked, the average birth weight would be 3133 grams, and if no mother smoked, the average birth weight would be 3409 grams.

This time, ate is – 275.5, if we type

(Output omitted)

We will know that the standard error is 22.68 and the 95% confidence interval is [- 319.9231.0].

Ipwra: IPW with regression adjusted estimator

RA estimators were used to model the results to illustrate the non randomized treatment allocation. The IPW estimator models the processing to illustrate the non random processing assignment. Ipwra estimator models the results and treatment methods to illustrate the non randomized treatment.

Ipwra uses IPW weights to estimate the corrected regression coefficients, which are then used to perform regression adjustments.

The covariates in the outcome model and the treatment model need not be the same. They are often not because the variables that affect the choice of treatment group are usually different from the variables related to the outcome. The ipwra estimator has dual robustness, which means that if the treatment model or the outcome model (instead of both) is mistakenly specified, the estimation of the effect will be consistent.

Let’s consider situations with more complex outcomes and treatment models, but still using our low weight data.

The resulting model will include

  1. Mother‘s age
  2. Early pregnancyPrenatal examinationIndicators of
  3. Indicators of maternal marital status
  4. Indicators of the first child

The treatment model will include

  1. Results all the covariates of the model
  2. Mother’s age  ^ two
  3. Years of maternal education

We will also specifyaequationsOptions, reporting results and treatment model coefficients.

 Iteration 0:   EE criterion =  1.001e-20
Iteration 1:   EE criterion =  1.134e-25

Treatment-effects estimation                    Number of obs      =      4642
Estimator      : IPW regression adjustment
Outcome model  : linear
Treatment model: probit
-------------------------------------------------------------------------------
              |               Robust
      bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
POmeans       |
      mbsmoke |
   nonsmoker  |   3403.336    9.57126   355.58   0.000     3384.576    3422.095
      smoker  |   3173.369   24.86997   127.60   0.000     3124.624    3222.113
--------------+----------------------------------------------------------------
OME0          |
         mage |   2.893051   2.134788     1.36   0.175    -1.291056    7.077158
    prenatal1 |   67.98549   28.78428     2.36   0.018     11.56933    124.4017
     mmarried |   155.5893   26.46903     5.88   0.000      103.711    207.4677
        fbaby |   -71.9215   20.39317    -3.53   0.000    -111.8914   -31.95162
        _cons |   3194.808   55.04911    58.04   0.000     3086.913    3302.702
--------------+----------------------------------------------------------------
OME1          |
         mage |  -5.068833   5.954425    -0.85   0.395    -16.73929    6.601626
    prenatal1 |   34.76923   43.18534     0.81   0.421    -49.87248    119.4109
     mmarried |   124.0941   40.29775     3.08   0.002     45.11193    203.0762
        fbaby |   39.89692   56.82072     0.70   0.483    -71.46966    151.2635
        _cons |   3175.551   153.8312    20.64   0.000     2874.047    3477.054
--------------+----------------------------------------------------------------
TME1          |
     mmarried |  -.6484821   .0554173   -11.70   0.000     -.757098   -.5398663
         mage |   .1744327   .0363718     4.80   0.000     .1031452    .2457202
              |
c.mage#c.mage |  -.0032559   .0006678    -4.88   0.000    -.0045647   -.0019471
              |
        fbaby |  -.2175962   .0495604    -4.39   0.000    -.3147328   -.1204595
         medu |  -.0863631   .0100148    -8.62   0.000    -.1059917   -.0667345
        _cons |  -1.558255   .4639691    -3.36   0.001    -2.467618   -.6488926
-------------------------------------------------------------------------------

The pomeans section of the output shows the POMS of the two treatment groups. Ate is now calculated as 3173.369 – 3403.336 = – 229.967.

The ome0 and ome1 sections showed RA coefficients of untreated and treated groups, respectively.

The tme1 part of the output shows the coefficients of the probabilistic processing model.

As in the first two cases, if we want the ate to have a standard error, we will specifyateOptions. If we need ATET, we can specifyatetOptions.

Aipw: enhanced IPW estimator

Ipwra estimator models the results and treatment methods to illustrate the non randomized treatment. So is the aipw estimator.

The aipw estimator adds a bias correction term to the IPW estimator. If the processing model is correctly specified, the bias correction term is 0, and the model is simplified as an IPW estimator. If the treatment model is not specified correctly, but the result model is specified correctly, the bias correction term corrects the estimator. Therefore, the bias correction term makes the aipw estimator have the same dual robustness as the ipwra estimator.

The syntax and output of aipw estimator are almost the same as that of ipwra estimator.

 Iteration 0:   EE criterion =  4.632e-21
Iteration 1:   EE criterion =  5.810e-26

Treatment-effects estimation                    Number of obs      =      4642
Estimator      : augmented IPW
Outcome model  : linear by ML
Treatment model: probit
-------------------------------------------------------------------------------
              |               Robust
      bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
POmeans       |
      mbsmoke |
   nonsmoker  |   3403.355   9.568472   355.68   0.000     3384.601    3422.109
      smoker  |   3172.366   24.42456   129.88   0.000     3124.495    3220.237
--------------+----------------------------------------------------------------
OME0          |
         mage |   2.546828   2.084324     1.22   0.222    -1.538373    6.632028
    prenatal1 |   64.40859   27.52699     2.34   0.019     10.45669    118.3605
     mmarried |   160.9513    26.6162     6.05   0.000     108.7845    213.1181
        fbaby |   -71.3286   19.64701    -3.63   0.000     -109.836   -32.82117
        _cons |   3202.746   54.01082    59.30   0.000     3096.886    3308.605
--------------+----------------------------------------------------------------
OME1          |
         mage |  -7.370881    4.21817    -1.75   0.081    -15.63834    .8965804
    prenatal1 |   25.11133   40.37541     0.62   0.534    -54.02302    104.2457
     mmarried |   133.6617   40.86443     3.27   0.001      53.5689    213.7545
        fbaby |   41.43991   39.70712     1.04   0.297    -36.38461    119.2644
        _cons |   3227.169   104.4059    30.91   0.000     3022.537    3431.801
--------------+----------------------------------------------------------------
TME1          |
     mmarried |  -.6484821   .0554173   -11.70   0.000     -.757098   -.5398663
         mage |   .1744327   .0363718     4.80   0.000     .1031452    .2457202
              |
c.mage#c.mage |  -.0032559   .0006678    -4.88   0.000    -.0045647   -.0019471
              |
        fbaby |  -.2175962   .0495604    -4.39   0.000    -.3147328   -.1204595
         medu |  -.0863631   .0100148    -8.62   0.000    -.1059917   -.0667345
        _cons |  -1.558255   .4639691    -3.36   0.001    -2.467618   -.6488926
-------------------------------------------------------------------------------

Ate was 3172.366 – 3403.355 = – 230.989.

last

The above example uses a continuous result: birth weight.  teffectsIt can also be used for binary, count and nonnegative continuous results.

The estimator also allows for multiple treatment categories.


reference:

【1】 Cattaneo, M. D. 2010. Efficient semiparametric estimation of multi-valued treatment effects under ignorability. _Journal of Econometrics_ 155: 138–154.