R language Stan for Bayesian reasoning analysis


Link to the original text:http://tecdat.cn/?p=6252

Stan of R

Stan can be run from many statistical packages. So far, I’ve been running Stan from R.

Simple linear regression

The first step is to document the Stan model. This contains a file linreg.stan :

 data {
  int N;
  vector[N] x;
  vector[N] y;

model {
  y ~ normal(alpha + beta * x, sigma);

The first part of the file, called data, declares the scalars, vectors, and matrices that will be passed to stan as input.

Next, we can simulate the dataset by running the following r code and using Stan and our files linreg.stan To fit the model:

stan(file = 'linreg. ', data = mydata, iter = 1000,   = 4)

When the Stan model is installed for the first time, there is a delay of a few seconds when the model is compiled into C + +. However, once the model is compiled, it can be applied to a new dataset without repeating the compilation process (performing simulation research has a great advantage).

In the above code, we asked stan to run four independent chains, each with 1000 iterations. After running, we can summarize the output in the following ways:

Inference for Stan model: linreg.
4 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=2000.

        mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
alpha  -0.10    0.00 0.10  -0.29  -0.16  -0.10  -0.04   0.09  1346    1
beta    0.95    0.00 0.11   0.75   0.88   0.95   1.02   1.17  1467    1
sigma   0.98    0.00 0.07   0.85   0.93   0.98   1.03   1.12  1265    1
lp__  -47.54    0.06 1.24 -50.77 -48.02 -47.24 -46.68 -46.17   503    1

Samples were drawn using NUTS(diag_e) at Mon Jun 08 18:35:58 2015.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

For the regression slope β, our posterior mean is 0.95 (close to the real value 1 used in the simulation data). In order to form a 95% posterior confidence interval, we simply use the percentiles of 2.5% and 97.5% of the sampling posterior, here are 0.75 to 1.17.

You can get a variety of other quantities from the fitted model. One is to plot the posterior distribution of one of the model parameters. To get the regression slope, we can do the following:


β Histogram of posterior distribution

R language Stan for Bayesian reasoning analysis

Now let’s use standard ordinary least squares to fit the linear model

    Min      1Q  Median      3Q     Max 
-1.9073 -0.6835 -0.0875  0.5806  3.2904 

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.10280    0.09755  -1.054    0.295    
x            0.94753    0.10688   8.865  3.5e-14 ***
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.9707 on 98 degrees of freedom
Multiple R-squared:  0.4451,    Adjusted R-squared:  0.4394 
F-statistic:  78.6 on 1 and 98 DF,  p-value: 3.497e-14

This gives us an estimate of the slope of 0.95, which is 2 decimal places different from Stan’s posterior mean, and the standard error is 0.11, which is the same as Stan’s posterior SD.

Stan and Bayesian Reasoning

Interested in exploring Stan and using it to perform Bayesian reasoning, this is due to the problems of measurement error and missing data. As described by WinBUGS and the authors, Bayesian method is very natural in solving the problem of different sources of uncertainty, which exceed parameter uncertainty, such as missing data or covariates measured by error. In fact, the popular multiple imputation method for missing data is developed in Bayesian paradigm.

R language Stan for Bayesian reasoning analysis