Original source:Tuo end data tribal official account
In this article, I will use Gibbs sampling of block for multiple linear regression to obtain the conditional posterior distribution required for Gibbs sampling of block. Then, the sampler is coded and tested with simulated data.
Suppose we have a sample size of the subject. Bayesian multiple regression assumes that the vector is extracted from multivariate normal distribution. By using identity matrix, we assume independent observations.
So far, this has nothing to do withMultivariate normal regressionSame. The following solution can be obtained by maximizing the probability:
The Bayesian model is obtained by specifying a priori distribution. In this example, I will use a priori values in the following cases
Before coding the sampler, we need to derive the a posteriori conditional distribution of each parameter of the Gibbs sampler.
Conditional posteriorTake more linear algebra.
This is a very beautiful and intuitive result. The covariance matrix of conditional a posteriori is the of covariance matrixestimate，
Also note that conditional posterior is aMultivariate distribution。 Therefore, in each iteration of Gibbs sampler, we start fromA posteriori plots a complete vector 。
The result vector of my simulation。
Running the Gibbs sampler generates estimates of true coefficients and variance parameters. 500000 iterations were run.cycleIt is 100000 times and 10 iterations.
The following is a graph of MCMC chain, where the true values are represented by red lines.
# Calculate a posteriori summary statistics post_dist %>% group_by(para) %>% summarise(median=median(draw), lwr=quantile(.025), upr=quantile(.975)) %>% # Consolidated summary statistics post\_dist <- post\_dist %>% left\_join(post\_sum_stats, by='param') # Draw MCMC chain ggplot(post_dist,aes(x=iter,y=dra)) + geom_line() + geom\_hline(aes(yintercept=true\_vals))
This is the posterior distribution of the trimmed parameters:
ggplot(post_dist,aes(x=draw)) + geom_histogram(aes(x=draw),bins=50) + geom\_vline(aes(xintercept = true\_vals))
It seems possible to obtain the parametersReasonable a posteriori estimation。 To ensure that the Bayesian estimator works properly, I repeated this for 1000 simulated data setsprocess。
This will produce 1000 groups of a posteriori mean and 1000 groups of 95%confidence interval。 On average, these 1000 a posteriori means should be expressed inTrue valueAs the center. On average, the real parameter value should be within 95% of the timeconfidence intervalInside.
The following is a summary of these assessments.
The “estimated average” column is the average of all 1000 simulationsMean a posteriori mean。 The deviation percentage is less than 5%. For all parameters, the coverage of 95% CI is about 95%.
We can make many extensions to this model. For example, a distribution other than the normal distribution can be used tofittingDifferent types of results. For example, if we have secondary metadata, we can model it as:
Then put a priori distribution on the. This idea extends Bayesian linear regression to Bayesian GLM.
In the linear case outlined in this paper, the covariance matrix can be modeled more flexibly. Instead, it is assumed that the covariance matrix is diagonal and has a single common variance. This is the same variance hypothesis in multiple linear regression. If the data are classified (for example, there are multiple observations per subject), we can use the anti Wishart distribution to model the entire covariance matrix.
Most popular insights