# Python implements Bayesian linear regression model with pymc3

Time：2021-8-27

In this paper, we will introduce regression modeling into Bayesian framework and use pymc3 MCMC library for reasoning. We will first review the multiple linear regression method of classical frequency theory. Then discuss how Bayesian considers linear regression.

# Bayesian linear regression with pymc3

In this section, we will make a statistical exampleclassicThe method is to simulate the data of some attributes we know, and then fit a model to calculate these original attributes.

# What is generalized linear model?

Before we begin to discuss Bayesian linear regression, I want to briefly outline the concepts of generalized linear models (GLM), because we will use them in pymc3establishOur model.

Generalized linear model is a flexible method to extend ordinary linear regression to more general forms of regression, including logical regression (classification) and Poisson regression (for counting data) and linear regression itself.

GLM allows errors with error distributions other than normal distributiondependent variable

# Pymc3 was used to simulate the data and fit the model

Before we use pymc3 to specify and sample Bayesian models, we need to simulate some noisy linear data.

The output is shown in the following figure: Noise linear data are simulated by numpy, pandas and Seaborn

Now that we have simulated, we want to fit Bayesian linear regression to the data.This is the GLM method.

Then we will find the maximum a posteriori probability (map) estimate of MCMC sampler. Finally, we will use no-u-turn sampler (nuts) for practical reasoning, then draw the curve of the model, and discard the first 500 samples as a “burn in” pre burning process.

Traceplot is shown in the figure below: # Bayesian GLM linear regression model was fitted to simulated data using pymc3

First, we use Seaborn lmplot method, fit_ Reg parameter is set to false,Frequency regression curve is not drawn。 Then we plot a posteriori predictive regression lines for 100 samples. Finally, we draw using the original “real” regression line and β 1 = 2.

We can see the sampling range of regression line in the figure below: 