## 1. Idea of linear regression algorithm

Machine learning algorithms can be divided into supervised learning and unsupervised learning.

What is supervised learning algorithm?

This method is called supervised learning, which is the most commonly used machine learning method. It is a machine learning task to infer the model from the labeled training data set.

Regression algorithm is a supervised learning algorithm. From the perspective of machine learning, regression algorithm is used to build an algorithm model, which is the mapping relationship between attribute (x) and label (y).

Linear regression is a regression analysis modeled between one or more independent variables and dependent variables.

It is characterized by a linear combination of one or more model parameters called regression coefficients.

## 2. Linear regression example

House area (m ^ 2) | Rent (yuan) |
---|---|

10 | 800 |

15 5 | 1200 |

20 2 | 1600 |

35.0 | 2500 |

48 3 | 3300 |

58.9 | 3800 |

65.2 | 4500 |

Taking the above data as a row of samples, we can get the following relationship

```
X (house area) y (rent)
0 10 800
1 15.5 1200
...
5 65.2 4500
```

As shown below

According to the above data, what is the rent for a house with an area of 80 square meters?

First, we need to find the mapping relationship between house area and price, y = Ka + B, as shown in the figure below

Then through the mapping relationship of y = f (x), the rent price is predicted.

This is an eigenvalue, so what if it is two eigenvalues? What we’re looking for is a plane.

To expand to more eigenvalues, the mapping relationship we are looking for is

X is the eigenvalue,

We can see θ。* x。， x。 Is 1, and then J we get this equation

Use a vector to represent the above formula

Eventually we got

## 3. Error

We get this model, but obviously there is an error between the predicted value and the real value ε To represent the error.

For each sample

From the central limit theorem of probability theory, we can know the error ε It is independent and has the same distribution, and obeys the mean of 0 and the variance of 0 σ² Gaussian distribution of.

therefore

Bring equation (1) into equation (2)

Then the likelihood function is used:

For ease of solution, take logarithm

When

At the minimum, that is, when it is 0, logl（ θ) The maximum value is our loss function.

Further transformation:

Then find its partial derivative

Let the partial derivative be 0, and you can finally find it

This is the least square method, which is also one of the methods to solve the linear regression loss function.

For the above code example of the relationship between house area and rent

```
import numpy as np
from matplotlib import pyplot as plt
from sklearn.linear_model import LinearRegression as lr
#Housing area data
x_list = [10, 15.5, 20.2, 35.0, 48.3, 58.9, 65.2]
#Corresponding rent data
y_list = [800, 1200, 1600, 2500, 3300, 3800, 4500]
x = np.array(x_list).reshape(-1,1)
y = np.array(y_list).reshape(-1,1)
model = lr()
model.fit(x, y)
y_plot = model.predict(x)
print(model.coef_)
plt.figure(figsize=(5,5),dpi=80, facecolor='w')
plt.scatter(x, y, color='red', linewidths=2,)
plt.plot(x, y_plot, color='blue',)
x_tick = list(range(5, 70, 5))
plt.grid(alpha=0.4)
plt.xticks(x_tick)
plt.show()
```

result

`[[63.66780288]]`

## 3. Example of Boston house price forecast

From sklearn Obtain relevant data sets in datasets, use standard linear regression to establish house price prediction model, and draw scattered and broken line diagrams of house price prediction value and real house price.

Code example

```
# coding:utf-8
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression as lr
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from matplotlib import pyplot as plt
from matplotlib import font_manager
font = font_manager.FontProperties(fname="/usr/share/fonts/wps-office/msyhbd.ttf")
def my_predic_fun():
"""
Forecast house prices in Boston using linear regression
:return:
"""
lb = load_boston()
x_train, x_test, y_train, y_test = train_test_split(lb.data, lb.target, test_size=0.2)
x_std = StandardScaler()
y_std = StandardScaler()
x_train = x_std.fit_transform(x_train)
x_test = x_std.transform(x_test)
y_train = y_std.fit_transform(y_train.reshape(-1,1))
y_test = y_std.transform(y_test.reshape(-1,1))
model = lr()
model.fit(x_train, y_train)
y_predict = y_std.inverse_transform(model.predict(x_test))
return y_predict, y_std.inverse_transform(y_test)
def draw_fun(y_predict, y_test):
"""
Draw scatter and line charts of house price forecast and real value
:param y_predict:
:param y_test:
:return:
"""
x = range(1,len(y_predict)+1)
plt.figure(figsize=(20, 8), dpi=80)
plt. Scatter (x, y_test, label = "true value", color ='Blue ')
plt. Scatter (x, y_predict, label = 'predicted value', color ='Red ')
plt.plot(x,y_test)
plt.plot(x,y_predict)
x_tick = list(x)
y_tick = list(range(0,60,5))
plt.legend(prop=font, loc='best')
plt.xticks(list(x), x_tick)
plt.yticks(y_tick)
plt.grid(alpha=0.4)
plt.show()
if __name__ == '__main__':
y_predict, y_test = my_predic_fun()
draw_fun(y_predict, y_test)
```

result

reference resources:https://blog.csdn.net/guoyunf…