- The linear regression model belongs to the classical statistical model, and its application scenario isPredict a continuous numerical variable (dependent variable) based on known variables (independent variables), linear regression can usually be applied toStock price forecast、Revenue forecast、Advertising effect prediction、Sales performance forecastamong.
-
Univariate linear regression:
-
Basic concepts:
-
Univariate linear regression is a method to analyze the linear correlation between only one independent variable (independent variable x and dependent variable y). The value of an economic index is often affected by many factors. If only one of them is the main factor and plays a decisive role, thenUnivariate linear regression can be used for prediction and analysis.Data sets can be expressed as {(x1, Y1), (X2, Y2),…, (xn, yn)}. Where Xi represents the ith value of the independent variable x, Yi represents the ith value of the dependent variable y, and N represents the sample size of the data set. After the model is built, the value of dependent variable y can be predicted according to the value of other independent variables X. The mathematical formula of the model can be expressed as:
-
Displayed in Python:
- Import the packages and related libraries we need
-
#The sklearn library is introduced and the linear regression module is used from sklearn import datasets,linear_model #Introduce train_ test_ Split to divide our data set into training set and test set from sklearn.model_selection import train_test_split import numpy as np import pandas as pd import matplotlib.pyplot as plt
#For example, we now have 10 rows and 2 columns of data. The first column is height and the second column is weight,Usual practice: when raw data is segmented,80% of the original data is used as training data to train the model, and the other 20% is used as test data, judge the effect of the model directly through the test data, and continuously improve the model before it enters the real environment;
data = np.array([[152,51],[156,53],[160,54],[164,55],
[168,57],[172,60],[176,62],[180,65],
[184,69],[188,72]])
#X and Y store eigenvectors and labels respectively. The purpose of using reshape here is that data [:, 0] is a one-dimensional array, but the latter model calls in the form of matrix
X,y = data[:,0].reshape(-1,1),data[:,1]
#Distinguish between training set and test set
# train_ Size = 0.8 means that 80% of the data are randomly extracted as training data
X_train,X_test,y_train,y_test = train_test_split(X,y,train_size=0.8)
#Linear regression algorithm model
regr = linear_model.LinearRegression()
#Fitting data, training model
regr.fit(X_train,y_train)
#The return result obtained by score is the square value of the determination coefficient R
regr.score(X_train,y_train)
- Square value of the determination coefficient r = 1-u / V
- U = sum of squares of (actual value of Y – expected value of Y)
- V = (actual value of Y – average value of actual value of Y) sum of squares — square value of output result R=0.963944147932503
-
font = {'family':"SimHei",'size':20} plt.rc('font',**font) ##Training data plt.scatter(X_train,y_train,color='r') ##Draw fitting line plt.plot(X_train,regr.predict(X_train),color='b') plt.scatter(X_test,y_test,color='black') #Test data plt. Xlabel ('height ') plt. Ylabel ('body weight ') plt.show()
Let’s make a simple prediction. What’s the weight of a person with a height of 170?
-
np.round(regr.predict([[170]]),1)
Array ([59.8]), we can see that 170 people weigh 59.8 kg according to our prediction.