Python machine learning (I) – linear regression

Time:2022-1-12
  • The linear regression model belongs to the classical statistical model, and its application scenario isPredict a continuous numerical variable (dependent variable) based on known variables (independent variables), linear regression can usually be applied toStock price forecastRevenue forecastAdvertising effect predictionSales performance forecastamong.
  • Univariate linear regression:

  • Basic concepts:

  • Univariate linear regression is a method to analyze the linear correlation between only one independent variable (independent variable x and dependent variable y). The value of an economic index is often affected by many factors. If only one of them is the main factor and plays a decisive role, thenUnivariate linear regression can be used for prediction and analysis.Data sets can be expressed as {(x1, Y1), (X2, Y2),…, (xn, yn)}. Where Xi represents the ith value of the independent variable x, Yi represents the ith value of the dependent variable y, and N represents the sample size of the data set. After the model is built, the value of dependent variable y can be predicted according to the value of other independent variables X. The mathematical formula of the model can be expressed as:

  • Python machine learning (I) - linear regression

    Displayed in Python:

  • Import the packages and related libraries we need
  • #The sklearn library is introduced and the linear regression module is used
    from sklearn import  datasets,linear_model
    #Introduce train_ test_ Split to divide our data set into training set and test set
    from sklearn.model_selection import train_test_split
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt

    #For example, we now have 10 rows and 2 columns of data. The first column is height and the second column is weight,Usual practice: when raw data is segmented,80% of the original data is used as training data to train the model, and the other 20% is used as test data, judge the effect of the model directly through the test data, and continuously improve the model before it enters the real environment;

data = np.array([[152,51],[156,53],[160,54],[164,55],
                 [168,57],[172,60],[176,62],[180,65],
                 [184,69],[188,72]])

#X and Y store eigenvectors and labels respectively. The purpose of using reshape here is that data [:, 0] is a one-dimensional array, but the latter model calls in the form of matrix
X,y = data[:,0].reshape(-1,1),data[:,1]
#Distinguish between training set and test set
# train_ Size = 0.8 means that 80% of the data are randomly extracted as training data
X_train,X_test,y_train,y_test = train_test_split(X,y,train_size=0.8)

#Linear regression algorithm model
regr = linear_model.LinearRegression()
#Fitting data, training model
regr.fit(X_train,y_train)
#The return result obtained by score is the square value of the determination coefficient R
regr.score(X_train,y_train)
  • Square value of the determination coefficient r = 1-u / V
  • U = sum of squares of (actual value of Y – expected value of Y)
  • V = (actual value of Y – average value of actual value of Y) sum of squares — square value of output result R=0.963944147932503
  • font = {'family':"SimHei",'size':20}
    plt.rc('font',**font)
    ##Training data
    plt.scatter(X_train,y_train,color='r')
    ##Draw fitting line
    plt.plot(X_train,regr.predict(X_train),color='b')
    plt.scatter(X_test,y_test,color='black')
    #Test data
    plt. Xlabel ('height ')
    plt. Ylabel ('body weight ')
    plt.show()

    Python machine learning (I) - linear regression

    Let’s make a simple prediction. What’s the weight of a person with a height of 170?

  • np.round(regr.predict([[170]]),1)
    Array ([59.8]), we can see that 170 people weigh 59.8 kg according to our prediction.