III Machine learning algorithms – linear regression (2)

Time:2021-12-31

1. Gradient descent method

The least square method for solving the loss function written above
III Machine learning algorithms - linear regression (2)

In addition to the least square method, gradient descent can also be used.
Let’s give it at random first θ One value, and then move in the direction of the negative gradient, that is, the result of each iteration θ Value using J( θ) Smaller than before.

III Machine learning algorithms - linear regression (2)
this α It refers to the learning rate, or step size, which affects the speed of the iteration.

Our function y = (x – 0.1) ²/ 2 as an example, the gradient descent method is used to find the value of X when y reaches the minimum
Code example

# coding:utf-8
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import font_manager


font = font_manager.FontProperties(fname="/usr/share/fonts/wps-office/msyhbd.ttf", size=25)

class OneGard(object):
    def __init__(self,fx,hx):
        """
        : param FX: original function
        : param HX: derivative
        """
        self.fx = fx
        self.hx = hx
        self.x = None
        self.GD_X = []
        self.GD_Y = []
        self.iter_num = 0
        self.f_change = None
        self.f_current = None

    def gard_fun(self,x, alpha=0.5):
        """
        gradient descent 
        : param X: initial random X value
        : param alpha: learning rate
        :return:
        """
        self.x = x
        self.f_change = self.fx(self.x)
        self.f_current = self.f_change
        self.GD_X.append(x)
        self.GD_Y.append(self.f_current)
        while self.f_change > 1e-10 and self.iter_num < 100:
            self.iter_num += 1
            self.x = self.x - alpha * self.hx(self.x)
            tmp = self.fx(self.x)
            self.f_change = np.abs(self.f_current - tmp)
            self.f_current = tmp
            self.GD_X.append(self.x)
            self.GD_Y.append(self.f_current)


def f(x):
    """
    y = (x - 0.1)²/2
    :param x:
    :return:
    """
    return (x - 0.1) ** 2 /2

def h(x):
    """
    y = (x - 0.1) ²/ Derivative of 2
    :param x:
    :return:
    """
    return (x - 0.1)



gard = OneGard(f, h)
gard.gard_fun(x=4,alpha=0.5)

Print ("final X: {:. 2F}, Y: {:. 2F}". Format (gard. X, gard. F_current))
Print ("number of iterations {}". Format (gard. Iter_num))
Print ("value of iteration process X: \ n {}". Format (gard. Gd_x))

#Drawing
X = np.arange(-4, 4.5, 0.05)
Y = np.array(list(map(lambda t: f(t), X)))

plt.figure(figsize=(20,10), facecolor='w')
plt.plot(X, Y, 'r-', linewidth=2)
plt.plot(gard.GD_X, gard.GD_Y, 'bo--', linewidth=2)

plt.show()

The result is

Final X: 0.10, Y: 0.00
Number of iterations 19
Value of X in iterative process:
[4, 2.05, 1.075, 0.5874999999999999, 0.34374999999999994, 0.221875, 0.1609375, 0.13046875000000002, 0.11523437500000001, 0.10761718750000002, 0.10380859375000001, 0.10190429687500001, 0.1009521484375, 0.10047607421875, 0.10023803710937501, 0.10011901855468751, 0.10005950927734375, 0.10002975463867188, 0.10001487731933595, 0.10000743865966798]

The image is as follows:
III Machine learning algorithms - linear regression (2)
There are three commonly used methods of gradient descent in machine learning:
Batch gradient descent (bgd), random gradient descent (SGD), small batch gradient descent (MBGD).
reference resourcesThree forms of gradient descent method: bgd, SGD and MBGD
(this is clearly written. When I studied before, my mind became paste. After reading this article, I suddenly became enlightened.)

2. Polynomial regression

Linear regression is for θ For the sample itself, the sample can be nonlinear.
for example
III Machine learning algorithms - linear regression (2)
Then we can get X1 = x, X2 = X ²,
Get as follows
III Machine learning algorithms - linear regression (2)
This turns into our familiar linear regression.

Polynomial expansion is to map the points in low latitude space to high latitude space.

3. Other linear regression

3.1 ridge regression

L2 regularization of linear regression is usually called ridge regression, also known as ridge regression. The difference from standard linear regression is that it adds an L2 regularization term to the loss function.
III Machine learning algorithms - linear regression (2)

λ It is a constant coefficient and a regularization coefficient. It belongs to a super parameter and needs to be adjusted.
λ If it is too small, it will lose the ability to process the fitting, and if it is too large, it will appear the phenomenon of under fitting due to too large force.

3.2 lasso regression

The linear regression model using L1 regularization is called lasso regression. The difference from ridge regression is that it adds L1 regularized terms.
III Machine learning algorithms - linear regression (2)

3.3 elastic network

Elastic network, elasitc net, using both L1 regularization and L2 regularization
III Machine learning algorithms - linear regression (2)

3.4 comparison of ridge regression and lasso regression

1) Both can be used to solve the over fitting problem of standard linear regression.
2) Lasso can be used for feature selection, but ridge regression is not, because lasso can make the coefficient of unimportant variables become 0, while ridge regression is not.
reference resourcesSystematic interpretation of linear regression, Lasso regression, ridge regression and elastic network

4. Code example

Ridge regression forecasting Boston house prices

from sklearn.datasets import load_boston
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from matplotlib import pyplot as plt
from matplotlib import font_manager

font = font_manager.FontProperties(fname="/usr/share/fonts/wps-office/msyhbd.ttf", size=25)

def radge_fun():
   """
   Ridge regression forecasting Boston house prices
   :return:
   """
   lb = load_boston()

   x_train, x_test, y_train, y_test = train_test_split(lb.data, lb.target, test_size=0.2)

   x_std = StandardScaler()
   y_std = StandardScaler()

   x_train = x_std.fit_transform(x_train)
   x_test = x_std.transform(x_test)
   y_train = y_std.fit_transform(y_train.reshape(-1,1))
   y_test = y_std.transform(y_test.reshape(-1,1))

   model = Ridge(alpha=1.0)

   model.fit(x_train, y_train)

   y_predict = y_std.inverse_transform(model.predict(x_test))
   return y_predict, y_std.inverse_transform(y_test)


def draw_fun(y_predict, y_test):
   """
   Draw scatter and line charts of house price forecast and real value
   :param y_predict:
   :param y_test:
   :return:
   """
   x = range(1,len(y_predict)+1)
   plt.figure(figsize=(25, 10), dpi=80)
   plt. Scatter (x, y_test, label = "true value", color ='Blue ')
   plt. Scatter (x, y_predict, label = 'predicted value', color ='Red ')
   plt.plot(x,y_test)
   plt.plot(x,y_predict)

   x_tick = list(x)
   y_tick = list(range(0,60,5))

   plt.legend(prop=font, loc='best')
   plt.xticks(list(x), x_tick)
   plt.yticks(y_tick)
   plt.grid(alpha=0.8)
   plt.show()


if __name__ == '__main__':
   y_predict, y_test = radge_fun()
   draw_fun(y_predict, y_test)

result
III Machine learning algorithms - linear regression (2)