# III Machine learning algorithms – linear regression (2)

Time：2021-12-31

The least square method for solving the loss function written above In addition to the least square method, gradient descent can also be used.
Let’s give it at random first θ One value, and then move in the direction of the negative gradient, that is, the result of each iteration θ Value using J（ θ) Smaller than before. this α It refers to the learning rate, or step size, which affects the speed of the iteration.

Our function y = (x – 0.1) ²/ 2 as an example, the gradient descent method is used to find the value of X when y reaches the minimum
Code example

``````# coding:utf-8
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import font_manager

font = font_manager.FontProperties(fname="/usr/share/fonts/wps-office/msyhbd.ttf", size=25)

class OneGard(object):
def __init__(self,fx,hx):
"""
: param FX: original function
: param HX: derivative
"""
self.fx = fx
self.hx = hx
self.x = None
self.GD_X = []
self.GD_Y = []
self.iter_num = 0
self.f_change = None
self.f_current = None

def gard_fun(self,x, alpha=0.5):
"""
: param X: initial random X value
: param alpha: learning rate
:return:
"""
self.x = x
self.f_change = self.fx(self.x)
self.f_current = self.f_change
self.GD_X.append(x)
self.GD_Y.append(self.f_current)
while self.f_change > 1e-10 and self.iter_num < 100:
self.iter_num += 1
self.x = self.x - alpha * self.hx(self.x)
tmp = self.fx(self.x)
self.f_change = np.abs(self.f_current - tmp)
self.f_current = tmp
self.GD_X.append(self.x)
self.GD_Y.append(self.f_current)

def f(x):
"""
y = (x - 0.1)²/2
:param x:
:return:
"""
return (x - 0.1) ** 2 /2

def h(x):
"""
y = (x - 0.1) ²/ Derivative of 2
:param x:
:return:
"""
return (x - 0.1)

gard = OneGard(f, h)
gard.gard_fun(x=4,alpha=0.5)

Print ("final X: {:. 2F}, Y: {:. 2F}". Format (gard. X, gard. F_current))
Print ("number of iterations {}". Format (gard. Iter_num))
Print ("value of iteration process X: \ n {}". Format (gard. Gd_x))

#Drawing
X = np.arange(-4, 4.5, 0.05)
Y = np.array(list(map(lambda t: f(t), X)))

plt.figure(figsize=(20,10), facecolor='w')
plt.plot(X, Y, 'r-', linewidth=2)
plt.plot(gard.GD_X, gard.GD_Y, 'bo--', linewidth=2)

plt.show()``````

The result is

``````Final X: 0.10, Y: 0.00
Number of iterations 19
Value of X in iterative process:
[4, 2.05, 1.075, 0.5874999999999999, 0.34374999999999994, 0.221875, 0.1609375, 0.13046875000000002, 0.11523437500000001, 0.10761718750000002, 0.10380859375000001, 0.10190429687500001, 0.1009521484375, 0.10047607421875, 0.10023803710937501, 0.10011901855468751, 0.10005950927734375, 0.10002975463867188, 0.10001487731933595, 0.10000743865966798]``````

The image is as follows: There are three commonly used methods of gradient descent in machine learning:
reference resourcesThree forms of gradient descent method: bgd, SGD and MBGD
(this is clearly written. When I studied before, my mind became paste. After reading this article, I suddenly became enlightened.)

## 2. Polynomial regression

Linear regression is for θ For the sample itself, the sample can be nonlinear.
for example Then we can get X1 = x, X2 = X ²，
Get as follows This turns into our familiar linear regression.

Polynomial expansion is to map the points in low latitude space to high latitude space.

## 3.1 ridge regression

L2 regularization of linear regression is usually called ridge regression, also known as ridge regression. The difference from standard linear regression is that it adds an L2 regularization term to the loss function. λ It is a constant coefficient and a regularization coefficient. It belongs to a super parameter and needs to be adjusted.
λ If it is too small, it will lose the ability to process the fitting, and if it is too large, it will appear the phenomenon of under fitting due to too large force.

## 3.2 lasso regression

The linear regression model using L1 regularization is called lasso regression. The difference from ridge regression is that it adds L1 regularized terms. ## 3.3 elastic network

Elastic network, elasitc net, using both L1 regularization and L2 regularization ## 3.4 comparison of ridge regression and lasso regression

1) Both can be used to solve the over fitting problem of standard linear regression.
2) Lasso can be used for feature selection, but ridge regression is not, because lasso can make the coefficient of unimportant variables become 0, while ridge regression is not.
reference resourcesSystematic interpretation of linear regression, Lasso regression, ridge regression and elastic network

## 4. Code example

Ridge regression forecasting Boston house prices

``````from sklearn.datasets import load_boston
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

from matplotlib import pyplot as plt
from matplotlib import font_manager

font = font_manager.FontProperties(fname="/usr/share/fonts/wps-office/msyhbd.ttf", size=25)

"""
Ridge regression forecasting Boston house prices
:return:
"""

x_train, x_test, y_train, y_test = train_test_split(lb.data, lb.target, test_size=0.2)

x_std = StandardScaler()
y_std = StandardScaler()

x_train = x_std.fit_transform(x_train)
x_test = x_std.transform(x_test)
y_train = y_std.fit_transform(y_train.reshape(-1,1))
y_test = y_std.transform(y_test.reshape(-1,1))

model = Ridge(alpha=1.0)

model.fit(x_train, y_train)

y_predict = y_std.inverse_transform(model.predict(x_test))
return y_predict, y_std.inverse_transform(y_test)

def draw_fun(y_predict, y_test):
"""
Draw scatter and line charts of house price forecast and real value
:param y_predict:
:param y_test:
:return:
"""
x = range(1,len(y_predict)+1)
plt.figure(figsize=(25, 10), dpi=80)
plt. Scatter (x, y_test, label = "true value", color ='Blue ')
plt. Scatter (x, y_predict, label = 'predicted value', color ='Red ')
plt.plot(x,y_test)
plt.plot(x,y_predict)

x_tick = list(x)
y_tick = list(range(0,60,5))

plt.legend(prop=font, loc='best')
plt.xticks(list(x), x_tick)
plt.yticks(y_tick)
plt.grid(alpha=0.8)
plt.show()

if __name__ == '__main__': 