Time：2020-11-6

# Standard equation method

There is not only gradient descent method to solve the cost function, but also the gradient descent method needs to select the appropriate learning rate, need to iterate for many cycles, and can only find the approximate value of the optimal solution, while the standard equation method does not need iteration and does not need to select the learning rate, so it can get the global optimal solution. However, the sample number of standard equation method must be greater than the number of data features.

## Feature scaling method

1. Data normalization: when the order of magnitude difference between multiple data features of the sample is large, the data can be normalized (the value range is treated as 0-1 or – 1-1) 2. Standardized mean value: ## cross validation

The data were divided into N parts and numbered as 0 n. Take 0 for the first training (n-1) is the training set and N is the test set (n-2) and N are training sets, and N-1 are test sets.

### There are m training samples and each sample has n data features ### Let y be the sample output value ### Let the fitting function be: H (x) = θ 0 * x0 + θ 1 * x1 . + θ n * xn (where x0 = 1) ### Cost function determined by mean square error ### Proof process

1. Simplification process 2. Derivation 3. Item 1: 4. The second item: 5. The third item: 6. Fourth item: 7. result: # Over fitting and regularization Methods to avoid over fitting:

1. Reduced features
2. Increase the amount of data
3. Regularized

### Ridge return ##### Matrix formula: ``````import numpy as np
import matplotlib.pyplot as plt
from numpy import genfromtxt
from sklearn import linear_model
data = genfromtxt(r'E:/project/python/data/csv/longley.csv', delimiter=',')
x_data = data[1:, 2:]
y_data = data[1:, 1]
alphas_to_test = np.linspace(0.001, 1)
model = linear_model.RidgeCV(alphas=alphas_to_test, store_cv_values=True)
model.fit(x_data, y_data)
#Ridge Coefficient
print(model.alpha_)
#Drawing
plt.plot(alphas_to_test, model.cv_values_.mean(axis=0))
plt.plot(model.alpha_, min(model.cv_values_.mean(axis=0)), 'ro')
plt.show()
#Print results
0.40875510204081633`````` ### Lasso function ``````from numpy import genfromtxt
from sklearn import linear_model
data = genfromtxt(r'E:/project/python/data/csv/longley.csv', delimiter=',')
x_data = data[1:, 2:]
y_data = data[1:, 1]
model = linear_model.LassoCV()
model.fit(x_data, y_data)
#Lasso coefficient
print(model.alpha_)
#Correlation coefficient
print(model.coef_)
#Print results
14.134043936116361
[0.10093575 0.00586331 0.00599214 0.         0.         0.        ]``````

### Elastic network: combining ridge regression and lasso function ##### Modified regular part: ``````from numpy import genfromtxt
from sklearn import linear_model
data = genfromtxt(r'E:/project/python/data/csv/longley.csv', delimiter=',')
x_data = data[1:, 2:]
y_data = data[1:, 1]
model = linear_model.ElasticNetCV()
model.fit(x_data, y_data)
#Lasso coefficient
print(model.alpha_)
#Correlation coefficient
print(model.coef_)
#Print results
30.31094405430269
[0.1006612  0.00589596 0.00593021 0.         0.         0.        ]``````

# polynomial regression

Whether it is linear regression or multiple linear regression, they can only fit the data of approximate straight line or single plane, while polynomial regression can fit more complex function changes ``````import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

data = np.genfromtxt(r'E:/project/python/data/csv/job.csv', delimiter=',')
x_data = data[1:, 1, np.newaxis]
y_data = data[1:, 2, np.newaxis]

#Defining polynomial regression
ploy_reg = PolynomialFeatures(degree=5)
#Feature processing
x_ploy = ploy_reg.fit_transform(x_data)
lin_reg = LinearRegression()
lin_reg.fit(x_ploy, y_data)
#Drawing
plt.plot(x_data, y_data, 'b.')
plt.plot(x_data, lin_reg.predict(ploy_reg.transform(x_data)), c='r')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()`````` Understanding of degree: degree is the coefficient of X in the formula. The model determines the most suitable θ by changing the value of X. when the function relationship is more complex, the first power function can’t fit the most suitable in any case, so we can create more complex high-order power function by changing the coefficient of X