Machine learning (6): Wu Enda’s notes

  • We have learned linear regression and logical regression before, but it may produce over fitting when it is applied to practice. Next, let’s see how to solve the problem of over fitting

What is over fitting

Machine learning (6): Wu Enda's notes

  • We still use the example of house price forecast to see the over fitting in linear regression

(1) If we use a straight line to fit the data, it is obviously not good, because house prices gradually flatten with the increase of area, which is inconsistent with the straight line trend. This situation is called under fitting (high deviation), assuming that the function does not fit the training set data well
(2) If we use quadratic function fitting, it is just right
(3) If the quartic polynomial is used to fit the data, it seems to fit the training data very well, passing through all the points. However, in order to fit the training data by all means, this hypothesis function uses too many function variables and is distorted, which can not be well generalized to new data (generalization: the ability of a hypothesis model to apply to new sample data). This is called fitting

  • Next, let’s take a look at the over fitting in logistic regression with the example of binary classification

Machine learning (6): Wu Enda's notes

  • Later, we will discuss how to debug and diagnose to find out what causes the learning algorithm failure, and how to use special tools to identify over fitting and under fitting. Next, we will see how to solve the over fitting problem

Methods to solve over fitting

  • Usually, over fitting is caused by more characteristic variables and less data. Therefore, we can solve the over fitting problem by reducing the number of variables, that is, we can abandon some variables. The latter model selection algorithm is to automatically keep and discard variables. However, abandoning variables often means abandoning part of the information of the problem
  • Therefore, regularization is the best method at present. Regularization can keep all characteristic variables

Cost function of regularization

  • Let’s first understand regularization intuitively through examples. Take the housing price forecast as an example, if we want to solve the problem of over fitting, we should punish the parameters 3 and 4 to make them as small as possible. We can add a number after the cost function (1000 is a random large number) times the square of 3 and 4. In this way, in order to make the cost function as small as possible, we will make 3 and 4 as small as possible; when 3 and 4 are close to 0, we can ignore these two numbers and get an approximate quadratic function

Machine learning (6): Wu Enda's notes

  • This is the idea behind regularization. By giving a smaller value of some parameters, we can get a more simple function, which is not easy to over fit. However, in the linear regression with hundreds of characteristics, we don’t know which parameters have little correlation with the results, so we can’t choose the parameters to make them smaller. Therefore, we penalize all parameters except parameter 0, add their regularization terms, and reduce all parameters
  • It should be noted that, first of all, we do not know why reducing all parameters can avoid over fitting like reducing the high-order parameters in house price forecast, but this is a fact that we can only accept; secondly, it is only our habit not to punish parameter 0. In fact, the result difference is very small, and we just do not add it out of management

Regularization parameter

  • The cost function with regularization parameter has two objectives

(1) We want to make the hypothesis function fit the training data better
(2) The parameter value should be as small as possible

Our regularization parameters need to balance the two
Machine learning (6): Wu Enda's notes

  • If our regularization parameter value is very large, the parameters are penalized to close to 0, it becomes under fitting; if the regularization parameter is small, the parameter value is not affected, it is easy to become over fitting
  • Therefore, we should choose the regularization parameters appropriately. Later, we will talk about the selection of regularization parameters by multiple selection

Regularization of logistic regression