Zero foundation “machine learning” self study notes | Note7: logical regression

Zero foundation


Write in front

This series is my personal notes when I taught myself [machine learning]. There may be many mistakes in the learning process. I hope all readers will give me some advice. This series is based onTeacher Wu Enda’s [“machine learning” course]As the key link, supplemented byHuang Haiguang’s [personal notes of Stanford University 2014 machine learning course (v5.51)], relevant mathematical knowledge will be interspersed in the middle.

logistic regression

7.1 classification issues

Logistic regression is a machine learning method used to solve the binary classification (0 or 1) problem, which is used to estimate the possibility of something.

Add dependent variable(dependent variable)The two classes that may belong to are called negative classes(negative class)And forward class(positive class), then the dependent variableWhere 0 represents a negative class and 1 represents a positive class.

Zero foundation


If we want to use linear regression algorithm to solve a classification problem, it is assumed that the output value of the function may be much greater than 1 or much less than 0. However, the property of logistic regression algorithm is: itsThe output value is always between 0 and 1.

Logistic regression algorithm is a classification algorithm, we use it as a classification algorithm. Sometimes you may be confused by the “regression” in the name of this algorithm, but the logical regression algorithm is actually a classification algorithm, which is applicable to the case where the value of label y is discrete, such as 1 0 1.

7.2 hypothesis representation

Looking back at the classification of breast cancer mentioned at the beginning, we can use linear regression to find out a straight line suitable for data.

Zero foundation


According to the linear regression model, we can only predict continuous values. However, for the classification problem, we need to output 0 or 1. We can predict:

WhenWhen forecasting

WhenWhen forecasting

For the data shown in the figure above, such a linear model seems to be able to complete the classification task well. If we observe a very large malignant tumor and add it to our training set as an example, we will get a new straight line.

Zero foundation


At this time, it is not appropriate to use 0.5 as the threshold to predict whether the tumor is benign or malignant. It can be seen that the linear regression model is not suitable to solve such a problem because its predicted value can exceed the range of [0,1].

We introduce a new model, logistic regression, whose output variables always range between 0 and 1.
The assumptions of the logistic regression model are:

Of which:

Representative eigenvector

Representative logic function(logistic function)Is a commonly used logic function forSShape function(Sigmoid function), the formula is:

The image of this function is:

[picture upload failed… (image-2e9344-1615636504740)]. PNG)

Together, we get the assumptions of the logistic regression model:

The function of is to calculate the estimated probability of output variable = 1 for a given input variable according to the selected parameters, i.e

For example, if for a given, calculated from the determined parameters, then there is a 70% chanceIs a positive class, corresponding toThe probability of being a negative class is 1-0.7 = 0.3.

7.3 judgment boundary

Zero foundation


In logistic regression, we predict that:

WhenWhen forecasting

WhenWhen forecasting

According to the drawing aboveSShape function image, we know when




also, i.e.:
When forecasting
When forecasting

Now suppose we have a model:

Zero foundation


And parametersIs a vector [- 3 1]. Then when, i.eWhen, the model will predict
We can draw straight lines, this line is the dividing line of our model, separating the area predicted as 1 from the area predicted as 0.

Zero foundation


If our data show such distribution, what kind of model can be suitable?

Zero foundation


Because you need a curve to separateAreas andIn the region of, we need quadratic characteristics:Is [- 1 0 1 1], then the decision boundary we get is exactly a circle with a circle point at the origin and a radius of 1.

We can use very complex models to adapt to the judgment boundary of very complex shapes.

7.4 cost function

Zero foundation


For linear regression models, the cost function we define is the sum of squares of all model errors. Theoretically, we can also use this definition for logistic regression models, but the problem is that when weWhen we bring it into the cost function defined in this way, the cost function we get will be a nonconvex function(non-convexfunction)。

Zero foundation


This means that our cost function has many local minima, which will affect the gradient descent algorithm to find the global minimum.

The cost function of linear regression is:
We redefine the cost function of logistic regression as:, where:

Zero foundation


AndThe relationship between is shown in the figure below:

Zero foundation


Built like thisFunction is characterized by: when the actualAndWhen it is also 1, the error is 0, whenbutWhen it is not 1, the error increases withBecome smaller and larger; When actualAndIt is also 0, and the time price is 0, whenbutWhen it is not 0, the error increases withIt gets bigger and bigger.
To be builtSimplified as follows:

Bring in the cost function to obtain:


This is the cost function of logistic regression:

Zero foundation

Screenshot 2021-03-13 160155

This formula can be combined into:

That is, the cost function of logistic regression:

The method of minimizing the cost function is to useGradient descent method(gradient descent)。

Zero foundation

Screenshot 2021-03-13 161844

The update process can be written as:

So, if you haveThere are two characteristics, that is:

Zero foundation


, parameter vectorinclude Until, then you need to use this formula:

To update all at the same timeValue of.

For linear regression hypothesis function:

Now the logical function assumes the following functions:

Therefore, even if the rules for updating parameters look basically the same, the gradient descent of logic function is actually two completely different things from that of linear regression because the definition of hypothesis has changed.

7.5 multi category classification: one to many

Let’s first look at the example of drug diagnosis. If a patient comes to your clinic because of nasal congestion, he may not be ill. Use itThis category represents; Or if you have a cold, use itTo represent; Or get the fluTo represent.

For binary classification problems and multi class classification problems, the data are as follows:

Zero foundation


Binary classification can use logistic regression, and the straight line can divide the data set into positive and negative categories.

Now we have a training set. For example, the above figure shows that there are three categories, which are represented by triangles, represented by a box, indicated by a fork。 What we need to do next is to use a training set and divide it into three binary classification problems.

Let’s start with category 1 represented by triangles. In fact, we can create a new “pseudo” training set. Type 2 and type 3 are set as negative classes, and type 1 is set as positive classes. We create a new training set, as shown in the figure below, and we need to fit an appropriate classifier.

To achieve this transformation, we mark one of the multiple classes as a positive class()Then mark all other classes as negative classes, and the model is recorded as。 Next, similarly, we select another class to mark as a forward class()Then mark other classes as negative classes and record this model as, and so on.

Zero foundation

Screenshot 2021-03-13 155046

Finally, we get a series of models, abbreviated as:Of which:

Finally, when we need to make a prediction, we run all the classifiers, and then select the output variable with the highest possibility for each input variable.

In short, we have finished what we need to do. Now we need to train the logistic regression classifier:, whereFor every possibleFinally, in order to make a prediction, we give a new inputValue, use this as a prediction. All we have to do is input in our three classifiersThen let’s choose onemaximal, i.e

whetherWhat is the value, we all have the highest probability value, we predictThat’s the value. This is the multi class classification problem and the one to many method. Through this small method, the logistic regression classifier can be used in the multi class classification problem.

Previous period:

Zero foundation “machine learning” self study notes | Note1: introduction to machine learning

Zero basis “machine learning” self study notes | Note2: univariate linear regression

Zero foundation “machine learning” self study notes | note3: review of linear algebra

Zero basis “machine learning” self-study notes | note5: multivariable linear regression

Zero basis “machine learning” self-study notes | note6: normal equation and its derivation (detailed derivation process is attached)