[machine learning] Why is logical regression called “regression” and how does sigmoid come from


By logm

This article was originally published at https://segmentfault.com/u/logm/articles and is not allowed to be reproduced~

If the mathematical formula in the article cannot be displayed correctly, please refer to: Tips for displaying the mathematical formula correctly

1. Logistic regression model

The formula of logistic regression model can be written as follows:

$$P(Y=1|x) = \frac{1}{1+e^{-wx}}$$

$$P(Y=0|x) = \frac{e^{-wx}}{1+e^{-wx}}$$

2. Probability of logistic regression

The event probability (odd) is defined as:

$$odd = \frac{p}{1-p}$$

Therefore, the log odds or logit function is:

$$logit(p) = log \frac{p}{1-p}$$

After the introduction of the logistic regression model, there are:

$$log \frac{P(Y=1|x)}{1-P(Y=1|x)} = w \cdot x$$

It can be found that the logarithm probability $logit (P) $is linear with $x $, so it can be called “regression”.

3. How does sigmoid function come from

The two classification problems that need to be solved in logical regression are actually to model:

$$P(Y=1|x) = f(wx)$$

The distribution of $y|x $is Bernoulli distribution, the distribution of $Wx $is normal distribution, and the function $f (x) $needs to map the normal distribution to Bernoulli distribution. What kind of function has such a property? Mathematicians found $f (x) = sigmoid (x) $.

4. The most accurate explanation

In fact, the above explanation of “why logical regression is called regression” and “why logical regression uses sigmoid” is not very accurate. It is for the convenience of readers to understand. For most students, it is enough to understand the above explanation.

If we want to understand “logical regression” accurately, we need readers to know in advanceGeneralized linear model, which requires a deep mathematical foundation.

There is too much knowledge involved in this part. I’ll simply mention it here. It needs readers to read it by themselvesPRMLThis book.

Linear regression is a kind of “generalized linear model”, and logical regression is also a kind of “generalized linear model”. The connection function of linear regression is identity function, so it does not need activation function; the connection function of logical regression is sigmoid function, so the activation function of logical regression is sigmoid function.