There is an interesting question: clearly LR is used for classification, why should we call it expression regression?? In the last chapter, we dredged the mathematical theory, modeling basis and loss function optimization method of LR algorithm, and we can answer this question with a little thinking.
In this chapter, we will analyze LR from the perspective of LR model evaluation and model parameter analysis.
The first is model evaluation
There are many references to evaluate the quality of a model. As a classical binary classification model, the confusion matrix matrix is more suitable.
A true negative TN
B false negative FN ×
C true postive TP ා correct prediction of positive sample number
D false positive FP
Comparing the input results with the label value of the tester, we can get abcd4 data
Accuracy: (number of correct predictions) / (all samples) = (a + C) / (a + B + C + D) ා accuracy
Positive coverage: (correct prediction of positive cases) / (all positive cases) = C / (B + C) ා sensitivity
Negative coverage: (number of negative cases of false prediction) / (all negative cases) = B / (a + D) ා specificity
The higher the accuracy, sensitivity and specificity, the better
However, these indicators are not unique, and not all models are applicable, and need to be combined with data processing, but in most cases, these three indicators can measure the quality of the model.
ROC curve is a more intuitive and comprehensive evaluation method.
Before introducing the ROC curve, I will introduce the basis of classification. According to the LR model we built in the previous chapter, we can know that the output of our model is a number between 0 and 1, which represents the probability of Logit distribution. In most cases, we will choose 0.5 as the classification threshold. This is actually very understandable. If you are participating in a bet, for some reason, the probability of the banker winning is greater than 0.5, you will also choose to prefer the banker to win, right.
Now the ROC curve is officially introduced. The horizontal axis is 1-specificity and the vertical axis is sensitivity. If we follow the previous theory, we will find that specificity and sensitivity should be a constant value curve?
Therefore, when drawing ROC curve, we will choose different threshold, calculate various indicators under different threshold, and the point in the connecting coordinate axis is the ROC curve. As shown in the figure below:
The picture is from “Python data analysis and mining” by Liu Shunxiang
The area enclosed by the ROC curve and the coordinate axis is calculated. 0.8 represents the model OK.
There is also a curve evaluation model, which I think is more concise and convenientKS curve
The ROC curve mentioned above did not emphasize how to select the threshold value when drawing, so it is possible that the curve drawn is not uniform. 1. The KS curve first sorts the model output. 2. According to the number of samples, it is divided into ten equal parts, and the threshold value of ten equal parts is selected to calculate the above three indexes. 3. Draw KS curve, the horizontal axis is equal to ten parts of the value, and the vertical axis has two curves, one is 1-specificity and the other is sensitivity. As shown in the figure below:
As shown in the figure, calculate the maximum difference between the two curves, that is, KS value. When KS value is above 0.4, the model is OK.
Analysis of the model
I remember hearing from a teacher that the winner of a bitmap spirit award once said that all unexplained analysis and prediction are unreliable. I don’t care if this sentence is correct or not, but fortunately, LR model has very strong interpretability.
In terms of macro theory, LR is derived from linear regression and obeys logit probability distribution, so it is supported by stable mathematical theory. Linear regression characteristic analysis is very clear, so the characteristic relationship of LR can also be explained, but in different forms.
As shown in my handwriting above, odds are called “occurrence ratio”, which refers to the probability ratio of an event’s occurrence and non occurrence. Some textbooks also call it “advantage”. The derivation of odds will not elaborate on P / (1-p), which was also shown in the previous chapter.
It is obvious that the eigenvalue Xi increases by 1, and the occurrence ratio is twice the original exp [Bi].
Application scenario analysis
To sum up, LR can be used in some places with strong characteristic relationship and good data quality due to its strong explanatory and mathematical logic. Typical is the bank customer risk classification, because the bank data quality is good, the characteristic relationship is obvious, LR can bring very clear and reliable valuable data analysis to the bank.