Definition of ROC
For example, in a data set of 10000 people, 100 people have a certain disease. Your task is to predict who has this disease. You predicted that 200 people had cancer, including: * TN, true negative: no cancer, and you also predicted that 9760 people had cancer * TP, true positive: 60 people had cancer, and you also predicted * FN, false negative: 40 patients had cancer but you didn’t predict * FP, False positive: there is no cancer, but you predict that there are 140 people with cancer
- True Positive Rate(TPR): 60/(60+40)=0.6
- False Positive Rate(FPR): 140/(9760+140)=0.0141
- accuracy: (9760+60)/10000=0.982
- precision: 60/(60+140)=0.3
- recall: 60/100=0.6
|Forecast \ actual||Positive Yang 100||Negative Yin 9900|
|Positive Yang 200||True positive, true positive, TP 60||False positive, FP 140|
|Negative negative 9800||False negative, FN 40||True negative, TN 9760|
- Among all the samples that are actually negative, the rate that is wrongly judged to be positive. FPR = FP rate = FP / ( FP + TN )
- Among all the samples that are actually positive, the rate that is correctly judged to be positive. TPR = TP rate = TP / (TP + FN)
- precision = TP / (TP + FP)
- recall = TPR = TP / (TP + FN)
- accuracy = ( TP + TN ) / ALL
- F-measure = 2 / (1/precision + 1 /recall)
These two indicators are understood in specific areas. For example, in medical diagnosis, judge the sick sample. So the main task is to find out the sick as much as possible, that is, the first indicator TPR. The higher the better. The samples without disease are misdiagnosed as sick, that is, the second index fpr, the lower the better. It is not difficult to find that these two indicators restrict each other. If a doctor is sensitive to the symptoms of illness and judges that he is ill with minor symptoms, his first index should be very high, but the second index will be higher accordingly. In the most extreme case, he regards all samples as sick, so the first index reaches 1 and the second index is also 1.
In the above cancer detection (in the case of extreme imbalance between positive and negative examples),
- The accuracy is of little significance. It can be seen from the results that because there are few positive examples, the detection is not very accurate, and the accuracy value is very high.
- Recall is relative toReal situationIn general, you correctly detected 60 cancer patients, and the total number of cancer patients was 100, so the recall was 60%.
- Precision is relative tomodel prediction In general, you predicted 200 cancer patients, of which 60 were accurate, and the precision was 30%;
- Therefore, we hope to see that the recall and precision of the model are very high, but it is generally difficult for them to reach the optimal value at the same time, so we need to make a trade-off
Sometimes we see the concepts of sensitivity and specificity:
- sensitivity = recall = True Positive Rate
- specificity = 1- False Positive Rate
In other words, a higher sensitivity is equivalent to a higher true positive rate, and a higher specificity is equivalent to a lower false positive rate /. In order to weigh recall and precision, ROC (receiver operating characteristic) curve and AUC (area under ROC curve) index can be used to judge the advantages and disadvantages of two classifiers.
Graphical and calculation of ROC
Several concepts of ROC curve:
- The abscissa is FPR and the ordinate is TPR
- Each point in the figure is the FPR and TPR calculated under different “truncation point / threshold“. The truncation point / threshold is equivalent to the probability output of the model to the sample, which can also be said to be a score
Taking the doctor’s diagnosis as an example, we can see that:
- The point in the upper left corner (TPR = 1, fpr = 0) is a perfect classification, that is, the doctor has excellent medical skills and correct diagnosis;
- At point a (TPR > FPR), doctor a’s judgment is generally correct;
- The point on the midline represents the result given by the doctor, which is the same as the result of randomly reporting the patient’s illness according to a certain proportion. The random proportion from the lower left corner to the upper right corner is 0% to 100% respectively. When the random proportion is 50%, it represents half right and half wrong, that is, the point in the middle of the image. When MPR = 0.5, that is, when MPR = 0.5, it is all in the middle of the doctor’s half, that is, when MPR = 0.5;
- At point C (TPR < FPR) in the lower half plane, if the doctor says you are ill, then you are probably not ill. We should listen to doctor C in reverse. It is a real quack.
In the figure above, a threshold is used to get a point. Now we need an evaluation index independent of the threshold to measure the doctor’s medical skills, that isTraverse all thresholds, the ROC curve is obtained. It’s the same picture from the beginning. Let’s assume that the following is the diagnostic statistics of a doctor, and the straight line represents the threshold. We traverse all the thresholds and can get the following ROC curve on the ROC plane.
Calculate the value of each point of the ROC curve with a simple simulation data
- First, there is a group of real classifications, such as 0,1,0,1, that is, there are 2 positive examples and 2 negative examples; Then a set of model predicted scores (probability), such as 0.2, 0.3, 0.5, 0.8
- The FPR and TPR under the output probability of each sample are calculated according to the above scores
- Firstly, the cutoff point is 0.2, that is, the threshold is set to 0.2. When the probability is greater than or equal to 0.2, the prediction is a positive example. Therefore, all four samples are positive examples. At this time, TP = 2, FN = 0, FP = 2, TN = 0; FPR= FP / ( FP + TN ) = 2 / (2+0) = 1，TPR=TP / (TP + FN) = 2 / (2+0) = 1
- Then, the cutoff point is 0.3, that is, the threshold is set to 0.23. Therefore, when the probability is greater than or equal to 0.3, the prediction is a positive example, so sample 1 is predicted as a negative example, and sample 2-4 is a positive example. At this time, TP = 2, FN = 0, FP = 1, TN = 1; FPR= FP / ( FP + TN ) = 1 / (1+1) = 0.5，TPR=TP / (TP + FN) = 2 / (2+0) = 1
- Calculate the cutoff point of 0.5, that is, when the threshold is set to 0.5, fpr = 0.5, TPR = 0.5;
- Calculate the cutoff point of 0.8, that is, when the threshold is set to 0.8, fpr = 0, TPR = 0.5;
Python can use sklearn, and R can use ROCR package or proc package. Here, use ROCR package to check the above calculation results:
library(ROCR) y <- c(0,1,0,1) p <- c(0.2,0.3,0.5,0.8) pred <- prediction(p, y) perf <- performance(pred, "tpr", "fpr") > perf An object of class "performance" Slot "x.name":  "False positive rate" Slot "y.name":  "True positive rate" Slot "alpha.name":  "Cutoff" Slot "x.values": []  0.0 0.0 0.5 0.5 1.0 Slot "y.values": []  0.0 0.5 0.5 1.0 1.0 Slot "alpha.values": []  Inf 0.8 0.5 0.3 0.2
x. Values corresponds to fpr, y.values corresponds to TPR, alpha Values corresponds to the cut off of the prediction score. The result is completely consistent with the above, and then simply make an ROC chart.
library(pROC) modelroc <- roc(y,p) plot(modelroc, print.auc=TRUE, auc.polygon=TRUE)
Definition of AUC
The AUC value is equivalent to the area covered by the ROC curve. It can be seen from the ROC curve that the larger the AUC value, the better the classification effect.
- AUC = 1 is a perfect classifier. When using this prediction model, no matter what threshold is set, a perfect prediction can be obtained. In most cases of prediction, there is no perfect classifier.
- 0.5 < AUC < 1, better than random guess. This classifier (model) can have predictive value if the threshold is properly set.
- AUC = 0.5, which is the same as random guess (e.g. lost copper plate), and the model has no predictive value.
- AUC < 0.5, worse than random guess; But as long as it is always reverse prediction, it is better than random guess.
How to calculate AUC
- Method 1: calculate the area under the ROC curve from the early
- Method 2: calculate the probability that the positive example score is greater than the negative example score:
Still the above example, the real classification is: 0,1,0,1, that is, 2 positive examples are recorded as m, and 2 negative examples are recorded as N; Scoring of model prediction: 0.2,0.3,0.5,0.8.
There are 2 positive examples (0.3 and 0.8) and 2 negative examples (0.2 and 0.5), so there are 2 * 2 = 4 pairs ([0.3,0.2], [0.8,0.2], [0.3,0.5], [0.8,0.5]) positive and negative sample pairs in total.
According to the score predicted by the model, there are three pairs ([0.3,0.2], [0.8,0.2], [0.8,0.5]) where the score of the positive example is greater than that of the negative example (it should be understood here as [0.3,0.2], 0.3 is the positive example, 0.2 is the negative example, and 0.3 > 0.2), so the AUC is 3 / 4 = 0.75
- Method 3: when there are many samples, the complexity of the above algorithm is too high o (n * m). Someone proposed a simple algorithm: sort the score from small to small, the maximum score sorting index is n, and the minimum is 1; Then sum the indexes of positive examples and subtract the number of positive examples positive examples m * (M + 1) / 2; Finally, divide by m * n.
Take the above example: (0.2, 0.3, 0.5, 0.8) → (1, 2, 3, 4), the sorting indexes of positive examples are 2 and 4, and m and N are both 2, so AUC = (2 + 4-2 * (2 + 1) / 2) / (2 * 2) = 0.75