Classification of machine learning: accuracy

Time:2021-9-24

Accuracy is an indicator used to evaluate classification models. Generally speaking,AccuracyIt refers to the proportion of accurate results predicted by our model. Formally, accuracy is defined as follows:

Accuracy = \dfrac{Number of correct predictions}{Total number of predictions}

For binary classification, the accuracy can also be calculated according to the positive category and negative category as follows:

Accuracy = \dfrac{TP + TN}{TP + TN + FP + FN}

Where TP = true case, FP = false positive case, FN = false negative case.

Let’s try to calculate the accuracy of the following model, which divides 100 tumors intomalignant(positive category) orBenign(negative category):

Classification of machine learning: accuracy

Accuracy = \dfrac{TP + TN}{TP + TN + FP + FN} \\{}\\= \dfrac{1 + 90}{1 + 90 + 1 + 8} = 0.91

The accuracy was 0.91, i.e. 91% (91 of 100 samples were predicted correctly). This means that our tumor classifier is very good at identifying malignant tumors, right?

In fact, as long as we carefully analyze the positive and negative categories, we can better understand the effect of our model.

Of the 100 tumor samples, 91 were benign (90 TN and 1 FP) and 9 were malignant (1 TP and 8 FN).

Of the 91 benign tumors, the model correctly identified 90 as benign. That’s ok. However, of the nine malignancies, the model correctly identified only one as malignant. What a terrible result! 8 of the 9 malignant tumors were not diagnosed!

Although the 91% accuracy may look good at first glance, if another tumor classifier model always predicts benign, the model will achieve the same accuracy using our sample (91 of 100 predictions are correct). In other words, our model is similar to those without predictive ability to distinguish malignant tumors from benign tumors.

When you useClassification of unbalanced data sets(for example, there is a significant difference between the number of positive category labels and negative category labels), the accuracy alone does not reflect the overall situation.

In the next section, we will introduce two indicators that can better evaluate classification imbalance: accuracy and recall.

This work adoptsCC agreement, reprint must indicate the author and the link to this article

Hacking