mean Average Precision（mAP）
Before introducing the concept of map, let’s review some concepts:
TP: true positive, a real class, predicts a positive class as a positive class number.
TN: true negative, which predicts the number of negative classes as the number of negative classes.
FP: false positive, a false positive class, predicts a negative class as a positive class.
FN: false negative, false negative, which predicts positive classes as negative classes.
According to the above, we can get the accuracy, precision, recall and F1 score.
#Proportion of all samples with correct classification Accuracy = (TP + TN) / (TP + TN + FP+ FN) #Of all the positive samples predicted, the correct proportion is predicted precision = TP / (TP + FP) #Proportion of all positive samples predicted to be positive recall = TP / (TP + FN) #A trade-off between accuracy and recall F1 = 2 * precision * recall /(precision + recall)
MPAP is a common evaluation standard in target detection tasks. What is mPAP and why is it used.
In the task of target detection, it is necessary to judge whether a predicted bounding box is correct or not. We will calculate the predicted bounding box and the IOU of the real box, and then set a threshold value. If the IOU > threshold value, then it is considered to be correct. If the threshold value of IOU is increased, the accuracy rate will increase and the recall rate will decrease. If the threshold value of IOU is reduced, the recall rate will increase and the accuracy rate will decrease. In this way, it is certainly not enough to evaluate the network model with only one threshold. How to implement a trade off between precision and recall.
Since one threshold is not enough, then take multiple thresholds to get multiple precision and recall. In this way, the following precision recall curve, also known as PR curve, can be obtained. The area enclosed by PR curve and coordinate axis is AP.
Before voc2010, only 11 values [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] are selected from recall, corresponding to 11 points in total, and then the area enclosed by PR curve and coordinate axis is calculated as AP.
In voc2010 and later, for each different recall value (including 0 and 1), select the precision maximum value when it is greater than or equal to these recall values, and then calculate the area under PR curve as AP value.
Each category can get a PR curve, corresponding to an AP. Average the APS of all categories to obtain the map.
Here is the interpolation average AP, and there is another calculation method. The difference between them can refer to here.
The figure below shows the original PR curve (green) and the interpolated PR curve (blue dotted line). It is difficult to directly calculate the area enclosed by the original PR curve and the coordinate axis (integral calculation is required), while it is convenient and simple to calculate the area enclosed by the blue dotted line and the coordinate axis. Interpolation method fills up the rising part of PR curve to ensure that PR curve is a decreasing curve.
The map calculation code is as follows:
1. Firstly, TP and FP of each category are calculated to get the accuracy and recall rate of each category.
def calc_detection_voc_prec_rec( pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_difficults=None, iou_thresh=0.5): """ An evaluation code for the Pascal VOC data set used to calculate the accuracy and recall rates Args: PRED? Bboxes (list): an iterative list of prediction boxes, each of which is an array PRED? Labels (list): a list of prediction labels that can be iterated Pred_scores (list): a list of prediction probabilities that can be iterated GT > bboxes (list): a list of real boxes that can be iterated Gt_labels (list): an iterative list of real box labels Gt_difficulties (list): a list of real box prediction difficulties that can be iterated. The default value is none, indicating that the difficulty levels are all low IOU thresh (float): if the IOU of the prediction box and the corresponding real box is greater than this threshold, the prediction is considered correct Returns: Rec (list): array list. Rec [l] represents the recall rate of the first class. If the first class does not exist, it is set to none Pre (list): array list. Pre [l] indicates the accuracy of the first class. If the first class does not exist, it is set to none """ #Turn all lists to iteratable objects pred_bboxes = iter(pred_bboxes) pred_labels = iter(pred_labels) pred_scores = iter(pred_scores) gt_bboxes = iter(gt_bboxes) gt_labels = iter(gt_labels) if gt_difficults is None: gt_difficults = itertools.repeat(None) else: gt_difficults = iter(gt_difficults) #Number of easy real boxes per category level n_pos = defaultdict(int) # score = defaultdict(list) #Indicates whether each prediction box matches the real box match = defaultdict(list) # pred_bbox, pred_label, pred_score, gt_bbox #The length of the six lists is the same #Each iteration is equivalent to a batch for pred_bbox, pred_label, pred_score, gt_bbox, gt_label, gt_difficult in \ six.moves.zip( pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels, gt_difficults): if gt_difficult is None: gt_difficult = np.zeros(gt_bbox.shape, dtype=bool) #Process each category separately for l in np.unique(np.concatenate((pred_label, gt_label)).astype(int)): #Retrieve the forecast box and forecast score belonging to category L pred_mask_l = pred_label == l pred_bbox_l = pred_bbox[pred_mask_l] pred_score_l = pred_score[pred_mask_l] #Sort forecast boxes in ascending order of probability score order = pred_score_l.argsort()[::-1] pred_bbox_l = pred_bbox_l[order] pred_score_l = pred_score_l[order] #Take out the real box belonging to category L gt_mask_l = gt_label == l gt_bbox_l = gt_bbox[gt_mask_l] gt_difficult_l = gt_difficult[gt_mask_l] #Count the number of non difficult borders by category, default to all n_pos[l] += np.logical_not(gt_difficult_l).sum() score[l].extend(pred_score_l) #If there is no forecast box if len(pred_bbox_l) == 0: continue #No match if the number of real boxes is 0 if len(gt_bbox_l) == 0: match[l].extend((0,) * pred_bbox_l.shape) continue pred_bbox_l[:, 2:] += 1 gt_bbox_l[:, 2:] += 1 #Calculating IOU of prediction box and real box iou = bbox_iou(pred_bbox_l, gt_bbox_l) #Get the index of the real box with the largest IOU of each prediction box gt_index = iou.argmax(axis=1) #If the IOU is less than the threshold value, i.e. there is no prediction box corresponding to the real box, set the index to - 1 gt_index[iou.max(axis=1) < iou_thresh] = -1 del iou #Indicates whether the real box is matched or not. If not, the label is 0. Otherwise, the label is 1 #Note: each real box can only match one prediction box once selec = np.zeros(gt_bbox_l.shape, dtype=bool) for gt_idx in gt_index: if gt_idx >= 0: #If the corresponding real box difficulty level is high if gt_difficult_l[gt_idx]: match[l].append(-1) else: #Match if the real box is matched if not selec[gt_idx]: match[l].append(1) else: match[l].append(0) #Set the prediction box corresponding to index GT > IDX as matched selec[gt_idx] = True else: match[l].append(0) n_fg_class = max(n_pos.keys()) + 1 prec = [None] * n_fg_class rec = [None] * n_fg_class for l in n_pos.keys(): score_l = np.array(score[l]) match_l = np.array(match[l], dtype=np.int8) #Descending order according to the probability of prediction category order = score_l.argsort()[::-1] match_l = match_l[order] tp = np.cumsum(match_l == 1) fp = np.cumsum(match_l == 0) #If FP + TP is 0, set prec [l] to Nan prec[l] = tp / (fp + tp) #If n POS [l] is 0, set rec [l] to none if n_pos[l] > 0: rec[l] = tp / n_pos[l] return prec, rec
2. Calculate the AP for each category based on pre and REC.
def calc_detection_voc_ap(prec, rec, use_07_metric=False): """ Args: Prec: array list Rec: array list Returns: AP (array): average accuracy of each category, shape - > (len (n ﹤ FG class),) """ n_fg_class = len(prec) ap = np.empty(n_fg_class) for l in six.moves.range(n_fg_class): if prec[l] is None or rec[l] is None: ap[l] = np.nan continue if use_07_metric: # 11 point metric ap[l] = 0 for t in np.arange(0., 1.1, 0.1): if np.sum(rec[l] >= t) == 0: p = 0 else: p = np.max(np.nan_to_num(prec[l])[rec[l] >= t]) ap[l] += p / 11 else: Interpolation algorithm #Insert 0 at the beginning and the end to ensure the decrease of the final PR curve mpre = np.concatenate((, np.nan_to_num(prec[l]), )) mrec = np.concatenate((, rec[l], )) #Np.maximum.accumulate along the specified axis, starting from the second element, compare it with the previous element, and take the maximum value #Compare from back to front, take the maximum value, and fill in the rising part of PR curve #The following code is equivalent to #for i in range(mpre.size - 1, 0, -1): # mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) mpre = np.maximum.accumulate(mpre[::-1])[::-1] #Starting from position 2, get the index that is not equal to the previous value i = np.where(mrec[1:] != mrec[:-1]) #Calculated area ap[l] = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) return ap
3. Calculate the average for all categories of APS.
Mean Intersection over Union(MIoU)
Miou is the model evaluation standard in the semantic segmentation task. After the IOU of each category is averaged, Miou is obtained. The calculation of IOU is shown in the figure below, IOU = overlap / Union.
The Miou calculation code is as follows:
- Calculating confusion matrix
def gen_matrix(gt_mask, pred_mask, class_num): """ Gt_mask (ndarray): shape - > (height, width), real segmentation map Pred_mask (ndarray): shape - > (height, width), predicted segmentation result Class_num: number of classes without background """ mask = (gt_mask >= 0) & (gt_mask < n) #Bincount is a count function. It sorts the array from small to large and counts it. By default, it counts from 0 to the maximum value of the array. count = np.bincount(n * gt_mask[mask].astype(int) \ + pred_mask[mask], minlength=n ** 2) #Confusion matrix cf_mtx = count.reshape(class_num, class_num) return cf_mtx
- According to the confusion matrix, all kinds of IOU are calculated, and finally the Miou is averaged.
def mean_iou(cf_mtx): """ CF ﹣ MTX (ndarray): shape - > (class ﹣ num, class ﹣ Num), confusion matrix """ # mIou = np.diag(cf_mtx) / (np.sum(cf_mtx, axis=1) + \ np.sum(cf_mtx, axis=0) -np.diag(cf_mtx)) #Average IOU of all categories mIou = np.nanmean(mIou) return mIou