IoU-aware Single-stage Object Detector for Accurate Localization


The network structure is as follows:

Using FPN structure, backbone is retinalnet, which is divided into five layers P3 ~ P7, training boxes of different sizes. Each layer has two branches corresponding to the head, including one branch to predict the classification, the other branch to predict two parts, one is the regression of box coordinates, the other is the IOU between GT box and anchor, which is also the main innovation of this paper, Fast RCNN and other methods are used to classify the anchor. If the IOU of GT box is higher than 0.7, it is a negative case. This paper directly predicts the IOU between GT box and anchor

The loss function consists of three parts: classified loss, including positive and negative cases, using focal loss as loss function; regression loss, using smooth L1 as loss function; IOU with 0 ~ 1 using binary cross entropy as loss function

In reasoning, the product of classification value and IOU value is used as the confidence degree of prediction box, that is, the basis of ranking. α is used to adjust the weight of the two