# SAPd: fsaf upgrade, reasonable loss weighting and pyramid feature selection | ECCV 2020

Time：2021-7-15

Aiming at the optimization problem of anchor point detection algorithm, this paper proposes SAPd method, which uses different loss weights for anchor points in different locations, and carries out weighted common training for different feature pyramid layers, which eliminates most of the artificial rules and more follows the weights of the network itself for training

Source: Xiaofei’s algorithm Engineering Notes official account

Paper: soft anchor point object detection

# Introduction

Anchor free detection methods are divided into anchor point category and key point category. Compared with key point category, anchor point category has the following advantages: 1) simpler network structure, 2) faster training and reasoning speed, 3) better use of feature pyramid, 4) more flexible feature pyramid selection, but the accuracy of anchor point category is generally lower than that of key point category, Therefore, this paper focuses on the factors that hinder the accuracy of anchor point classification, and proposes SAPd (soft anchor point detector)

• Soft-weighted anchor points。 In the training of anchor point algorithm, the points satisfying the geometric relationship are usually set as positive sample points, and the weight of the loss value is all 1, which results in the higher confidence of the classification of the less accurate points. In fact, the regression difficulty of different points is different. The closer to the edge of the target, the lower the weight of the loss value, so that the network can focus on the learning of high-quality anchor points.
• Soft-selectedpyramid levels。 In each round of training, anchor point algorithm will select one layer of feature pyramid for training, and other layers are ignored, which causes waste to a certain extent. Although the response of other layers is not as strong as that of the selected layer, its feature distribution should be similar to that of the selected layer, so different weights can be given to multiple layers for training at the same time.

# Detection Formulation with Anchor Points

Firstly, this paper introduces the network structure and training method of anchor point target detection method.

### Network architecture

Each layer of the feature pyramid contains a detection head, and the feature pyramid layer is marked as $P_ L$, $l$are the number of layers, and the size of the layer’s characteristic graph is $1 / s of the input$W / times h $_ L$times, $s_ L = 2 ^ L$is stride. Generally, the range of $l$is 3 to 7, and the detection head includes classification subnet and regression subnet. The subnet starts with five $3 times 3$convolution layers, and then $k$classification confidence and four offset values are predicted for each position, and the offset values are the distance from the current position to the target boundary.

### Supervision targets

For the target $B = (C, x, y, W, H)$, the central region is $B_ V = (C, x, y, epsilon W, epsilon h)$, $- epsilon$is the scaling factor. When target $B$is given pyramid level $p_ L$and anchor point $p_{ Lij}$in $B_ When V is less than V, it is considered as P_{ Lij}$is the positive sample point, the classification target is $C$, and the regression target is the normalized distance $d = (d ^ L, d ^ t, d ^ R, d ^ b)$, which are the distances from the current position to the four boundaries of the target

$Z$is the normalization factor. For negative sample points, the classification target is the background ($C = 0$) and the location target is null, so there is no need to learn.

### Loss functions

Network output $p per point_{$k $dimension classification output of Lij}$, $- hat {C}_{ Lij}$and 4-dimensional position regression output $\ hat {D}_{ Lij}$, using focal loss and IOU loss respectively

The total network loss is the sum of positive and negative sample points divided by positive sample points

# Soft Anchor-Point Detector

The core of SAPd, as shown in Figure 3, are soft weighted anchor points and soft selected pyramid levels, which are used to adjust anchor point weights and use multi-layer feature pyramid for training.

### Soft-Weighted Anchor Points

• ##### False attention

Based on the traditional training strategy, it is observed that the positioning accuracy of some anchor point outputs is poor, but the classification confidence is very high, as shown in Figure 4a, which will cause that the most accurate prediction result of positioning is not retained after NMS. The possible reason is that the training strategy treats the center area $B equally_ Anchor point in V$. In fact, the closer the point is to the target boundary, the more difficult it is to return to the accurate target location. Therefore, the loss value of different anchor points should be weighted according to the location, so that the network can focus on the learning of high-quality anchor points, rather than forcing the network to learn those points which are difficult to return.

• ##### Our solution

In order to solve the above problems, this paper proposes the concept of soft weighting, which is the loss value of each anchor point $L_{ Lij}$adds a weight of $W_{ Lij}$, the weight is determined by the position of the point and the boundary of the target, and the negative sample points do not participate in the calculation of the position regression, so it is directly set to 1

$f$is the reflection point $P_{ Lij}$and target $B$boundary function, the paper set $f$as the centrerness function $f (P)_{ lij}, B)=[\frac{min(d^l_{ lij}, d^r_{ lij})min(d^t_{ lij}, d^b_{ lij})}{max(d^l_{ lij}, d^r_{ lij})max(d^t_{ lij}, d^b_{ lij})}]^{\eta}$

The specific effect can be seen in Figure 3. After soft weighted, the weight of anchor point becomes a mountain.

### Soft-Selected Pyramid Levels

• ##### Feature selection

The anchor free method generally selects one layer of the feature pyramid for training in each round, and the effect of selecting different layers is completely different. However, through visualization, it is found that the active regions of different layers are actually similar, as shown in Figure 5, which means that the characteristics of different layers can be predicted cooperatively. Based on the above findings, this paper considers that there are two criteria for selecting the appropriate pyramid layer

• The selection should be based on eigenvalues, not artificial rules.
• Multi layer features are allowed to train each target, and each layer needs to make a significant contribution to the prediction results.
• ##### Our solution

In order to meet the above two criteria, this paper proposes to use the feature selection network to predict the weight of each layer for the target. The overall process is shown in Figure 6. Using roialin to extract the features of the corresponding region of each layer, and then input them into the feature selection network, and then output the weight vector. The effect can be seen in Figure 3. The weight peaks of each layer of the pyramid are similar in shape, but different in height. It should be noted that the feature selection network is only used in the training phase.

The structure of feature selection network is very simple, as shown in Table 1. It is trained with the detector. GT is one hot vector, and the value is specified according to the minimum loss method of fsaf. For details, please refer to the previous article on fsaf. So far, the target $B$passes the weight $W ^ B_ L$is associated with each layer of the pyramid. Combined with the soft weighting, the weight of anchor point is:

The loss of the complete model is the weighted anchor point loss plus the loss of the feature selection network

# Experiment

The comparative experiment of each module.

Compared with SOTA algorithm.

# Conclusion

Aiming at the optimization problem of anchor point detection algorithm, this paper proposes SAPd method, which uses different loss weights for anchor points in different locations, and carries out weighted common training for different feature pyramid layers, so as to remove most of the artificial rules and train more according to the weights of the network itself.