In this method, the training of target detection is defined as the maximum likelihood estimation (MLE) process, and the end-to-end learning of target classification, target detection and matching relationship is learned end-to-end. From the experimental results, the effect is very significant

Source: Xiaofei’s algorithm Engineering Notes official account

**Thesis: freeanchor: learning to match anchors for visual

Object Detection**

**Thesis address: https://arxiv.org/abs/1909.02466v1****Paper code: https://github.com/zhangxiaosong18/FreeAnchor**

# Introduction

The conventional target detection network is based on IOU to match anchor and GT, but it will face the following problems:

- For objects with non central features, such as slender objects, spatial alignment can not guarantee that anchor covers enough object features, resulting in the degradation of classification and detection performance.
- When the detection targets are dense, it is not feasible to use IOU as the matching criterion.

All of the above problems come from the pre-set anchor and gt matching, without considering the output of the network. Therefore, a learning based matching method is proposed in this paper. The matching process is defined as the process of maximizing likelihood estimation. The end-to-end learning of target classification, target detection and matching relationship has achieved good results

- The training process of detection algorithm is defined as the process of maximum likelihood estimation, and the manually set anchor and gt matching is changed to free anchor matching, breaking the constraint of IOU, allowing GT to select anchors according to the maximum likelihood criterion.
- The maximum likelihood can promote the network learning how to match the optimal anchor and ensure the compatibility with NMS algorithm.

# The Proposed Approach

In order to learn the matching relationship between anchor and GT, the training of target detection algorithm is transformed into the process of maximum likelihood estimation, and the classification and detection are optimized from the perspective of maximum likelihood, and then the detection custom likelihood is defined to optimize the matching relationship by ensuring the recall rate and accuracy rate. In the training phase, the detection custom likelihood is converted to the detection customization loss, which can effectively end-to-end at the same time Learning target classification, target detection and matching relationship.

### Detector Training as Maximum Likelihood Estimation

The loss function of conventional one stage detection algorithm is as follows: Formula 1, $/ mathcal {l} (- theta)_ {ij}^{cls}=BCE(a_ j^{cls},b_ i^{cls}, \theta)$，$\mathcal{L}(\theta)_ {ij}^{loc}=SmoothL1(a_ j^{loc},b_ i^{loc}, \theta)$，$\mathcal{L}(\theta)_ {ij}^{bg}=BCE(a_ J ^ {CLS}, VEC {0}, theta) $, $/ theta $are the parameters learned by the network, and $C_ {I, J} $refers to anchor $a_ Does J $match GT $B_ I $, only if the IOU of the two is greater than the threshold, it is 1. When the anchor meets multiple GT, the GT with the largest IOU is selected_ {+}=\{a_ j | \sum_ i C_ {ij}=1\} \in A$，$A_ {-}=\{a_ j | \sum_ i C_ {ij}=0\} \in A$。

from the perspective of maximum likelihood estimation (MLE), the loss function $\ mathcal {l} (- theta) $is converted into the likelihood probability of formula 2, $/ mathcal {P} (theta)_ {ij} ^ {CLS} $and $/ mathcal {P} (- theta)_ {ij} ^ {BG} $is the classification confidence level, $/ mathcal {P} (\ theta)_ The {ij} ^ {LOC} $is the fixed position reliability, and the minimum of $/ mathcal {l} (\ theta) $is the maximum likelihood probability.

although formula 2 strictly optimizes anchor classification and location from the perspective of maximum likelihood estimation, it ignores how to learn the matching matrix $C_ The current detection algorithm solves this problem by matching IOU index, and does not consider optimizing the matching relationship between GT and anchor.

### Detection Customized Likelihood

in order to optimize the matching rules between GT and anchor, this paper adds detection customized likelihood to CNN target detection framework, which combines accuracy and recall rate, and maintains the adaptation to NMS.

first construct each GT $B_ The IOU of I $is the higher anchor as the candidate set $a_ Then learn how to achieve the best match while maximizing the detection custom likelihood.

In order to optimize the recall rate, each GT must have at least one corresponding anchor, such as formula 3. The candidate set of each GT is selected to classify and detect the best anchor.

In order to optimize the accuracy, the detector needs to classify the anchor with poor location into background class. The objective function is as follows: Formula 4, which means that the top anchor is not the background as far as possible. $P\{a_ j \in A_ {-}\}=1-max_ i P\{a_ j \to b_ I \} $is $a_ The probability that j $does not match all GT, $p \ {a_ j \to b_ I \} $is anchor $a_ J $correct forecast GT $B_ I $. For NMS compatibility, $p \ {a_ j \to b_ I \} $must satisfy the following attributes:

- $P\{a_ j \to b_ I \} $is an IOU related monotonic increasing function
- When anchor and GT are less than the threshold value, $p \ {a_ j \to b_ I \} $is close to 0
- For each GT, there is only one anchor satisfying $p \ {a_ j \to b_ i \}=1$

$P\{a_ j \to b_ The attribute of I \} $can be summarized as the saturated linear function, that is, the $p \ {a_ j \to b_ i \}=Saturated linear(IoU_ {ij}^{loc}, t, max_ j(IoU_ {ij}^{loc}))$。

According to the above definition, the test custom likelihood is defined as formula 5, which combines recall rate and accuracy rate, and is compatible with NMS. By optimizing the likelihood, the recall rate and accuracy rate can be maximized to match GT and anchor freely.

### Anchor Matching Mechanism

In order to learn the matching relationship effectively, the detection customization likelihood of formula 5 is transformed into the detection customization loss function, such as formula 5, $Max $function is used to select the most suitable anchor for each GT. During the training period, from the candidate set $a_ Select an anchor from I $to update the network parameter $/ theta $.

At the beginning of training, due to random initialization, the confidence level of each anchor is very small, which can not represent the quality of anchor. Therefore, mean Max function is used to select anchor.

When the training is not enough, the mean Max function can approach the mean function, that is, almost all anchors can be used for training. With the more training, the mean Max function is close to the max function, which is equivalent to the max function, that is, the best anchor is selected for training.

Replace Max function of formula 6 with mean Max function, add focal loss to the second term, and perform $W for the two terms respectively_ 1 $and $W_ 2 $weighted, the final detection custom loss function is as formula 7, $X_ i=\{\mathcal{P}(\theta)_ {ij}^{cls} \mathcal{P}(\theta)_ {ij}^{loc} | a_ j \in A_ I \} $is the candidate set $a_ The likelihood set of I $, $W_ 1=\frac{\alpha}{||B||}$，$w_ 2=\frac{1-\alpha}{n||B||}$，$FL\_ (p)=-p^{\gamma}log(1-p)$。

Combined with the detection of customized loss function, the training process of detector is as follows: algorithm 1.

# Experiments

The freeanchor implementation of the experiment is based on retinanet, which simply changes the loss function to the detection custom loss function proposed in this paper.

### Learning-to-match

### Compatibility with NMS

### Parameter Setting

The experimental results are as follows

- Anchor bag size $n $, compared with $\ {40, 50, 60, 100 \} $, 50 is the best.
- Background IoU threshold $t$，$P\{a_ j \to b_ The confidence level of I \} $, compared with $\ {0.5, 0.6, 0.7 \} $, 0.6 is the best.
- Focal loss parameter was used to compare the best results of the combination of $/ alpha / in \ {0.25,0.5,0.75 \} $and $\ gamma in \ {1.5,2.0,2.5 \} $, $/ alpha = 0.5 $and $\ gamma = 2,0 $.
- The weight of Formula 1 is used to balance the loss of classification and location, and 0.75 is the best.

### Detection Performance

# CONCLUSION

In this method, the training of target detection is defined as the maximum likelihood estimation (MLE) process, and the end-to-end learning of target classification, target detection and matching relationship is learned end-to-end. From the experimental results, the effect is very significant.

If this article is helpful to you, please give me a like or read it

More content, please pay attention to WeChat official account.