Extremenet detects the four poles of the target, and then combines them in a geometric way to detect the target. The performance of extremenet is equivalent to other traditional detection algorithms. Extreme net detection method is very unique, but it contains more post-processing methods, so there is a lot of room for improvement. If you are interested, you can go to the error analysis part of the paper experiment

Source: Xiaofei’s algorithm Engineering Notes official account

**Bottom up object detection by grouping extreme and center points**

**Address:https://arxiv.org/abs/1901.08043****Paper code:https://github.com/xingyizhou/ExtremeNet**

## Introduction

In target detection, the commonly used method defines the target as a rectangular box, which usually brings a lot of background information that hinders detection. Therefore, extreme net is proposed in this paper to locate the target by detecting the four poles of the target, as shown in Figure 1. The whole algorithm is improved based on the idea of cornernet. Five heat maps are used to predict the four poles and the central region of the target respectively. The poles of different heat maps are combined, and whether the combination meets the requirements is judged by the value of the combined geometric center on the heat map of the central point. In addition, the extremenet detection pole can cooperate with Dexter network to predict the target segmentation information.

## ExtremeNet for Object detection

Extreme net uses hourglassnet to detect the key points of class knowability. It follows the training steps, loss function and offset value prediction of cornernet. The prediction of offset value is class agnostic, and the center point does not contain offset value. The backbone network outputs a total of $5 times C $heat map, $4 times 2 $offset value characteristic map and $C $is the number of categories. The overall structure and output are shown in Figure 3. When the poles are extracted, they are combined according to the geometric relationship.

### Center Grouping

Since the poles are located in different directions of the target, the combination will be very complex. This paper thinks that using embedding vector to combine like cornernet will lack global information, so center grouping is proposed to combine the poles.

The process of center grouping is shown in algorithm 1. Firstly, the peak points on the heat map of four poles should be obtained, and the peak points should meet two requirements: 1) its value should be greater than the threshold value $\ tau_ P $2) is the local maximum, and the peak value should be greater than the surrounding eight points. The process of obtaining the peak value is called extrectpeak. After getting the peak points on each heat map, traverse the combination of each peak point, and calculate the geometric center point of the combination of peak points ($t $, $B $, $R $, $l $) satisfying the geometric relationship_ x+t_ x}{2}, \frac{t_ y+b_ y} {2}) $, if the value of the geometric center point satisfies the condition of $\ hat {y} ^ {(c)}_{ c_ x, c_ y} \ge \tau_ C $, it is considered that the peak combination meets the requirements.

### Ghost box suppression

When three targets of the same size are equidistant, center grouping may have high confidence. At this time, the middle target may have two cases, one is the correct prediction, the other is the wrong combination with the next object output, the paper calls the prediction box of the second case ghost box. In order to solve this problem, the soft NMS post-processing method is added. If the sum of the confidence of a prediction frame is more than three times, the confidence is divided by two, and then the NMS operation is performed.

### Edge aggregation

Sometimes the poles are not unique. If the target has a horizontal or vertical boundary, all the points on the edge are poles, and the prediction value of the network for the points on this boundary will be small, which may lead to the missing detection of poles.

In this paper, edge aggregation is used to solve the problem. For the local maximum points of the left and right heat maps, fractional aggregation is performed in the vertical direction, while the local maximum points of the upper and lower heat maps are fractional aggregated in the horizontal direction. The monotone decreasing fractions in the corresponding direction are aggregated until they meet the local minimum in the aggregation direction. Suppose that $M $is the local maximum point, $n ^ {(m)}_ i=\hat{Y}_{ m_ x+i, m_ y} $is the point in the horizontal direction and defines $I_ 0 < 0 $and $0 < I_ 1 $is the nearest local minimum on both sides, i.e. $n ^ {(m)}_{ i_ 0-1} > N^{(m)}_{ i_ 0} $and $n ^ {(m)}_{ i_ 1} < N^{(m)}_{ i_ 1 + 1} $, then the peak value of edge aggregation is updated to $\ tilde {y}_ m=\hat{Y}_ m+\lambda_{ aggr}{\sum}^{i_ 1}_{ i=i_ 0}N^{(m)}_ I $, where $\ lambda_{ Aggr} $is the aggregation weight, set to 0.1, and the overall effect is shown in Figure 4.

### Extreme Instance Segmentation

Poles contain more target information than bbox, after all, they have twice as much annotation information (8 vs 4). Based on the four poles and bbox, this paper proposes a simple method to obtain the mask information of the target. Firstly, the line with 1 / 4 bbox boundary length is extended with the pole as the center. If the line exceeds bbox, it will be truncated. Then the four lines are connected from the beginning to the end to get an octagon, as shown in Figure 1. Finally, the deep extreme cut (dextr) method is used to further obtain the mask information. The dextr network can transform the pole information into the segmentation information. Here, the octagonal screenshot is directly input into the pre trained dextr network.

## Experiments

In addition, the paper analyzes the error of extreme net and replaces the output of each module with GT, which can reach 86.0ap.

Compared with other SOTA methods.

Instance segmentation effect.

## Conclusion

Extremenet detects the four poles of the target, and then combines them in a geometric way to detect the target. The performance of extremenet is equivalent to other traditional detection algorithms. Extreme net detection method is very unique, but it contains more post-processing methods, so there is a lot of room for improvement. If you are interested, you can go to the error analysis part of the paper experiment.

If this article is helpful to you, please like it or read it

More content, please pay attention to WeChat official account.