SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

Time:2021-9-17

This paper proposes that Pconv can effectively integrate the internal relationship between scales by 3D convolution of feature pyramid and regularization with specific Ibn. In addition, this paper proposes SEPC, which uses deformable convolution to adapt to the irregularity of the correspondence between actual features and maintain scale equilibrium. Pconv and SEPC significantly improve the detection algorithm of SOTA, and do not bring too much additional computation

Source: Xiaofei’s algorithm Engineering Notes official account

Paper: scale equalizing pyramid revolution for object detection

SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

Introduction


SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

  feature pyramid is an important means to solve the problem of object scale, but there is a large semantic gap between feature maps of different levels. In order to eliminate these semantic gaps, many studies focus on how to strengthen feature fusion, but most of these studies directly scale and add the feature map, and do not take into account the internal attributes of the feature pyramid. Inspired by the scale space theory (multi-scale feature extraction), this paper proposes Pconv (pyramid revolution), which uses 3-D convolution to associate similar feature maps and mine the interaction between scales. Considering that the inter layer features of the feature pyramid change greatly and the correspondence of each point between layers is irregular, this paper proposes that SEPC (scale equalizing pyramid revolution) can deform the high-level features of the feature pyramid, which can adapt to the actual scale change and maintain the inter layer scale balance.
  the main contributions of the paper are as follows:

  • A lightweight pyramid convolution Pconv is proposed to mine the correlation of internal scales by 3-D convolution of feature pyramids.
  • A scale balanced pyramid convolution SEPC is proposed to reduce the difference between feature pyramid and Gaussian pyramid (this paper proves that Pconv has scale invariance on Gaussian pyramid).
  • This module can improve the performance of SOTA single stage target detection algorithm, and hardly affect the reasoning speed.

Pyramid convolution


SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

Pconv (pyramid convolution) is actually a 3-D convolution, spanning scale and spatial dimensions. As shown in FIG. 4A, Pconv can be expressed as n different 2-D convolutions.

SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

However, the size of characteristic graphs of different pyramid levels is different. In order to accommodate different sizes, Pconv uses different Stripes when processing different characteristic graphs. The paper samples $n = 3 $, the stripe of the first convolution kernel is 2, and the stripe of the smallest convolution kernel is 0.5.

SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

Pconv can be expressed as Formula 1, $W_ 1$、$w_ 0 $and $W_ {- 1} $is three independent 2-D convolution kernels, and $x $is the input characteristic graph$*_ {S2} $represents the convolution kernel with stripe 2.

SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

The convolution kernel with stripe of 0.5 samples the characteristic graph bilinear up twice, and then processes it with the convolution kernel with stripe of 1. Pconv also uses zero padding. For the underlying and top-level pyramid levels, only two items of formula 2 need to be used. The calculation amount of Pconv is about 1.5 times that of the original FPN.

Pipeline

SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

As shown in Figure 5a, retinanet can be regarded as a Pconv of $n = 1 $. Replace four conv heads with Pconv heads of $n = 3 $. The stacked Pconv can effectively gradually improve the correlation without too much additional calculation. However, in order to reduce the amount of calculation as much as possible, you can choose to share four layers of Pconv for classification and positioning branches, and then add an additional layer of ordinary convolution layer respectively, as shown in Figure 5B. The amount of calculation in this design is even less than that of the original retinanet. See Appendix 1 for specific calculation.

Integrated batch normalization (BN) in the head

Pconv uses the shared BN layer to count all feature graphs in feature pyramid instead of single graph statistics. Since the statistics come from all the characteristic graphs in the pyramid, the variance will become smaller. In this way, even if a small batch size is used, the BN layer can be well trained (the variance is stable).

Scale-equalizing pyramid convolution


SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

  Pconv uses a fixed convolution kernel size for different levels. On the Gaussian pyramid (the degree of ambiguity is not serious and the Gaussian kernel is close to the scaling scale of the feature map), Pconv can extract features with constant scale. See the original Appendix 3 for specific proof.
  however, in practice, due to the existence of multi-layer convolution and nonlinear operation, the fuzziness of the feature pyramid is much more serious than that of the Gaussian pyramid (the scaling degree of the feature may be out of proportion to the size of the feature image). It is difficult to extract scale invariant features using a fixed convolution kernel size. Therefore, this paper proposes SEPC (scale equalizing pyramid revolution), which uses deformable convolution for the high-level features except the bottom layer and predicts an offset separately, which can adapt to the fuzziness of each layer, maintain the scale balance between feature maps, and extract the features with unchanged scale.
  SEPC has the following advantages:

  • The adaptability of deformable convolution can deal with the large interlayer ambiguity of feature pyramid.
  • Eliminate the difference between feature pyramid and Gaussian pyramid (the paper proves that Pconv can extract features with unchanged features from Gaussian pyramid).
  • Since the convolution computation of high-level features is reduced by 4 times compared with that of low-level features (area reduction), adding deformable convolution to high-level features only brings a small amount of additional computation.

SEPC is divided into two versions. SEPC full adds SEPC to combined head and extra head in Figure 5b, while SEPC Lite only adds SEPC to extra head.

Experiments


Single-stage object detectors

SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

Effect of each component

SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

Comparison of different BN implementations in the head

SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

The output of BN layer $y = \ gamma \ frac {X – \ Mu} {\ sigma} + \ beta $, $\ gamma $and $\ beta $are parameters, and $\ Mu $and $\ sigma $are statistical results. The comparison of three BNS in Figure 7, in which integrated BN (Ibn) is the shared BN proposed in the paper, and all parameters and statistics are shared

Comparison with other feature fusion modules

SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

Comparison with state-of-the-art object detectors

SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

Extension to two-stage object detectors

SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020

CONCLUSION


This paper proposes that Pconv can effectively integrate the internal relationship between scales by 3D convolution of feature pyramid and regularization with specific Ibn. In addition, this paper proposes SEPC, which uses deformable convolution to adapt to the irregularity of the correspondence between actual features and maintain scale equilibrium. Pconv and SEPC significantly improve the detection algorithm of SOTA, and do not bring too much additional computation.



If this article is helpful to you, please like it or read it
More content, please pay attention to WeChat official account.

SEPC: using 3D convolution to extract scale invariant features from FPN, the rising point artifact | CVPR 2020