Big market! Selected papers on CVPR 2020 target detection


This article is learning notes

Participation: Wang bokings, Sophia

Recently concluded CVPR 2020 has made a great contribution to the promotion of object detection. In this article, we will introduce some particularly impressive papers.

1. A hierarchical graph network for 3D object detection on point clouds

Hgnet consists of three main components:

  • U-shaped network based on gconv
  • Proposal generator
  • Prore module – uses fully connected graphs to reason proposals


A shape focused gconv (SA gconv) is proposed to capture local shape features. This is done by modeling the relative geometric position to describe the shape of the object.


The U-shaped network based on sa-gconv can capture multi-level functions. Then they are mapped to the same feature space through voting module and used to generate suggestions. Next, the proposal reasoning module based on gconv uses the proposal to predict the bounding box.

Here are some performance results obtained on the sun rgb-d V1 dataset.


2. Hvnet: hybrid voxel network for lidar based 3D object detection

In this paper, we propose a hybrid voxel network (hvnet), which is a primary network for automatic driving of 3D object detection based on point cloud.


The voxel feature coding (VFe) method used in this paper consists of three steps

  • Voxelization – assigns a point cloud to a 2D voxel mesh
  • Voxel feature extraction computes the mesh related point by point feature, which is fed to the pointnet style feature encoder
  • Projection – aggregates point by point features to voxel level features and projects them to their original mesh. This forms a pseudo image feature map


Voxel size is very important in VFe. Smaller voxel sizes can capture finer geometric features. They are also better at object localization, but they take longer to reason. The use of thicker voxels results in faster reasoning speed because it results in smaller feature maps. However, its performance is poor.

A hybrid voxel network (hvnet) is proposed to realize the utilization of fine-grained voxel functions. It consists of three steps:

  • Multi scale voxelization – creates a set of feature voxel scales and assigns them to multiple voxels.
  • Hybrid voxel feature extraction calculates the voxel related features of each scale and inputs them into the feature of interest encoder (avfe). The features on each voxel scale are connected point by point.
  • Dynamic feature projection – projects features back to the pseudo image by creating a set of multi-scale item voxels.


This is the result obtained on the Kitti dataset.


3. Point-gnn: graph neural network for 3D object detection in a point cloud

In this paper, a graph neural network, point GNN, is proposed to detect objects from LIDAR point clouds. The network predicts the class and shape of the object to which each vertex belongs in the graph. Point-gnn has an automatic regression mechanism, which can detect multiple objects at once.

The proposed method consists of three parts

  • Graphics Construction: voxel down sampling point cloud for graphics construction
  • Graph neural network with t-iteration
  • Bounding box merging and scoring


The following results are obtained on the Kitti dataset:



4、Camouflaged Object Detection

This paper solves the challenge of detecting objects embedded in its surrounding environment – camouflage object detection (COD). The author also introduces a new dataset named cod10k. It contains 10000 images, covering many camouflage objects in natural scenes. It has 78 object classes. Images are annotated with category labels, bounding boxes, instance level and extinction level labels.



A cod framework called search identity network (sinet) is developed. The code is available here:

The network has two main modules:

  • Search module (SM), used to search camouflage objects
  • Recognition module (IM) for object detection


The following results are obtained on various datasets:


5、Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

In this paper, a short-term target detection network is proposed. The goal of the network is to detect invisible targets with some annotation examples.

Their methods include attention RPN, multiple relation detector and contrast training strategy. This method uses the similarity between the less beat support set and the query set to identify new objects, and reduces the error identification. The author also contributed a new dataset containing 1000 categories whose objects have high-quality annotations.

The network architecture consists of a weight sharing framework, which has multiple branches – one is a query set and the rest are used for support sets. The query branch of weight sharing framework is fast r-cnn network.


This paper introduces an attention RPN and detector with multiple relational modules to generate accurate resolution between potential frames in support and query.


Here are some of the results obtained on the Imagenet dataset.


Here are some observations from many datasets.


6、D2Det: Towards High-Quality Object Detection and Instance Segmentation

The author of this paper proposes d2det, a method for precise location and classification. They introduce dense local regression, which can predict multiple dense box offsets of object proposals. This allows them to achieve precise positioning.

The author also introduces the scheme of distinguishing ROI pools to achieve accurate classification. The merging scheme samples several sub regions of the proposal and performs adaptive weighting to obtain the distinguishing features.

The code is located at:

This method is based on the standard fast r-cnn framework. In this method, fast r-cnn’s traditional box shift regression is replaced by the proposed dense local regression. In this method, the discriminant ROI pool is used to enhance the classification.


In the two-stage method, the first stage uses the regional proposal network (RPN), and the second stage uses separate classification and regression branches. The classification branch is based on the discriminant pool. The goal of local regression branch is to locate the object accurately.


The following results were obtained on the MS coco dataset:


Editor: Notes by sophia|kings
Computer vision alliance report: official account CVLianMeng