Open download! From RCNN to SSD, this should be the most complete inventory of target detection algorithms


Guidance:From simple image classification to 3D pose recognition, computer vision never lacks interesting problems and challenges. With the naked eye, we can detect the cat and dog in a pet photo, and recognize the stars and the moon in Van Gogh’s starry night. How to give the machine the intelligence of “seeing” through the algorithm is what we will talk about next.

This paper first introduces the concept of target detection, then introduces a simplified target detection problem – location + classification and its existing problems. Finally, it gradually enters into the common models and methods of target detection, such as fast r-cnn, SSD, etc. This process will involve a lot of detailed concepts and knowledge points. Please download the following ebook for detailed technical explanation.

Stamp here to download the ebook now

Open download! From RCNN to SSD, this should be the most complete inventory of target detection algorithms

A wonderful collection of dry goods in the book

1. Common models and methods of target detection


Scholars have done a lot of research in this field, and the more famous one is the selective search method. The specific method is not described in detail here. Interested readers can see the paper on selective search. You just need to know that this is a way to select regions of interest (ROI) from images. With the method of getting ROI, the final target detection results can be obtained through classification and merging. Based on this idea, we have the following r-cnn method.

  • Select potential target candidate box (ROI)
  • Training a good feature extractor
  • Training the final classifier
  • A regression model is trained for each class to fine tune the deviation of ROI from the position and size of the real rectangular box

1.2Fast R-CNN

For the three main problems of r-cnn, let’s think about whether there is a better solution. The first is speed. The CNN feature extraction of 2000 ROIs takes a lot of time. Can we use a better method, such as sharing convolution layer to process all 2000 ROIs at the same time? The second is that CNN features will not be updated due to the adjustment of SVM and regression. A kind of
R-cnn’s operation process is relatively complex. Can there be a better way to make the training process end-to-end? Next, we will introduce fast r-cnn [2] proposed by firshick et al. In 2015, it cleverly solves several main problems of r-cnn.

1.3 Faster R-CNN

Faster r-cnn [3] as a classical method of target detection appears frequently in many actual events and competitions. In fact, fast r-cnn is to build a small network based on fast r-cnn, and directly generate region proposal to replace other methods (such as selective search) to get ROI. This small network is called the region prediction network (RPN). RPN is the key in fast r-cnn’s training process, and other processes are basically the same as fast r-cnn.

Next, let’s take a look at the training process of fast r-cnn:

  • Use the pre trained model of Imagenet to train an RPN network.
  • Use the pre trained model of Imagenet and the recommended area generated in step (1) to train fast r-cnn network, and get the actual category of objects and the position of the fine-tuning rectangular box.
  • Use the network in (2) to initialize RPN, fix the front volume accumulation layer, and only adjust the parameters of RPN layer.
  • Fix the front convolution layer, only train and adjust the FC layer of fast r-cnn.


In r-cnn’s series of algorithms, we need to obtain a large number of proposals first, but there is a large overlap between them, which will bring a lot of repetitive work. Yolo [5] changed the prediction idea based on proposal, divided the input picture into s * s small grids, made prediction in each small grid, and finally merged the results.

Next let’s take a look at the key steps of Yolo learning:
Yolo has requirements for the size of the network input picture. First, the picture needs to be zoomed to the specified size (448448), and then the picture is divided into small cells of SS.  
These predictions are made in each cell: whether the cell contains an object, the position of the rectangular box containing the object, and the fraction of C categories corresponding to the cell.

1.5 SSD

SSD [4] uses the idea of Yolo grid and the anchor mechanism of fast r-cnn for reference, so that SSD can predict quickly and get the location of the target relatively accurately. Next, some features of SSD are introduced:

  • Multi scale feature layer is used for detection. In the RPN of fast RCNN, the anchor is generated on the last feature layer of the backbone network. In SSD, the anchor is not only generated on the last feature layer, but also generated at several high-level feature layers.
  • All the anchor generated by the feature layer in SSD will be filtered by positive and negative samples, and then the classification score and bbox location will be learned directly.

2. Industrial application practice of target detection

We have explained the application of target detection technology and how to combine technology with industry to give full play to the greatest value, which is also our most concerned.

Under the situation of economic stability, domestic manufacturing enterprises are speeding up the pace of transformation and upgrading. As a technology company with feelings and sense of mission, we hope to help traditional enterprises achieve transformation and upgrading through technical means.

In the photovoltaic industry, the quality inspection sector has long faced problems such as high professionalism, difficult recruitment, and insufficient manpower. Germany, which has a high level of industrial automation, has introduced the component El quality inspection technology, but only for typical defects, only auxiliary labor (can not replace labor). In China, photovoltaic enterprises have tried in the field of intelligent AI identification technology for nearly 10 years, but the automatic quality inspection of polycrystalline batteries and modules is far from the industrial production level.

This paper will focus on the El quality inspection function of monocrystal and polycrystal components introduced by Ali. At present, it has been running in the production line and the accuracy is stable above 95%. AI detection has a very obvious advantage in the field of “cost reduction and efficiency improvement” of industrial vision. Alibaba cloud will cooperate with more enterprises in the future to write a new chapter of intelligent manufacturing.

Author: xinxuerui

Read the original text

This is the original content of yunqi community, which can not be reproduced without permission.