Pp-ylo surpasses the progress of ylov4- target detection


By Jacob solawetz
Compile Flin
Source: towardsdatascience

Pp-yolo evaluation index shows better performance than the most advanced object detection model yolov4. However, the author of Baidu wrote:

This paper does not intend to introduce a new target detector. It’s more like a recipe that tells you how to build a better detector step by step.

Let’s have a look.

Development history of Yolo

Yolo was originally written by Joseph Redmon to detect targets. Target detection is a computer vision technology. It locates and marks objects by drawing a bounding box around the target, and determines the class label to which a given box belongs. Unlike large NLP transformers, Yolo is designed to be small and provide real-time reasoning speed for deployment on devices.

Yolo-9000 is the second “yolov2” target detector published by Joseph Redmon. It improves the detector and emphasizes the ability of the detector to be extended to any object in the world.

Yolov3 further improved the detection network and began to mainstream the target detection process. We started publishing tutorials on how to train yolov3 in pytorch and how to train yolov3 in keras, and compared the performance of yolov3 with efficient det (another state-of-the-art detector).

Then Joseph Redman withdrew from the target detection game for ethical reasons.

Of course, the open source community took over the baton and continued to promote the development of Yolo technology.

Yolov4 was recently published by Alexey AB in his Yolo Darknet repository. Yolov4 is mainly a collection of other known computer vision technologies, which have been combined and verified through the research process. Please see here to learn more about yolov4.

The reading of yolov4 paper is similar to that of pp-yolo paper, as shown below. We provide some good training tutorials on how to train yolov4 in Darknet.

Then, just a few months ago, yolov5 was released. Yolov5 adopts Darknet (C-based) training environment and converts the network to pytoch. The improved training technology further improves the performance of the model, and creates a very easy-to-use object detection model out of the box. Since then, we have been encouraging developers using roboflow to turn their attention to yolov5 through this yolov5 training tutorial to form their custom object detectors.

Enter pp-yo.

What does PP stand for?

PP is the abbreviation of PaddlePaddle, a deep learning framework written by Baidu.

If you’re not familiar with paddle, we’re in the same boat. Paddle was originally written in Python and looks similar to pytorch and tensorflow. It’s interesting to delve into the padding framework, but it’s beyond the scope of this article.

Pp-yolo contribution

Pp-yolo’s paper reads much like yolov4’s paper because it is a collection of known technologies in computer vision. The novel contribution is to demonstrate that the integration of these technologies can improve performance and provide ablation research to study the help of each step to the model.

Before we delve into the contribution of pp-olo, let’s review the architecture of the OLO detector.

Anatomical Yolo detector

The Yolo detector is divided into three main parts.

YOLO Backbone: Yolo backbone is a convolutional neural network that combines image pixels to form features with different granularity. The backbone is usually pre trained on a classified dataset (usually Imagenet).

YOLO Neck: Yolo neck (FPN selected above) combines and blends the convnet layer representation before passing it to the prediction header.

YOLO Head: This is the part of the network that performs bounding box and class prediction. It is guided by three Yolo loss functions for classes, boxes, and objects.

Now let’s take a closer look at the contribution of PP Yolo.

Replace backbone network

The first PP Yolo technology is to replace the yolov3 darknet53 backbone with resnet50 VD DCN convnet backbone. RESNET is a more popular backbone, its execution optimizes more frameworks, and its parameters are less than darknet53. By exchanging this backbone, we can see the improvement of map, which is a great victory for PP Yolo.

EMA of model parameters

Pp-yolo tracks the exponential moving average of network parameters to maintain the shadow prediction time of model weight. This has been proved to improve the accuracy of reasoning.

Larger batch

Pp-yolo increased the batch size from 64 to 192. Of course, if there is a GPU memory limit, this is difficult to achieve.

Dropblock regularization

Pp-yolo implements dropblock regularization at the neck of FPN (in the past, this usually occurred in the backbone). In a given step of the network, dropblock will randomly delete some training features to indicate that the model does not rely on key features for detection.

IOU loss

The Yolo loss function cannot be well converted to the map index, which uses the intersection on the Union in a large number of calculations. Therefore, it is useful to edit the training loss function considering the final prediction. This editor also appears in yolov4.

IoU Aware

Pp-yolo network adds a prediction branch to predict the IOU estimated by the model of a given object. Including this IOU aware when deciding whether to predict objects improves performance.

Grid sensitivity

The old Yolo model can not predict well near the boundary of the anchor box area. To avoid this problem, you can define the frame coordinates slightly differently. This technology is also available in yolov4.

Matrix non maximum suppression

Non maximum suppression is a technique for deleting candidate proposals for classification. Matrix non maximum suppression is a technique to sort these candidate predictions in parallel, which speeds up the calculation speed.


Coordconv is inspired by a problem of ConvNets, that is, ConvNets only maps (x, y) coordinates to a hot pixel space. The coordconv solution enables the convolutional network to access its own input coordinates. Coordconv interventions are marked with yellow diamonds above. For more details, see the cordconv file.


Spatial pyramid pooling is an additional block behind the backbone layer for mixing and merging spatial features. It is also implemented in yolov4 and yolov5.

Better pre training backbone

The authors of PP Yolo refined a larger RESNET model as the backbone. A better pre training model can also improve downstream transfer learning.

Is pp-yolo the most advanced?

Pp-yolo outperformed the results of yolov4 released on April 23, 2020.

To be fair, the author points out that this may be a wrong problem. The author’s intention seems not only to “introduce a novel detector”, but to show the process of carefully adjusting the object detector to maximize performance. The introduction of this article is quoted here:

This paper focuses on how to stack some effective techniques that hardly affect efficiency for better performance… This paper does not intend to introduce a novel target detector. It’s more like a recipe. It tells you how to build better detectors step by step. We found some effective techniques for yolov3 detector, which can save developers’ trial and error time.The final pp-yolo model increased the map of coco from 43.5% to 45.2% faster than yolov4

The pp-yolo contribution reference above raised the yolov3 model from 38.9 map to 44.6 map in the coco object detection task, and increased the reasoning FPS from 58 to 73. These indicators are shown in the paper, which outperforms the current published results of yolov4 and efficientdet.

When benchmarking pp-yolo against yolov5, yolov5 still seems to have the fastest reasoning accuracy (AP and FPS) on V100. However, the yolov5 paper remains to be published. In addition, the research shows that the performance of training the ylov4 architecture on the yov5 ultralytics repository is better than that of yov5, and in a portable manner, the performance of yov4 trained with yov5 contributions will be better than the pp-yolo results published here. These results have yet to be officially released, but can be traced back to discussions on GitHub.

It is worth noting that many of the technologies used in yolov4, such as schema search and data expansion, are not used in ppyolo. This means that with the combination and integration of more of these technologies, the latest technology of target detection still has room for development.

There is no doubt that this is an exciting moment for the implementation of computer vision technology.

Should I switch from yolov4 or yolov5 to pp-yolo?

Pp-yolo model shows the prospect of the most advanced target detection, but compared with other object detectors, the improvement is gradual. It is written in a new framework. At this stage, the best practice is to develop your own experience results by training pp-yolo on your own data set.

At the same time, I suggest you check the following Yolo tutorials to make your object detector a reality:

Original link:https://towardsdatascience.com/pp-yolo-surpasses-yolov4-object-detection-advances-1efc2692aa62

Welcome to panchuang AI blog:

Official Chinese document of sklearn machine learning:

Welcome to panchuang blog resources summary station: