Abstract:The classification and treatment of domestic waste is the focus of the whole society at present. How to classify and detect domestic waste concisely and efficiently is very important for the transportation and treatment of waste. The application of AI technology in garbage classification has become the focus of attention.
Nowadays, AI is the pronoun of intelligence in this era. There is AI in any field, and garbage sorting, supervision and other scenes can’t do without the empowerment of “Ai +”.
However, garbage often belongs to the extreme deformation of goods, and the situation is quite special. The current technology on the basis of visual, can do garbage classification alarm, such as judging whether the garbage is sorted. As for whether it can directly carry out visual detection and classification, and achieve a certain effect, it needs more data and experimental support to judge the feasibility of this matter. In view of these problems, we may be able to hear from the Haihua garbage sorting challenge how competitors use technology to change the world.
The data of Haihua garbage classification challenge includes single garbage dataset and multi garbage dataset. The single category garbage data set contains 80000 single category domestic garbage images, and there is only one garbage instance in each single category garbage image. The multi class garbage data set includes 4998 images, of which 2998 multi class garbage images are used as training set data. The list a and B each contain 1000 test images, and each multi class garbage image contains up to 20 garbage instances. We will introduce the two data sets separately.
1、 Multi category waste
Figure 1 Distribution of multiple types of garbage data
As shown in Figure 1, the multi category garbage covers 204 categories of garbage, but the data of these 204 categories is very uneven, and some categories are very few or even do not appear.
Figure 2 multi class garbage data visualization
The two images in Figure 2 are the two images in the training set. The garbage targets are mainly concentrated in the central area of the image, and the degree of overlap is high. In addition, we can see that some targets often appear in another image at different angles.
From the observation and statistics of Figure 1 and Figure 2, we can draw several conclusions
(1) Since an object often appears in multiple images, over fitting these targets is very effective, which is why AP can train above 90 in this competition. Therefore, you can consider backbones with more parameters, such as resnext101 x64 + DCN.
(2) The image is taken from the top, and horizontal and vertical flipping are very effective.
(3) Although the categories are very uneven, due to the repeated occurrence of targets, several targets are often trained, and the same target can be detected 100% again. Class imbalance mainly affects objects with little data, so only these objects need to be expanded, mainly including ink cartridge, snail, plum kernel, shellfish and so on.
(4) With high overlap, we can use mixup and other methods to artificially create some high overlap targets for training.
Table 1 data statistics
In addition to the macro statistics at the image level, we also made a detailed analysis of the targets in the dataset. Table 1 shows the statistics of target size and aspect ratio. First of all, according to Coco’s classification, the object whose length is greater than 96 belongs to large objects, and 75% of the objects are large objects, which means that the lifting method for small objects is basically invalid. Secondly, the aspect ratio rarely has the appearance of large scale objects, which gives us a lot of inspiration for parameter adjustment of anchor.
2、 Single category waste
Single category garbage mainly contains 80000 images with one target for each. As shown in the two pictures on the left, the targets of single category garbage are larger. There are two ways to use the single class, one is to expand the data with few categories, the other is to use the single class data set to get a better pre training model.
Figure 3 data comparison
When we expand the data, we find that compared with the multi category garbage, the goal of the same category is not exactly the same. The single category of crayfish is crayfish, the multi category of crayfish is actually targeted at milk boxes, and the diode is targeted at plastic pipes. This shows that it is not feasible to expand data with single class, because the data is not homologous. We tried this scheme, but the accuracy remained the same.
For the pre training model, due to the large target, we stitched the images according to 4 * 4, which reduced the amount of data, improved the number of targets of a single image, and achieved certain results. But when combined with other enhancement methods, it has no effect, so we give up this scheme.
3、 Model scheme:
Figure 4 baseline scheme
We choose cascade RCNN implemented by mmdetection for baseline and resnext101 x64 + DCN for backbone. Because the evaluation index ap50:95 of coco is used in this competition, cascade RCNN can achieve very good results by setting different thresholds for regression. In addition, larger backbones tend to achieve better results on this dataset.
2. Parameter adjustment
At the beginning of the competition, we selected 2500 training pieces and 498 local validation pieces from the data of the training set, and then adjusted the parameters on this basis. Because of the high overlap of targets, the threshold of softnms is 0.001, max_ per_ When img = 300 and flip test, the effect is better, which can be improved about 0.02 compared with not using these parameters. Limited by the video memory, the image region (0.8W, 0.8h) is randomly cut out from the image, then the short edge is randomly limited between , and the long edge is limited to 1800 for multi-scale training. During the test, the image is moderately enlarged, the short edge is set to 1200, and the accuracy can be trained to 88.3%, Combined with ohem, the accuracy of training is about 88.6%, and the 498 images of local validation are also input into the training, which can be improved by 0.5% to 89.2%.
For a small number of categories, we added labels to shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish shellfish.
As shown in Figure 5, for the adjustment of anchor, we adjust the proportion of anchor from [0.5,1.0,2.0] to [2.0 / 3.0,1.0,1.5]. In addition, in order to improve the detection ability of large objects, we adjust the level division of FPN from 56 to 70, which is equivalent to increasing the target assigned by each layer of FPN, and then we change the scale of anchor from 8 to 12 to detect these large objects.
Figure 5 anchor modification
As shown in Figure 6, after adjusting the parameters, we can find that the distribution of the number of targets in FPN is closer to the normal distribution. We think such a distribution will be helpful for detection. From the convolution number of several stages of RESNET, we can see that there are more stage parameters of RESNET corresponding to the middle layer of FPN, and more targets should be detected; there are fewer parameters corresponding to backbones on both sides of FPN, and the number of detected targets should not be too many.
Figure 6 number distribution of targets on FPN
In image enhancement, we added mixup online for 24 epochs training, which can be improved to 91.2% ~ 91.3%, but only 12 epochs did not improve. Mixup is relatively simple. The two images are fused at a ratio of 0.5, so there is no need to weight loss.
Figure 7 Effect of mixup
3. Model fusion
In the previous test process, we thought that the speed of 1080ti and 2080 should be similar. Each test on 1080ti took about 40 minutes, so we only selected about three models, which was a disadvantage. In the test of B-list, we found that 2080 was much faster than 1080ti. We only used a single model plus flip test for 25 minutes. If we used more models, we might get one Step by step to improve the score. We use cascade RCNN based on resnext101 X32 + GCB + DCN, cascade RCNN based on resnext101 x64 + DCN, and guided anchor cascade RCNN based on resnext101 x64 + DCN. For the methods used in fusion, the effect of different methods is almost the same. Our method is the method provided by the paper “weighted boxes fusion: assembling boxes for object detection models”, and the fusion threshold is set to 0.8
Figure 8 Effect of WBF
4. Parameter effect
Table 2 parameter setting
Fig. 9 A accuracy change
4、 Deployment and use of NAIE platform
1. Platform understanding
Personal understanding of NAIE platform is mainly composed of three parts: local debugging area, cloud storage area and cloud training area. If you know the functions of these three parts, you can get started quickly.
The local debugging area is based on vscode, which is associated with a server without GPU. It can operate like a normal Linux server on the command line for preliminary deployment debugging of the environment.
The cloud storage area mainly stores large data and pre training model. Large files such as pre training model cannot be directly transferred from the local debugging area to the model training area.
The model training area calls GPU to complete the model training, and copies the trained parameter model to the cloud for storage. Only the model saved to the cloud can be downloaded.
2. Model deployment
Here, the deployment of mmdetection is introduced as an example.
- 1) Code upload
The code can be uploaded by right clicking NAIE upload. There is a size limit when uploading the code, which can not exceed 100m. Therefore, it is recommended to delete the pre training model and some irrelevant files, and only keep the core code.
- 2) Environment deployment
Environment deployment needs to write a requirements.txt , which indicates the required Python library and version number.
- 3) Model operation
The platform does not support running sh files, so you need to write a py file, such as model.py , used inside os.system () follow the command line.
In addition, in model.py Also call the Moxing package to store the trained model in the cloud.
In the model training area, select model.py And the required GPU specifications for training.
- 4) Additional supplement
It is impossible to upload large files directly through NAIE upload, so you can write a program in the local debugging area, such as debug.py In the program, WGet is downloaded from the program and sent to the cloud through the Moxing packet. model.py The Moxing package is used to transfer it to the server.
Several contestants finally completed the competition and won awards. Although the ranking was not particularly good, they still accumulated a lot of experience through the competition. They said that their achievements are inseparable from the computing power support of Huawei’s NAIE training platform. Huawei’s NAIE training platform provides V100 and P100 graphics cards for training for free, which provides great help for our scientific research and participation in competitions. It is very convenient to modify the code and train, and we can answer or help solve the problems we encounter when we are familiar with the platform in the early stage. I hope that through this sharing, we can provide some reference and avoid the pit experience.
. Cai Z , Vasconcelos N . Cascade R-CNN: Delving into High Quality Object Detection[J]. 2017.
. Zhang H , Cisse M , Dauphin Y N , et al. mixup: Beyond Empirical Risk Minimization[J]. 2017.
. Solovyev R , Wang W . Weighted Boxes Fusion: ensembling boxes for object detection models[J]. arXiv, 2019.
. P. Wang, X. Sun, W. Diao, and K. Fu, “Fmssd: Feature-merged single-shot detection for multiscale objects in large-scale remote sensing imagery,” IEEE Transactions on Geoscience and Remote Sensing, 2019.
. Zhang S , Chi C , Yao Y , et al. Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection[J]. 2019.
. Pang J , Chen K , Shi J , et al. Libra R-CNN: Towards Balanced Learning for Object Detection[J]. 2019.
. Deng L , Yang M , Li T , et al. RFBNet: Deep Multimodal Networks with Residual Fusion Blocks for RGB-D Semantic Segmentation[J]. 2019.
. Ren S , He K , Girshick R , et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 39(6).
. Lin T Y , Dollár, Piotr, Girshick R , et al. Feature Pyramid Networks for Object Detection[J]. 2016.
. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y.Wei, “Deformable convolutional networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 764–773.
. X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9308–9316.
. Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “Ccnet: Criss-cross attention for semantic segmentation,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 603–612.
. Wang J , Chen K , Yang S , et al. Region Proposal by Guided Anchoring[J]. 2019.
Click follow to learn about Huawei’s new cloud technology for the first time~