Recently, the results of the 2019 Google object challenge hosted by Google were officially announced. The first polar chain technology AI team won a gold medal with only two months left.
Last year, after Ms coco stopped organizing the competition of object frame recognition, Google released its first competition of object recognition, involving more than 400 researchers and machine learning researchers. This year, as one of the workshops of iccv, Google launched the second object competition based on the open image V5 data set, and the test set is exactly the same as the first.
For two consecutive years, Google open images – object detection track took over the coco object recognition competition. As the “golden standard” in the field of computer vision, it attracted a large number of teams to participate in order to obtain the lowest error rate in the open images dataset. This year, it also attracted more than 560 teams. At the same time, the breakthrough of deep learning technology makes great progress in image recognition task, even more than the accuracy of human beings.
As deep network solutions become deeper and more complex, they are often limited by the amount of training data available. With this in mind, Google has publicly released the open images dataset to stimulate the progress of image analysis and understanding. Open images follows the tradition of Pascal VOC, Imagenet and coco, and has reached an unprecedented scale.
As a hot branch of CV (computer vision) field, object recognition has a wide range of application scenarios, from the very mature vehicle license plate recognition, pedestrian detection, to the emerging unmanned driving in recent years. With the increasing demand, people’s requirements for recognition accuracy are also rising. In the open images workshop of ECCV in 2018, the Google team explained the differences and challenges between the open image object recognition competition and other similar competitions, that is, having a larger amount of data, label types, uneven data distribution, providing label dependency information and incomplete data labels. Compared with coco, the diversity of this data set is much greater, and it poses a greater challenge to the most advanced case recognition methods. Based on this data set, Google calls on the scientific elites in the field of computer vision all over the world to participate and take a huge step towards a more complex computer vision model for landmark detection. This is also the largest and most detailed public data at present.
As one of the star platforms of the polar chain, Jinmu brings rich and accurate experience to users with the recognition of various dimensions including objects, scenes, etc. Of course, object recognition plays an indispensable role. For the continuous exploration of cutting-edge technologies and algorithms, we also use this competition to strengthen the team’s ability in object recognition.
In view of the uneven distribution of data in this competition, we have increased the number of tags with less data. In the selection of algorithm framework, at present, the object recognition algorithm of two stage has obvious advantages in accuracy compared with one stage, and the cascade RCNN algorithm is the popular choice of various object competitions at present. However, the cascade mode also leads to a decrease in speed, which is not suitable for the actual scene. Considering the actual application scenario of our platform, we chose the faster and more classic faster RCNN. Next is backbone. At present, a large number of selection algorithms for competitions tend to be deeper and more complex, such as resnext101 (32x48d), senet, etc., which create an accuracy of 84.5% (Top1) of Imagenet for Facebook. But this kind of model has a common characteristic, it is very huge, and it also increases the time of training and testing. In addition, in order to achieve higher accuracy, participants are more inclined to train algorithms with different frameworks and backbones, usually six or more, and finally fuse. The effect of this on the overall efficiency and performance is also conceivable. The competition is to better improve the actual landing effect of algorithm or technology. Based on this original intention, we only chose two relatively more balanced backbones, resnext-101 (64x4d) and resnet-152. In the test phase, multi-scale test and the results fusion of internal cycles are used. In addition, we use softnms to fuse the results of the two algorithms and get the final results.
In the end, the team scored 0.62163 on the public list and 0.58259 on private, winning a gold medal.
As the builder of the global video networking video business operating system, polar chain technology adheres to AI technology to enable the information in the video, link the five modes of Internet information, service, shopping, social networking and games, and realize the value multiplication of new Internet economy and customers based on video. The practice of the Google AI open images – object detection track competition is the optimization of the video recognition object algorithm field of Jinmu system of polar chain technology, and is also to better support the service and enabling of visual networking. In the future, the polar chain will continue to explore technological breakthroughs in the frontier fields and promote the sustainable and rapid development of the artificial intelligence ecosystem.