Reading guide of Ali Mei:Looking back on 2019, as the first national travel platform with a daily life of more than 100 million, the number of users of goto map to C and to B will reach a new high. It is thousands of Gaode technicians who work hard day and night to support and drive the rapid development of business.
In 2019, artificial intelligence technology will be fully implemented in Gaode map, playing an increasingly important role in business scenarios such as vision, search, navigation and positioning; the technology fields of client & mobile, automotive technology, service architecture, data research and development, quality, etc., will also achieve deepening, integration, intelligence, innovation and breakthrough, providing users with more accurate and efficient map services and interaction Experience.
Now?Gaode technology 2019Free launch, 6 chapters annual inventory, for you to reveal the “temperature” technology! This article “technical process and practice of high-precision map middle surface identification” is one of the articles included in e-book. It introduces the technical evolution of high-precision map ground identification. These technical means serve the needs of high-precision map production line in different periods, and provide the basic technical guarantee for the construction of high-precision map.
1. Ground identification
Ground identification refers to the identification of various types of ground identification elements in the map Road, such as ground arrow, ground text, time, ground number, deceleration strip, vehicle distance confirmation line, deceleration hill, pedestrian crossing, stop and yield line, etc. These automatic identification results will be delivered to the map production line as production data, and then evolved into maps serving automatic driving, vehicle navigation and mobile navigation.
High precision maps generally require at least centimeter accuracy of each map element. Therefore, compared with ordinary maps, higher position accuracy is required, which is also the biggest difference from ordinary map recognition. Therefore, exploring how to identify ground signs completely and accurately is our direction of continuous efforts.
There are two difficulties in the identification of ground signs: one is the variety and size of ground signs; the other is that the ground signs are easy to be worn and covered, and the clarity is uneven, which brings great challenges to high-precision identification.
1) There are many kinds of ground signs:In the actual scene, there are many kinds of ground signs, which have different distribution in content, color, shape, size and so on.
- Color: such as yellow, red, white, etc;
- Shape: arrow shape, various characters and numbers, bar, multi bar, face shape, mound shape, etc;
- Size: the standard arrow length defined by the national standard is 9 m, but there are also ground sign elements of 1 m ~ 2 m or even less than 1 m, especially the size difference of speed bumps and sidewalks will be greater, which reflects that the number of pixels and the ratio of length to width in the image will be greatly different.
2) Many wear glands:The ground elements will be worn by vehicles and pedestrians for years, and the frequent traffic jams increase the possibility of ground elements being covered. Therefore, the quality of point cloud data obtained from lidar and visible image data obtained by camera are uneven, which brings great challenges to ground mark recognition.
The common problems are as follows, and an example is shown in Figure 2
- Wear of ground signs: ground signs are incomplete or seriously unclear due to wear, discoloration and paint dropping
- Acquisition environment problems: occlusion (construction, vehicle), material laser reflectivity difference caused by environmental change and unclear visible light (rainy day, backlight, etc.).
2. Identify the start
What we need to do is to extract this part of the ground identification, and the most intuitive method is threshold segmentation, skeleton extraction, connected domain analysis and other traditional methods. Firstly, the region of the skeleton is truncated, and then the region of high reflectivity is obtained.
In addition, we also try to extract the ground signs by grabcut and other algorithms. Grabcut algorithm clusters the foreground and background separately to obtain K groups of similar pixel sets. Then, the foreground and background are modeled by Gaussian mixture model (GMM) to judge whether the pixel belongs to the ground sign or the background. After extracting the suspected ground identification area, the machine learning model (SVM, etc.) is used to subdivide the class to obtain better recognition effect.
As can be seen from the above figure, it is better to extract ground signs for some scenic spots, but the effect is not good for wear, fuzzy, similar foreground background, complex background and so on. It is easy to miss recall, and the location accuracy is not high, and the robustness is not strong.
3. Deep learning Era
In 2012, the alexanet network proposed by Hinton team won the champion of the 2012 image recognition competition. Compared with the traditional methods, CNN has obvious advantages in the field of image. In recent years, the detection and recognition technology based on deep learning has also been greatly developed.
The era of deep learning is the era of data and hardware driven. Combined with some manual annotation and automatic generation, we have millions of data, and the data of various scenes is still constantly enriched. Combined with algorithm exploration and innovation, we have achieved better and better technical and business effects.
At present, the detection and identification technology mainly includes two stages (such as RCNN Series) and one stage (SSD, Yolo, etc.). The advantages of two stage network are that the overall effect is better, the recognition position is more accurate, and it has certain competitiveness for small target detection.
The advantage of one stage detection and recognition method lies in its fast processing speed. High precision map not only needs high recognition performance, but also needs high enough recognition position accuracy, so we choose two stage direction with high accuracy.
1) R-fcn detection
Combined with position sensitive score map and position sensitive pooling, r-fcn algorithm has achieved high performance and location accuracy in target detection and recognition. We choose r-fcn detection algorithm to realize the detection and recognition of ground surface markers.
R-fcn algorithm is based on deep learning method, through learning a large number of actual scene samples, so it has made a great improvement in generalization. The recognition ability of automatic recognition for different scenes has been improved, and the recall rate of ground identification has been greatly improved. The algorithm diagram is as follows:
The following are some examples of ground identification detection and recognition:
The introduction of deep learning has greatly improved the performance of automatic recognition of high-precision map ground signs, and the recall of ground signs has been greatly improved. The drawback of r-fcn is that the final detection position of r-fcn is based on the score of ground sign category. However, the position with the highest score is not always the most consistent with the actual position, so in terms of location prediction accuracy, r-fcn has a disadvantage- FCN is not perfect.
2) Cascade detector
With the development of deep learning and the increasing requirements of the industry for the accuracy of target detection and recognition, more and more high-precision detection and recognition algorithms have been proposed, such as IOU net.
We timely adopted more advanced identification algorithm, in order to obtain more accurate position accuracy to meet the needs of production line business. Combined with cascade detection, we used deformable conv adaptive receptive field technology to improve the recognition accuracy of the algorithm.
This algorithm is different from the traditional algorithm for ROI prediction regression to get the final position, but through the cascade form of continuous correction of the predicted position and the actual position deviation, each cascade regression, the algorithm recognition results will be more consistent with the true value, which is very conducive to improve the recognition accuracy, in line with the high requirements of high-precision map for target location accuracy, and finally recall And position accuracy are better.
The following are some examples of algorithm recognition results:
Through the introduction of cascade detection and recognition model, the automatic recognition ability of high-precision production line has been improved in recognition accuracy, but we have endless mining for the improvement of automatic identification position accuracy, so we have the following scheme.
3) Cascade detection + local regression
Imagine that if we do a local position regression in the ground marking area, then the network can focus on the more subtle ground mark area, and finally get a position closer to the boundary. Combined with the practice, when doing the ground mark recognition, we will easily cause the part of the accuracy problem to do position refinement alone, get more precise position.
The following are some examples of algorithm recognition results:
Using detection + regression technology scheme to achieve better position detection accuracy, let us leave the “real world” one step further. Its disadvantage is that the process of technical scheme is long and not simple and beautiful.
4) Corner based detection
In the corner regression based target detection method, a single convolution neural network is used to predict two groups of thermal maps to represent the positions of different objects’ angles, that is, the target boundary box is detected as a pair of key points (i.e., the upper left corner and the lower right corner of the boundary box) and the embedding vector of each detected corner. The corner is used to determine the position of the target, and the embedding vector is used to group a diagonal point belonging to the same target.
This method simplifies the output of the network. By detecting the target as a pair of key points, this method eliminates the disadvantage of large number of anchors in the feature layer in the existing detector design, because a large number of anchors cause a lot of overlap and imbalance of positive and negative samples. At the same time, in order to produce a tighter bounding box, the network also predicts the offset to fine adjust the position of corner points. By predicting the thermal map, embedding vector and migration, the accurate boundary box is obtained.
In the detection task, it is necessary to obtain the feature map of the same size for position regression, classification and so on. The algorithm will quantize and downsampling, which inevitably leads to the loss of accuracy. The biggest impact of this drawback is that the position returned by detection is not robust enough, and there will be more or less offset in some cases.
5) Cascade detection + segmentation refinement
With the continuous maturity of semantic segmentation technology, semantic segmentation based on deep learning has been able to classify the input image at pixel level, and its accuracy is getting higher and higher, that is, the contour of elements in the image is becoming more and more fine.
We use a segmentation model based on RESNET, and combine adaptive receptive field, multi-scale fusion, coat fine fusion, region of interest attention mechanism and other technologies to achieve pixel level segmentation of ground signs.
In order to obtain the entity information of the ground sign, we still use detection to determine the approximate location of the ground sign, but the difference is that the final accurate location of the ground sign is obtained by dividing the semantic information of the ground sign in the corresponding area.
The following is an example of partial detection combined with refinement:
The introduction of semantic segmentation improves the recognition accuracy of ground signs, solves the problem that the recognition position accuracy is not robust caused by detection, and makes the automation effect of high-precision map ground identification go to a new level.
However, this method is a little cumbersome, and the detection and segmentation tasks need to consume a lot of GPU resources, that is to say, a picture needs several GPU operations at the same time and the subsequent CPU post-processing fusion can get the final result. If these steps can be optimized, then it will certainly simplify the process and save a lot of computing resources.
Based on the above considerations, we adopt a detection and recognition algorithm based on panet. The information transmission in each layer of traditional case segmentation model is not enough. Panet solves these problems well, and fully integrates the features of coat and fine. It not only has the top-down feature fusion, but also combines the bottom-up feature fusion. In the high-level feature, it fully integrates the strong positioning feature of the bottom layer, and solves the problem of information loss of shallow features.
In addition, adaptive feature downsampling is used to fuse different feature layers, extract ROI features for prediction, and add additional mask Foreground Background classification branches to make the prediction mask more accurate. The combination of these means has a great benefit for the accuracy of target detection position. At the same time, the combination of segmentation and detection tasks can promote each other to achieve better results.
The following are examples of recognition results of some algorithms. It can be seen that the algorithm has a certain tolerance for some wear fuzzy ground marks, and its position accuracy has been greatly improved. (the outer frame of the ground sign in the figure is the approximate position obtained by detection, the inner frame is the position obtained by pixel level segmentation, and the inner frame is the final position of the ground sign).
The above scheme needs to project the point cloud into 2D space, and there is a certain normalization quantization operation in the middle, which inevitably leads to the loss of some information. The most intuitive is that the target is easily lost in some places where the reflectivity of the point cloud is low. If we can extract from the original 3D point cloud, these problems will be solved.
7) Target detection based on 3D point cloud
Based on the above considerations, we explore 3D object detection on the original point cloud. 3D point cloud recognition is an important part of various real-world applications, such as autonomous navigation, reconstruction, VR / AR, etc. Compared with image-based detection, lidar provides reliable depth information, which can be used to accurately locate objects and characterize their shapes.
We have explored a variety of 3D point cloud recognition algorithms, such as bird view, voxel and so on. Due to the good performance of pointrcnn in the original 3D point cloud target detection, we use the method based on pointrcnn to extract the ground identification. The whole detection framework includes two stages: the first stage divides the point cloud of the whole scene into the front scenic spot and the background point, and generates a few high-quality 3D proposal directly from the point cloud in a bottom-up way.
In the second stage, the candidate regions are modified in the normal coordinates to obtain the final detection results. Each proposal is pooled and transformed into the normal coordinates to better learn the local spatial features. At the same time, it is combined with the global semantic features in the first stage to predict box optimization and confidence prediction.
4. Effects and benefits
The support of big data makes our algorithm have better robustness and recognition ability. Combined with various strategies in the algorithm and a variety of data sources (point cloud, visible light, etc.), we are constantly improving the accuracy of ground identification. The location accuracy of the ground truth 5cm range is more than 99%, and the recall is more than 99.99%. All indicators have been steadily improved.
The above scheme has been officially put on line, and processed a large amount of data, and the quasi call rate has reached the requirements of production operation. Meanwhile, the efficiency of manual operation production line has been improved. Here are some renderings:
5. Write at the end
The high precision map is called the “eye” of the automatic driving system, and the biggest difference between the map and the ordinary map is that the main body is different. The user of ordinary navigation map is human, which is used for navigation and search, while the user of high-precision map is computer, which is used for high-precision positioning, auxiliary environment perception, planning and decision-making. Therefore, high precision map not only needs a high recall rate for map elements, but also needs a very high location accuracy.
The identification of elements in high-precision map has put forward higher requirements for technology. Throughout the development of high-precision map industry, map making gradually transits from pure manual to semi-automatic and even full-automatic. The period recognition technology has been continuously developed and improved, from manual feature construction to automatic feature recognition, from 2D recognition to 3D recognition and higher dimensional recognition, from single source recognition to multi-source fusion.
At present, high-precision map mostly adopts manual operation, and the quality and efficiency of manual operation is always a contradiction point. In contrast, machine automatic identification has higher efficiency, lower activity cost and no less than manual work quality. The application of automatic identification will accelerate the construction of high-precision map and promote the development of high-precision map industry. High precision ground identification technology has been applied in Gaode high-precision map, which effectively improves the efficiency and quality of data production, and provides solid technical support for the construction of high-precision map.
Click here to download the full ebook.
The author of this paper: Yu Yao
Read the original
This article is from Alibaba cloud partner “alitech”. If you need to reprint it, please contact the original author.