1、 Background and current situation
In recent years, the infrastructure construction of domestic road traffic and related facilities is changing with each passing day. The vast number of users’ daily travel demand is strong, which puts forward higher requirements for the data quality and current situation of the electronic map products used. In the traditional process of map data collection and production, that is, manual processing of collected data after field collection by acquisition equipment, the problems such as slow data update and high processing cost become increasingly prominent.
With the advantages of visual AI and big data technology, Gaode map leads the transformation of map data industry. Through image AI technology, it can directly identify and extract various data elements from the collected data, providing the most solid technical foundation for realizing the operation mode of machine replacing human.
Through the high-frequency and high-density data collection of the real world, using the image visual AI ability, the Gaode map can automatically detect, identify and determine the contents and positions of various traffic signs and markings in the massive collection image library. By comparing with the historical information, it can quickly find the change information of the real world. At the same time, combined with the powerful and professional data fusion ability, the system can automatically detect and identify the contents and positions of various traffic signs and markings in the massive collection image library Now 100% information is integrated, so as to build a high current national basic map.
To sum up, through in-depth technical cooperation in algorithm and map engineering, as well as business connection with data acquisition and data production, a fully automated production line of basic map data production with image recognition, location service, differential filtering and data fusion as the core technologies is built, and an efficient and high-quality data information pipeline from the real world to the map application terminal is established Production channel.
2、 Feasibility and key points of automatic production line
From the development of image object classification and detection, image object classification and detection has a history of several decades, and a series of classic algorithms have emerged. In recent years, with the rapid development of image recognition technology, especially deep learning technology and the development of GPU computing power, classification and detection technology has been greatly improved.
From the big data needed for automation, Gaode map has focused on map data production for more than ten years, and has accumulated rich and accurate data covering the whole country. In addition, it has a large amount of collected information every day, which has become a natural sample pool for algorithm training. At the same time, a set of professional and standardized map production operation specifications has laid a solid theoretical foundation for data fusion 。
Therefore, from the accumulation of algorithm reserve capacity, data and process, the construction of automatic production line has strong feasibility, which focuses on the following four parts:
image recognition: the goal of image recognition is to analyze the real information related to map data from the input image, detect and identify the information of traffic signs and markings in the pictures, subdivide their types, and understand the numbers and characters in them, so as to express the contents in the form of text. In addition, since the input image is continuous, a single sign and line sign can be observed on multiple images, so the same information in multiple images is integrated, and the most appropriate image is selected as the main image for display.
Location serviceBased on low precision GPS and image acquisition, location service calculates the precise position of itself and scene objects, and maps them to map data. It includes image road understanding, sign location analysis, acquisition track matching and other core capabilities. According to the trajectory characteristics and road connectivity, the matching probability model of the relationship between the location position, angle, speed and the candidate road is established, and the trajectory is associated with the map data. Through the understanding of the scene in multiple pictures, the relative position of the picture relative to the intersection is given, and the action position of the object is further determined by combining the shape of the road data on the map.
Image difference and semantic filtering: the purpose is to compare the consistency of the newly collected data with the data in the existing parent database, automatically carry out differential and filtering operations on the same information, leaving the changed information. The difference between the two is that the former is to detect whether the newly collected images at the same location have any changes compared with the historical ones, and compare them from the perspective of track and image itself; the latter looks at the content after image recognition from the perspective of data, and compares whether there is any change in the parent database data from the perspective of map semantics.
Location based data fusion: the results of image recognition, combined with the location service to provide the role of the road. Through the abstract intersection model, data fusion is done in the road or intersection, that is, adding or updating map data.
3、 Key technical capability
1. Image recognition
Image recognition faces three major challenges: on the one hand, there are various scenes and various types. There are many kinds of objects to be tested, such as traffic signs, ground guide lines, electronic eyes, etc. For example, the normal direction information signs are as follows:
Special directional information signs:
There are hundreds of types of traffic signs commonly used in national standards, and there will be some traffic signs with local characteristics in different places, so it is necessary to support customized detection and recognition. There are various shapes of common signs, such as triangle, circle, square, diamond, octagon, etc. at the same time, the colors are widely distributed, such as yellow, red, blue, green, black, white, etc. in addition, some slogans and billboards similar to traffic signs in natural scenes need to be excluded to reduce the impact on the recognition accuracy.
On the other hand, there are great differences in image quality in natural scenes, many of which are of low quality. In addition, there are extreme outdoor scenes such as occlusion, backlight, rain and snow. These problems should be considered and solved in the detection process.
Finally, the size difference of the objects to be detected is large, such as square board (hundreds of pixels), small as electronic eye and traffic lights (more than ten pixels). However, the small-scale detection has poor identification, which requires high detection algorithm.
To sum up, traffic sign detection is actually a multi type target detection task for the algorithm itself. The mainstream method is end2end scheme based on deep learning, which completes the detection and fine classification tasks in a network at the same time. The commonly used datasets are Pascal VOC (class 20) and coco (class 90).
According to the actual needs of the business, the whole scheme is divided into two parts: target detection and fine classification. In the target detection stage, all traffic signs are detected in the image through fast RCNN. In this stage, the recall rate and execution speed are very high, and the requirements can be widened in terms of accuracy; in the fine classification stage, candidate frames are obtained in the target detection stage, and then fine classification is carried out And filter out the noise to ensure high recall and accuracy.
2. Location service
Trajectory drift has always been a great challenge to the accuracy of the location matching map. On the one hand, parallel roads and elevated scenes, especially the main and auxiliary roads, which are 1-2 lanes away, need high positioning accuracy. The conventional GPS positioning accuracy is 5-10m, which is difficult to achieve 80% recognition rate of main and auxiliary roads. In addition, the basic map data itself also has the problem of GPS accuracy.
In addition to the basic theories such as rules and hidden Markov model learning, reasoning and Viterbi algorithm, the key to the success of trajectory matching is to resist positioning drift reasonably. By learning and summarizing the trajectory morphology, we can find out its rules, establish a probability model that conforms to its characteristics, accurately express the matching establishment process, and reasonably balance the relationship between matching accuracy and anti drift ability. In addition, the connectivity of long track and image recognition of lane number or road location relationship are used to solve the problem of partial scene of parallel road.
At present, the determination of action road and action position depends on image recognition. The recognition of intersection position and the understanding and judgment of map data scene are based on image recognition. For example, the relative position of signs to road or intersection is difficult to determine by identification itself. It needs to integrate the understanding and judgment of data road network data characteristics. This kind of judgment is complex and can be understood by people at a glance, but machines It’s hard to describe in terms of rules. Therefore, through the analysis of scenes such as straight traffic in the road section, straight traffic in the intersection and turning, the operation road is determined by comparing the model of map road section or intersection, and the action position is calculated according to different attributes.
3. Image difference and semantic filtering
Image difference mainly faces the problem of data alignment, that is, multiple data acquisition at the same location will be affected by the accuracy of GPS itself and the deviation of road judgment caused by satellite signal occlusion. In addition, in terms of semantic recognition, environmental factors in the natural environment, such as occlusion, blur, shadow, rain and snow weather, change of perspective, will affect the analysis of deep semantic information (such as type, content, etc.) of subsequent algorithms. The superposition of the two factors increases the difficulty in the consistency comparison of multiple images and semantics.
In this aspect, the algorithm greatly improves the accuracy of recognition and consistency judgment, so as to avoid the impact of wrong matching on data update. The image difference is divided into two parts: data alignment and local matching. The data alignment answers whether the two images are in the same position and perspective, and judges the position relationship of the two images by means of GPS track coarse screening and image matching. Local matching needs to answer whether the two objects are of the same type. For objects with text content, it also needs to detect the consistency of layout and text. Therefore, in addition to the introduction of common point feature matching technology, image matching network based on deep learning is also used. For the content part of the text, the OCR ability is used to complete the understanding and analysis of the content, and the final judgment is that the content collected twice is completely consistent.
4. Location based data fusion
Due to the complexity of the real world, map production experience has accumulated and a large number of standardized map data production specifications have been formed, which are intangible assets that can reasonably abstract and accurately express the real world. Even though the real road network forms are various, they can be classified abstractly through the model, and the relatively general map data model under different scenes can be established, so as to establish a large number of tools and methods for map data processing on it, so as to ensure the wide use of data automatic fusion ability.
In essence, the realization of automatic production of SD basic map data in Gaode is to introduce image AI technology and data fusion technology in the process of basic map data production, combine with many years of map digitization production operation specifications and experience, innovate a set of data oriented automatic production line, form automation liberation, and continuously provide high-efficiency and high-quality map data to solve the map supply The problems of high specialization, high labor cost and low operation efficiency of commercial production line can meet the needs of current situation of electronic map product data in the process of travel.