Last week, Ren Xiaofeng, chief scientist of Alibaba Gaode map, made a technical exchange with you on the development of computer vision related technology and the application in the field of map travel at the online live broadcast activity of “cloud dialogue” between senior students of Alibaba. The interaction between live broadcast is hot. Especially in QA, students are interested in visual application, AR navigation, positioning technology, 5g, career development, etc The topic questions, Ren Xiaofeng made a wonderful answer. We sorted out the Q & a content and shared it with you.
Video playback address：
Dr. Ren Xiaofeng, currently the chief scientist and researcher of Alibaba Gaode map, is mainly responsible for the application and innovation of visual technology in the field of map and travel. Before joining Alibaba, he worked for Amazon from 2013 to 2017, where he was a senior chief scientist of Amazon and the algorithm director of Amazon go. He graduated from Zhejiang University, Ph.D., University of California, Berkeley, visiting professor, Department of computer science, University of Washington, chairman of CVPR / iccv / AAAI and associate editor of IEEE PAMI.
Development and application of vision technology
Question: what are the applications of computer vision in the construction of high-precision maps?
Ren XiaofengVisual algorithm is the core technology for high-precision map construction, mainly used in data alignment and accuracy assurance, recognition and map data automatic generation, visual positioning and high-precision map update.
Question: do you think the current level of basic research and hardware can guarantee the rapid development of visual technology? Will the development of visual technology encounter a bottleneck that is difficult to break through in the near future?
Ren Xiaofeng: after the rapid development of deep learning technology in various fields of vision in the past few years, to a certain extent, deep learning and the basic technology of vision are now facing bottlenecks. In other words, it has not developed as fast as it did at the beginning. There are many problems to be solved and new technologies may need to be created. For the application, I think the basic technology and hardware level is generally enough. What’s more important is how to use the technology well and break through the technical bottleneck pertinently.
Question: single target tracking sot (tracking a single target with a given template, category independent / cross domain) has made great progress in the past two years, and has the potential to solve the problem of fast tracking. Do you have any application prospects in map business, such as visual positioning (tracking landmarks in VO) / AR navigation (short-term tracking)? If so, what kind of requirement problems (robustness / speed, etc.) need to be solved?
Ren XiaofengTracking is a visual basic technology, which is applied in many scenes. For navigation and travel, it can play a key role in AR navigation and positioning, reduce the computational requirements of identification (detection), and increase robustness and smoothness. However, in many practical applications, the use and requirements of tracking will be different from the academic settings of single target tracking.
Question: can visual features combine with semantics to bring better experience to map navigation travel service?
Ren XiaofengVision can not only provide high-precision positioning, but also provide semantic understanding of the scene, which can definitely bring better navigation and travel experience. But the specific product experience and technology realization still need further exploration and accumulation.
Question: which direction is the next step in computer vision? What is the future?
Ren XiaofengComputer vision is a general means of perception, with a large amount of information. It can be used in a variety of perceptual tasks and can be observed from a long distance. Its application prospect is very broad and beautiful. The difficulty of the next step, in addition to the basic technology needs to progress and breakthrough. There are also: how to find the application scenarios where vision can play a key role, how to design the overall scheme according to the actual problems, how to better solve the problem of computing resources, and how to combine other sensors and prior knowledge.
Question: is ar navigation real-time image computing? Can the equipment capacity be marked?
Ren XiaofengAr navigation is real-time image computing, which can realize navigation and assistant driving function under the condition of low computing power. We also try our best to “pre calculate” some elements in the environment to cooperate with real-time calculation.
Question: what does ar navigation use to present content in the end? Display or HUD?
Ren Xiaofeng: AR navigation has a variety of product forms: central control panel, HUD, rearview mirror, instrument panel, which are in use / potential use display mode.
Question: there is a non-technical question. Will ar navigation excessively attract the driver’s attention and cause him / her to ignore the traffic on both sides of the vehicle?
Ren XiaofengThis is a good problem in product design, and it is also a problem that we have been polishing and seeking balance. A good design of AR navigation products, will consider not to attract more attention.
Question: is there fatigue driving test for safe driving assistance?
Ren XiaofengAt present, Gaud’s ar navigation only has a monocular camera facing outward, and does not support fatigue driving detection. The monitoring of vehicle interior, including fatigue detection, is an important application of vision technology in safety assisted driving.
Question: what are the mainstream technologies for indoor positioning? Is the prospect of indoor navigation based on acoustic signal good?
Ren Xiaofeng: indoor positioning has a variety of sensor based technologies, including WiFi, Bluetooth, RFID, ultra wideband, as well as acoustic signals. I think the development of indoor positioning, if you need to deploy sensors, largely depends on the technology and positioning accuracy, but whether there is a good application. The popularity of WiFi location is because the indoor network needs WiFi. The iPhone 11 is equipped with a UWB chip to transfer files up close.
Question: what is the cause of such a big gap in GPS positioning? Is it because of multipath effect?
Ren XiaofengGPS positioning is not accurate for many reasons, mainly in the “city Canyon” (high-rise buildings) scene. Multipath effect is one of the most important factors, because the refraction of the environment (especially high reflective materials such as glass) leads to inaccurate GPS position calculation. There are other reasons, such as the decrease of the number of satellites that can be observed due to the shelter of buildings / viaducts, the interference of air (especially charged ions and water vapor), etc.
Question: how does Golder solve the problem of GPS drift?
Ren XiaofengThis is a complicated problem. Based on the mobile phone sensor, we have done a lot of optimization based on the actual driving and walking scenes, including GPS confidence analysis, combination with IMU, and road network. Visual positioning is a new direction that we are developing to solve the problem of inaccurate positioning.
Basic technology of map
Question: what are the layers of gaude map at present? Is it a semantic high-precision map?
Ren Xiaofeng: there are many kinds of map data forms in Gaode map, from standard map (seen on Gaode APP), to Lane level map, to high-precision map. Different precision means different application. There are semantic information in many kinds of maps, but the content and precision of semantic information are different.
Question: what’s the difference between a depth camera and a normal camera?
Ren Xiaofeng: the information obtained by ordinary camera is 2D RGB image without 3D information. In addition to RGB color, depth camera can also obtain depth (distance) information on each pixel, which is usually in active mode (time of flight, structured light, etc.). Now many mainstream mobile phones are equipped with depth cameras.
Question: how does Gaode map collect road information? Will the map be updated in real time if the road changes?
Ren XiaofengThere are many sources of road information in Gaode map, mainly relying on low-cost vehicle video data. Road related information is changing at any time, we will continue to collect the latest information and make updated map data, timely online application.
Question: what are the difficulties in mapping indoor three-dimensional space (such as multi-storey commercial buildings)?
Ren XiaofengThe most difficult point of Indoor 3D map drawing is data collection. The method of 3D reconstruction needs images with multiple angles. The accuracy of mobile modeling method based on depth camera may not meet the requirements.
New career development
Question: from the academic research field of vision and image to the development of corporate commercial computer vision application technology, what knowledge needs to be supplemented?
Ren XiaofengI think the main consideration is not to supplement specific knowledge, but to cultivate one’s ability in various aspects: (1) the ability to analyze and solve practical problems; (2) the ability to practice; (3) the ability to quickly learn and expand knowledge.
Question: how to make a career plan in the field of computer vision?
Ren XiaofengThere is no essential difference between the career planning of other industries and technical directions. We should combine our own strengths / weaknesses and interests, find our own suitable work direction, gradually improve the depth, breadth, height and comprehensive ability of technology, and make practical results step by step to develop career.
Question: is it necessary to have deep learning skills to work in the field of vision?
Ren XiaofengComputer vision is now using a lot of deep learning technology, deep learning knowledge and technology I think is necessary. There are some sub areas related to geometry, such as 3D reconstruction, slam / vio, and there are not many deep learning applications, but (1) it is expected that there will be more deep learning applications in the future; (2) in order to improve the breadth of technology and vision, we also need to understand deep learning to a certain extent.
Industry hot spots and others
Question: will 5g technology be used in automatic driving?
Ren XiaofengAt present, 5g technology will be used in many ways in automatic driving, but for L4 / L5 automatic driving, I don’t think 5g can fundamentally solve the problem of safety (and comfort) of automatic driving.
Positioning and tracking in the cloud: how to coordinate?
Ren XiaofengGenerally speaking, those with high real-time requirements and close combination with sensors will be completed on the end; those closely combined with maps and requiring a large amount of reference data will be completed on the cloud.
Question: Google maps has a street view map module, which uses a lot of image recognition technology. How is the street view map assembled? And what is the development trend of street view?
Ren XiaofengGoogle map’s street view map is mainly from Google’s own street view collection vehicle, which has high-quality cameras and integrated inertial navigation sensors. Street view map is mainly a process of splicing. Streetscape maps are interesting, but they haven’t fundamentally changed the navigation and travel experience. Google’s latest ar pedestrian navigation (which is different from Gode’s car ar navigation) is a new application based on street view maps.
Question: how can wearable devices (such as glasses, smart assistants, etc.) be better implemented and commercialized in terms of visual technology?
Ren XiaofengHardware (AR display, computing power) and experience are the main problems for wearable devices to be truly launched and popularized. As an advanced product, Google glass is too limited in hardware. At present, AR glasses are mainly used in enterprise scenes. I personally think that the application prospect of wearable devices as personal assistants (including navigation, information display, etc.) is very good, but now the hardware conditions may not be mature.