Reading Guide:Outline of this sharing——
- Perception Introduction
- Sensor Setup & Sensor Fusion
- Perception Onboard System
- Perception Technical Challenges
01 Perception Introduction
Perception system takes the data of various sensors and the information of high-precision map as the input. After a series of calculation and processing, it can accurately perceive the surrounding environment of the autonomous vehicle. It can provide rich information for downstream modules, including the location, shape, category and speed of obstacles, as well as the semantic understanding of some special scenes (including construction area, traffic lights and traffic signs).
Perception system includes many aspects and subsystems:
- Sensor: sensor installation, field angle, detection distance, data throughput, calibration accuracy and time synchronization. Because there are many sensors used, the solution of time synchronization will play a very important role here.
- Target detection and classification: in order to ensure 100% vehicle safety, it can achieve approximately 100% recall rate and very high accuracy; This will involve in-depth learning, including object detection and multi-sensor fusion on 3D point cloud and 2D image.
- Multi target tracking: follow up the information of multiple frames, calculate and predict the trajectory of obstacles.
- Scene understanding, including traffic lights, road signs, construction areas, and special categories, such as school buses and police buses.
- Machine learning distributed training infrastructure and related evaluation evaluation system.
- Data: a large amount of annotation data, including 3D point cloud data and 2D image data.
At present, sensors are mainly divided into three categories:
- Radar millimeter wave radar.
This picture is equivalent to the output of perception object detection. It can detect the obstacles around the vehicle, including vehicles, pedestrians, bicycles, etc. at the same time, combined with the high-precision map, it will output the surrounding background information.
The green color in the picture is a car, the orange color is a motorcycle, the yellow color is a pedestrian, and the gray color is some background, such as vegetation information.
Combined with multi frame information, the speed and direction of moving pedestrians and vehicles are accurately output.
02 Sensor Setup & Sensor Fusion
The above is a general introduction of perception system from input to output. Next, we will introduce pony The sensor installation scheme of AI third generation vehicles and the solution of sensor fusion.
At present, our whole sensor installation solution can cover 360 degrees and a sensing distance of 200 meters. From the installation of different sensors and different angles, three lidars are used at first, the top and both sides. Lidar has a sensing range of 100 meters. At the same time, four wide-angle cameras cover the 360 degree field of view of the camera. The far field of vision expands the sensing range to 200 meters through forward millimeter wave radar and long focus camera. This set of sensor configuration can ensure that our self driving vehicles can drive automatically in residential areas, commercial areas and industrial areas.
The 3rd generation sensor configuration scheme launched at the world artificial intelligence conference in September 18.
The front camera has two wide angles and one telephoto. So that you can see the information of signal lights at a longer distance and the status of traffic lights within 200 meters.
The installation scheme of the whole sensor is introduced above. The following mainly introduces the solution of multi-sensor fusion.
The first problem to be solved is to calibrate the data of different sensors into the same coordinate system. Including camera internal parameter calibration, lidar to camera external parameter calibration, radar to GPS external parameter calibration.
The important premise of sensor fusion is to improve the calibration accuracy to a very high accuracy. Whether it is sensor fusion at the result level or sensor fusion at the metadata level, it is a necessary foundation.
From this picture, we can see that the accuracy of sensor calibration is still very high when we project the 3D laser point cloud onto the image.
The whole calibration work has basically been completely automated. Calibration schemes of different sensors:
The first is the calibration of camera internal parameters. The calibration platform of internal parameters can calibrate the sensor for each camera within two to three minutes.
This figure shows the external parameter calibration from the camera to the lidar. The lidar rotates 360 degrees, and each rotation cycle is 100 milliseconds. Camera is a problem of instantaneous exposure, so it involves a way of time synchronization. Trigger camera exposure through lidar. For example, we have four cameras, which can ensure time synchronization through lidar.
3D and 2D complement each other, and their better integration can have a more accurate output of perception.
03 Perception Onboard
The above briefly introduces the sensor setup of the whole perception and the method of sensor fusion. Next, we will introduce the architecture of perception onboard and the solution.
This is the architecture of the whole perception onboard. Firstly, lidar, camera and radar sensor data are used for time synchronization, and all time errors are controlled within 50ms. Combined with the sensor data, the detection and classification of frame wise are calculated. Finally, multi-target tracking is carried out by using multi frame information, and finally the relevant results are output. The technical details related to sensor fusion and deep learning are involved here, which will not be discussed too much here.
- The solution of the whole perception system should ensure these five points:
- The first is security, which ensures almost 100% detection recall
- Precision requirements are very high. If it is lower than a certain threshold, it will cause false positive, and the vehicle will be very uncomfortable under automatic driving
- Try to output all information helpful to driving, including road signs, traffic lights and other scene understanding information
- Ensure efficient operation and process a large number of sensor data in near real time
- Scalability is also very important. Deep learning depends on a large amount of data. The generalization ability of the whole model is very important. It enables our model and some new algorithms to adapt to more cities and more countries.
04 Perception Technical Challenges
Here are some challenging scenarios:
Part I: balance between precision and recall;
Part II: long tail scene.
This is a busy scene at the intersection of the evening peak, with a large number of pedestrians and motorcycles passing through the intersection.
The corresponding original data can be seen through 3D point cloud data.
Rainy days. Auto drive system is very difficult to handle in some special or bad weather conditions.
For example, lidar can hit water spray. The white one in the picture is the filter for water spray. If the water spray cannot be accurately identified and filtered, it will cause trouble to the autonomous vehicle. Here we can see the processing results of our current system. Combined with the data of lidar & camera, it has a high recognition rate of water spray.
Long tail problem
These are the two types of sprinkler we met during the road test. On the left is the upward spray gun, and on the right is the sprinkler on both sides. Human drivers can easily surpass the sprinkler, but for the perception system, it takes a lot of time to process and identify such scenes and vehicles, and finally let the autonomous vehicle get a better body feeling when encountering similar scenes.
Detection of small objects
Unexpected events, such as the sudden appearance of stray cats and dogs during road test, we expect the sensing system to accurately recall small objects.
It will be more challenging for traffic lights. Will always encounter new scenes. Because there are all kinds of traffic lights in different cities or countries.
The problem of backlight, or the problem of camera exposure suddenly coming out from under the bridge. It is solved by dynamically adjusting the camera exposure.
This is also the scene of traffic lights. There is a countdown on the traffic lights. We need to identify the countdown, so that the self driving vehicle can have a better ride experience when it meets the yellow light.
The waterproof problem of the camera in rainy days is also necessary to deal with extreme weather conditions.
The traffic light enters the identification of the progress bar. If green turns yellow quickly, slow down.
That’s all for today’s sharing. Thank you.
This article was first published inWechat official account “datafuntalk”。
This article is composed of blog one article multi posting platformOpenWriterelease!