CVPR 2019 is the most important academic conference in the direction of machine vision. This conference has attracted a total of 5160 papers from all over the world and received 1294 papers. The number of contributions and the number of accepted papers have reached a new high. Among them, papers, projects and exhibitors related to automatic driving have also made their debut, becoming the “new favorite” of this conference.
The trajectory prediction challenge belongs to CVPR 2019 Workshop on autonomous driving beyond single frame perception (autopilot Seminar), which is held by the robot and autopilot Laboratory of Baidu Research Institute. It focuses on multi frame perception, prediction and autopilot planning in autopilot, aiming to gather people from academia and industry Researchers and engineers in the field discuss the application of computer vision in autopilot. Meituan unmanned distribution and visual team won the first prize in this competition.
In this competition, teams need to predict the track of each obstacle in the next 3 seconds according to the track of each obstacle in the past 3 seconds. There are four types of obstacles, including pedestrians, bicycles, large motor vehicles and small motor vehicles. The track of each obstacle is represented by the sampling points on the track, and the sampling frequency is 2 Hz. Meituan’s method finally won the first place in the competition with a score of 1.3425. At the same time, we also shared the ideas of algorithm and model in the seminar.
Brief introduction of competition
The track prediction competition data comes from the real road data collected in Beijing, which contains complex traffic lights and road conditions. The marking data used in the competition is based on the camera data and radar data, including various vehicles, pedestrians, bicycles and other motor vehicles and non motor vehicles.
Training data: each road data file contains obstacle data for one minute, sampling frequency is 2 Hz per second, and each line of annotation data contains the ID, category, location, size and orientation information of the obstacle.
test data: each road data file contains obstacle data of 3 seconds, sampling frequency is 2 Hz per second, and the goal is to predict the obstacle location in the next 3 seconds.
Average displacement error: average display error (ADE), the average European distance difference between each predicted position and each true value position.
End point displacement error: final display error (FDE), the average European distance difference between the predicted position of the end point and the true position of the end point.
Because the data set contains different types of obstacle trajectory data, the index based on the weighted sum of categories is used for evaluation.
The prediction problem to be solved in this competition does not rely on map and other traffic signals and other information, which belongs to the prediction problem based on unstructured data. Now, the main methods of this kind of problem are mainly divided into two categories according to interaction: 1Independent prediction，2. Dependency prediction。
Independent prediction is only based on the historical movement track of obstacles to give the future driving track. Dependent prediction is to take into account the interaction information of all obstacles in the current frame and the historical frame to predict the future behavior of all obstacles.
Considering the dependency prediction of interactive information is a kind of problem that has been studied more and more in academic circles. But after the research and summary, we found that it is more to study a single type of interaction, for example, if there are vehicles on the highway, then predict the interaction between these vehicles; for example, predict the interaction track of pedestrians on the sidewalk. There are few ways to predict interactions between all categories of obstacles.
Here are two methods for pedestrian interaction prediction:
Method 1. social Gan, encode the input of each obstacle vehicle respectively, then extract the interactive information through a unified pooling module, and then predict separately.
Method 2. STARNet, use a star type LSTM network, use the hub network to extract the interaction information of all obstacles, and then output to each host network to independently predict the trajectory of each obstacle.
After getting the questions, we first analyze the training data. Because the ultimate goal is to predict the location of obstacles, the information of the size of obstacles in the labeled data is not very important, as long as we predict according to the category.
Secondly, analyze whether the orientation information should be used. Through statistics, it is found that the orientation information of the true value annotation is very inaccurate. As can be seen from the figure below, most of the annotation direction information is far from the track direction, so it is decided not to use the orientation information for prediction.
Then, the integrity of the data is analyzed. In the training process, each obstacle needs 12 frames of data before it can use 6 frames of data to predict the trajectory of the next 6 frames in the simulation test process. But in the real data collection, there is no way to ensure the integrity of the data, and there may be a lack of data before and after or in the middle. Therefore, we generate some training data according to the position relationship between the front and back frames to fill the lack of data.
Finally, the data is enhanced. Because our method does not consider the interaction between obstacles, only relies on the information of each obstacle itself for training, so the obstacle trajectory is processed with rotation, reverse and noise.
Because the problem of trajectory prediction this time is to predict the trajectory of all categories, the single category trajectory prediction model is not suitable for this problem, and if all objects are put in a single interaction model, the interaction features between different obstacles cannot be extracted correctly. We tried some methods to prove this.
Therefore, in the competition, we use a multi category independent prediction method. The network structure is shown in the figure below. This method constructs an LSTM encoder decoder model for each category, and adds a noise module between the encoder and the decoder. The noise module generates a fixed dimension of high-speed noise, which is connected with the LSTM state output by the encoder module as the decoder module In the initial state of LSTM, noise module is mainly responsible for adding data disturbance in the multi round training process. In the reasoning process, multiple different tracks can be generated by giving different noise inputs.
Finally, it is necessary to choose an optimal trajectory from different trajectory outputs. Here, a simple rule is adopted to select the trajectory that is closest to the predicted trajectory direction and the historical trajectory direction as the final trajectory output.
We only use the official data for training. According to the above data enhancement methods, we first enhance the data, then build a network structure for training. Loss uses weighted sum of ADE (wsade), uses Adam optimization method, and finally submits the test wsade result of 1.3425.
|STARNet (interaction based approach)||1.8626|
|Trafficpredict (Apollo scape baseline method)||8.5881|
In this competition, we tried to use the independent prediction method of multiple categories. By enhancing the data and adding Gaussian noise, and finally selecting the optimal trajectory by artificial design rules, we achieved good results in the trajectory prediction challenge. However, we think that the interaction based method is better than the independent prediction method if it is used well, for example, it can design multi category internal interaction and inter category interaction. In addition, it is also concerned that some methods based on graph neural network are also applied to trajectory prediction. In the future, more similar methods will be tried in actual projects to solve the actual prediction problems.
- Yanliang Zhu, Deheng Qian, Dongchun Ren and Huaxia Xia. StarNet: Pedetrian Trajectory Prediction using Deep Neural Network in Star Topology[C]//Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2019.
- Gupta A, Johnson J, Fei-Fei L, et al. Social gan: Socially acceptable trajectories with generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018: 2255-2264.
- Apolloscape. Trajectory dataset for urban traffic. 2018. http://apolloscape.auto/traje…
Author brief introduction
- Li Xin, algorithm expert of PNC group track prediction group of meituan unmanned distribution and vision department.
- Yan Liang, algorithm engineer of PNC group trajectory prediction group of meituan unmanned distribution and vision department.
- Deheng, director of PNC track prediction group of meituan unmanned distribution and vision department.
- Dong Chun, head of PNC group of meituan unmanned distribution and vision department.