Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Time:2021-4-7

Affected by the epidemic, the 11th China Database Technology Conference (DTCC 2020) was postponed from May to August, and then to December. Nevertheless, the enthusiasm of Chinese people for database technology has not diminished. From December 21 to December 23, 2020, the Beijing International Conference Center is full of people, and the major manufacturers compete. In the NoSQL technology special session, Dr. Li Ruiyuan of Jingdong Intelligent City Research Institute brought you the theme report of “architecture design and application practice of Jingdong urban spatiotemporal data engine just”, which has received extensive attention.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Personal profile of Dr. Li Ruiyuan: Dr. Li Ruiyuan, head of the spatiotemporal data group of Jingdong City, researcher of Jingdong Smart City Research Institute, data scientist of Jingdong smart city business division, is responsible for the design of spatiotemporal data platform architecture, the research on the combination of spatiotemporal index and distribution, the research and development of spatiotemporal data products, and the implementation of spatiotemporal data mining in urban scenes. Before joining JD, I worked in the urban computing group of Microsoft Research Asia for 4 years. Research interests include: spatiotemporal data management and mining, distributed computing and urban computing. He has published more than 20 papers in high-level journals and international conferences at home and abroad, including: KDD, artistic intelligence, ICDE, AAAI, TKDE, WWW, Ubicomp, software journal, etc. More than 20 patents have been applied. Now he is a member of China Computer Society (CCF), a member of CCF Database Committee, and a member of IEEE. He has been the reviewer of many top conferences or journals at home and abroad.

Just introduction: spatiotemporal data contains rich information and can be applied to various urban applications. However, it is difficult to store, manage and analyze spatiotemporal data efficiently due to its high update frequency, large data volume and complex structure. It is obvious that a single machine can not cope with the scene of massive data. However, distributed query processing frameworks such as spark and Hadoop are difficult to process spatiotemporal data efficiently due to the lack of effective spatiotemporal index and spatiotemporal analysis algorithm. JD City spatiotemporal data engine just adopts advanced data modeling method, data storage technology, distributed index technology and analysis technology, and preset a variety of effective spatiotemporal mining algorithms, which can help people manage massive spatiotemporal data conveniently and efficiently. Based on just, it has won ACM sigpatial ten-year influence award twice in a row, published more than 20 international top papers and applied for more than 30 patents. Just has provided support for a number of smart city projects and played an important role in Xinguan epidemic prevention; within JD, just has been verified under the scenario of double 11 billion orders and massive logistics trajectory.

The content of this speech has been confirmed by the speaker.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Hello, everyone! Thank you very much for your participation in this report and for the invitation from the organizer of the conference. My name is Li Ruiyuan, from Jingdong Intelligent City Research Institute. You can search my name directly through Baidu. The first link should be me. Today, I bring you a small but very common database — just, the spatiotemporal data engine of Jingdong city. It’s a niche because you may not have heard of spatiotemporal databases. Let me do a survey here. Do you know which manufacturers are doing spatiotemporal databases? (only one or two people raised their hands at the scene) in fact, some GIS manufacturers are doing spatiotemporal databases, and some Internet manufacturers are also doing spatiotemporal databases. It is universal because it can really solve many problems around us and is closely related to everyone’s life. Strictly speaking, just can’t be called spatiotemporal database at present, so we call it spatiotemporal data engine.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

My report will start from the following four aspects.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

With the development of 5g and IOT technology, massive spatiotemporal data are produced. In short, spatiotemporal data is the data with temporal and spatial attributes. These spatiotemporal data can be roughly divided into three categories. The first type is the map vector data we can see when we open the mobile map; the second type is the satellite remote sensing image data; the third type is the city perception data, including the GPS data on the car, the signaling data between the mobile phone and the base station, and our check-in data on social media. When you enter our conference center, you use Beijing health treasure to scan the QR code, which is actually a kind of check-in data. These spatiotemporal data can be applied to many urban applications. Due to time constraints, I will just give one example. In the prevention and control of the epidemic, if a new confirmed case is found in a certain place, we can find all the people who have been to the place during this period of time through the check-in data of Beijing health treasure, and focus on the investigation of these people, and implement protective measures to prevent the spread of the epidemic.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Spatiotemporal data has the following four characteristics

First, the volume of spatiotemporal data is very large. We all say that this is the era of big data, and in the real world, 80% of the data is related to geographical location. This requires our spatiotemporal data engine to have strong scalability.

Second, the data structure of spatiotemporal data is very complex. This is manifested in two aspects. 1) There are many types of spatiotemporal data, such as the Beijing International Conference Center, which exists in the form of points on the map; roads exist in the form of lines; and a small area exists in the form of surface on the map; there are also time series, such as air quality stations, which have a reading every hour; and even in the form of a network, which is better than For example, an Internet of vehicles can form an edge when two vehicles are very close to each other. 2) Spatiotemporal data is a high-dimensional data, it has at least three dimensions: time, longitude and latitude. This requires that our spatiotemporal data engine can support all kinds of spatiotemporal data.

Thirdly, the query mode of spatiotemporal data is very unique. Unlike many queries in relational databases, which use values as filtering criteria, the query mode of spatiotemporal data is usually spatial range query, for example, to find the cars within 1 km around Beijing International Conference Center in the past hour; or nearest neighbor query, for example, to find the nearest taxi to me. This requires our spatiotemporal data engine to have a special index structure.

Fourth, the update frequency of spatiotemporal data is very high. For example, GPS points may generate a new reading every two seconds, and mobile signaling also generates data continuously. This requires that our spatiotemporal data engine can access data in real time and support data update.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Now there are many spatiotemporal data platforms.

The first is the expansion of existing relational databases, such as PostGIS, Oracle spatial, MySQL spatial, etc., which can support the management and query of spatio-temporal data. But this kind of spatiotemporal data platform is designed for stand-alone version at first. When the data volume is large, for example, more than 1t, the system is often difficult to work, so they are faced with the problem of scalability.

In order to solve the problem of massive data, many distributed platforms have emerged, such as Hadoop, spark and HBase. However, these platforms have no spatio-temporal index. Without spatio-temporal index, if a kNN query is executed, such as finding the nearest taxi to me, the system will scan all the records, calculate the distance between each record and me, and then sort them according to the distance and return the K records closest to me. This will be very inefficient and face serious efficiency problems .

So we thought, can we build spatiotemporal index in these distributed platforms? There are mainly three types of Representatives. Spatial Hadoop builds spatiotemporal index on Hadoop, which can manage massive spatiotemporal data. However, we all know that even for a job, Hadoop will trigger multiple disk reads and writes, resulting in low efficiency. In order to solve the problem of Hadoop, spark tries to cache data into memory as much as possible. Geospark builds spatiotemporal index on spark. But in reality, memory resources are very valuable. Usually, the project may only have three machines, requiring you to manage massive spatio-temporal data. Therefore, geospark will also face the problem of scalability. The other is the distributed spatiotemporal data platform based on key value, which stores the data to disk and indexes the spatiotemporal data through some index components, such as geomesa. Our system just also belongs to this category. However, the original geomesa + NoSQL is difficult to use, and developers need to deeply understand its development manual in order to manage spatio-temporal data. On the other hand, geomesa + NoSQL also lacks some spatio-temporal analysis functions, so developers need to write many spatio-temporal analysis functions from scratch. Therefore, this kind of spatio-temporal data management system is facing the problem of ease of use.

The last kind of system is the visualization of spatiotemporal data. Now there are many front-end components, such as leaflet and mapbox, which can display spatio-temporal data on the map; there are many back-end components, such as GeoServer, which can publish some GIS services. But the visualization of spatiotemporal data is not only to show the data itself, but also to show the deep meaning of the data, which requires linkage with the underlying spatiotemporal analysis. For example, when we have more than 2000 vehicles, we first need to access the track data of each vehicle in real time, and then use map matching technology to determine which road each vehicle is on. When there is a lot of data on a certain road, we may need to aggregate them automatically, otherwise the front end is very easy to cause stuck. The existing GIS visualization platform lacks the function of linkage with the underlying algorithm, so it will face the problem of analysis and rendering.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

In order to solve the above problems, we propose the Jingdong City spatiotemporal data engine just, just is the acronym of the English name JD urban spatial temporal data engine. Just provides an integrated solution for spatiotemporal data storage, management, mining, analysis and service provision. The figure shows the basic framework of just. At the bottom is just-db, which can be understood as a database. It models, stores and indexes spatio-temporal data to efficiently support queries. Many spatio-temporal data mining algorithms are encapsulated in just-dm. Just-ts analyzes and visualizes temporal data. Just-gis analyzes and visualizes GIS data. Task management module on the right manages J The tasks of ust include real-time tasks and timing tasks; in order to provide better external services, we also have just service module; on the far right is the deployment of operation and maintenance monitoring module to ensure the stable operation of our system. Corresponding to the four shortcomings of the existing spatio-temporal data platform mentioned above, just platform has the characteristics of strong scalability, high efficiency, good usability, fast analysis and rendering, etc. Next, I’ll explain in detail how we did it.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

As mentioned above, the structure of spatio-temporal data is very complex and there are many kinds of data. If we build a separate table for each kind of spatio-temporal data and adopt an independent storage and analysis method, it will cause great design cost and difficult maintenance. Therefore, we will divide spatiotemporal data into point data and network data according to whether there is a relationship between spatiotemporal data. Furthermore, for point data and network data, they are divided into three types of data according to the dynamic and static characteristics of time and space. Therefore, they are divided into 6 data types of 2 × 3, and all spatiotemporal data can be modeled with one of them. For each kind of spatio-temporal data, we design the best index structure, encapsulate the complete analysis and mining algorithm, which greatly reduces our management cost of spatio-temporal data. This reflects the strong expansibility of just to spatiotemporal data types.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

On the other hand, just natively uses a variety of distributed frameworks. Here are several important frameworks. For example, we use HBase as the underlying storage engine, spark as the execution engine, and geomesa as the indexing tool.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

This is the technical framework diagram of just-db. It can be seen from the diagram that we also use many other distributed components and integrate them organically (Note: modules marked with question marks are planned modules). Using distributed components, just can support massive spatio-temporal data with strong scalability.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

We also designed a new storage mode for spatiotemporal data. Take trajectory data as an example. As mentioned earlier, we use HBase as the underlying storage. HBase is a key value database. The traditional method of using key value database to store tracks is that each GPS point is stored as a record, and a track is composed of many GPS points. From the morphological point of view, the track is stored vertically, which we call vertical storage. There are several disadvantages of vertical storage: first, each GPS point is stored as a record, and the number of records is the same as the number of GPS points, resulting in a large number of data entries; second, the GPS points of the track are scattered, so it is difficult to compress the track, which eventually results in a large space occupation, resulting in a slow query efficiency; third, for each record, it is difficult to compress the track In other words, a record does not represent the complete information of the track, which is not conducive to the subsequent track query and analysis. For example, if we want to query similar tracks, we need to consider the information of the whole track, not just a GPS point. In order to solve these three problems, we propose a new trajectory storage scheme, which is called horizontal storage scheme. As shown in the table on the right, we first segment the trajectory to get many sub trajectories, and then store all the GPS points of each sub trajectory in a grid in the key value database. The advantages of this method include: firstly, the number of entries recorded is no longer directly related to the number of GPS points, but is the same as the data of sub tracks, which greatly reduces the number of data entries; secondly, the GPS points of a sub track are stored together, which is convenient for us to compress the data and reduce the storage space of data; thirdly, each record contains complete track information, which is convenient We analyze and query the trajectory. Here, we need to note that in the key value database, the index refers to the design of the key, rather than the structure similar to B + tree in the relational database. We use the strategy of space for time. Each logical table stores multiple physical tables at the bottom. The values of each physical table are the same, but the keys are different, so as to efficiently support different types of queries. In addition, the key is related to the current record, but has nothing to do with other records. That is to say, when the track is inserted, we do not need to modify the existing records, which can efficiently support the insertion and update of the track.

Note that there is a column called signature, which we call trace signature. Generally, we use the minimum bounding box (MBR) of the trajectory to represent the position information of the trajectory. But it can cause some problems. As shown in the figure, the trajectory actually passes through a small part of its minimum bounding box, that is to say, the minimum bounding box can not completely represent its location information. In order to solve this problem, we further divide the minimum bounding box into n × n grids, corresponding to an n × N binary sequence. When at least one GPS point in the trajectory falls into the grid, the corresponding binary bit is set to 1, otherwise it is set to 0. In this way, we can more accurately represent the position information of the trajectory. More accurate location information also provides us with more query optimization space.

All in all, the new storage mode designed for trajectory takes less space, can better support all kinds of spatio-temporal queries, and has higher efficiency.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

In addition, we design a new spatiotemporal index structure for spatiotemporal data. As mentioned earlier, we use NoSQL’s key value database. Key value database can only store one-dimensional index, but our spatiotemporal data has at least three-dimensional attributes: longitude, latitude and time. This requires the transformation of three-dimensional information into one-dimensional information. In geomesa, we use the space filling curve, which is actually geohash. Suppose we have a latitude of 40.78 and a longitude of – 73.97. For latitude, its value ranges from – 90 to 90. We adopt a strategy similar to binary search. When we are on the left side of the search space, we get a 0; when we are on the right side of the search space, we get a 1. Until we reach a specific level, we stop. In this way, 40.78 is transformed into a binary sequence: 101. Similarly, we perform a similar operation on longitude-73.97 and get another binary sequence: 010. We then cross encode the two binary sequences, thus transforming a two-dimensional information into one bit. What if there is a time dimension? Because time is infinitely extended, we first divide time into multiple time intervals, which we can call a time bucket, such as one day. For the time in each bucket, we use the same coding strategy. For example, at 10 o’clock, we continuously conduct binary search from 0 to 24 hours to get a sequence 011. Finally, the binary codes of time, latitude and longitude are crossed to get a one-dimensional binary code. If we want to find a GPS point track between 1 and 2 points passing through a range of 1 km by 1 km, we are likely to get such a key range, and then scan the key range in the key value database. We notice that the scope of this key is very large, covering most of the data. The reason is that time scale is different from space scale. In this example, the ratio of one hour span to one year (assuming that the length of the time bucket is one year) is much larger than the ratio of one kilometer multiplied by one kilometer range to the earth’s surface area, which will cause the space filtering effect to fail!

In order to solve this problem, we propose some new indexing methods. Instead of coding time and space in a unified way, we first divide time into buckets, and code space separately in each bucket. When querying, we first find the corresponding time regions, and then generate a space coding range in each time region. With this change, we can speed up the time-space query from more than 30 seconds to less than 5 seconds.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

As mentioned earlier, we use spark as our execution engine. The traditional way of using spark is that the client initiates a request to the server, and then the server initiates a resource application to the horn cluster. After receiving the resource application, the horn cluster will request the resource manager and start the app Only after a series of processes such as master, apply for container, start executor, and distribute tasks, can a sparkcontext be created to handle users’ queries. We notice here that requesting resources is very time-consuming. To solve this problem, just creates two sparkcontexts on the horn cluster in advance and manages them with sparkjob server. When a user initiates a request, we directly select one of the sparkcontext to process it. The purpose of using two sparkcontexts is to ensure the high availability of the system. When one of the sparkcontexts crashes, the other can respond to the user’s request in time. In short, our online active sparkcontext mode can reduce the application time of resources, further accelerate the query efficiency, and improve the stability of the system.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

We have done a lot of experiments with real trajectory data sets. The above two figures are about the comparison of storage performance. The comparison method is the vertical track storage method. As can be seen from the figure, for 136gb of original track data, we only spent 30GB of storage space. Compared with the vertical storage method, our space utilization rate has increased by 85%, and the efficiency of storage index has increased by more than 7 times. The following two pictures are about the comparison experiment of trajectory space range query and kNN query. The comparison method is also two very advanced trajectory management systems based on spark in the industry. Because spark will try to store the data in memory, in our experimental environment (5 machines), their efficiency is far lower than our method. It is noted that when the trajectory data is larger than 100GB, the comparison method is better The method crashes directly, but our method still supports it well. This fully shows that our system has high efficiency and strong scalability! Our paper has also been successfully received by ICDE 2020, the top international database conference. You can search and read it if you are interested.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

In the front, we mainly introduce our efforts of just for strong scalability and high efficiency. In order to make just easier to use, we encapsulate many out of the box spatiotemporal data mining algorithms. Including: trajectory data mining algorithm, road network data mining algorithm, and other data mining algorithms. At present, we are still improving more algorithms. All these algorithms, we exposed a lot of parameter interfaces, developers can specify different parameters according to different business requirements.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Another effort for ease of use is that we provide a convenient interface and interaction mode. At present, different developers may use different development languages. For example, colleagues doing back-end service development may use Java for development, and colleagues doing data analysis may use Python for development. But the Putonghua of all these colleagues is SQL. Just provides an interactive way of SQL, which can reduce everyone’s learning costs. In addition, SQL defines a unified input and output format for us, which also facilitates developers to freely combine our functions. For example, whether to filter the track first and then segment the track, or to segment the track first and then filter the track, can be freely specified by developers, which improves the flexibility of our system. Our goal is that all our operations, including data query, data definition, data processing and data analysis, can be implemented with a simple SQL statement. We have implemented a complete set of SQL optimizer, which can realize the filtering, projection and other predicate push down, constant calculation, and query rewriting. In order to enable developers to use our just as they use MySQL data, we also implement a JDBC compliant DB driver. This greatly improves the usability of our system. The papers of our system have also been successfully received by ICDE 2020, the top conference of this year’s database.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

We find that at present, when providing just service for the outside, most of the back-end developers only forward the front-end requests to just-db, but they need to deploy a back-end service separately, and at the same time, they need to consider authentication, current limiting, caching, logging and other functions, resulting in heavy workload of back-end development engineers, easy to make mistakes, and high maintenance costs. In order to reduce the workload of back-end development, we use the just service module to provide configurable API services. Users only need to configure some parameters in the system to provide API service interface to the outside world, which truly achieves zero code. At the same time, just service module also provides unified authentication, traffic restriction, automatic caching, log monitoring and other functions, without the need for back-end development engineers to develop independently. These further improve the usability of just system.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

The management of spatiotemporal data cannot be separated from the visualization analysis of spatiotemporal data. Traditional GIS Engine provides the ability of map visualization. However, the traditional GIS engine mainly focuses on the static and dynamic spatial data. For example, the road network and POI data are generally updated quarterly or semi annually, and the amount of data is not very large, generally hundreds of gigabytes. Now it has entered the era of 5g and IOT, and TB data will be generated every minute. The traditional GIS engine is difficult to access these data efficiently, and the visualization effect is also insufficient. As shown in the figure, when our data points exceed 5000, the front-end rendering using leaflet technology will cause obvious stuck, and even the browser may crash. In addition, the existing GIS engine is difficult to interact with the underlying analysis algorithm, and our data visualization is not simply to show what data is, but to show the meaning behind the data.

To this end, we provide a just-gis module, which is mainly oriented to the spatio-temporal data with dynamic as the main and static as the auxiliary. With the help of the bottom just-db module, we can access massive spatio-temporal data in real time and quickly; with the help of just-dm module, we can interact with some analysis algorithms. We also implement some coding and compression strategies for distributed spatio-temporal data, which can speed up rendering. The middle figure shows the display effect of 160000 urban parts using our just-gis engine. We can see that our engine can smoothly enlarge, shrink and drag the map. The figure on the right is an actual case. We access the GPS track data of vehicles in the city in real time, and carry out map matching operation on the track data to analyze the speed and traffic flow information of each road. At the same time, we also cluster GPS points in real time to reduce the rendering pressure of the front end.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

This is the basic functional framework of our just-gis. The intermediate basic service and application service are just-gis-server, which encapsulates various GIS analysis functions and can publish GIS services. Just studio and just-gis GL JS are front-end display modules and SDK packages.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

This is our just studio user interface. It can visually edit all the elements on the map, such as the color and thickness of a road. Through just studio, users can customize their own map styles. In the future, just studio will become a visual output window for all just capabilities. Users don’t even need to write SQL statements. Through it, we can directly call our underlying capabilities and generate the upper level solutions as you see.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Next, we introduce several actual landing application cases based on just.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Affected by the epidemic, the China database technology conference was postponed from May to August, and then to December. With the unremitting efforts of our government departments, the epidemic situation in our country has been basically controlled, and now we can get together here. But the epidemic situation abroad is still not optimistic. Early detection is still the most effective way to prevent and control the epidemic before the vaccine comes out. Traditional means to find the close contact crowd mainly rely on people’s memory and offline manual investigation. This method is very slow, error prone, and will miss a lot of close contact crowd. Once a potential infected person is found a day later, he will infect more people. We have to use new methods to quickly find people who are in close contact with confirmed cases. Human trajectory information can reflect the contact information between people, so we can quickly find those potentially susceptible people by analyzing the trajectory data. On the right is our system framework. Our system is deployed in multiple departments, and has real-time access to a variety of spatiotemporal data. Using our preset spatiotemporal processing algorithms, we have built an effective spatiotemporal index to efficiently support the query and analysis of related people. 20 days before the novel coronavirus pneumonia, our system helped Beijing find more than 500 close contacts with high risk. It helped Suqian find 1/4 confirmed new crown pneumonia patients in the city, and helped the 18 provinces and municipalities in Guangzhou, Nanjing and Chengdu to conduct a situation analysis of high-risk groups, and it could excavate the easy moving group from the mass of track data in seconds. Related papers have been published on arXiv, you can download and read if you are interested.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

The second example is just’s contribution to the regulation of hazardous chemicals. As you may remember, on June 13 this year, a serious explosion of dangerous chemicals occurred in Wenling, Zhejiang Province, killing more than 20 people and injuring more than 170 people. Therefore, the supervision of hazardous chemicals is related to the safety of people’s lives and property, and has received great attention from government departments. At present, we just mainly do two things in the supervision of dangerous chemicals: 1) whether the dangerous chemicals vehicles drive according to the reported route? Because if dangerous chemical vehicles drive into residential areas, it will cause serious security threats; 2) are there any illegal chemical plants? For example, some residents may vacate their own houses to store some chemical products, which we call “black chemical”. In addition, some chemical plants that do not meet the requirements may produce secretly after being shut down by the government. The supervision of hazardous chemicals is very difficult. Taking Nantong as an example, the supervision of hazardous chemicals involves six links and seven elements, which requires the cooperation of nine different government departments and the linkage of 12 software systems. Nantong has more than 2000 hazardous chemicals enterprises, more than 3000 hazardous chemicals vehicles and more than 1000 ships. There are only more than 20 front-line hazardous chemicals supervisors, if only through manual arrangement It’s very difficult to check. We just deployed to Nantong to solve the above two problems. Our system has real-time access to the trajectory data of hazardous chemical vehicles. To solve the first problem, when a hazardous chemical vehicle deviates from the original reported route, our system will sound the alarm in real time; at the same time, we can quickly analyze the area that the hazardous chemical vehicle can reach in the next 15 minutes under the current traffic conditions, and combined with the road network information, we recommend the traffic police department to set the location of the roadblock to intercept the vehicle and prevent it from entering the residents It is a threat to the security of the society. In view of the second problem, we analyze the GPS point trajectory information of hazardous chemical vehicles, and find out the places where they often stay. Combined with the POI distribution information, if we find that there are no hazardous chemical factories or closed hazardous chemical factories around some stopping places, then there are likely to be illegal chemical factories in these stopping places. We recommend these locations to the regulatory authorities, and they check them on the spot. In addition, we can analyze some high-risk areas through the driving path and stopping place of hazardous chemicals, tell residents not to go to those areas with high risks, and remind government departments to pay special attention to those areas with high risks.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

This is the whole process management platform for hazardous chemicals we deployed in Nantong. The video on the right shows a real case of a “black chemical” enterprise found through our system. Our system greatly reduces the workload of ground investigation of the staff of the Commission and office Bureau, and improves their human efficiency. During the trial operation of several months, our system quickly detected 410 abnormal driving behaviors of dangerous chemical vehicles, and accurately matched 64 illegal small chemical companies.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

The last case is to use just’s ability to restore the road network of the community. At present, all map manufacturers have complete road network data on their main roads, because the information collection on the main roads can be easily collected by cars, but their road data on the internal areas are often incomplete, because some areas do not allow external vehicles to enter, so manual collection is very time-consuming and labor-consuming. However, the road network data inside the community is very important for express delivery and takeout scenarios, which can better help the system to schedule the couriers and takeout boys. We have tens of thousands of express brothers in Jingdong, whose footprints are all over the streets of China, and every courier has a PDA handheld device, which generates a GPS point every two seconds. Lu Xun really said that there was no road in the world, and if there were more people walking, it would become a road. So, we use the GPS trajectory information of express brother to recover the fine-grained road network inside the community, and whether each road can drive or walk, and then infer the travel time of each road.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

This is the basic framework of our system. Just platform access to the mass courier’s track information and coarse-grained road network information, and then denoise, segment and map match the track information to provide efficient space-time query to the upper model. We have trained some deep learning models, which can repair the road network well. Related work has also been successfully accepted by AAAI, the world’s top Artificial Intelligence Conference. If you are interested, you can have a look.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Our system is also successfully implemented in more smart city projects, including xiong’an New District data platform, Jiangsu Yuanbo smart Yuanbo project, Nantong Xueliang project, Guanghan National Agricultural Industrial Park project, Zhanyi Jindun Project, Nantong city governance modernization project, etc. as you can see, our Zhanyi Jindun has been reported by CCTV-1. In the future, just will be implemented in more projects to help the country solve problems and create value for the society.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Finally, it introduces the academic achievements based on our just.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

We have always asked ourselves to do “stand up to heaven” things, in addition to solving practical problems, but also actively precipitation theoretical knowledge. In the past less than three years, more than ten technologies of our just have passed the evaluation of top international conferences or journals. Our just has also created a history. It is the first team in the world to win the ten-year influence award in the field of international spatiotemporal data for two consecutive years. So far, we have provided technical services for public security bureaus of 25 provinces and received their thank-you letters. Based on just, we have submitted more than 30 invention patents and obtained the authoritative certification of the Ministry of public security. So far, we have passed six soft works.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Our goal is to be the best spatiotemporal data management and analysis platform in the world. You can pay attention to our official account, or you can access my personal homepage through two dimensional code. The PPT can be downloaded through the official account. My report is over. Thank you again!

Welcome to clickJingdong Zhilian cloud, learn about the developer community

More wonderful technical practice and exclusive dry goods analysis

Welcome to the official account of Jingdong developer cloud.

Jingdong City spatiotemporal data engine just appeared in China Database Technology Conference

Recommended Today

Introduce regular expressions in ruby in detail

A regular expression is a special sequence of characters that matches or finds other strings or collections of strings by using patterns with special syntax. grammar A regular expression is literally a pattern between slashes or any separator after% R, as follows: ? 1 2 3 4 5 6 7 8 9 10 11 12 […]