Application case of tsutsuga: big data judicial query platform


1. Preface

When the public security organs, procuratorial organs and law enforcement organs need to inquire about the bank deposits of enterprises, institutions, institutions and organizations or consult the accounting vouchers, account books, statements and other archives related to the case, the banks shall actively cooperate. When inquiring or consulting, the people’s court shall issue a formal official letter to the bank, and the bank president (director) shall appoint a specific business department to be responsible for providing the relevant information and materials and sending special personnel to receive the information. The searcher may transcribe, copy or photograph the required materials, but not borrow them. The people’s court shall keep the information provided by the bank confidential.

The investigation organs of the people’s Procuratorate in handling job-related crimes, especially corruption and bribery cases, are the important ways to obtain clues and evidence to inquire about the suspect’s account and the evidence to the commercial bank. This is also a legal obligation of commercial banks. However, in the actual operation process, when the investigation organs go to the commercial banks to carry out the inquiry work, the query efficiency is low because of the difficulties in historical data query.

For historical data, for data over three to five years, the bank will archive the data to tape library or optical disk library by offline storage. When the investigation organ makes a judicial inquiry request to the bank, the bank staff need to export the offline data in the library into online data for inquiry. The export of database data is a very time-consuming and labor-consuming process, which leads to the slow progress of judicial inquiry.

2. Challenges
Because the judicial inquiry needs to view all transaction flow of the bank account of the company or individual, the bank needs to provide all historical data for inquiry. In view of such needs, banks should arrange relevant system staff to export offline data for query. In this demand environment, banks need an effective solution to liberate the bank staff from the heavy derivative operation.

The data of judicial inquiry is the historical data stored by banks for decades, and it will involve many business systems, such as core system, credit card system and online banking system. Therefore, its data has the following characteristics: large amount of data, numerous business systems and replacement of old and new systems. In view of the above characteristics, the solution needs to address the following requirements:

Offline data onlineThe key point of the whole solution is to eliminate the offline data export in judicial inquiry, and the most effective solution is to make the offline data online. After the offline data is online, the judicial inquiry only queries the online data to the corresponding inspection department. Because the judicial inquiry is not as frequent as the core transaction query, large and medium-sized or even small computers can not be used as the hardware storage platform for data online.

Unified data management of all business systems: because the judicial inquiry involves many business systems, it is necessary to query the data in various business system platforms. This kind of query method brings great labor cost. Offline data online needs to manage the data of each business system in a unified way, and the subsequent judicial inquiry can query all relevant data only in one platform.
Data integration of new and old systems: various systems of banks have been upgraded many times in the whole history, which leads to great differences in data storage design between old and new systems. In order to provide efficient and convenient judicial inquiry, data integration between the old and new systems is also essential.

Provide efficient data queryThe efficiency of data query should be guaranteed while offline data is online. Only when both of them are achieved can judicial inquiry really get rid of the situation of inefficient inquiry.

3. Solutions
Judicial inquiry platform can be divided into data acquisition layer, data storage and processing layer and data application layer from bottom to top. Data storage and processing layer is the core of judicial query platform, which is mainly based on sequoiadb distributed database and spark memory analysis framework. Based on this architecture, the offline data of judicial inquiry can be online and real-time.

3.1 data acquisition layer
The main function of the data acquisition layer is to provide the business system data needed by judicial inquiry for the data storage and processing layer. The ODS data retrieval platform collects the historical data prepared by the old and new core, old and new credit card and online banking business systems, unifies the collected data format, and then provides it to the data storage and processing platform through SFTP, FTP and CD network transmission.

3.2 data storage and processing layer
The main work of data storage and processing layer is to complete the unified storage and processing of judicial query data. The data transferred from data acquisition layer to data storage and processing layer is mainly divided into stock data and incremental data. According to these two kinds of data, the storage and processing layer constructed by sequoiadb + spark completes the data planning, warehousing and processing.

Stock data storageInventory data refers to the data that has been put into storage as of a certain point in time. It is mainly used as initialization data of various business systems for storage. Because the amount of historical data stored in judicial inquiry is relatively large, the inventory data will be planned according to the system category, data category (flow and non flow) and data volume before storage. The domain of sequoiadb database can complete data planning according to the system category. For example, domain1 is used for old and new cores, and domain2 is used for old and new credit cards. Sequoiadb’s data horizontal segmentation mechanism and time order model can store data orderly and efficiently according to data category and data volume. For example, time sequence model can be used to store data according to customer transaction date. After the data planning is completed, the operator uses the sequoiadb import tool to import the system data into the sequoiadb database.

Data model denormalizationDue to the replacement of the old and new systems and the historical design of the old systems, there are great differences in the data table structure of the old and new systems in the same system, and the old system data does not use data query when storing a large amount of historical data. As we all know, the difficulty of historical data query lies in multi table join query of data scale. In order to realize the unification of the old and new system data and efficient and fast query, the storage and processing layer needs to process the stock data according to the judicial query demand. Data processing uses spark analysis framework to process the data stored in sequoiadb according to the unified planning of the new and old system structure, such as flattening all the data into pipelined and non pipelined tables.

Incremental data synchronization: incremental data refers to the data that changes every day after the deadline of stock data, such as the customers added by the new core every day and the daily transaction flow data. The data stored in sequoiadb database should be consistent with the T-2 data of online transaction system (such as new core and new credit card).

3.3 data application layer
After the offline data is online, the application of data is not limited to judicial query (i.e. public security and law query), but also can be used for historical data customized query and administrator query. Because of the low frequency of judicial inquiry, online data is not used in most of the time. On the premise of not affecting the judicial inquiry, the value of online storage data should be brought into play, such as bank outlets’ query of historical data and bank administrator’s query. The data application layer uses sequoiadb API, sequoiadb SQL and sparksql to obtain data from the data storage and processing layer, and displays the obtained data on the web front-end page.

4. Project results
Low cost online storage of offline dataThe sequoiadb database adopts a distributed architecture, which only needs an ordinary x86 PC server to store massive data efficiently. Due to the characteristics of offline, massive, decentralized and low-frequency query, the low-cost online storage architecture makes it possible for offline data to be online.

Business system data unified platform managementJudicial inquiry involves multiple business systems, so it is very important to plan, store and manage the data of multiple business systems. The domain function of sequoiadb and the effective management of metadata information well realize the unified storage and management of multi system data.

Real time query of historical data: the judicial query data is stored in the sequoiadb distributed database, and the historical data can be queried in real time. Sequoiadb distributed storage + multiple index mechanism achieves the result of a judicial query request task in seconds.

Download the latest version of sequoiadb database 2.6

Sequoiadb database technology blog

Sequoiadb Tsuga database community