The original technology of tdsql is not technology for technology, but innovation to solve business problems based on business needs.
Tencent’s billing service system is the world’s leading financial cloud billing service system. The system includes SaaS, PAAS and IAAs. At the SaaS level, it includes Mi master, cloud store, tdsql and other systems.
There are nearly 28 billion tdsql escrow accounts. Master Mi relies on tdsql for financial transactions. The daily flow of Tencent recharge and its related partners exceeds 15 billion, and the daily transaction volume is more than 10 billion. Financial data is used for settlement, reconciliation, audit, risk control data analysis, user portrait construction and other businesses in tdsql database. Such as the reconciliation business of kingglory game coupons, user account consumption recharge change audit and risk control business.
There are two data sources for business such as reconciliation and audit.Some data sources come from log data of different systems (relational database or NoSQL system), which is called flow log. However, the daily log flow data of such a system is nearly 100 g, and from the trend, the incremental data increases rapidly. In addition, some data is tabulated by time in tdsql. After a period of time, the data tabulated by time should be reconciled and calculated by using the daily log.
Reconciliation is mainly used to solve several exceptions:
- The system has a bug or does not show the expected situation in case of failure. This may lead to successful delivery, unsuccessful deduction, or successful deduction but no delivery. Such as recharge and delivery of Tencent video VIP management system.
- Avoid hacker / internal risk. For example, illegal personnel bypass the business system to recharge themselves and other fraudulent acts.
There are many types of such reconciliation businesses, and different applications have different log flow formats. The accounts hosted by tdsql need to regularly check the data consistency of thousands of multi-level businesses and accounts.
From a technical point of view, there are four problems:
- Complex application development: using business logs requires the business system to continuously generate log information, and then consume computing resources to analyze different log formats and store the log information in the analysis system. This brings the burden of development and waste of resources.
- Data logic separation: tdsql is divided into tables by time. Settlement can only be carried out according to the determined time period, and cannot be calculated flexibly and conveniently. If the data in any time period is calculated, the sub table by time period physically separates the logical continuity of data by time, and several specific sub tables need to be specified for calculation.
- Loss of real-time characteristics: the above two problems implicitly mean that the calculated data needs to be imported into a new analysis system for calculation. The process of exporting / importing data also consumes resources and time, making it difficult for the analysis system to have real-time calculation characteristics.
- Complex data management: in addition, log and other information are historical data and need to be saved for a long time. Tencent generates, stores, parses and manages more than 15 billion flow logs in different formats every day, which has become a huge challenge.
Modern database systems only retain the current value of data, and historical data are discarded due to storage costs and other reasons. As an important asset, data, whether current data or data that once existed in history, has important value. Therefore, historical data storage, analysis, mining and repeated use are the needs of current Internet and other enterprises. Especially for financial historical data, because it is safe and needs to be calculated many times, there is an increasing demand for data with temporal attributes to be managed in Tencent’s billing business.
For the above reasons, Tencent has developed a temporal database t-tdsql based on tdsql relational database. The database system uniformly manages a large amount of full temporal data and current data, and solves the problems in the above four businesses.
The solution of business pain points is based on the in-depth analysis and thinking of the characteristics of the database and business scenarios.
Because the data is valuable, the tdsql team believes that historical data is valuable. This is the core value of tdsql temporal database t-tdsql.Therefore, we give tdsql a new understanding of data.
Tdsql believes that:
The status attribute of the data, which identifies the life cycle track of the data. The life cycle of data is divided into three stages. Each stage depicts different state attributes of data to identify the state in the life cycle trajectory of data.
- Current state: the data of the latest version of the data item, which is the data in the current stage. The status of data in the current stage is called current status.
- Historical state: a state in the history of a data item. Its value is the old value, not the current value. The status of data in the historical stage is called historical status. There can be multiple historical states of a data item, reflecting the process of data state change. Data in historical status can only be read and cannot be modified or deleted.
- Transitional state: it is neither the latest version nor the historical version of the data item. It is in the process of changing from the current state to the historical state. The data in the transition state is called half decay data.
These three states cover the life cycle of a data item and are collectively called full state or full state data. Under mvcc mechanism, three states of data exist; In the non mvcc mechanism, data only exists in historical state and current state.
- Current state: under mvcc or blocking concurrent access control mechanism, the new value of data after transaction submission is in the current state.
- Historical state: under mvcc mechanism, the data generated by the transaction before the smallest transaction in the current active transaction list is in historical state. Under the blocking concurrent access control mechanism, after the transaction is committed, the value of the data before submission becomes the value in historical state, that is, the old value of the data item is in historical state.
- Transition state: under the mvcc mechanism, there are still active transactions (non latest related transactions) in use on the read version. Because the latest related transaction modifies the value of the data item, its latest value has been in a current state, and the read value has been in a historical state relative to the current state, so its data state is between the current state and the historical state, so it is called transition state.
The dual temporal attributes of data are effective time attribute and transaction time attribute respectively.
The valid time attribute represents the time attribute of the object represented by the data. For example, Kate’s middle school starts and ends from September 1, 2000 to July 30, 2003, while the university starts and ends from September 1, 2003 to July 30, 2007. The time here is the effective time.
The transaction time attribute represents the time and occurrence time of a state of data. Data has its temporal properties, that is, when and what operations the database system has carried out. An operation is encapsulated as a transaction in the database system, and the transaction is atomic. Therefore, we use the transaction flag to identify the transaction temporal attribute of a data.
Formally, the effective time attribute and transaction time attribute are represented by common user-defined fields in the data model, but described by specific keywords for constraint checking and assignment by the database engine.
The tdsql team expects to build a database system to solve the above problems. The features that the new system should provide are as follows:
Therefore, the t-tdsql temporal database based on tdsql has the following characteristics, which can cover the four major aspects of dual temporal data application, data security, data analysis and simplified application development: