Chen Peixin, participated in the research and development of Guosen Securities basic platform (Devops, micro service governance, serverless)
Guosen Securities is a large-scale comprehensive securities company nationwide, with 57 branches and 185 business departments in 118 cities and regions. According to the data released by China Securities Association, Guosen Securities ranks in the forefront of the industry in terms of its total assets, net assets, net capital, operating income, net profit and other core indicators in recent years.
Guosen Securities began to contact tidb in June 2020. It took about half a year from the pre research of technology to the first business launch. The first online business is the golden sun bill, which is successively applied in data center station, service observation and other systems. From the deployment of tidb only in Dongguan mainframe room to the deployment of tidb multi machine rooms in September 2021, and the pilot work of domestic haiguang x86 server has been started. Guosen Securities has accumulated rich practical experience in the exploration and application of open source newsql database. At present,Guosen Securities has seven tidb clusters, with 109 nodes and a maximum table of 10 billion, supporting custody, brokerage, proprietary and other businesses.
From 0 to 1, Guoxin golden sun introduced tidb
Guosen golden sun provides services related to securities trading, financial management and information. The main function of our securities software is trading. When trading, we will pay more attention to the rate of return and when to buy and sell stocks. At present, the number of users of Guoxin golden sun is about 10 million. Guoxin has a big data platform that stores the transaction data of all users over the years. The bill stock data is more than 10 billion, the stock clearing data is about 1 billion, and the whole bill and stock clearing incremental data is about 2 billion per year.
The figure below shows the general service structure of Guoxin golden sun. The golden sun app is directly connected to the golden sun back-end, and the back-end architecture is built based on the grpc micro service framework developed by Guoxin. Like most of the back-end trading systems and clearing systems, there are various back-end trading systems, such as golden sun. All data will eventually be pushed to the data center, which will run batches every day or every week. After running the batch, the relevant data will be pushed to the front-end database, and the app side will query the database through the query service to obtain the relevant bill and stock clearing data.
Figure: Guoxin golden sun data service architecture
We can call it the three versions of the slash and burn bill. In version 1.0, only one-year bill query is supported, and single database and single table SQL server is used. How is it realized? First, the data center will synchronize the data into a temporary table every day, and then convert the data. The converted data of the temporary table will be added to the formal table, which stores the data volume of one year. In the case of single database and single table, all the data of a year are compressed into a column through an intermediate program. There is a JSON string, which stores the data of 365 days a year. What if new data comes in the new day? The background will set up a scheduled task to push the data pushed from the new day into this JSON, push it back for 365 days, clear the data of the previous earliest day, and finally write it into the formal table. As you can see, this is actually a very primitive implementation. Why use JSON? Because we have more than 10 million data users. If we save one row every day, we can’t hold it with a single database and single table. So at the beginning, we used a very clever way to merge more than 360 rows into one row. In this way, the data volume of the whole table is nearly more than 10 million, and the query efficiency is OK.
Figure: implementation method of bill 1.0 single warehouse and single table
The problem with this method is: in business, users want to query data for a longer time, such as five years. If using a single table, this demand is difficult to meet. Technically, data query and subsequent update are under great pressure and difficult to expand. Sometimes there are errors in data update. When users query the next day, the data found is inaccurate. In order to deal with these business and technical difficulties, Guoxin used sharding JDBC to divide the database and table in bill version 2.0. When introducing this technology, we also heard about tidb. ConsideringSecurities business requires high stabilityAt that time, I was worried about the stability of tidb, so I chose sharding JDBC. After dividing the database and tables, it can support the query of 5-year bills. 16 MySQL are used, with a total of 512 tables. How does the data center synchronize with the sub database and sub table? As before, the data center writes the data into the temporary table first. The conversion service will configure the rules of database and table division, take the data from the temporary table, and finally write it into the formal table. There is an ETL tool in the data center, but it does not support extension, so it is not written directly to the formal table.
Figure: implementation method of bill 2.0 by warehouse and table
After running for about two years, we found new problems. Although the sub database and sub table can meet the business needs, there are great constraints on scalability. These constraints include: first, it is difficult to expand the fields. We divided 512 tables. If we want to have new business, we need to add a new field. At this time, DBA will be very painful and need to add fields to each sub table. Second, capacity expansion is extremely troublesome. If the data is not estimated accurately at the beginning, the rules of database and table must be changed. From 512 tables at the beginning to multiply by 2 to more than 1000 tables. The work of DBA migration is very complicated and easy to make mistakes. Third, synchronization also requires an intermediate table, so the time of data synchronization is still the same slow, and restricts the online time of the system. Fourth, the regular creation and cleaning of sub tables are also cumbersome. Some daily tables will be deleted every day, such as the table five years ago, and then the table for the next day will be created. When developing, we always use this timer to clean and create. Fifth, in terms of operation and maintenance, it is also necessary to operate and maintain multiple databases.
The sub database and sub table system brings additional burden to the development and operation and maintenance. Therefore, we introduce tidb into Guoxin and use the newsql database to support the 5-year bill query, solve the problems caused by sub database and sub table, and the bill has entered the era of 3.0. Under this architecture, the data center directly pushes the data into the formal table every day. The query service directly queries tidb and adds cache on the query service. After the introduction of tidb, the efficiency of daily synchronous warehousing is about 70% higher than before. At present, tidb has been deployed in the computer rooms of Guoxin three places. At the same time, it is also doing a pilot of domestic haiguang x86 server in Dongguan computer room recently.
Figure 3.0 implementation of tidb distributed database
Next, let’s talk about the experience of using tidb in the past year. From the perspective of development, the first is to delete a large amount of data. At the beginning, I didn’t have experience, but I still follow the old routine. For example, to delete the data of a specified day, I directly delete SQL where = “one day”. At that time, it was Saturday. The operation and maintenance alarm showed that tidb machines hung up one by one. After investigation, it was found that the amount of data involved in delete SQL was too large. Subsequently, the transaction size is adjusted to 10g, and the memory of tidb machine is expanded to 64g, which is an expansion at the system level; On the other hand, we also make corresponding transformation on the application side to delete in batches. In case of big data deletion, you can consider using range partition table and directly truncate or drop partition.
The second experience is to try to use auto-random as the primary key for the new tidb business. For the continuous and large number of insertion scenarios, the hot spots of insertion can be avoided in most cases. For multi machine room data synchronization, tidb needs a primary key or unique index. No primary key or unique index will cause oom of the synchronization program. When the table has a large amount of data, if you want to add this primary key, the whole process will be more troublesome.
Implementation of High Availability Disaster Recovery architecture in three places
At the beginning, tidb was only deployed in Guoxin Dongguan host room as a pilot. Subsequent operation and maintenance requires tidb to do disaster recovery deployment related work, and the application should realize high availability and multi activity in three places. Previously, the application of each computer room was to access its own local tidb. Disaster recovery drills will be conducted every quarter to verify the availability of disaster recovery in different places of Shanghai and Futian in the same city after the failure of the whole host room in Dongguan.
Pingcap’s teacher gave three plans at the beginning. The first scheme is the simplest and straightforward. A separate tidb cluster is deployed in all three machine rooms. Dongguan machine room reads and writes, and uses ticdc or binlog to synchronize the corresponding data to the disaster recovery cluster of the other two machine rooms. The advantage of this scheme is that it is relatively simple. During the disaster recovery drill, if the host room in Dongguan hangs up, the applications of the other two computer rooms basically do not need to do any operation, but can still be used. The problem with this scheme is that the number of copies is relatively large. There need to be 9 copies, and the synchronization delay may be a little larger.
The second scheme is classicTwo places and three centers, this scheme has high requirements for the network, and if the Dongguan computer room hangs, Futian computer room needs to be manually restored. The third isTwo centers in the same city, Take Dongguan and Futian as a cluster and synchronize the data to the Shanghai disaster recovery cluster, that is, the two clusters. However, the recovery of this scheme during disaster recovery drill will also be more complex.
Figure: comparison of multi machine room schemes
After the comparison of the three schemes, Guoxin finally adopted the simplest binlog synchronization scheme. A tidb cluster is deployed in each computer room. Of course, this is also realized according to the business characteristics. Guoxin’s business basically uses query, and there will be no multiple computer rooms to write at the same time, so this simplest method is adopted in the end. During the implementation of multi machine room deployment, we did some migration and import work: at first, tidb was only deployed in Dongguan machine room, because we were not familiar with the use of tidb, and some business tables did not have a primary key or unique index. After building a new tidb cluster in Futian computer room, we found that when the two clusters are synchronized, the synchronizer will directly oom, because there is no primary key or unique index. At that time, the largest table had reached more than 6 billion. It was actually impossible to add a primary key or unique index directly to the table.
How did we do that? First, use dumping to export the table to a CSV file. The CSV file is named with the name of the table. After the export is completed, create a new table on the original database, add a primary key or unique index, rename the two exported CSV files the same as the new table, then import the data into the new table through lightning, and finally rename the old table and the new table, Name the new table as the formal table and rename the formal table as the backup table. In this way, the impact on the business can be minimized. In the process of import and export, users are basically insensitive.Figure: table processing without primary key
Service observable exploration
Finally, let’s talk about the golden sunService observableExploration of. After the application uses the micro service architecture, there will be many deployed nodes, and the whole process of the call chain is also very complex. At this time, it will be very complex to locate a problem. According to the popular concept of “service observability” in the industry, we have made a problem location that can be used to assist development. This “service observable application” mainly includes three parts: one is the log, the second is the indicator, and the last is the tracking chain. We enhanced the log part, transformed the request and response logs of the system into tidb through ETL tool, and then did visualization related work.
Figure: observability of golden sun service
visualizationPart is the demand that the development has been raising. The collected logs are generally viewed in kibana of elk. They are all some texts, which is very non intuitive. The optimization we do is to import the request and response logs of each micro service into tidb and show them with the “service observable” application. If the customer has any problems, enter the customer’s mobile phone number to intuitively see what the customer has done in a certain period of time, so that the problem can be located quickly. At the same time, we also visualize the content and response, which looks more convenient. At that time, there were also some problems in tidb warehousing, because this business feature is different from the bill. The bill is basically inserted every night, and users do query operations during the day. However, the log visualization will continue to do a large number of inserts during the opening period (from 9 a.m. to more than 3 p.m.), and there are few query operations. When there are problems, they go up to query. This is an operation and maintenance related system, so it doesn’t use a good disk. After the system goes online, it is found that the whole tidb becomes very stuck. At first, we thought there was a problem with the inserted program or query program. We did a lot of optimization and found that it still can’t work. Finally, after upgrading the disk, we found that the whole performance has been directly improved. Our experience is that if you go to tidb, you must choose a good disk to ensure processing efficiency.