A friend received a request from the big data platform to write data in 2 billion +, which needs to be quickly loaded into MySQL for business display the next day.
2、 Implement reanalysis
For a single table of 2 billion, MySQL operation and maintenance involves relatively little and has little experience. However, for InnoDB single table insert, if the memory is larger than the data, it can be maintained at 100000-150000 lines. But most of the time, the data we accept exceeds the memory. Here, we use xelabs tokudb for a test.
3、 Introduction to xelabs tokudb
Optimization relative to official tokudb:
- Built in jemalloc memory allocation;
- Introduce more built-in tokudb performance indicators;
- Support xtrabackup backup;
- Introduce zstd compression algorithm;
- Binlog supporting tokudb_ group_ Commit feature;
4、 Test table
Tokudb core configuration:
Write data using load data:
Calculate the write speed per second:
The actual file size is 8.5g, and the write size to tokudb is 3.5G, which is only close to half of the multi-point compression. For 2 billion data writes, the actual test can be completed in more than 58 minutes. It can meet the actual needs. In addition, for machines with good disk IO (SSD disks and cloud disks on the cloud), if the memory and data are similar, the data volume test of this magnitude needs to add self incrementing columns in InnoDB, which can be completed in a little more than three hours. From the best practical point of view, both InnoDB and tokudb write the same data. InnoDB takes about 3-4 times as long as tokudb. Different file sizes, the same 2 billion data:
The file size is 5 times the size.
Using tokudb in a cloud environment with 8 cores, 8g memory and 500g high-speed cloud disk, 570000 writes per second can be easily realized through multiple tests.
In addition, several scenarios are also tested for your reference: if a self incremented primary key is used in tokudb, the primary key has no value, resulting in an obvious decrease in the write speed inside mysql. Similarly, 200 million data is written with a self created primary key:
For the same data writing, when the primary key is automatically incremented without value, the bulk loader data feature of tokudb cannot be used, which is equivalent to converting to a single insert implementation, so the effect is too slow.
As for the prerequisite requirements of tokudb bulk loader, this table is an empty table. It can also be used for self incrementing columns, such as when the self incrementing column has a value. It is suggested that in actual use, if the auto increment column has a value, consider removing the auto increment attribute and changing it to a unique index, so as to reduce some processing logic of auto increment and make tokudb run faster. In addition, in the bulk loader process, compression is not very good in order to pursue faster writing.
About tokudb bulk loader:
5、 Test environment description
Test using centos7 environment
Author: Wu Bingxi