abstractWe all mentioned that the performance optimization of MySQL focuses on optimizing SQL and index to improve query performance. Most products or websites are faced with more high concurrency data reading problems. However, how to optimize it in the scenario of massive data writing?
Today, I will mainly introduce the optimization scheme in the scene with a large number of writing.
In general, the performance of MySQL database writing is mainly limited by the configuration of the database itself, the performance of the operating system, and the performance of disk IO. The main optimization methods include the following points:
1. Adjust database parameters
The default value is 1, which is the transaction commit setting parameter of the database. The optional values are as follows:
0: the log buffer is written to the log file once a second, and the disk operation of the log file is refreshed, but no operation is performed in a transaction commit.
1: When each transaction is committed, the log buffer is written to the log file to refresh the disk operation.
2: At each commit, the log buffer is written to the file, but the disk operation of the log file is not refreshed. Refresh the log file once per second.
Some people will say that if it is changed to a value other than 1, will it be unsafe? The safety comparison is as follows:
In the manual of MySQL, in order to ensure the persistence and consistency of transactions, it is recommended to set this parameter to 1. The factory default value is 1, which is also the safest setting.
When InnoDB_ flush_ log_ at_ trx_ Commit and sync_ Binlog is the safest when it is all 1. In the case of mysqld service crash or server host crash, only one statement or transaction can be lost in the binary log.
However, in this case, it will lead to frequent IO operations, so this mode is also the slowest way.
- When InnoDB_ flush_ log_ at_ trx_ If commit is set to 0, the crash of mysqld process will result in the loss of all transaction data in the last second.
- When InnoDB_ flush_ log_ at_ trx_ If commit is set to 2, all transaction data in the last second may be lost only when the operating system crashes or the system is powered down.
For the same table, the C # code is used to batch insert according to the system business process. The performance comparison is as follows:
- (a) under the same conditions: InnoDB_ flush_ log_ at_ trx_ Commit = 0, it takes 25.08 seconds to insert 50W rows of data;
- (b) InnoDB under the same conditions_ flush_ log_ at_ trx_ Commit = 1, it takes 17 minutes 21.91 seconds to insert 50W rows of data;
- (c) InnoDB under the same conditions_ flush_ log_ at_ trx_ Commit = 2, it takes 1 minute 0.35 seconds to insert 50W rows of data.
Conclusion: when it is set to 0, data writing is the fastest, which can quickly improve the performance of database writing, but it is possible to lose data in the last 1 second.
These two parameters mainly affect the writing of temporary table and memory engine table. If the setting is too small, even the error message of table is full will appear
According to the actual business situation, it is necessary to set the amount of data that is larger than the amount of space that needs to be written.
（3) max_allowed_packet=256M,net_buffer_length=16M，set autocommit=0
If you set these three parameters properly during backup and recovery, you can speed up your backup and recovery!
Obviously, AUTOEXTEND behind the table space is to allow the table space to expand automatically, which is not enough. By default, it is only 10m. In the scenario of massive data writing, you might as well increase this parameter;
When the table space grows, allocate more table space as much as possible at a time, and avoid frequent file expansion when writing in large quantities
Set the size of the transaction log, the number of log groups, and the log cache. The default value is very small, InnoDB_ log_ file_ The default size is only a few tens of M, InnoDB_ log_ files_ in_ Group is 2 by default.
However, in InnoDB, data is usually written to cache first, then to transaction log, and then to data file. If the setting is too small, in the scenario of mass data writing, it will inevitably lead to frequent triggering of database checkpoints to write the data in the log to the disk data file. Frequent refreshing buffer and switching log will lead to the performance degradation of mass data writing.
Of course, it should not be too large. When the database is abnormally down due to excessive size, when the database is restarted, it will read the dirty data in the log that has not been written into the data file, redo and recover the database. If the size is too large, the recovery time will be longer. When the recovery time is far beyond the user’s expected acceptance of the recovery time, it will inevitably cause user complaints.
You can refer to the database default settings of Huawei cloud for this setting. In the Huawei cloud 2-core 4G environment, it seems to be configured by default buffer:16M ,log_ file_ size:1G—- Almost 25% of the total memory according to the official recommendation of MySQL, while the log group files_ in_ Group is set to 4 groups.
With the low hardware configuration of 2-core 4G, due to the rationality of parameter setting, it has been able to resist thousands of read and write requests per second and more than 80000 requests per minute.
If the amount of data written is much larger than the amount of data read, or it is convenient to change the parameters at will, you can import a large amount of data, and then adjust the log_ file_ The size adjustment is larger, which can reach InnoDB_ buffer_ pool_ 25% – 100% of size.
（6) innodb_ buffer_ pool_ Size sets the available cache size of MySQL InnoDB. Theoretically, the maximum can be set to 80% of the total memory of the server
The higher the value, the better the writing performance. For example, the above parameter InnoDB_ log_ file_ Size refers to InnoDB_ buffer_ pool_ Size.
So it’s called Siyi. It controls the number of concurrent threads. Theoretically, the more threads there are, the faster the writing will be. Of course, it can’t be set too large. The official recommendation is about twice the number of CPU cores.
It controls the cache size of a single write in a single session. The default value is about 4K. Generally, there is no need to adjust it. However, in the scene of frequent mass writing, you can try to adjust it to 2m, and you will find that the writing speed will be improved to a certain extent.
The default value is 1. It mainly sets the number of memory buffer pools. To put it simply, it controls concurrent reading and writing of InnoDB_ buffer_ The number of pools.
In the scene of mass writing, you can also increase the parameter, which will bring significant performance improvement.
Binary log, which usually records all the addition, deletion and modification operations of the database. However, in a large number of data, such as database restore, you might as well temporarily close bin_ Log, turn off the binary log writing, let the data only write to the data file, quickly complete the data recovery, and then open it.
2. Reduce disk IO and improve disk read / write efficiency
It includes the following methods:
(1) Database system architecture optimization
a: Master and copy;
For example, deploy a dual master-slave mode. The dual master-slave mode is to back up each other and ensure data security. Different business systems connect to different database servers. Combined with the function of automatic switch of ngnix or keep alive, it can realize load balancing and automatic switch in case of failure.
Through this kind of architecture optimization, the concurrent read and write io of distributed business system from one server to multiple servers can also improve the write speed of a single database.
b: Do read write separation
Like the problem to be considered in 1, it can reduce the disk IO of a single server, and it can also move the backup operation on the server to the standby server to reduce the IO pressure of the primary server, so as to improve the write performance.
(2) : hardware optimization
a: In the case of limited resources, when installing and deploying, there should be multiple disks in the operating system. Applications, database files, log files, etc. should be distributed to different disks to reduce the IO of each disk, so as to improve the write performance of a single disk.
b: Using SSD
If the resources are enough, SSD can be used for storage. SSD has the characteristics of high-speed writing, and it can also significantly improve all disk IO operations.
Of course, there are more hardware or software optimization methods, which will not be listed here.
This article is shared from Huawei cloud community “MySQL mass write performance optimization”, original author: floating dust.