If you need to regularly clean up the historical data in an ordinary large table.
You can use one or more delete statements with a where condition (where condition is time). If the amount of table data is large, it will cause great pressure on the database. Even if we delete the old data, the underlying data file does not get smaller.
Why didn’t it get smaller?
When data is deleted, MySQL does not immediately reclaim the table space. The storage space occupied by the deleted data and the index bit will be empty, waiting for new data to make up for the vacancy.
Forced recycling: optimize table
Facing this kind of problem, the most effective way is to use partition table. The biggest advantage of partition table is that it can clean up historical data very efficiently.
About partition tables
Partitioning tables is not done at the storage engine level. This is a feature supported by MySQL (starting from 5.1)
Decompose the table index into smaller, more manageable parts.
Logically, there is only one table or index, but physically, the table or index may consist of dozens of physical partitions.
The biggest advantage of partition table is that it can clean up historical data very efficiently.
Each partition is an independent object, which can be processed by itself or as part of a larger object (if the partition table is large, the partition can also be allocated to different disks).
When executing the query, the optimizer will filter the partitions that do not have the data we need according to the partition definition. In this way, the query does not need to scan all the partitions in the whole table, but only find the partitions that contain the data we need.
Check whether the partition function is enabled
mysql> SHOW PLUGINS \G; *************************** 43. row *************************** Name: partition Status: ACTIVE Type: STORAGE ENGINE Library: NULL License: GPL
MySQL currently only supports horizontal partitioning (horizontal partitioning is to allocate different rows to different physical files).
Range partition (range)
Row data is put into partitions based on the values of a given continuous interval.
List partition (list)
Similar to the range partition, it is only oriented to discrete values.
Hash partition (hash)
The partition is distinguished according to the value returned by the user-defined expression.
Partition according to the hash function provided by MySQL database.
You can partition the values of multiple columns（ MySQL 5.5 + starts support).
This is the most common partition type. The most common way to partition is based on the time field (partition based columns are preferably integer).
The column of partition can allow null value. If the column value of partition is null, the first partition will be selected.
CREATE TABLE range_partition_test ( id INT, pdate INT ) PARTITION BY RANGE (pdate) ( PARTITION p1 VALUES LESS THAN ( 201702 ), PARTITION p2 VALUES LESS THAN ( 201703 ), PARTITION p3 VALUES LESS THAN ( 201704 ), PARTITION p4 VALUES LESS THAN (MAXVALUE) );
MAXVALUEIs an infinite value, so the P4 partition is the default partition.
When executing a query, bring the partition field so that you can use the partition clipping function.
mysql> select * from range_partition_test; +------+--------+ | id | pdate | +------+--------+ | 1 | 201701 | | 2 | 201702 | | 3 | 201703 | | 4 | 201704 | | 5 | 201705 | +------+--------+ mysql> explain partitions select * from range_partition_test where pdate between 201702 and 201703; +----+-------------+----------------------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------------------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | range_partition_test | p2,p3 | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where | +----+-------------+----------------------+------------+------+---------------+------+---------+------+------+----------+-------------+
Only P2 and P3 partitions are queried.
The list partition is similar to the range partition.
The difference is that list is a set of enumeration value lists, and range is a set of continuous interval values. They are very similar in grammar.
It is recommended that the list partition column be a non null column. Otherwise, insert null value. If there is no null value in the enumeration list, the insert will fail (different from range partition).
CREATE TABLE list_partition_test ( id INT, pdate INT ) PARTITION BY LIST (pdate) ( PARTITION p1 VALUES IN (1,3,5,7,9), PARTITION p2 VALUES IN (2,4,6,8,0) );
The purpose of hash partition is that the data can not be evenly divided into predefined partitions. Ensure that the number of records in each partition is basically the same.
In the actual work, we often encounter this kind of table like membership table. There is no obvious characteristic field that can be partitioned. But the table data is very large. In this case, hash partition can be used.
Based on the given number of partitions, data is allocated to different partitions. Hash partition can only hash integer.
CREATE TABLE hash_partition_test ( id INT, pdate INT ) PARTITION BY HASH(id) PARTITIONS 4;
- The above partition object (ID) can also be an expression, and the result of the expression must be an integer value.
- Hash partitions do not need to specify the partitions clause, and the default number of partitions is 1.
- It is not allowed to write only parts without specifying the number of partitions.
- The underlying implementation of hash partition is actually based on mod function.
Key partition is similar to hash partition. The differences are as follows:
- Key partition allows multiple columns, while hash partition only allows one column.
- If there is a primary key or a unique key, the partition column in the key may not be specified. By default, it is the primary key or the unique key. If there is no primary key or unique key, the column must be specified explicitly.
- The key partition object must be a column, not a column based expression.
- The algorithm of key partition is different from hash partition. For InnoDB engine, MD5 value is used to partition.
You can use non integer data to partition directly. Partitions are obtained by direct comparison of types, and need not be converted to integers. At the same time, multiple column values can be partitioned.
CREATE TABLE listvardou ( id INT, pdate INT ) PARTITION BY LIST COLUMNS(id,pdate) ( PARTITION a VALUES IN ( (1, 201701), (1, 201702), (1, 201703)), PARTITION b VALUES IN ( (2, 201702) ) PARTITION b VALUES IN ( (3, 201703) ) );
- Range partition, list partition, hash partition and key partition objects can only return integer values. If they are not integer values, functions should be used to convert them to integer values.
- If the data table is too large to be put into memory, or only the last part of the table has hot data and other data are historical data, partition table can be selected.
- Partition table data is easier to maintain (partition can be optimized, checked, repaired and deleted in batches independently, big data can be in the form of drop partition, etc.).
- The data of partition table can be distributed on different physical devices, so that multiple hardware devices can be used efficiently.
- Independent partitions can be backed up and restored, which is very suitable for large data set scenarios.
- The main purpose of partition is to manage the high availability of database.
This work adoptsCC agreementReprint must indicate the author and the link of this article