Order by sorting optimization

Time:2021-6-14

In daily business development, order by is indispensable. But to write efficient sorting SQL, we need to spend some energy and time to understand the underlying principle of sorting, so as to find a good strategy to optimize the sorting.

How to sort

Index (index sort, best performance)

Use index fields to sort as much as possible

Filesort (file sort)

2.1 two way sorting

Before MySQL 4.1, we scan the disk twice to get the data
First read the row pointer and order by column from the disk, and sort them. Then scan the ordered list, and read them from the list again according to the value in the list (read from the disk again). To scan the disk twice, I / O is very time-consuming.

2.2 one way sorting

After MySQL 4.1, a better sorting algorithm is added
Read all the columns needed by the query from the disk, sort them in the buffer according to the order by column, and then scan the sorted list for output
It's faster, avoids a second read (from disk) and changes random I / O to sequential I / O
But it uses too much space because it keeps every row in memory
Deficiencies:
In sort_ In buffer, single channel algorithm takes up more space than double channel algorithm
Because the one-way algorithm takes out all the fields, it is possible that the total size of the data taken out exceeds that of sort_ The capacity of the buffer (MySQL allocates a block of memory to each thread for sorting) causes that only sort can be taken each time_ Buffer capacity of the size of the data, sort (create TMP file, multi-channel merge), row out again
sort_ The buffer capacity is too small, so it can be used for multiple I / O operations
This idea saves one I / O operation, but leads to a large number of I / O operations, which is not worth the loss.
The conditions for using one way sorting are as follows:
1. The total size of the field types retrieved by the query statement should be less than   max_ length_ for_ sort_ data
2. The sort field does not contain text and blob types

Optimization strategy

When order by, only the required fields are queried

1. When the total field size of query is less than max_ length_ for_ sort_ Data, and the sort field is not of text blob type, the single way sort algorithm will be used, otherwise the multi way sort algorithm will be used
2. The data of both algorithms may exceed that of sort_ If the buffer capacity is exceeded, TMP files will be created to merge and sort, resulting in multiple I / O. however, the risk of using one-way sorting is greater, so we need to improve sort_ buffer_ size

Try to improve sort_ buffer_ size

No matter which algorithm is used, increasing this parameter will improve efficiency
It should be improved according to the system's own ability, because this parameter is for each process

Try to increase max_ length_ for_ sort_ data

Increasing this parameter will increase the probability of using the improved algorithm.
However, if it is set too high, the total data capacity exceeds sort_ buffer_ The obvious symptoms are high disk I / O activity and low processor utilization

example

data sheet

*************************** ***************************
Table: user
Create Table: CREATE TABLE `user` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(20) NOT NULL,
`age` int(10) NOT NULL DEFAULT '0',
`city` varchar(20) NOT NULL,
`addr` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_name_age_city` (`name`,`age`,`city`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Order by can use the leftmost prefix of the index

* select id,name,age,city from user order by name;
* select id,name,age,city from user order by name,age,city;
* explain select id,name,age,city from user order by name desc,age desc,city desc;

Order by sorting optimization

Order by sorting optimization

Order by sorting optimization

If where is defined as a constant with the leftmost prefix of the index, order by can use the index

* select * from user where name = 'zhangsan' order by age,city;
* select * from user where name = 'zhangsan' and age = 20 order by city;
* select * from user where name = 'zhangsan' and age > 20 order by age,city;

Order by sorting optimization

Order by sorting optimization

Order by sorting optimization

Index cannot be used for sorting

select * from user order by name,age,city;// Query * field
select * from user order by addr;// Non index field sorting
select * from user order by name,addr;// Contains non indexed fields
select * from user where age = 20 order by city;// Skipping the name field, violating the leftmost prefix rule
select * from user where name = 'zhangsan' order by city;// Skipping the age field, violating the leftmost prefix rule
select * from user where name = 'zhangsan' order by age,addr;// Contains non indexed fields

Order by sorting optimization

Order by sorting optimization

Order by sorting optimization

Order by sorting optimization

Order by sorting optimization

This work adoptsCC agreementReprint must indicate the author and the link of this article

Aduh