There is a large amount of data and paging query is very slow. What is the optimization scheme?

Time:2021-8-2

Read by little hub:

When the number of pages is large, the query efficiency drops sharply. Is there any way to optimize it? Finish reading this article!


Author: youyou I

Source: cnblogs.com/youi/p/7851007.html

  • preparation
  • General paging query
  • Use subquery optimization
  • Use ID qualification optimization
  • Use temporary table optimization
  • ID description of data table
    • *

When there are tens of thousands of records in the table to be queried from the database, all the results of one-time query will become very slow, especially with the increase of data volume. At this time, pagination query needs to be used. There are also many methods and optimization points for database paging query. Here are some methods I know.

preparation

In order to test some of the optimizations listed below, the following describes an existing table.

  • Table name: order_ history
  • Description: the order history table of a business
  • Main fields: unsigned int ID, tinyint (4) int type
  • Field status: there are 37 fields in the table, excluding large data such as text. The maximum is varchar (500), and the ID field is index and incremental.
  • Data volume: 5709294
  • MySQL version: it is not easy to find a million level test table offline on 5.7.16. If you need to test yourself, you can write shell scripts or insert data for testing. The execution environment of all the following SQL statements has not changed. The following are the basic test results:
select count(*) from orders_history;

Return result: 5709294

The three query times are:

  • 8903 ms
  • 8323 ms
  • 8401 ms

General paging query

General paging queries can be implemented using a simple limit clause. The limit clause is declared as follows:

SELECT * FROM table LIMIT [offset,] rows | rows OFFSET offset

The limit clause can be used to specify the number of records returned by the select statement. Note the following:

  • The first parameter specifies the offset of the first return record line. Note from0start
  • The second parameter specifies the maximum number of record rows to return
  • If only one parameter is given: it indicates that the maximum number of record rows is returned
  • The second parameter is – 1, which means to retrieve all record lines from an offset to the end of the recordset
  • The offset of the initial record line is 0 (not 1)

The following is an application example:

select * from orders_history where type=8 limit 1000,10;

This statement will be from the table orders_ Query in historyoffset: 100010 pieces of data after the start, that is, 1001 to 1010 pieces of data(1001 <= id <= 1010)。

The records in the data table are sorted by primary key (generally ID) by default. The above results are equivalent to:

select * from orders_history where type=8 order by id limit 10000,10;

The three query times are:

  • 3040 ms
  • 3063 ms
  • 3018 ms

For this query method, the following tests the impact of query records on time:

select * from orders_history where type=8 limit 10000,1;
select * from orders_history where type=8 limit 10000,10;
select * from orders_history where type=8 limit 10000,100;
select * from orders_history where type=8 limit 10000,1000;
select * from orders_history where type=8 limit 10000,10000;

The three query times are as follows:

  • Query one record: 3072ms, 3092ms, 3002ms
  • Query 10 records: 3081ms, 3077ms, 3032ms
  • Query 100 records: 3118ms 3200ms 3128ms
  • Query 1000 records: 3412ms, 3468ms, 3394ms
  • Query 10000 records: 3749ms 3802ms 3696ms

In addition, I have made more than ten queries. From the perspective of query time, it can be basically determined that when the number of query records is less than 100, there is basically no gap in query time. With the increasing number of query records, more and more time will be spent.

Test for query offset:

select * from orders_history where type=8 limit 100,100;
select * from orders_history where type=8 limit 1000,100;
select * from orders_history where type=8 limit 10000,100;
select * from orders_history where type=8 limit 100000,100;
select * from orders_history where type=8 limit 1000000,100;

The three query times are as follows:

  • Query 100 offset: 25ms 24ms
  • Query 1000 offset: 78ms 76ms 77ms
  • Query 10000 offset: 3092ms 3212ms 3128ms
  • Query 100000 offset: 3878ms 3812ms 3798ms
  • Query 1000000 offset: 14608ms, 14062ms, 14700ms

With the increase of query offset, especially after the query offset is greater than 100000, the query time increases sharply.

This paging query method will start scanning from the first record in the database, so the later it goes, the slower the query speed, and the more data it queries, the slower the total query speed.

Use subquery optimization

This method first locates the ID of the offset position, and then queries it later. This method is applicable to the case where the ID increases.

select * from orders_history where type=8 limit 100000,1;

select id from orders_history where type=8 limit 100000,1;

select * from orders_history where type=8 and
id>=(select id from orders_history where type=8 limit 100000,1)
limit 100;

select * from orders_history where type=8 limit 100000,100;

The query time of the four statements is as follows:

  • Statement 1: 3674ms
  • Clause 2: 1315ms
  • Clause 3: 1327ms
  • Clause 4: 3710ms

For the above query, please note:

  • Compare the first statement with the second statement: the speed of using select id instead of select * is increased by three times
  • Compare the second statement with the third statement: the speed difference is tens of milliseconds
  • Compare the third statement with the fourth statement: thanks to the increase in the speed of select id, the query speed of the third statement is increased by three times

Compared with the original general query method, this method will be several times faster.

Use ID qualification optimization

In this way, it is assumed that the ID of the data table isContinuous incrementIf yes, we can calculate the range of query ID according to the number of pages and records queried. You can use ID between and to query:

select * from orders_history where type=2
and id between 1000000 and 1000100 limit 100;

Query time: 15ms, 12ms, 9ms

This query method can greatly optimize the query speed and can be completed within tens of milliseconds. The limitation is that it can only be used when the ID is clearly known. However, when creating a table, the basic ID field will be added, which brings a lot of convenience to paging query.

There can also be another way to write:

select * from orders_history where id >= 1000001 limit 100;

Of course, you can also use the in method to query. This method is often used to query when multiple tables are associated. You can use the ID set of other table queries to query:

select * from orders_history where id in
(select order_id from trade_2 where goods = 'pen')
limit 100;

Note that some MySQL versions do not support the use of limit in the in clause.

Use temporary table optimization

This method does not belong to query optimization, which is mentioned here.

For the problem of using ID to limit the optimization, the ID needs to be continuously increased. However, in some scenarios, such as when using the history table or when there is a data loss problem, you can consider using the temporarily stored table to record the paging ID and using the paging ID for in query. This can greatly improve the speed of traditional paging query, especially when the amount of data is tens of millions.

ID description of data table

In general, when creating tables in the database, the ID increment field is forced to be added to each table to facilitate query.

If the amount of data such as order library is very large, it is generally divided into databases and tables. At this time, it is not recommended to use the ID of the database as the unique ID, but use the distributed high concurrency unique ID generator to generate it, and use another field in the data table to store the unique ID.

Using the range query to locate the ID (or index) first, and then using the index to locate the data can improve the query speed several times. That is, select id first, and then select *;

I have little talent and learning, and I will inevitably make mistakes. If I find any mistakes and omissions in the article, I hope you will give me advice.


Recommended reading:

Great, springboot + Vue front and rear end separation complete introductory tutorial!

Share a set of springboot development blog system source code and complete development documents! Speed save!

The 100 Java open source projects worth learning on GitHub, covering various technology stacks!

The latest frequently asked enterprise interview questions and answers in 2020