Detailed explanation of total record optimization of count query in paging component

Time:2020-3-29

1 background

When using the paging function of mybatis plus (hereinafter referred to as MBP). We found a paging optimization class of jsqlparsercountoptimize. The official didn’t give a detailed introduction to it. We didn’t find any words to analyze this kind of logic on the Internet, which we dare not use. It’s also convenient for others.

2 Principle

First of all, the principle of paginationinterceptor is not covered here (the implementation principle of mybatis general pagination encapsulation is quite simple, which is the same thing). Finally, the query is basically divided into two SQL: total count records + real pagination records. This class uses optimization to check the count. How to optimize the count query? Here’s a real scenario to help you understand: if there are two tables, user, user address and user account, which record user and user address and user account respectively, a user may have multiple addresses, i.e. 1-to-many relationship; a user can only have one account, i.e. 1-to-1 relationship.

2.1 optimize order by

First look at the following SQL and put it under paging query

select * from user order by age desc, update_time desc

Traditional paging components are often

Check count: 
select count(1) from (select * from user order by age desc, update_time desc)
Record:
select * from user order by age desc, update_time desc limit 0,50

Any problems? The order by can be completely removed when checking the count! In the case of complex query, large table, sorting of non index fields and so on, it’s very slow to query records. Count again! So it’s obvious that we want to optimize count toselect count(1) from (select * from user)

2.1.1 restriction

But not all scenes can be optimized, such as withgroup byQueries

2.1.2 source code

So the MBP source code is implemented as follows. If there is no group by and order by statement, remove the order by

//Add include groupby without removing orderby
if (null == groupBy && CollectionUtils.isNotEmpty(orderBy)) {
        plainSelect.setOrderByElements(null);
        sqlInfo.setOrderBy(false);
}

2.2 optimize the join scenario

In the join operation, there is also the possibility of optimization. See the following SQL

select u.id,ua.account from user u left join user_account ua on u.id=ua.uid

In this case, you can delete the left join direct query user when querying the count page by page, because the relationship between user and user account is 1-to-1, as follows

Check count: 
select count(1) from user u
Record: 
select u.id,ua.account from user u left join user_account ua on u.id=ua.uid limit 0,50

2.2.1 restriction

There are many restrictions on whether count can remove the join direct query of the first table, as follows:

The number of records cannot be enlarged after the table records the join

From the above case, if the number of records after left join is larger than the total number of records in the first direct query table, this optimization cannot be carried out. For example, three users each record two addresses

Select u.id, UA. Address from user u left join user [address UA on u.id = ua.uid (6)
vs
Select count (1) from user U (3)

In this case, if you remove the left join and check the count, you will get less total records.Notice that this could turn into a pitMBP cannot automatically determine whether the paging query will enlarge the records, so the join optimization is turned off by default. If you want to turn on the jsqlparsercountoptimize bean that needs to be declared and customized, set the optimizejoin to true, as follows

@Bean
    public PaginationInterceptor paginationInterceptor() {
        PaginationInterceptor paginationInterceptor = new PaginationInterceptor();
        paginationInterceptor.setCountSqlParser(new JsqlParserCountOptimize(true));
        return paginationInterceptor;
    }

In fact, the source code design here is a little unreasonable, because after opening, you have to carefully review the paging codes of your various left joins. If there is amplification, when you can only construct page objects, set the optimizecountsql to false (the default is true), which is equivalent to turning off all count optimizations in this query. Then not only join, but also order by and other optimizations will not be carried out. It is recommended to obtain the optimizejoin from page (or ThreadLocal?) instead, and change it to the configuration that can be configured for each query level, which is off by default. Only after the developer confirms that the join can be optimized can the active setting of this query level be turned on.

Left join only

If it is an inner join or a right join, the number of records will always be enlarged, so MBP optimization will automatically judge that if there is any non left join in multiple joins, this optimization will not be carried out, such asfrom a left join b .... right join c... left join dAt this time, the optimization will not be carried out directly

On statement has query conditions

such as

select u.id,ua.account from user u left join user_account ua on u.id=ua.uid and ua.account > ?

Where statement contains conditions for joining tables

such as

select u.id,ua.account from user u left join user_account ua on u.id=ua.uid where ua.account > ?

2.2.2 source code

MBP’s join optimization source code is roughly as follows, corresponding to the above optimization and limitations

List joins = plainSelect.getJoins();
//Whether optimizejoin is enabled globally (it is recommended that you can also set it from page by query)
if (optimizeJoin && CollectionUtils.isNotEmpty(joins)) {
    boolean canRemoveJoin = true;
    String whereS = Optional.ofNullable(plainSelect.getWhere()).map(Expression::toString).orElse(StringPool.EMPTY);
    for (Join join : joins) {
            //Left join only
            if (!join.isLeft()) {
                    canRemoveJoin = false;
                    break;
            }
            Table table = (Table) join.getRightItem();
            String str = Optional.ofNullable(table.getAlias()).map(Alias::getName).orElse(table.getName()) + StringPool.DOT;
            String onExpressionS = join.getOnExpression().toString();
            /*If the join contains? (represents that the on statement has query conditions) 
            perhaps 
            Where statement contains conditions for joining tables
            Do not remove the join*/
            if (onExpressionS.contains(StringPool.QUESTION_MARK) || whereS.contains(str)) {
                    canRemoveJoin = false;
                    break;
            }
    }
    if (canRemoveJoin) {
            plainSelect.setJoins(null);
    }
}

2.3 optimize the location of select count (1)

In traditional pagination, select count (1) is often set in the outer layer of the original query SQL, such as

select count(1) from (select * from user)

The real purpose of count is to get the number of records, which does not need the data in the original queryselect *It takes extra time, so it can be optimized as follows

select count(1) from user

2.3.1 restriction

In the same way, there are some scenarios where count location cannot be optimized

The field of select contains parameters

This optimization can not be carried out if the select contains {}, ${} parameters waiting for replacement, because the subsequent stage of placeholder replacement of real values will result in errors due to the decrease of the number of placeholders, such as

select count(1) from (select power(#{aSelectParam},2) from user_account where uid=#{uidParam}) ua
vs
select count(1) from user_account where uid=#{uidParam} ua

MBP official issue Chen 95 registered this issue

Include distinct

Select contains the statement of distinct de duplication, which may increase the number of count records, so this optimization cannot be carried out. such as

select count(1) from (select distinct(uid) from user_address) ua
vs
Select count (1) from user "address UA" the number of records may increase

Include group by

For statements containing group by, because there are always aggregation functions in select, the built-in semantics of count (1) becomes aggregation functions, which cannot be optimized. such as

Select count (1) from (select uid, count (1) from user "address group by uid) UA returns the total number of records in a single row and column
vs
Select count (1) from user "address group by uid"

2.3.2 source code

The related source code in MBP is as follows

//The parameter contained in the field of select is not optimized
for (SelectItem item : plainSelect.getSelectItems()) {
        if (item.toString().contains(StringPool.QUESTION_MARK)) {
                return sqlInfo.setSql(SqlParserUtils.getOriginalCountSql(selectStatement.toString()));
        }
}
//Include distinct, groupby not optimized
if (distinct != null || null != groupBy) {
        return sqlInfo.setSql(SqlParserUtils.getOriginalCountSql(selectStatement.toString()));
}
...
//To optimize SQL, count select item is actually a select count (1) statement
plainSelect.setSelectItems(COUNT_SELECT_ITEM);

3 Summary

In fact, this paper aims at some optimization ideas for the step of checking the number of count records in the general paging component

  • Optimize order by
  • Optimize join statement
  • Optimize the location of select count (1)
  • Pay attention to the restrictions corresponding to the above optimization, otherwise business errors may be caused (especially the join optimization, which is relatively hidden)

In fact, it’s not limited to MBP. You can also try to use the customized page blocker, which has a significant effect on page optimization

“Used to record the evolution of life, the iteration of story. Looking forward to be a platform to help and think for you