A case of SQL optimization: or condition optimization

Time:2021-4-21

This paper is compiled from:Kangaroo cloud technology | SQL optimization case ( two ) : or condition optimization

Data stack is a cloud native station data center PAAS. We have an interesting open source project on GitHub:https://github.com/DTStack/flinkx

Flinkx is a Flink based batch flow unified data synchronization tool, which can collect both static data, such as mysql, HDFS, etc., and real-time data, such as mysql, binlog, Kafka, etc. it is a global, heterogeneous, batch flow integrated data synchronization engine. If you are interested, you are welcome to visit us in GitHub community~

In mysql, if you change the position of or in the SQL statement under the same query conditions, the query results will also be different. In more complex cases, it may lead to poor performance of index selection. In order to avoid the problem of greatly reduced execution efficiency, we can properly consider using union all to separate SQL with more complex query logic.

For common or scenarios, please read the following cases:

Case 1: different columns using or condition query

1. Scenarios to be optimized

SELECT
..
..
  FROM`t1` a
 WHERE a.token= '16149684'
   AND a.store_id= '242950'
   AND(a.registrationId IS NOT NULL
   AND a.registrationId<> '')
    OR a.uid= 308475
   AND a.registrationId IS NOT NULL
   AND a.registrationId<> ''

Implementation plan

+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+
| id           | select_type           | table           | type           | key               | key_len           | ref           | rows           | Extra                                       |
+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+
| 1            | SIMPLE                | a               | range          |idx_registrationid | 99                |               | 100445         | Using index condition; Using where          |
+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+

It takes 5 ms to return one row of records.

2. Scene analysis

From the query conditions, we can see that token and uid are very good in filtering, but because of the use of or, we need to use the index merge method to get better performance. But in the actual implementation process, MySQL optimizer chooses to use the index on the registration ID by default, which leads to poor performance of SQL.

3. Scene optimization

We rewrite SQL to union all.

SELECT
...
...
FROM`t1` a
WHERE a.token = '16054473'
AND a.store_id = '138343'
AND b.is_refund = 1
AND (a.registrationId IS NOT NULL
AND a.registrationId <> '')
union all
SELECT
...
...
FROM`t1` a
where a.uid = 181579
AND a.registrationId IS NOT NULL
AND a.registrationId <> ''

+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+
| id           | select_type           | table           | type           | possible_keys                | key           | key_len           | ref                          | rows           | Extra                              |
+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+
| 1            | PRIMARY               | a               | ref            | IDX_TOKEN,IDX_STORE_ID_TOKEN | IDX_TOKEN     | 63                | const                        | 1              | Using index condition; Using where |
| 1            | PRIMARY               | b               | eq_ref         | PRIMARY                      | PRIMARY       | 4                 | youdian_life_sewsq.a.role_id | 1              | Using where                        |
| 2            | UNION                 | a               | const          | PRIMARY                      | PRIMARY       | 4                 | const                        | 1              |                                    |
| 2            | UNION                 | b               | const          | PRIMARY                      | PRIMARY       | 4                 | const                        | 0              | unique row not found               |
|              | UNION RESULT          | <union1,2>      | ALL            |                              |               |                   |                              |                | Using temporary                    |
+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+

A total of 5 lines of records are returned , It takes 5 ms.

By comparing the execution plans before and after optimization, it is obvious that,Split SQL into two subqueries, and then use union to merge the results, which has better stability, security and performance.

Case 2: using or query condition in the same column

1. Scenarios to be optimized

select
....
....
from
t1 as mci
left join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1
left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2
left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3
left join(
  select product_id,
  count(0) count
  from t2 pprod
  inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id
  and pprod.is_enable =1
  and ppinfo.is_enable=1
  and pinfo.belong_t0 =1
  and pinfo.end_time >=now()
  and not (
   pinfo.onshelv_time>'2019-06-30 00:00:00'
   or pinfo.end_time>'2018-12-05 00:00:00'
  )group by pprod.product_id
)as pc on pc.product_id = mci.product_id
where mci.is_enable =0
and mci.comodifty_type in ('1', '5', '6')
and (pc.count =0 or pc.count isnull ) limit 0,5;

Implementation plan

A case of SQL optimization: or condition optimization

2. Scene analysis

In this example, there is a sub query in the SQL query. The sub query is regarded as the driver table, resulting in auto_ Key, which is tested by SQL splitting. The verification is mainly based on ( pc . count = 0 , or pc . count is null ) Will affect the performance of the entire SQL, need to compare rewriting.

3. Scene optimization

First, we can think alone( pc.count =0 , or pc.count How to optimize? Write a similar SQL first

Select col from test where col =100 or col is null;
+--------+
| col    |
+--------+
|    100 |
|   NULL |
+--------+
2 rows in set (0.00 sec) 

At this time, we see the same column, but corresponding to different values. In this case, we can use case when to convert.

Select col From test where case when col is null then 100 else col =100 end;
+--------+
| col    |
+--------+
|    100 |
|   NULL |
+--------+
2 rows in set (0.00 sec)

Go back to the original SQL and rewrite it.

select
....
....
from
t1 as mci
left join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1
left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2
left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3
left join(
  select product_id,
  count(0) count
  from t2 pprod
  inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id
  and pprod.is_enable =1
  and ppinfo.is_enable=1
  and pinfo.belong_t0 =1
  and pinfo.end_time >=now()
  and not (
   pinfo.onshelv_time>'2019-06-30 00:00:00'
   or pinfo.end_time>'2018-12-05 00:00:00'
  )group by pprod.product_id
)as pc on pc.product_id = mci.product_id
where mci.is_enable =0
and mci.comodifty_type in ('1', '5', '6')
and case when pc.count is null then 0 else pc.count end=0 limit 0,5;

A case of SQL optimization: or condition optimization

It can be seen that the optimized SQL is 30 seconds faster than the original SQL, and the execution efficiency is improved about 50 times.

Case 3: optimizing or conditions of associated SQL

1. Scenarios to be optimized

SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE user_msg.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL30SECOND)
OR user.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
OR group.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)

2. Scene analysis

After careful analysis of the above query statements, we find that although the business logic only needs to query the data modified in half a minute, the execution process must associate all the data, which brings unnecessary performance loss.

A case of SQL optimization: or condition optimization

3. Scene optimization

We split the original SQL. The first part of sql-01 is as follows:

SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE user_msg.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)

A case of SQL optimization: or condition optimization

Sql-01 as user_ MSG table as the driver, using GMT_ The modified index filters the latest data.

The second part of sql-02 is as follows:

SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE user.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)

A case of SQL optimization: or condition optimization

Ql-02 takes user as the driver table, MSG user_ The index of ID filters the rows very well.

The third part sql-03 is as follows:

SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE group.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)

A case of SQL optimization: or condition optimization

sql – 03 group driven table, GMT_ The modified index filters the latest data.

summary

The common scenarios of MySQL or condition optimization are as follows:

1. The same column can be replaced by in

2. In the case of different columns and complexity, union all can be used for separation

3. Association SQL or condition

We need to combine the actual scene, analysis and optimization.