Examples of eight common SQL errors in MySQL

Time:2021-1-16

preface

MySQL still maintained a strong database popularity growth trend in 2016. More and more customers build their applications on MySQL database, even migrate from Oracle to MySQL. But there are also some customers in the process of using MySQL database, such as slow response time, CPU full and so on.

Alibaba cloud RDS expert service team has helped cloud customers solve many urgent problems. Now some common SQL problems in apsaradb expert diagnosis report are summarized as follows for your reference.

1. Limit statement

Paging query is one of the most commonly used scenarios, but it is also the most prone to problems.

For example, for the following simple statements, the general idea of DBA is to type, name, create_ Add a composite index to the time field. In this way, conditional sorting can effectively use the index, and the performance is improved rapidly.


SELECT * 
FROM operation 
WHERE type = 'SQLStats' 
  AND name = 'SlowLog' 
ORDER BY create_time 
LIMIT 1000, 10;

Well, maybe more than 90% of DBAs solve this problem, so far.

But when the limit clause becomes “limit 1000000,10”, the programmer will still complain: why am I still slow when I only take 10 records?

You know, the database doesn’t know where the 1000000 record starts. Even if there is an index, it needs to be calculated from scratch. When this kind of performance problem occurs, most of the time the programmer is lazy.

In the scenario of front-end data browsing and page turning, or big data batch export, the maximum value of the previous page can be regarded as a parameter as the query condition. SQL is redesigned as follows:


SELECT * 
FROM  operation 
WHERE type = 'SQLStats' 
AND  name = 'SlowLog' 
AND  create_time > '2017-03-16 14:00:00' 
ORDER BY create_time limit 10;

In the new design, the query time is basically fixed and will not change with the growth of data.

2. Implicit transformation

Another common mistake in SQL statements is that the query variable and field definition types do not match. For example, the following statement:


mysql> explain extended SELECT * 
  > FROM my_balance b 
  > WHERE b.bpn = 14000000123 
  >  AND b.isverified IS NULL ;
mysql> show warnings;
| Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field 'bpn'

The field BPN is defined as varchar (20). MySQL’s strategy is to convert the string to a number and then compare it. Function on table field, index invalid.

The above situation may be the parameters automatically filled in by the application framework, rather than the original intention of the programmer. Now there are many complex application frameworks, which are easy to use but also careful that they may dig holes for themselves.

3. Association update, delete

Although MySQL 5.6 introduces materialization features, we need to pay special attention to the optimization of query statements. For updating or deleting, you need to manually rewrite it as join.

For example, in the following UPDATE statement, MySQL actually executes loop / nested subquery, and its execution time can be imagined.


UPDATE operation o 
SET status = 'applying' 
WHERE o.id IN (SELECT id 
    FROM (SELECT o.id, 
        o.status 
      FROM operation o 
      WHERE o.group = 123 
        AND o.status NOT IN ( 'done' ) 
      ORDER BY o.parent, 
         o.id 
      LIMIT 1) t);

Implementation plan:


+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| id | select_type  | table | type | possible_keys | key  | key_len | ref | rows | Extra            |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+
| 1 | PRIMARY   | o  | index |    | PRIMARY | 8  |  | 24 | Using where; Using temporary      |
| 2 | DEPENDENT SUBQUERY |  |  |    |   |   |  |  | Impossible WHERE noticed after reading const tables |
| 3 | DERIVED   | o  | ref | idx_2,idx_5 | idx_5 | 8  | const | 1 | Using where; Using filesort       |
+----+--------------------+-------+-------+---------------+---------+---------+-------+------+-----------------------------------------------------+

After rewriting to join, the selection mode of subquery changes from dependent subquery to developed, and the execution speed is greatly accelerated, from 7 seconds to 2 milliseconds.


UPDATE operation o 
  JOIN (SELECT o.id, 
       o.status 
      FROM operation o 
      WHERE o.group = 123 
       AND o.status NOT IN ( 'done' ) 
      ORDER BY o.parent, 
        o.id 
      LIMIT 1) t
   ON o.id = t.id 
SET status = 'applying' 

The implementation plan is simplified to


+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra            |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+
| 1 | PRIMARY  |  |  |    |  |   |  |  | Impossible WHERE noticed after reading const tables |
| 2 | DERIVED  | o  | ref | idx_2,idx_5 | idx_5 | 8  | const | 1 | Using where; Using filesort       |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-----------------------------------------------------+

4. Mixed sort

MySQL can’t use index for mixed sort. But in some scenarios, there are still opportunities to use special methods to improve performance.


SELECT * 
FROM my_order o 
  INNER JOIN my_appraise a ON a.orderid = o.id 
ORDER BY a.is_reply ASC, 
   a.appraise_time DESC 
LIMIT 0, 20 

The execution plan is displayed as a full table scan:


+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
| id | select_type | table | type | possible_keys  | key  | key_len | ref  | rows | Extra 
+----+-------------+-------+--------+-------------+---------+---------+---------------+---------+-+
| 1 | SIMPLE  | a  | ALL | idx_orderid | NULL | NULL | NULL | 1967647 | Using filesort |
| 1 | SIMPLE  | o  | eq_ref | PRIMARY  | PRIMARY | 122  | a.orderid |  1 | NULL   |
+----+-------------+-------+--------+---------+---------+---------+-----------------+---------+-+

Because is_ Reply has only two states: 0 and 1. After rewriting, the execution time is reduced from 1.58 seconds to 2 milliseconds.


SELECT * 
FROM ((SELECT *
   FROM my_order o 
    INNER JOIN my_appraise a 
      ON a.orderid = o.id 
       AND is_reply = 0 
   ORDER BY appraise_time DESC 
   LIMIT 0, 20) 
  UNION ALL 
  (SELECT *
   FROM my_order o 
    INNER JOIN my_appraise a 
      ON a.orderid = o.id 
       AND is_reply = 1 
   ORDER BY appraise_time DESC 
   LIMIT 0, 20)) t 
ORDER BY is_reply ASC, 
   appraisetime DESC 
LIMIT 20;

5. Exists statement

When MySQL treats the exists clause, it still adopts the nested subquery execution mode. Such as the following SQL statement:


SELECT *
FROM my_neighbor n 
  LEFT JOIN my_neighbor_apply sra 
    ON n.id = sra.neighbor_id 
     AND sra.user_id = 'xxx' 
WHERE n.topic_status < 4 
  AND EXISTS(SELECT 1 
     FROM message_info m 
     WHERE n.id = m.neighbor_id 
       AND m.inuser = 'xxx') 
  AND n.topic_type <> 5 

The implementation plan is as follows:


+----+--------------------+-------+------+-----+------------------------------------------+---------+-------+---------+ -----+
| id | select_type  | table | type | possible_keys  | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+
| 1 | PRIMARY   | n  | ALL | | NULL  | NULL | NULL | 1086041 | Using where     |
| 1 | PRIMARY   | sra | ref | | idx_user_id | 123  | const |  1 | Using where   |
| 2 | DEPENDENT SUBQUERY | m  | ref | | idx_message_info | 122  | const |  1 | Using index condition; Using where |
+----+--------------------+-------+------+ -----+------------------------------------------+---------+-------+---------+ -----+

Removing the exists and changing it to join can avoid nested subqueries and reduce the execution time from 1.93 seconds to 1 millisecond.


SELECT *
FROM my_neighbor n 
  INNER JOIN message_info m 
    ON n.id = m.neighbor_id 
     AND m.inuser = 'xxx' 
  LEFT JOIN my_neighbor_apply sra 
    ON n.id = sra.neighbor_id 
     AND sra.user_id = 'xxx' 
WHERE n.topic_status < 4 
  AND n.topic_type <> 5 

New implementation plan:


+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| id | select_type | table | type | possible_keys  | key  | key_len | ref | rows | Extra     |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+
| 1 | SIMPLE  | m  | ref | | idx_message_info | 122  | const | 1 | Using index condition |
| 1 | SIMPLE  | n  | eq_ref | | PRIMARY | 122  | ighbor_id | 1 | Using where  |
| 1 | SIMPLE  | sra | ref | | idx_user_id | 123  | const  | 1 | Using where   |
+----+-------------+-------+--------+ -----+------------------------------------------+---------+ -----+------+ -----+

6. Push under the condition

External query conditions cannot be pushed down to complex views or subqueries

  • Aggregate subqueries;
  • Subquery with limit;
  • Union or union all subquery;
  • Sub query in output field;

As shown in the following statement, we can see from the execution plan that its conditions act after the aggregate subquery:


SELECT * 
FROM (SELECT target, 
    Count(*) 
  FROM operation 
  GROUP BY target) t 
WHERE target = 'rm-xxxx'

+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
| id | select_type | table  | type | possible_keys | key   | key_len | ref | rows | Extra  |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+
|
1
| PRIMARY  |
 <derived2> 
| ref |
 <auto_key
0
> 
| <auto_key0> |
514
| const |
2
| Using where |
| 2 | DERIVED  | operation | index | idx_4   | idx_4  | 519  | NULL | 20 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+-------+------+-------------+

After confirming that the query condition can be pushed down semantically, it is rewritten as follows:


SELECT target, 
  Count(*) 
FROM operation 
WHERE target = 'rm-xxxx' 
GROUP BY target

The implementation plan becomes:


+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+
| 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index |
+----+-------------+-----------+------+---------------+-------+---------+-------+------+--------------------+

Please refer to the article for a detailed explanation of MySQL external conditions that cannot be pushed down http://mysql.taobao.org/monthly/2016/07/08

7. Narrow the scope ahead of time

Start with the initial SQL statement:


SELECT * 
FROM  my_order o 
    LEFT JOIN my_userinfo u 
       ON o.uid = u.uid
    LEFT JOIN my_productinfo p 
       ON o.pid = p.pid 
WHERE ( o.display = 0 ) 
    AND ( o.ostaus = 1 ) 
ORDER BY o.selltime DESC 
LIMIT 0, 15 

The original meaning of the SQL statement is: first make a series of left connections, and then sort the first 15 records. It can also be seen from the execution plan that the number of sorting records in the last step is estimated to be 900000, and the time consumption is 12 seconds.


+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
| id | select_type | table | type  | possible_keys | key   | key_len | ref       | rows  | Extra                       |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+
| 1 | SIMPLE   | o   | ALL  | NULL     | NULL  | NULL  | NULL      | 909119 | Using where; Using temporary; Using filesort    |
| 1 | SIMPLE   | u   | eq_ref | PRIMARY    | PRIMARY | 4    | o.uid |   1 | NULL                        |
| 1 | SIMPLE   | p   | ALL  | PRIMARY    | NULL  | NULL  | NULL      |   6 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+---------------+---------+---------+-----------------+--------+----------------------------------------------------+

Since the last where condition and sort are all for the leftmost main table, you can sort my first_ Order sort reduces the amount of data in advance, and then makes left join. After SQL rewriting, the execution time is reduced to about 1 millisecond.


SELECT * 
FROM (
SELECT * 
FROM  my_order o 
WHERE ( o.display = 0 ) 
    AND ( o.ostaus = 1 ) 
ORDER BY o.selltime DESC 
LIMIT 0, 15
) o 
   LEFT JOIN my_userinfo u 
       ON o.uid = u.uid 
   LEFT JOIN my_productinfo p 
       ON o.pid = p.pid 
ORDER BY o.selltime DESC
limit 0, 15

Check the execution plan again: after materializing the subquery (select_ Type = derived). Although the estimated row scan is still 900000, the actual execution time becomes very small after using the index and limit clause.


+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
| id | select_type | table   | type  | possible_keys | key   | key_len | ref  | rows  | Extra                       |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+
| 1 | PRIMARY   | <derived2> | ALL  | NULL     | NULL  | NULL  | NULL |   15 | Using temporary; Using filesort          |
| 1 | PRIMARY   | u     | eq_ref | PRIMARY    | PRIMARY | 4    | o.uid |   1 | NULL                        |
| 1 | PRIMARY   | p     | ALL  | PRIMARY    | NULL  | NULL  | NULL |   6 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED   | o     | index | NULL     | idx_1  | 5    | NULL | 909112 | Using where                    |
+----+-------------+------------+--------+---------------+---------+---------+-------+--------+----------------------------------------------------+

8. Intermediate result set push down

Let’s look at the following example that has been preliminarily optimized (the main table in the left join takes precedence over the query criteria)


SELECT  a.*, 
     c.allocated 
FROM   ( 
       SELECT  resourceid 
       FROM   my_distribute d 
          WHERE  isdelete = 0 
          AND   cusmanagercode = '1234567' 
          ORDER BY salecode limit 20) a 
LEFT JOIN 
     ( 
       SELECT  resourcesid, sum(ifnull(allocation, 0) * 12345) allocated 
       FROM   my_resources 
          GROUP BY resourcesid) c 
ON    a.resourceid = c.resourcesid

Is there any other problem with this statement? It is not difficult to see that subquery C is a full table aggregate query, which will lead to the performance degradation of the whole statement when the number of tables is particularly large.

In fact, for subquery C, the final result set of left join only cares about the data that can match the main table resourceid. So we can rewrite the statement as follows, the execution time is reduced from 2 seconds to 2 milliseconds.


SELECT  a.*, 
     c.allocated 
FROM   ( 
          SELECT  resourceid 
          FROM   my_distribute d 
          WHERE  isdelete = 0 
          AND   cusmanagercode = '1234567' 
          ORDER BY salecode limit 20) a 
LEFT JOIN 
     ( 
          SELECT  resourcesid, sum(ifnull(allocation, 0) * 12345) allocated 
          FROM   my_resources r, 
              ( 
                   SELECT  resourceid 
                   FROM   my_distribute d 
                   WHERE  isdelete = 0 
                   AND   cusmanagercode = '1234567' 
                   ORDER BY salecode limit 20) a 
          WHERE  r.resourcesid = a.resourcesid 
          GROUP BY resourcesid) c 
ON    a.resourceid = c.resourcesid

But subquery a appears many times in our SQL statement. This writing method not only has extra overhead, but also makes the whole statement obviously complicated. Rewrite again using the with statement:


WITH a AS 
( 
     SELECT  resourceid 
     FROM   my_distribute d 
     WHERE  isdelete = 0 
     AND   cusmanagercode = '1234567' 
     ORDER BY salecode limit 20)
SELECT  a.*, 
     c.allocated 
FROM   a 
LEFT JOIN 
     ( 
          SELECT  resourcesid, sum(ifnull(allocation, 0) * 12345) allocated 
          FROM   my_resources r, 
              a 
          WHERE  r.resourcesid = a.resourcesid 
          GROUP BY resourcesid) c 
ON    a.resourceid = c.resourcesid

summary

Database compiler generates execution plan, which determines the actual execution mode of SQL. However, the compiler only serves the best of its ability, and the compilers of all databases are not perfect.

Most of the scenarios mentioned above also have performance problems in other databases. Only by understanding the characteristics of database compiler can we avoid its shortcomings and write high-performance SQL statements.

When programmers design data models and write SQL statements, they should bring in the idea or consciousness of algorithms.

Write complex SQL statements to develop the habit of using with statements. Simple and clear SQL statements can also reduce the burden of the database.

Well, the above is the whole content of this article, I hope the content of this article has a certain reference learning value for your study or work, thank you for your support for developer.