Case study of stack SQL optimization: implicit conversion

Time:2021-4-21

Data stack is a cloud native station data platform PAAS. We have an interesting open source project on GitHub and gitee: flinkx. Flinkx is a unified data synchronization tool based on Flink, which can collect both static data and real-time data. It is a global, heterogeneous, batch flow integrated data synchronization engine. If you like, please give us a star! star! star!

GitHub open source project:https://github.com/DTStack/fl…

Gitee open source project:https://gitee.com/dtstack_dev…

MySQL is one of the most popular relational databases at present. With the rapid development of Internet, MySQL database is widely used in the production system of e-commerce, finance and many other industries.

In the actual development and operation process, we must often encounter the problem of slow SQL. A SQL with poor performance often brings too much performance overhead, which leads to the overuse of the entire operating system resources, and even causes session accumulation, leading to online failure.

In the scenario of SQL tuning, a common problem is implicit type conversion. So what is implicit conversion?

In mysql, when operators are used with different types of operands, type conversion occurs to make operands compatible, and implicit conversion occurs. Implicit transformation often means that the efficiency of SQL execution will be greatly reduced.

Next, the author will combine with several common scenarios to let you know what implicit conversion is and how to deal with the situation of implicit conversion. Please read the following cases.

1、 Implicit conversion caused by inconsistency between passing data type and field type

One of the classic scenarios is the implicit conversion caused by the inconsistency between the data type and the field type, which is also the most common scenario. Here is an example:

1) Scenarios to be optimized

SQL and execution plan are as follows:

select * from dt_t1 where emp_no = 41680;

Case study of stack SQL optimization: implicit conversion

The table index is as follows:

key idx_empno (emp_no)

[click and drag to move]

2) Scene analysis

From the execution plan, type part: all, full table scan, not IDX_ Empno index. In general, the data type passed may not be consistent with the actual field type. Let’s take a look at the specific table structure.

[email protected] mysql.sock 5.7.28-log :[employees] 14:48:10>desc employees;
Field Type Null Key Default Extra
emp_no varchar(14) NO MUL NULL
birth_date date NO NULL
first_name varchar(14) NO NULL
last_name varchar(16) NO NULL
gender enum(‘M’,’F’) NO NULL
hire_date date NO NULL

6 rows in set (0.00 sec)

In the table structure, we can see that the field type is varchar and the passing field is integer, which makes the implicit conversion unable to follow the index.

3) Scene optimization

The SQL can avoid implicit conversion by simple rewriting, as follows:

select * from dt_t1 where emp_no=’41680′;

When the incoming data is of varchar type consistent with the matching field, the index can be used normally. The optimization effect is as follows:

Case study of stack SQL optimization: implicit conversion

2、 Implicit conversion caused by inconsistent associated field types

In addition to the constant matching query scenario, when the associated fields of the associated query are inconsistent, there will also be implicit conversion.

one ) Scenarios to be optimized

SELECT count(*) from t1 as a
JOIN t2 b on a.id = b.alipay_order_no ;

2) Scene analysis

From the execution plan, we can see that the driven table B, Extra:Range checked for each record (index map: 0x8)

Generally, when we see range checked for each record (index map: 0x8), it may be an implicit conversion. Let’s see how the official document explains it.

Range checked for each record (index map: N) (JSON property: message)

MySQL found no good index to use, but found that some of indexes might be used after column values from preceding tables are known. For each row combination in the preceding tables, MySQL checks whether it is possible to use a range or index_merge access method to retrieve rows. This is not very fast, but is faster than performing a join with no index at all. The applicability criteria are as described in Section 8.2.1.2, “Range Optimization”, and Section 8.2.1.3, “Index Merge Optimization”, with the exception that all column values for the preceding table are known and considered to be constants.

Indexes are numbered beginning with 1, in the same order as shown by SHOW INDEX for the table. The index map value N is a bitmask value that indicates which indexes are candidates. For example, a value of 0x19 (binary 11001) means that indexes 1, 4, and 5 will be considered.

View the structure of the table below:

CREATE TABLE t2 (
id int(11) NOT NULL AUTO_INCREMENT,
alipay_order_no varchar(45) DEFAULT NULL,
xxxx
PRIMARY KEY (id),
KEY idx_alipay_order_no_temp (alipay_order_no) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2539968 DEFAULT CHARSET=utf8
A total of 1 line of records is returned , It takes 5 ms .
CREATE TABLE t1 (
id bigint(20) NOT NULL,
xxxxxx
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
A total of 1 line of records is returned , It takes 5 ms .

[click and drag to move]

We observed from the table structure that the data of the associated field is of int type and varchar type.

When this happens, how can we optimize it?

Let’s also look at the specific execution plan. The driver table is a and the driven table is B. the association condition is a.id = b.alipay_ order_ No; when the field ID of a table is a constant, it is passed to b.alipay_ order_ No, column_ If the type is inconsistent and the index cannot be used, let’s compare the field type passed by a.id with b.alipay_ order_ No, you can use the index?

3) Scene optimization

We can explicitly type convert the associated fields of the driven table to make them consistent with the type of the associated fields of the driven table. After rewriting, the SQL is as follows:

SELECT COUNT(*)
FROM t1 o
join t2 og ON o.def8= og.group_id
WHERE o.def1= ‘DG21424956’

2) Scene analysis

From this execution plan, we can see that the second list og contains using join buffer (block nested loop), type = all

Generally, in this case: using join buffer (block nested loop), a. the associated field has no index, B. implicit conversion, etc

See the following table structure:

create table t1(

 .....  

group_id varchar(20) NOT NULL,
PRIMARY KEY (id),
KEY group_id (group_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

create table t2(

 .....  
`def8` varchar(20) DEFAULT NULL,

PRIMARY KEY (id),
KEY idx_tr_def1 (def8)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

[click and drag to move]

We can see from the table structure that all the associated fields have indexes, but the character sets are different, T1 utf8, T2 utf8mb4

3) Scene optimization

The idea of SQL rewriting is similar to the above example. We convert the character set of the associated fields of the driver table as follows:

SELECT COUNT(*) FROM t1 o
left join t2 og ON CONVERT( o.def8 USING utf8 ) = og.group_id
WHERE o.def1= ‘DG21424956

After the conversion to a consistent character set, it can be associated by index

3、 Implicit conversion caused by inconsistent validation rules

So, as long as the data types and character sets on both sides of the operator are consistent, there will be no implicit conversion?

The answer is no, because another very important attribute of character set is verification rules. When the verification rules are inconsistent, there will be implicit conversion behavior. Take the following example:

1) Scenarios to be optimized

SELECT *
FROM t1
WHERE uuid in (SELECT uuid FROM t2 WHERE project_create_at!= “0000-00-00 00:00:00”)

The SQL execution plan is as follows:

2) Scene analysis

The table structure of the two tables is as follows:

CREATE TABLE t1 (
id int(11) NOT NULL AUTO_INCREMENT, `
uuid` varchar(128) COLLATE utf8_unicode_ci NOT NULL DEFAULT ” COMMENT ‘UUID’,
xxxxxx
PRIMARY KEY (id),
UNIQUE KEY uuid_idx (uuid)
) ENGINE=InnoDB AUTO_INCREMENT=2343994 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE t2 (
id int(11) NOT NULL AUTO_INCREMENT,
uuidvarchar ( one hundred and twenty-eight ) CHARACTER SET utf8mb4 COLLATE utf8mb4_ unicode_ ci NOT NULL DEFAULT ” COMMENT ‘ Project UUID ‘,
PRIMARY KEY (id),
) ENGINE=InnoDB AUTO_INCREMENT=5408 DEFAULT CHARSET=utf8

We can see from the table structure that T1 table, as the driven table UUID, has a unique index, and the data types and character sets of associated fields are consistent, but the different verification rules lead to the inability to use the index in this scenario.

3) Scene optimization

We can display and define the verification rules of the associated fields of the driver table to make them consistent with the driven table by rewriting as follows

explain extended
select b.*
from (select uuid COLLATE utf8_unicode_ci as uuid
from t1 where project_create_at != “0000-00-00 00:00:00”) a, t2 b

where a.uuid = b.uuid
id select_type table type key key_len ref rows Extra
1 PRIMARY <derived2> ALL 51
1 PRIMARY b eq_ref uuid_idx 386 a.uuid 1
2 DERIVED volunteer_patients range idx-project-create-at 6 51 Using index condition

A total of 3 lines of records are returned , It takes 4 ms .

[click and drag to move]

As you can see, the rewritten SQL normally uses the index for Field Association, which achieves our expected effect.
4、 Summary

The main scenarios of implicit conversion are inconsistent field types, inconsistent associated field types, inconsistent character set types or inconsistent proofreading rules. When there are SQL performance problems caused by implicit transformation, analyze the corresponding scenarios and find the right solution.

In addition, implicit conversion may lead to inaccurate query result set, inconsistent character set may also lead to synchronous error reporting between master and slave, so we should try to avoid it in practical use.

Recommended Today

Large scale distributed storage system: Principle Analysis and architecture practice.pdf

Focus on “Java back end technology stack” Reply to “interview” for full interview information Distributed storage system, which stores data in multiple independent devices. Traditional network storage system uses centralized storage server to store all data. Storage server becomes the bottleneck of system performance and the focus of reliability and security, which can not meet […]