Data stack technology sharing: Interpret the type column and extra column of the MySQL execution plan

Time:2022-9-23


1. Interpret the type

The type of the execution plan represents the access data type, and there are many access types.

1、system
Indicates that this step returns only one row of data. If the execution object of this step is a driving table or a main table, then the driven table or subquery is only accessed once.

2、const

Indicates that this execution step returns at most one row of data. const usually appears in equivalent queries on primary keys or unique indexes, such as queries on the primary key id of table t:

Data stack technology sharing: Interpret the type column and extra column of the MySQL execution plan

3、eq_ref

The eq_ref type generally means that when a table is associated, the associated column on the associated table is the primary key or unique index. For example, the table jiang is associated with the lock_test table, and the associated columns are the primary key columns of the two tables:

Data stack technology sharing: Interpret the type column and extra column of the MySQL execution plan

When the above SQL is executed, the jiang table is the driving table, the lock_test is the driven table, the associated column of the driven table is the primary key id, and the type is eq_ref.

Therefore, there is an important feature for the eq_ref type: the table involved in this step is the driven table; the unique index or primary key is used in this step. Apart from system and const, this is the associated type that works best.

4、ref

Contrary to the above, if the type of a step of the execution plan is ref, it means that the associated column of this step is a non-unique index. For example, use the primary key id column of table jiang to associate the num column of the table lock_test, and establish a common index on the num column:

Data stack technology sharing: Interpret the type column and extra column of the MySQL execution plan

When the above SQL is executed, table jiang is the driving table, lock_test is the driven table, the driven table is a non-unique index, and the type is ref.

Therefore, the characteristics of ref are: the index used to access the data in this step is a non-unique index.

5、Ref_or_null
For example, execute the following statement:
Data stack technology sharing: Interpret the type column and extra column of the MySQL execution plan
Indicates that the index is gone (there is an index on the num column), but a null value is also accessed.

6、index_merge

Indicates index merging. Generally, index merging occurs when an OR operation is performed on multiple secondary index columns.
For example, execute the following statement:
mysql> explain select * from lock_test where id=3 or num=4;

Data stack technology sharing: Interpret the type column and extra column of the MySQL execution plan

id is the primary key, and a common index is built on the num column. When the statement is executed, the or operation is processed through two single-column indexes.

7、unique_subquery
Represents a unique subquery. For example, when the following statement is executed:
value in(select primary_key from single_table where …)
For the in clause, when the subquery in the in clause returns the primary key of a table, the type is displayed as unique subquery.

8、index_subquery
When the following statement is executed:
value in(select key_column from single_table where …)
Similar to the above, it means that for the in clause, when the subquery in the in clause returns the secondary index column (non-primary key column) of a certain table, the type is displayed as index_subquery.

9、range:
Fetch a portion of the data on an indexed column. Commonly used to perform between and operations on indexed columns.

10、index:
The full scan of the index generally occurs when the index is covered, that is, a full scan of the indexed column occurs.

11、all:
Full table scan without indexes.
A special case:

Explain select * from stu limit 1,1;
2. Interpret extra

1、using where:
It generally has two meanings:
Indicates that when accessing through an index, it is necessary to return to the table to access the required data;
Filter conditions occur at the server layer instead of the storage engine layer;
If the execution plan shows that the index is gone, but the rows value is high, and the extra is shown as using where, the execution effect will not be very good. Because the cost of index access is mainly on the return table, a covering index can be used to optimize it.
The filtering condition can also be suppressed by covering the index, and the filtering operation can be performed at the storage engine layer, which is the best.
So, covering index is the most effective way to solve using where.

2、using index condition
Indicates that the filtering will be pushed down to the storage layer for execution to prevent the server layer from filtering too much data
If the using index condition appears in extra, it means that access to table data is optimized.

3、using temporary
Indicates that a temporary table is used during statement execution. Temporary tables may be used in the presence of the following clauses:
order by
group by
distinct
unions, etc.
If the data cannot be directly returned to the user, it needs to be cached, and the data is cached in the user workspace as a temporary table. Note that there may be disk temporary tables, and you need to pay attention to the rows of data that need to be cached.
The temporary table corresponding to the above four operations can be eliminated by using the index.

4、using sort_union(indexs)
For example, when executing the following statement:
Data stack technology sharing: Interpret the type column and extra column of the MySQL execution plan

There are indexes on the Sname and sphone columns. At this time, the extra item of the execution plan will display using sort_union(i_sname, i_spone), indicating that the indexes are merged. Often accompanied by index_merge.

5、using MRR:
Generally, the process of accessing table data through a secondary index is: first access the secondary index column, find the corresponding secondary index data, and then obtain the corresponding primary key value, and then take the primary key value to access the table and retrieve the row data. . The data retrieved in this way is sorted according to the secondary index.
MRR means: After obtaining the corresponding primary key value through the secondary index, it does not directly access the table but stores it first. After obtaining all the primary key values, sort the primary key values, and then access the table. This can greatly reduce the number of accesses to the table, at least to achieve sequential access to the table.
One of the advantages of MRR is to improve the efficiency of index access to the table, that is, to reduce the cost of returning the table. But there is a big problem: the retrieved data is not sorted according to the secondary index.

6、using join buffer(Block Nested Loop)
BNL mainly occurs when two tables are associated, and there is no index on the associated table.
BNL means this: A is associated with B, and A’s associated column has an index but B’s does not. At this time, 10 rows of data will be taken from table A and placed in the user’s join buffer space, and then the data on B will be associated with the associated column of A in the join buffer. At this time, only one visit to table B is required. , that is, a full table scan occurs in table B.
If 10 rows of data in the join buffer are associated, 10 more rows of data are taken and continue to be associated with table B until all data in table A is associated.
As can be seen from the above, this method will probably increase the efficiency by about 90%.

7、using join buffer(Batched Key Access)
The general situation of BKA is: when the table is associated, there is an index on the driven table, but the number of rows returned by the driving table is too many.
When the above situation occurs, the returned result set of the driving table will be put into the join buffer of the user workspace, and then a record of the result set will be fetched to associate the index associated column of the driven table. After the corresponding primary key column is obtained, the data is not immediately fetched from the driven table through the primary key column, but is first stored in the workspace. After all the data in the result set has been associated, sort all the primary key columns obtained by association in the workspace, and then uniformly access the driven table to get data from it. The advantage of this is that the number of visits is greatly reduced.
It can be seen from the above: BKA uses MRR technology; BKA is suitable for the situation that the number of rows returned by the drive table is large, and the index is used when the drive table is accessed.
This feature can be turned on or off:
Set optimizer_switch=’mrr=on,batched_key_access=on’;

8、using index for group by
Indicates that the group by is completed through the composite index without returning to the table.
For example, compound index (a, b), when executing the statement: select a from tb group by b; will appear using index for group by.

9、using index

Indicates that a covering index scan is implemented; that is, the data to be accessed is in the index and does not need to be returned to the table. In general, reducing unnecessary data access can improve efficiency.

For example, take the data on the num column for the table lock_test, and create a common index on the num column:

Data stack technology sharing: Interpret the type column and extra column of the MySQL execution plan

10、using filesort
Indicates that there is sorting behavior, but not necessarily disk sorting.

11、materialize scan
A full scan of the materialized table, because the materialized table is a temporary table and there is no index on the table.

DataStack is a cloud-native, one-stop data center PaaS. We have an interesting open source project on github and gitee: FlinkX. FlinkX is a unified data synchronization tool based on Flink. It can collect static data and also It can collect real-time changing data, and is a data synchronization engine that integrates global, heterogeneous, and batch streams. If you like it, please give us a star! star! star!

github open source project:https://github.com/DTStack/fl…

gitee open source project:https://gitee.com/dtstack_dev…