MySQL schema Overview – > query execution process – > sql parsing order

Time:2021-12-5

preface

I’ve always wanted to know how an SQL statement is executed and the order in which it is executed, and then check and summarize the data of all parties, so I have the following blog post.

This article will discuss the knowledge from MySQL overall architecture – > query execution process – > statement execution order.

MySQL Architecture Overview

It is best to look at the diagram of the architecture, coupled with the necessary explanatory text.

The following figure is based on the original one in the reference book, and then adds your own understanding to it.

MySQL schema Overview - > query execution process - > sql parsing order

As can be seen from the above figure, the whole architecture is divided into two layers. The upper layer is called ‘SQL layer’ of MySQL, and the lower layer is called ‘storage engine layer’ of various storage engines that provide interfaces to the upper. The functions of other modules and components can be simply understood from the name, which will not be described here.

Query execution process

Let’s move forward. Let me talk about the query execution process according to my own understanding:

connect
  • The client initiates a query request and listens to the ‘connection management module’ of the client to receive the request;
  • Forward the request to ‘connect in / thread module’;
  • Call ‘user module’ to check authorization;
  • After passing the check, the connection in / thread module takes out the idle cached connection thread from the thread connection pool to connect with the client request. If it fails, a new connection request will be created.
handle
  • First query the cache, check whether the query statements match exactly, and then check whether they have permissions. If they are successful, the data will be retrieved and returned directly;
  • If the previous step fails, it will be transferred to the command parser to generate the parsing tree after lexical analysis and syntax analysis;
  • The next step is the preprocessing stage, which processes semantics that cannot be solved by the parser, checks permissions, etc., and generates a new parsing tree;
  • Then transfer it to the corresponding module for processing;
  • If it is a select query, a lot of optimization will be done through the query optimizer to generate an execution plan;
  • After receiving the request, the module checks whether the connected user has access to the target table and target fields through the “access control module”;
  • If yes, call the table management module. First, check whether the table cache exists. If yes, the corresponding table and lock are obtained directly. Otherwise, reopen the table file;
  • According to the meta data of the table, obtain the storage engine type and other information of the table, and call the corresponding storage engine for processing through the interface;
  • When data changes occur in the above process, if the log function is turned on, it will be recorded in the corresponding binary log file.

result

  • After the query request is completed, return the result set to the “connect in / thread module”;
  • The returned can also be the corresponding status ID, such as success or failure;
  • The ‘connect in / thread module’ performs subsequent cleaning and continues to wait for requests or disconnect from the client.

MySQL schema Overview - > query execution process - > sql parsing order

SQL parsing order

Next, let’s take a look at the past and present lives of an SQL statement.

Let’s first look at the example statement:

SELECT DISTINCT
    < select_list >
FROM
    < left_table > < join_type >
JOIN < right_table > ON < join_condition >
WHERE
    < where_condition >
GROUP BY
    < group_by_list >
HAVING
    < having_condition >
ORDER BY
    < order_by_condition >
LIMIT < limit_number >

However, its execution order is as follows:

FROM <left_table>
ON <join_condition>
<join_type> JOIN <right_table>
WHERE <where_condition>
GROUP BY <group_by_list>
HAVING <having_condition>
SELECT 
DISTINCT <select_list>
ORDER BY <order_by_condition>
LIMIT <limit_number>

Although I didn’t expect it to be like this, it’s still very natural and harmonious at first glance. Where to get it, constantly filter conditions, choose the same or different, and arrange the order, then I know to take the first few.

In that case, let’s look at the details step by step.

preparation

Create test database

create database testQuery

Create test table

CREATE TABLE table1
(
    uid VARCHAR(10) NOT NULL,
    name VARCHAR(10) NOT NULL,
    PRIMARY KEY(uid)
)ENGINE=INNODB DEFAULT CHARSET=UTF8;

CREATE TABLE table2
(
    oid INT NOT NULL auto_increment,
    uid VARCHAR(10),
    PRIMARY KEY(oid)
)ENGINE=INNODB DEFAULT CHARSET=UTF8;

insert data

INSERT INTO table1(uid,name) VALUES('aaa','mike'),('bbb','jack'),('ccc','mike'),('ddd','mike');
INSERT INTO table2(uid) VALUES('aaa'),('aaa'),('bbb'),('bbb'),('bbb'),('ccc'),(NULL);

The final desired result

SELECT
    a.uid,
    count(b.oid) AS total
FROM
    table1 AS a
LEFT JOIN table2 AS b ON a.uid = b.uid
WHERE
    a. NAME = 'mike'
GROUP BY
    a.uid
HAVING
    count(b.oid) < 2
ORDER BY
    total DESC
LIMIT 1;
Now start the journey of SQL parsing!
  • FROM

When multiple tables are involved, the output of the left table will be used as the input of the right table, and then a virtual table VT1 will be generated.

(1-j1) Cartesian product

Calculate the cross join of two associated tables to generate virtual table vt1-j1.

mysql> select * from table1,table2;
+-----+------+-----+------+
| uid | name | oid | uid  |
+-----+------+-----+------+
| aaa | mike |   1 | aaa  |
| bbb | jack |   1 | aaa  |
| ccc | mike |   1 | aaa  |
| ddd | mike |   1 | aaa  |
| aaa | mike |   2 | aaa  |
| bbb | jack |   2 | aaa  |
| ccc | mike |   2 | aaa  |
| ddd | mike |   2 | aaa  |
| aaa | mike |   3 | bbb  |
| bbb | jack |   3 | bbb  |
| ccc | mike |   3 | bbb  |
| ddd | mike |   3 | bbb  |
| aaa | mike |   4 | bbb  |
| bbb | jack |   4 | bbb  |
| ccc | mike |   4 | bbb  |
| ddd | mike |   4 | bbb  |
| aaa | mike |   5 | bbb  |
| bbb | jack |   5 | bbb  |
| ccc | mike |   5 | bbb  |
| ddd | mike |   5 | bbb  |
| aaa | mike |   6 | ccc  |
| bbb | jack |   6 | ccc  |
| ccc | mike |   6 | ccc  |
| ddd | mike |   6 | ccc  |
| aaa | mike |   7 | NULL |
| bbb | jack |   7 | NULL |
| ccc | mike |   7 | NULL |
| ddd | mike |   7 | NULL |
+-----+------+-----+------+
28 rows in set (0.00 sec)

(1-j2) on filter

Filter based on the virtual table vt1-j1, filter out all columns that meet the on predicate conditions, and generate the virtual table vt1-j2.

Note: because of grammatical constraints, ‘where’ is used instead, from which readers can also feel the subtle relationship between the two.

mysql> SELECT
    -> *
    -> FROM
    -> table1,
    -> table2
    -> WHERE
    -> table1.uid = table2.uid
    -> ;
+-----+------+-----+------+
| uid | name | oid | uid  |
+-----+------+-----+------+
| aaa | mike |   1 | aaa  |
| aaa | mike |   2 | aaa  |
| bbb | jack |   3 | bbb  |
| bbb | jack |   4 | bbb  |
| bbb | jack |   5 | bbb  |
| ccc | mike |   6 | ccc  |
+-----+------+-----+------+
6 rows in set (0.00 sec)

(1-j3) add external column

If an external connection (left, right, full) is used, the columns in the main table (reserved table) that do not meet the on condition will also be added to vt1-j2 as external rows to generate virtual table vt1-j3.

mysql> SELECT
    -> *
    -> FROM
    -> table1 AS a
    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid;
+-----+------+------+------+
| uid | name | oid  | uid  |
+-----+------+------+------+
| aaa | mike |    1 | aaa  |
| aaa | mike |    2 | aaa  |
| bbb | jack |    3 | bbb  |
| bbb | jack |    4 | bbb  |
| bbb | jack |    5 | bbb  |
| ccc | mike |    6 | ccc  |
| ddd | mike | NULL | NULL |
+-----+------+------+------+
7 rows in set (0.00 sec)

The following is a vivid explanation of ‘SQL joins’ from the Internet. If it infringes your rights and interests, please tell me to delete it. Thank you.

MySQL schema Overview - > query execution process - > sql parsing order

  • WHERE

Filter the temporary table generated in the VT1 process, and the columns that meet the where clause are inserted into the vt2 table.

Note: aggregation operation cannot be used because of grouping at this time; You cannot use aliases created in select;

Difference from on:

  • If there are external columns, on filters the associated table, and the main table (reserved table) will return all columns;
  • If no external columns are added, the effect is the same.

Application:

  • The main table should be filtered in where;
  • For the associated table, use on if the condition query is followed by the connection, and use where if the condition query is followed by the connection.
mysql> SELECT
    -> *
    -> FROM
    -> table1 AS a
    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid
    -> WHERE
    -> a. NAME = 'mike';
+-----+------+------+------+
| uid | name | oid  | uid  |
+-----+------+------+------+
| aaa | mike |    1 | aaa  |
| aaa | mike |    2 | aaa  |
| ccc | mike |    6 | ccc  |
| ddd | mike | NULL | NULL |
+-----+------+------+------+
4 rows in set (0.00 sec)
  • GROUP BY

This clause will group the tables generated in vt2 according to the columns in group by to generate VT3 tables.

Note: the columns used in the statements of subsequent processing, such as select and having, must be included in group by. For those that do not appear, you can use the aggregate function;

Reason: group by changes the reference to the table and converts it into a new reference method. The number of columns that can perform next-level logical operations on it will be reduced.

My understanding is: according to the grouping field, the records with the same grouping field are merged into one record, because each grouping can only return one record, unless it is filtered out, and the fields not in the grouping field may have multiple values, which cannot be put into one record, Therefore, these multivalued columns must be converted into single values through aggregation functions;

mysql> SELECT
    -> *
    -> FROM
    -> table1 AS a
    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid
    -> WHERE
    -> a. NAME = 'mike'
    -> GROUP BY
    -> a.uid;
+-----+------+------+------+
| uid | name | oid  | uid  |
+-----+------+------+------+
| aaa | mike |    1 | aaa  |
| ccc | mike |    6 | ccc  |
| ddd | mike | NULL | NULL |
+-----+------+------+------+
3 rows in set (0.00 sec)
  • HAVING

This clause filters different groups in the VT3 table and only applies to the grouped data. Clauses that meet the having condition are added to the vt4 table.

mysql> SELECT
    -> *
    -> FROM
    -> table1 AS a
    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid
    -> WHERE
    -> a. NAME = 'mike'
    -> GROUP BY
    -> a.uid
    -> HAVING
    -> count(b.oid) < 2;
+-----+------+------+------+
| uid | name | oid  | uid  |
+-----+------+------+------+
| ccc | mike |    6 | ccc  |
| ddd | mike | NULL | NULL |
+-----+------+------+------+
2 rows in set (0.00 sec)
  • SELECT

This clause processes the elements in the select clause to generate the vT5 table.

(5-j1) calculate the expression in the select clause to generate vt5-j1.

(5-J2) DISTINCT

Find the duplicate column in vt5-1 and delete it to generate vt5-j2.

If the distinct clause is specified in the query, a temporary memory table will be created (if there is no memory, it needs to be stored on the hard disk). The table structure of this temporary table is the same as that of the virtual table vT5 generated in the previous step, except that a unique index is added to the column for distinct operation to remove duplicate data.

mysql> SELECT
    -> a.uid,
    -> count(b.oid) AS total
    -> FROM
    -> table1 AS a
    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid
    -> WHERE
    -> a. NAME = 'mike'
    -> GROUP BY
    -> a.uid
    -> HAVING
    -> count(b.oid) < 2;
+-----+-------+
| uid | total |
+-----+-------+
| ccc |     1 |
| ddd |     0 |
+-----+-------+
2 rows in set (0.00 sec)
  • ORDER BY

From the table in vt5-j2, sort the results according to the conditions of order by clause to generate VT6 table.

Note: the only place where aliases in select can be used.

mysql> SELECT
    -> a.uid,
    -> count(b.oid) AS total
    -> FROM
    -> table1 AS a
    -> LEFT OUTER JOIN table2 AS b ON a.uid = b.uid
    -> WHERE
    -> a. NAME = 'mike'
    -> GROUP BY
    -> a.uid
    -> HAVING
    -> count(b.oid) < 2
    -> ORDER BY
    -> total DESC;
+-----+-------+
| uid | total |
+-----+-------+
| ccc |     1 |
| ddd |     0 |
+-----+-------+
2 rows in set (0.00 sec)
  • LIMIT

The limit clause selects the specified row data starting from the specified position from the VT6 virtual table obtained in the previous step.

be careful:

  • The positive and negative effects of offset and rows;
  • This can be done when the offset is large and the aging rate is very low;
  • The sub query is used for optimization. In the sub query, the maximum ID is obtained from the index, then arranged in reverse order, and then the n-row result set is obtained;
  • The inner join optimization is adopted. In the join clause, the ID list is also obtained from the index first, and then the final result is obtained by directly associating the query.
mysql> SELECT
    -> a.uid,
    -> count(b.oid) AS total
    -> FROM
    -> table1 AS a
    -> LEFT JOIN table2 AS b ON a.uid = b.uid
    -> WHERE
    -> a. NAME = 'mike'
    -> GROUP BY
    -> a.uid
    -> HAVING
    -> count(b.oid) < 2
    -> ORDER BY
    -> total DESC
    -> LIMIT 1;
+-----+-------+
| uid | total |
+-----+-------+
| ccc |     1 |
+-----+-------+
1 row in set (0.00 sec)

So far, the journey of SQL parsing is over. The figure above summarizes:

MySQL schema Overview - > query execution process - > sql parsing order

Epilogue

Well, the in-depth understanding trip here is almost really over. Although it is not very in-depth, it is just some things pieced together. After referring to some books I have read before, the master’s pen is really different. And in this process, I got a lot of things. The most important thing is to further realize the grandeur of the computer software world~

Transferred from: annsshadow
cnblogs.com/annsshadow/p/5037667.html

MySQL schema Overview - > query execution process - > sql parsing order

Recommended Today

The real problem of Alibaba IOS algorithm can’t hang up this time

More and more IOS developers continue to enter the peak of job hopping in 2020 Three main trends of interview in 2020: IOS bottom layer, algorithm, data structure and audio and video development Occupied the main battlefield. Data structure and algorithm interview, especially figure, has become the main reason for the failure of most first-line […]