As an excellent relational open source database, MySQL database is favored by everyone. In our daily work, we often deal with it. Do you know how MySQL is executed when we write SQL? Let’s talk about it.
select * from T where ID=10;Take this SQL statement as an example:
When we execute the query statement, we will receive the result returned by mysql. How is it executed step by step in MySQL?
First of all, let’s take a look at the MySQL architecture and the organization structure of MySQL services
As shown in the figure above:
MySQL can be roughly divided into server layer and storage engine layer.
The server layer includes connector, query cache, analyzer, optimizer, executor, as well as all built-in functions (such as date, time, mathematics and encryption functions, etc.), and all functions across storage engines are implemented in this layer, such as stored procedures, triggers, views, etc.
The storage engine layer is responsible for data access. The architecture is plug-in and supports InnoDB, MyISAM, memory and other storage engines. After MySQL version 5.5.5, its default storage engine is InnoDB.
All storage engines share a server layer。
Embedded problem: the difference between memory engine and redis
Usually, when connecting to MySQL database from the command line:
mysql -h127.0.0.1 -P6293 -uroot -p
This command goes to the connector to verify the user’s identity.
mysqlIt is a client tool used to establish a connection with the server. After completing the TCP handshake, the connector starts to use the input parameter information for authentication.
- If the user name or password is incorrect, there will be one
Access Denied for userClient interrupt.
- If the verification is passed, the connector will find out the user permissions in the permission table and store them in variables. After that, the permission judgment logic in the connection will depend on the permissions read at this time.
At this point, we should understand why the existing connection of a user will not be changed to the new permission after we have modified the permission to a user. For the new permissions to take effect, you can only create a new connection for the user.
After the connection is completed, if there is no follow-up action, the connection will be idle
show processlistCommand to view all connection status. among
CommandColumns are displayed as
SleepIndicates that there is an idle connection in the system.
If the client is in sleep for a long time, the connector will disconnect it automatically. The disconnection time is based on the configuration
wait_timeoutParameter, the default value is 8 hours.
If the client sends the request again after the connection is disconnected, it will receive an error:
Lost connection to MySQL server during query。 At this point, the request continues to execute and needs to be reconnected.
In mysql, a long connection means that after a successful connection, if the client continues to request, the same connection will be used all the time. The short connection means that the connection is broken every time a few queries are executed, and a new connection is rebuilt in the next query.
The process of establishing a connection is usually complicated, so we should try to use long connection.
However, it does not mean that there will be no problem with long connections. Because the temporary memory used during the execution of MySQL is managed in the connection object, the memory consumption of the connection will be increased every time the request is executed. If these resources are not released, the memory will be full and will be forced to kill by the system. From the perspective of the phenomenon, MySQL is restarted abnormally.
Then the solution to this problem needs to consider the following two points:
1. Regularly disconnect a long connection, or judge that a connection in the program occupies too much memory, it will be closed, and then you can use reconnection.
2. After MySQL 5.7, there is a mysql_ reset_ The function of connection can reinitialize the connection resource, that is to release the memory occupied by the connection. This process will restore the connection to the state it was created without reconnection and permission verification.
After the connection is established, we can execute the select statement, and we will come to the second step of the execution logic: query cache.
This step will not be explained too much, because MySQL 8.0 directly deleted the whole function of query cache.
We often don’t need to use query cache because its hit rate is really low. Updating a table row by row will empty all query caches on the table.
The execution logic of the analyzer is to build a parse tree, check its morphology and syntax, generate a new parse tree, and check whether the fields and tables exist.
First of all, lexical analysis: mainly based on the keyword of Mysql to verify and parse, and identify the table name and column name.
mysql> elect * from t where ID=1; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'elect * from t where ID=1' at line 1
After the syntax analysis: on the basis of lexical analysis, judge whether the SQL statement is legal.
If the statement is incorrect (for example, the field does not exist), you will receive an error message:
If we execute this statement: the field K in table t does not exist, an error will be reported in the parser phase
select * from T where k=1 Unknown column ‘k’ in ‘where c...
After that, a permission check will be performed. That is to say, if the user does not have the permission of table T and the field K page in SQL does not exist, an error is reported as no permission.
After the analyzer, MySQL already knows what we are going to do, and it needs to be processed by the optimizer before starting to execute SQL.
Simply put, the optimizer is to optimize the SQL execution plan.
The optimizer decides which index to use when there are multiple indexes in the table, or determines the join order of each table when there are multiple table associations in a statement.
To put it simply: the optimizer converts the previously generated parse tree into an execution plan and makes the optimal selection.
If you execute the following statement:
mysql> select * from t1 join t2 using(ID) where t1.c=10 and t2.d=20;
The ID value of C = 10 can be extracted from the T1 table, and then the ID value is associated with the T2 table to determine whether the value of D in the T2 table is 20;
You can also take the ID value of D = 20 from the T2 table, and then associate the ID value with the T1 table to determine whether the value of C in T1 is equal to 10.
The logical results of the above two execution methods are the same, but the execution efficiency may be different. The role of the optimizer is to decide which scheme to use.
After the optimizer is executed, it enters the actuator phase.
Embedded problem: how does the optimizer select indexes? Will it be the wrong choice?
MySQL knows what we are going to do through the analyzer and how to do it through the optimizer. When it comes to the executor stage, MySQL starts to execute logic.
Simply put, the executor is to check permissions, open tables, and process data.
When executing SQL, it will judge whether the tables in SQL and other tables involved in the operation (such as the trigger operation of updating one table to trigger the update of another table) have relevant permissions on the current connection. If not, the permission error will be returned; if there is, continue to process the data
Note: the permission check is not performed on the executor. In fact, when the query cache is hit, the permission verification is performed when the cache returns the result. And is called before the optimizer
mysql> select * from T where ID=10; ERROR 1142 (42000): SELECT command denied to user 'b'@'localhost' for table 'T'
select * from T where ID=10;In this statement, the ID field in table t has no index, so the execution process is as follows:
1. Call the InnoDB engine interface to get the first row of the table and judge whether the ID value is 10. If not, skip it. If yes, the row information will be stored in the result set.
2. Call the engine interface to continue to get the next line, and repeat the judgment in step 1 until the last line of data.
3. The actuator returns the result set to the client.
For an indexed table, the interface for fetching the first row satisfying the condition is called for the first time in the engine, and then the next row satisfying the condition is fetched in a loop. These interfaces are defined in the engine.
After that, we can see one in the slow query log of the database
rows_examinedField representing the number of lines scanned during the execution of this statement. This value is accumulated each time the executor calls the engine to get data rows.
In some scenarios, when the executor is called once, multiple lines are scanned inside the engine, so the engine scans the number of lines and
rows_examinedIt’s not exactly the same.
When the query cache is turned on, the result set is updated to the query cache.
1. In the executor stage, why should we determine whether the table has the permission to execute queries, rather than in the parser? According to the information in MySQL_ In theory, we can judge whether there is permission in the parser phase.
Sometimes the tables to be operated by SQL statements are not just those in the literal SQL. For example, there is a trigger, which can only be determined at the executor stage. Therefore, the judgment authority before the optimizer is sometimes incomplete.
2. Create a user without select permission and execute
select * from T where k=1;(there is no K field in table t)
select command deniedinstead of:
unknown columnIs it possible to indicate that the read column does not exist until the table is opened?
It does not mean that, for the sake of security, it is definitely best to return unauthorized information
The client connects to the server through the connector, obtains the permission and other information, and then in the effective duration of the connection (Interactive)_ Timeout and wait_ Time out parameter control, version 5.7 will disconnect automatic reconnection) to process client requests.
Determine which SQL is in select / update / delete / insert, and if it is, determine whether the query cache is enabled. (at this time, the select of the judgment is a simple judgment, and other parsing is not done, so the analyzer will judge and parse again)
If the query cache is on, the
If it is hit, the query permission will be judged when the data is returned, and the data will be returned if the permission is passed.
If it fails, it enters the parser.
If it is not turned on, enter the analyzer directly.
The analyzer conducts lexical analysis, syntax analysis, and checks the syntax order of SQL to generate a parse tree. Then the preprocessor further analyzes the parse tree to verify whether it has the corresponding permissions for the table and whether the data table and fields exist. After the verification, the parsing tree is updated and handed to the optimizer for processing.
The optimizer selects the optimal execution plan of SQL and gives it to the executor.
The executor verifies the corresponding permissions again. After the validation, it calls the engine interface to get the data and returns the results to the client. If the query cache is enabled, the cache will be updated. In the case of an update / delete / insert request, the query cache is deleted.
Some easy to miss knowledge:
1. The connector is to query the user’s permission from the permission table and save it in a variable for query cache, analyzer and executor to use when checking permissions.
2. During the execution of SQL, there may be triggers, which can be used to determine whether there are corresponding permissions at runtime
precheckThe operation cannot check the permissions of the tables involved in the runtime, so it is necessary to check the permissions in the executor stage. In addition, because of the precheck step, it is shown that the user does not have permission when an error is reported, rather than that the field K does not exist. This is to avoid exposing the table structure to the user.
3. The stage of lexical analysis is from information_ In the schema, the structure information of the table is obtained.
4. The connection pool can be used to change short connection to long connection
5、mysql_ reset_ Connection is an API provided by MySQL for various languages, not SQL statements.
6、wait_ Timeout is the idle timeout of a non interactive connection_ Timeout is the idle timeout for an interactive connection. Execution time is not included in idle time. Through the client connection is interactive, through the program connection is not interactive