SQL language everywhere。 SQL is not only the exclusive skill of technical personnel, it seems that everyone can write SQL, just as everyone is a product manager. If you do background development, crud is a routine. If you do data warehouse development, writing SQL may occupy most of your working time. When we understand the select syntax, we also need to understand the underlying principle of select execution. Only in this way can we have a deeper understanding of SQL. This article will gradually decompose the execution process of SQL, hoping to help you.
The purpose of this paper is to explain the execution process of SQL query. It does not involve too complex SQL operations. It mainly involves two tablescitizenandcityThe specific data are as follows:
CREATE TABLE citizen ( name CHAR ( 20 ), city_id INT ( 10 ) ); CREATE TABLE city ( city_id INT ( 10 ), city_name CHAR ( 20 ) ); INSERT INTO city VALUES (1, "Shanghai"), (2, "Beijing"), (3) "Hangzhou"); INSERT INTO citizen VALUES ("tom",3), ("jack",2), ("robin",1), ("jasper",3), ("kevin",1), ("rachel",2), ("trump",3), ("lilei",1), ("hanmeiei",1);
Query execution order
The query statements involved in this paper are as follows: join the city table and the city table, and then filter out the city_ name != ” Shanghai “data, and then according to the city_ Name groups the cities whose total number of people in each city is more than 2. The details are as follows:
SELECT city.city_name AS "City", COUNT(*) AS "citizen_cnt" FROM citizen JOIN city ON citizen.city_id = city.city_id WHERE city.city_ name != ' 'shanghai ' GROUP BY city.city_name HAVING COUNT(*) >= 2 ORDER BY city.city_name ASC LIMIT 2
The writing order of the above SQL query statement is as follows:
SELECT ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY ...
But the execution sequence is not like this. The specific execution sequence is as follows:
- 1. Get data（From, Join)
- 2. Filtering data（Where)
- 3. Grouping（Group by)
- 4. Group filtering（Having)
- 5. Return to the query field（Select)
- 6. Sorting and pagination（Order by & Limit / Offset)
Scream tip: This paper aims to explain the general underlying principle of SQL execution, and does not consider its optimization technology, such as predicate push down, projection push down, and so on.
The underlying principles of execution
In fact, the SQL execution order mentioned above is the so-called underlying principle. When we execute a select statement, each step will produce aVirtual tableThe virtual table is used as input in the next step. Note that these processes are transparent to users.
You can notice that select starts from the step from. In this stage, if multiple tables are used to join, the following steps will be followed:
Get data（From, Join)
- First, the Cartesian product is obtained by cross join, which is equivalent to the virtual table vt1-1;
- Then, the virtual table vt1-1 is used as input and the virtual table vt1-2 is output;
- Add external rows. If we use left join, right link or full join, external rows will be involved, that is, external rows will be added to virtual table vt1-2 to get virtual table vt1-3
After the above steps, we get a final virtual table VT1, on which we use where filter to filter out the data that does not meet the conditions, so as to get the virtual table vt2.
After where filtering, we get vt2. Next, the group by operation is performed to get the middle virtual table VT3.
On the basis of virtual table VT3, we use having to filter out the aggregate data that does not meet the conditions, and get vt4.
Return to query field（Select)
When we have finished the condition filtering, we can filter the fields extracted from the table, that is, enter the select and distinct phases. First, the target fields are extracted in the select phase, and then the duplicate rows are filtered out in the distinct phase to get the middle virtual tables vt5-1 and vt5-2, respectively.
Sorting and pagination（Order by & Limit / Offset)
After we extract the desired field data, we can sort according to the specified field, that is, the order by stage, and get the virtual table VT6. Finally, on the basis of VT6, take out the record of the specified row, that is, the limit stage, and get the final result, which corresponds to the virtual table vt7
Detailed implementation step analysis
Step 1: get data（From, Join)
FROM citizen JOIN city
The first step in this process is to execute the statements in the from clause and then the join clause. The result of these operations is the Cartesian product of two tables.
After the execution of from and join, the required rows will be filtered according to the on condition of join
ON citizen.city_id = city.city_id
Step 2: filtering data（Where)
After you get the row that meets the condition, it is passed to the where clause. This evaluates each row using a conditional expression. If the row does not evaluate to true, it is removed from the collection.
WHERE city.city_ name != ' 'shanghai '
Step 3: grouping（Group by)
The next step is to execute the group by clause, which groups the rows with the same value. After that, all select expressions are evaluated by group, not by row.
GROUP BY city.city_name
Step 4: Group Filtering（Having)
The grouped data is filtered by the predicates contained in the having clause
HAVING COUNT(*) >= 2
Step 5: return to query field（Select)
In this step, the processor will evaluate what the query result will print and whether there are some functions to run on the data, such as distinct, Max, sqrt, date, lower, etc. In this case, the select clause only prints the city name and the count (*) value of its corresponding group, and uses the identifier “city” as the city_ The alias of the name column.
SELECT city.city_name AS "City", COUNT(*) AS "citizen_cnt"
Step 6: sorting and pagination（Order by & Limit / Offset)
The final processing step of the query involves the sorting of the result set and the output size. In our example, we arrange it in ascending alphabetical order and output two data results.
ORDER BY city.city_name ASC LIMIT 2
This paper mainly analyzes the execution order and underlying principle of SQL statements, the basic SQL query will be divided into six steps. Combined with specific examples, this paper gives the detailed results of each step, so as to have a deeper understanding of the underlying principles of its implementation.
Official account “big data technology and multi warehouse”, reply to “information” to receive big data package.