Micro course lesson 16 advanced function explain



Hello, everyone. In the past, we introduced the use of explain. In this issue, we will introduce the explain function in detail.

Explain can be used to analyze the distributed query plan and see the results of the distributed SQL. In addition to this function, we can also do some SQL tuning. Let’s take an example.

Micro course lesson 16 advanced function explain

There are three tables. The configuration of the tables is not very important, but the following three SQL statements. If you look closely at the second level of the three SQL statements, the only difference is the order of ABC. One is table a, table B, table C, one is table C, table a, table B, and the other is table B, table C, table a. If their results are exactly the same, it doesn’t matter which one you choose in MySQL. It’s completely equivalent. But is that true in dble? Let’s take a look at it through explain.

Micro course lesson 16 advanced function explain

This is the result of our explain executing the first SQL. If the result is relatively long, put the long one first. Such a solution set is to issue three SQL statements of three tables. We can see that the first two SQL have no order, and the last one has order by. Take a look at the query result, because it looks abstract, so we draw it into a tree through the reference relationship, and the above is the original sentence.

Micro course lesson 16 advanced function explain

In fact, three of a have join. Through the left coincidence state, table a joins table B first, and then the result joins table C. As can be seen from the figure on the right, we follow the order of table X_ A is distributed to four nodes. After returning from the four nodes by default, it is followed by a series of operations, such as rename. Because table a has subqueries in the original sentence, you need to change the alias name after checking. Then join. On the other side of the join is table B, which is a global single table, and then join. The results of join completion need to be sorted once, and finally join with table C. When table C joins, table C is the main table, and the generated results are then distributed. In this way, we turn our query plan into a tree. Used to analyze whether our query plan is reasonable.

Micro course lesson 16 advanced function explain

We convert all three SQL into trees, and the overall structure of a and B is basically the same. C is different from them. Let’s compare a and B first. What’s the difference between a and B? Table a has to do a Cartesian product the first time it joins. Because the on condition in my original sentence is only what C equals to B, what C equals to a, there is no direct relationship between a and B. After completing the Cartesian product, we need to convert it into an ordered sequence and join it with the ordered sequence on the other side. In my second graph, table a joins with table C first. There is no relationship between a and C, so it does not need to do Cartesian product. It only needs to make an ordered join, and then the join is sorted according to the second level of join relationship, and then the result set can be returned.

Then let’s look at the third picture, which is not so complicated. There is only one join. Why? Because in the third picture, we do B and C first. A table in B C is a global table, so it can be partially optimized. In fact, when joining between B and C, you can first distribute the global table and the other table as a whole, and then join the results after the distribution. One join is done on the MySQL node, not here. If you go to the actual measurement, a performance comparison from left to right is getting better and better.

Let’s take out these three pictures and go back to them. Why does this happen? There are several keywords to note.

The first is my join relationship. When my a table and B table have no join relationship, try not to write them together. Because dble is now doing equivalent optimization. If equivalence optimization is not done well, you will not select the driver table according to your real association relationship, but according to the order of your SQL. Therefore, when two unrelated tables are put together and put in front of each other, they are directly joined. This will cause a calculation of Cartesian product, and the performance will be poor.

The second is ordered join, which we introduced. In fact, it is to avoid Cartesian product in middleware. If there is sorting, it will be executed in mysql, and then I will not need to do Cartesian product in middleware.

The third is affinity. In fact, multi table join is set according to the order of SQL writing. Through such an example, our explain can not only look at a simple query plan, but also build such a tree to see whether our query plan is reasonable. So as to achieve the effect of optimizing SQL.

OK, let’s introduce it here

In order to facilitate reading, some spoken words are optimized without affecting learning. The manuscript and the video will be consistent as far as possible.