Heavy release | detailed explanation of “SQL intelligent diagnosis” function of new generation cloud native data warehouse analyticdb

Time:2021-10-19

Introduction: analyticdb for MySQL provides users with efficient, real-time, feature rich and intelligent “SQL intelligent diagnosis” and “SQL intelligent tuning” functions, provides users with ideas, directions and specific methods for SQL performance tuning, reduces users’ use costs and improves users’ efficiency in using ADB.

SQL is a simple and easy-to-use business logic expression language, but with the increase of the amount of scanned data and query complexity, the query performance will become slower and slower. To tune SQL, you often need to pay attention to the following parts:

  • Need to understand the engine architecture: users often need to have a certain understanding of the architecture characteristics of the SQL Engine in order to perfectly combine with the data distribution characteristics and business scenario characteristics of the business for data modeling, so as to design a table structure in line with the architecture characteristics of the SQL Engine.
  • There are great differences in SQL characteristics: the SQL of ad hoc queries often changes greatly, including the number of tables participating in the join, join conditions, the number of fields grouped and aggregated, and filter conditions.
  • There are great differences in data characteristics: the data distribution characteristics of users will change with the change of business characteristics. If they always follow the original modeling method and SQL statements, they can not guarantee to give full play to the greatest advantages of SQL Engine. The change of data characteristics or business model will lead to the regression of SQL performance.

Based on this, analyticdb for MySQL (a new generation of cloud native real-time data warehouse, syntax compatible with MySQL, hereinafter referred to as ADB) provides users with efficient, real-time, feature rich and intelligent “SQL intelligent diagnosis” and “SQL intelligent tuning” functions, provides users with ideas, directions and specific methods for SQL performance tuning, and reduces users’ use costs, Improve the efficiency of users using ADB.

Next, we will introduce the “SQL intelligent diagnosis” function newly released by ADB through the two steps of “discovering slow query” and “diagnosing slow query”, combined with a scenario case. (PS: “SQL intelligent tuning” will be released in subsequent versions)

1、 Slow query found

To locate a slow query, users first need to find the slow query. ADB’s user console provides a variety of ways to help users, such as “Gantt chart” and “query list”, which can be retrieved in multiple dimensions to help users quickly locate slow queries, and the diagnostic tool ensures that users can conduct full query retrieval and analysis in the last two weeks.

(1) Gantt chart

Users can enter the SQL intelligent diagnosis function through cluster console – diagnosis and Optimization – SQL diagnosis.

Heavy release | detailed explanation of
First, you will see a Gantt chart (also known as swimlane chart, where queries flow through different swimlanes. The swimlanes here are not ADB query queues, but only to distinguish queries executed at different times). The Gantt chart graphically shows the execution sequence of queries on ADB instances. Each color block represents a query, and the left side of the color block is the submission time of the query, The right side of the color block is the end time of the query. The relative length of the color block represents the execution time of a query. The color of the color block has no special meaning, just to distinguish different swimlanes.

Heavy release | detailed explanation of
Through the Gantt chart, users can intuitively see the queries that take a long time to execute within the current time range, and can intuitively see which queries are executed in parallel and the time period of parallel execution, which can help users judge which queries are affected by a bad SQL. The color block density can be used to intuitively judge whether the period with high pressure of ADB instance is related to the high concurrency of some queries.

(2) Query list

The Gantt chart can intuitively reflect the time correlation between queries, but when the time range selected by the user is large, the color blocks in the Gantt chart will be densely distributed and difficult to distinguish, and the indicators on the Gantt chart are relatively limited. At this time, the user can use the query list function in the diagnostic tool. The query list provides more than 10 important indicators of query level, such as database name, user name, client segment IP, time consumption, memory consumption and scanning volume. These information and indicators can help users further judge the source and resource consumption of slow queries.

Heavy release | detailed explanation of

In terms of advanced retrieval capability, the diagnostic tool provides three types of retrieval methods:

1. Fuzzy retrieval and precise retrieval: users can perform fuzzy matching according to the keywords in SQL, and the precise retrieval function helps users accurately retrieve the query when determining the query ID;
2. String type retrieval conditions: the retrieval tool will automatically identify the data name, user name, client IP and resource group used within the time range selected by the user, and provide a drop-down box for the user to select, so as to improve the user’s retrieval efficiency;
3. Retrieval criteria of numeric type: users can freely select the index units to retrieve, such as KB, MB, GB, etc., without manual conversion.

At the same time, users often need to download slow queries when using diagnostic tools. After downloading slow queries, more customized slow query management and analysis can be carried out in tools such as excel. Therefore, we also provide the download function of query list.

2、 Diagnostic slow query

(1) Query the execution process in ADB

Before introducing the ADB execution process, we need to briefly introduce three related basic concepts:

  • Stage

In the execution phase, the query in ADB will first be divided into multiple stages according to whether the shuffle is generated. One stage is the physical entity of a part of the execution plan. The data source of a stage can be data in the underlying storage system or data transmitted in the network. A stage is composed of tasks of the same type distributed on different computing nodes, and multiple tasks will process data in parallel.

  • Task

A task is the execution entity of a stage on an executor node. Multiple tasks of the same type form a stage to process data in parallel within the cluster.

Operator is the smallest data processing unit of ADB. ADB will decide whether to use parallel or serial execution to process data according to the semantics expressed by operators or the dependencies between operators.

Take a typical branch aggregation query as an example to understand the query execution process in ADB. The SQL statement is as follows:

SELECT COUNT(*), SUM(salary)

FROM emplayee

WHERE age>30 ADN age<32

GROUP BY sex

Within ADB, the front-end controller node first receives the SQL statement request, parses and parses the SQL statement, and finally uses the optimizer to generate the final execution plan. The overall execution plan will be divided into sub plans according to the division principle of stage, such as plan0, Plan1 and plan2 in the figure, which are distributed to the corresponding nodes respectively.

Among them, sub plan plan2 will process data in the form of task instances on four computing nodes in parallel. First, scan and filter the data, and then perform local aggregation of the data. After processing, the data will be sent to the downstream computing node, that is, the node of stage1, according to the reproduction of the sex field, and the final aggregation of the data will be performed according to the requirements of sub plan Plan1. Finally, the data is summarized by the node of stage 0 and returned to the client.

Heavy release | detailed explanation of

Like a typical data warehouse, ADB’s execution plan is generally divided into “logical execution plan” and “physical execution plan”:

  • Logical execution plan: understand the query processing flow at the macro level

The logical execution plan shows the query processing logic at a higher level. The rule-based execution plan (RBO) will judge whether the filter conditions can be pushed down, while the cost based execution plan (CBO) will judge the order of multi table Association. Therefore, logical execution plans do not pay attention to the specific processing methods during physical execution, such as whether multiple operators need to be fused during execution to reduce function calls, or automatically generate code granularity. These logical execution plans do not pay attention, which leads to that logical execution plans often contain only stage level execution statistics. However, user tuning often requires statistics accurate to the operator level.

  • Physical execution plan: understand the processing performance of each operator at the micro level

Compared with the logical execution plan, the physical execution plan includes the data processing flow diagram between operators and the execution statistics of operators. You can accurately see the memory occupied by a join operator or aggregation operator and the amount of data before and after filtering by the filter operator. However, not all operators need to be able to correctly understand their meaning. In particular, some physical operators cannot find a connection with the user’s SQL statement, which will also bring great doubts to the positioning problem of users using physical execution plans alone.

ADB’s “SQL intelligent diagnosis” function provides users with a fusion view of logical execution plan and physical execution plan. Using the fusion execution plan, users can understand the query processing flow from the macro level and the processing performance of each operator from the micro level, so as to help users locate the query performance bottleneck more accurately and quickly.

(2) SQL self diagnosis function

Although we provide fused and hierarchical execution plans to help users analyze query performance problems, we find that users will encounter difficulties in using fused execution plans in two types of scenarios:

Primary users of ADB

In order to reduce the learning and migration costs of MySQL users, ADB has achieved that most syntax is compatible with MySQL. However, the background of ADB is not a MySQL kernel, but an independent self-developed distributed data storage and distributed computing system. In the face of ADB’s implementation plan, primary users of ADB often don’t know where the focus of optimization is and can’t start.

Complex SQL in ADB

For complex SQL, it often involves the connection operation of hundreds of tables. The number of stages will reach more than hundreds, and the number of operators will reach thousands. The execution plan chart is very large. Even advanced users of ADB often have no way to start with such a complex execution plan, as shown in the following figure: an execution plan chart of 196 stages:

Heavy release | detailed explanation of

To solve the above problems, we have added SQL self diagnosis capability to the execution plan. The SQL self diagnosis capability will reflect the expert experience in the execution plan in the form of rules. For the first contact of ADB, we can determine the performance bottleneck points in the query execution process according to the diagnosis results, or learn the key operators that need to be paid attention to in the ADB execution plan according to the diagnosis results. For complex execution plans, SQL self diagnosis can help users quickly locate the location that needs to be tuned in the execution plan, and provides tuning related methods and documents to make users more targeted in the tuning process.

SQL self diagnosis capability displays diagnosis results and optimization suggestions through three layers: query level diagnosis results, stage level diagnosis results and operator level diagnosis results.

We take an online complex SQL as an example to introduce the example of using execution plan and SQL self diagnosis tool to locate performance problems. First, we search a query with large memory consumption through the slow query retrieval tool. Click “diagnosis” to open the diagnosis page of the query and switch to the “execution plan” tab. We will see that the query level diagnosis result has determined that the current query data is a query with large memory consumption, as shown in 1 in the following figure:
Heavy release | detailed explanation of

At this time, we need to locate the reason for the large memory effect. We click Sort by memory, and we can see that on the right, the flashback sort will be carried out according to the memory percentage consumed by stage. It can be very obvious that stage [1] occupies more than 87% of the current query memory. We click stage [1], and the diagnostic tool will automatically focus on the position of stage [1] in the execution plan tree, Click stage [1], we can see the execution statistics of stage [1]. At the same time, we can see the position of 5, which prompts us that there is an operator in stage 1 that occupies a large amount of memory, but there is no detailed information. Therefore, next, we need to go to the interior of stage [1] to see which operator in stage [1] occupies a large amount of memory.

Click “view stage execution plan” to enter the stage [1]. First, we still sort according to the memory. You can see that the operator join [317] occupies more than 99% of the memory of the whole stage. Click this operator, and the operator execution plan tree will automatically locate the current operator. At this time, we can see the details of the operator diagnosis results and the information will prompt us, When building the left join of the hash table user, it takes up a large amount of memory. The diagnosis results also provide a link to the official tuning document. According to the tuning method given in the document, we can reduce the memory occupation of the operator.

Heavy release | detailed explanation of

In the above example, SQL performance problems are located through “query level diagnosis results” and “operator level diagnosis results”. Let’s take another example of “stage level diagnosis results”.

As shown in the figure below, we can see that stage [10] has the largest time-consuming proportion after sorting according to the time-consuming. Click stage [10] in the execution plan diagram to see two types of diagnostic results in the diagnostic results column, one is “stage diagnosis” and the other is “operator diagnosis”. The stage diagnosis tells us that the output data of the current stage is tilted, And tell us which fields are tilted (data skew is a problem that seriously affects the performance in the distributed system. The skew of stage output data will not only lead to a long tail in the time of the current stage processing data, but also lead to a long tail in the downstream data processing). At the same time, you can see an operator diagnosis result, suggesting that there is a skew in the table scanning, Then we can preliminarily judge that the output data skew of the current stage is caused by scanning a data skew table. Next, we go to stage [0] for positioning and analysis.

Heavy release | detailed explanation of

After entering the stage memory, we can see that the tablescan operator takes the most time according to the time-consuming sorting. At this time, we click the tablescan operator to see the detailed diagnosis result information about the data skew of the table in the diagnosis result. This table has a serious data skew problem due to the inappropriate selection of data distribution fields, At the same time, you can see the relevant official tuning documents. According to the tuning documents, we can adjust to the appropriate distribution fields to reduce the impact of table data skew on query performance.

Heavy release | detailed explanation of

Through the above two examples, we can see that the integration of execution plan and SQL self diagnosis can quickly help us locate the query performance problems, give some tuning suggestions, reduce a lot of unnecessary waste of time and energy, and reduce the threshold for primary users to use ADB. For more diagnosis results of SQL self diagnosis, please refer to the official website document: SQL self diagnosis. At present, 20 + diagnosis rules have been launched, involving query related memory consumption, time-consuming, data skew, disk IO and execution plan. More diagnosis rules will be launched in succession.

3、 Follow up planning

Through the above description and example analysis, we can see that the current diagnosis and tuning tools can help users troubleshoot various SQL performance problems, but we still find and summarize the needs of multiple users when analyzing instance performance problems during actual online problem processing and on duty:

  • What SQL should I tune?

When opening the diagnosis and tuning page, the user is faced with tens of thousands or even tens of millions of SQL running on the instance. Although the SQL to be tuned can be preliminarily selected by sorting through time consumption, memory consumption or scanning volume, in fact, the user lacks a perspective of specific diagnosis results, such as:

  • Which SQL is skewed by data scanning?
  • Which SQL is inefficient index filtering?
  • Which SQL is skewed by stage output?
  • Which SQL partitions are unreasonable?

After tuning the specific diagnostic results of a certain SQL, users need to know which similar SQL needs to be tuned. Therefore, we will provide users with a tool to analyze the specific diagnostic results to solve a specific problem at one time.

  • There is a problem with my SQL. Is it related to the poor way of creating tables?

ADB background is a distributed data storage and execution framework, which relies on the uniform distribution of data to each background node. At the same time, ADB designs different table types for different business scenarios, such as partition table and replication table. Some table fields are stored in aggregation, which will also improve the query performance, However, users often do not know which queries are affected by the poor table creation method. Subsequently, we will associate “data modeling diagnosis results” with “query diagnosis results”. Users can quickly know which SQL is affected by bad table structure through the diagnosis results of data modeling. In turn, they can also know which tables need to be optimized through the diagnosis results of a certain SQL. The two types of diagnosis results can be optimized in linkage, which can solve the query performance problem of the instance from the root.

4、 Summary and Prospect

The “SQL intelligent diagnosis” function has been launched recently. Users can quickly start using it in combination with actual business.

Original link
This article is the original content of Alibaba cloud and cannot be reproduced without permission.