Performance tuning of SQL tuning

Time:2020-3-31

Syntax based optimization and simple query conditions. Syntax based optimization refers to the selection of words and the order of writing in SQL statements without considering any non grammatical factors (such as index, table size and storage).

general rule

In this section, I’ll look at some general rules that you need to pay attention to when writing simple queries.

Optimize query conditions based on weight

The best query statement is to apply a simple comparison operation to the least number of rows. The following two tables, table 1 and table 2, list the typical query condition operators in the order of good to bad and assign the weights.

Table 1. Operator weights in query conditions

Operator weight value

=          10
>          5
>=       5
<          5
<=       5

LIKE 3

<>      0

Table 2. Weights of operands in query criteria

Operand weight
Constant character 10 only
Column name 5 only
Parameter 5 only
Multi operand expression 3
Exact value type 2
Other value type 1
Time data type 1
Character data type 0
NULL 0

According to the weights assigned in Table 1 and table 2, it can be seen that the best query conditions are as follows:

… WHERE smallint_column = 789

This example gets 27 points, which are calculated as follows:

Only the small column on the left gets 5 points
The operand is a small data type (2 points)
10 points for equal sign (=) operator
Text character (789) on the right 10 points
Here’s another example

… WHERE char_column >= varchar_column || ‘ x ’

This type of query weight score is very low, only 13 points

Only the column name on the left is 5 points
0 point for operands of char type
Operators greater than or equal to 5 points
On the left is the multi operand representation, 3 points
0 point for operands of varchar type
The weights in the above table may be different in different types of database systems, so it is meaningless to remember these specific values, just know their sorting. The fewer comparison conditions are used, the higher the score is. Such comparison conditions are usually those with fewer rows or easy to compare.

Transfer rule

The law of transmission is defined as follows:

IF

(A <comparison operator> B) IS TRUE
AND (B <comparison operator> C) IS TRUE

THEN

(A <comparison operator> C) IS TRUE
AND NOT (A <comparison operator> C) IS FALSE

Comparison operators include: =, >.

Through the law of transmission, we can see that we can replace B with C without changing the meaning of the expression.

The following two examples express the same meaning, but the second expression executes faster than the first.

Expression 1:

… WHERE column1 < column2

AND column2 = column3

AND column1 = 5

Expression 2:

… WHERE 5 < column2

AND column2 = column3

AND column1 = 5

Most database management systems do this automatically, but when expressions contain parentheses, they do not automatically adjust. For example, a select statement is as follows:

SELECT * FROM Table1

WHERE column1 = 5 AND

NOT (column3 = 7 OR column1 = column2)

If you convert, you will get the following statement:

SELECT * FROM Table1

WHERE column1 = 5

AND column3 <> 7

AND column2 <> 5

The statement after this change will execute faster than the first one.

  1. Sargability

The ideal SQL expression should be in the following general format:

<column> <comparison operator> <literal>

Earlier, IBM researchers named this query condition “sargable predictions” because Sarg is a combination of search argument.

According to this rule, the left side of the query criteria should be a column name; the right side should be an easy to find value.

Following this rule, all database systems will use the following expressions:

5 = column1

Conversion to:

column1 = 5

However, when the query conditions contain arithmetic expressions, only part of the database system is transformed.

For example:

… WHERE column1 – 3 = -column2

Conversion to:

… WHERE column1 = -column2 + 3

It can also bring query performance optimization.

Tuning for specialized operators

Previously, we talked about the general rules of query conditions. In this section, we will discuss how to use special operators to improve the performance of SQL code.

And (AND)

The database system parses a series of expressions connected by and from left to right, but Oracle is an exception. It parses expressions from right to left. We can use this feature of database system to put the expression with small probability in front, or if two expressions have the same possibility, we can put the relatively uncomplicated expression in front. In this way, if the first expression is false, the database system will no longer have to work hard to parse the second expression. For example, you can convert:

… WHERE column1 = ‘A’ AND column2 = ‘B’

Conversion to:

… WHERE column2 = ‘B’ AND column1 = ‘A’

It is assumed that the probability of column2 =’b ‘is low. If it is an Oracle database, you only need to use the rules in reverse.

Or (OR)

Contrary to the (and) operator, when writing SQL statements with the or operator, you should put the expression with high probability on the left side, because if the first expression is false, the or operator means that the next expression needs to be parsed.

And + or

According to the expansion law of set,

A and (B or C) and (A and b) or (A and C) are equivalent expressions.

Suppose there is a table as shown in Table 3. To execute the expression of an and operator in front of it

SELECT * FROM Table1

WHERE (column1 = 1 AND column2 = ‘A’)

OR (column1 = 1 AND column2 = ‘B’)

Table 3. And + or query

Row# Colmun1 Column2
1 3 A
2 2 B
3 1 C

When the database system searches according to the query language, it performs the following steps:

Index lookup column1 = 1, result set = {row 3}
Index lookup column2 = ‘a’, result set = {row1}
And merge result set, result set = {}
Index lookup column 1 = 1, result set = {row 3}
Index lookup column 2 = ‘B’, result set = {row2}
And merge result set, result set = {}
Or merge result set, result set = {}
Now, according to the expansion rule of the set, convert the above statements:

SELECT * FROM Table1

WHERE column1 = 1

AND (column2 = ‘A’ OR column2 = ‘B’)

When searching in a new order, it performs the following steps:

Index lookup column2 = ‘a’, result set = {row1}
Index lookup column 2 = ‘B’, result set = {row2}
Or merge result set, result set = {}
Index lookup column1 = 1, result set = {row 3}
And merge result set, result set = {}
This shows that the number of searches is less than once. Although some database operating systems will automatically perform such a conversion, for simple queries, such a conversion is still beneficial.

Non (NOT)

Make non (not) representations easier to read. Simple conditions can achieve the purpose of conversion by reversing the comparison operator, for example:

… WHERE NOT (column1 > 5)

Conversion to:

… WHERE column1 <= 5

In the more complex case, according to Morgan’s theorem of sets:

Not (A and b) = (not a) or (not b) and not (a or b) = (not a) and (not b)

According to this theorem, it can be seen that its search can be reduced to at least one time. The following query criteria:

… WHERE NOT (column1 > 5 OR column2 = 7)

Can be converted to:

… WHERE column1 <= 5

AND column2 <> 7

However, when there is an inequality operator < > in the converted expression, the performance will be degraded. After all, in a set with an average distribution of values, the number of unequal values is far greater than the number of equal values. Because of this, some database systems will not compare the difference Index search is performed, but they will perform index search for greater than or less than, so the following queries can be transformed as follows:

… WHERE NOT (column1 = 0)

Conversion to:

… WHERE column <0

OR column > 0

IN

Many people think that there is no difference between the following two query criteria because they return the same result set:

Condition 1:

… WHERE column1 = 5

OR column1 = 6

Condition 2:

… WHERE column1 IN (5, 6)

This idea is not entirely correct. For most database operating systems, in is faster than or. So if you can, you need to replace or with in

When the in operator is a series of dense integer numbers, it is better to find which values do not meet the conditions, rather than which values meet the conditions. Therefore, the following query conditions should be transformed as follows:

… WHERE column1 IN (1, 3, 4, 5)

Conversion to:

… WHERE column1 BETWEEN 1 AND 5
AND column1 <> 2

The same performance improvement can be achieved when a series of discrete values are transformed into arithmetic expressions.

UNION

In SQL, the union of two tables is the set of duplicate values in two tables, that is, the set of duplicate rows in two or more query results returned by the union operator. This is a good way to merge data, but it’s not the best way.

Enquiry 1:

SELECT * FROM Table1

WHERE column1 = 5

UNION

SELECT * FROM Table1

WHERE column2 = 5

Enquiry 2:

SELECT DISTINCT * FROM Table1

WHERE column1 = 5

OR column2 = 5

In the above example, both column1 and column2 have no index. If query 2 always executes faster than query 1, it is recommended to always convert query 1 to query 2. However, in some database systems, this may lead to poor performance, which is caused by two optimization defects.

The first optimization flaw is that many optimizers only optimize one where statement in one select statement, so both select statements of query 1 are executed. First, the optimizer finds all the rows that meet the criteria according to the fact that the query condition column1 = 5 is true, and then finds all the rows that meet the criteria according to the fact that the query condition column2 = 5 is true, that is, two table scans. Therefore, if column1 = 5 has no index, it will take twice as long to query 1 as to query 2. If column1 = 5 has an index, it still needs a second scan, but only in some database systems there is an uncommon optimization defect, but the first one is made up. When some optimizers find that there is an or operator in a query, they do not use index queries, so in this case, and only in this case, union has higher performance than or. This situation is rare, so it is still recommended to use or instead of union when the column to be queried has no index.

The above is the author’s summary of how to improve the performance of SQL. These rules will not necessarily improve the performance of all database systems, but they will not degrade the performance of the database, so mastering and using these rules can help the development of database applications. This article summarizes some basic aspects of SQL performance tuning, including order by, group by, index, etc.

Reference resources:
https://www.oschina.net/quest…

Recommended Today

PHP Basics – String Array Operations

In our daily work, we often need to deal with some strings or arrays. Today, we have time to sort them out String operation <?php //String truncation $str = ‘Hello World!’ Substr ($STR, 0,5); // return ‘hello’ //Chinese string truncation $STR = ‘Hello, Shenzhen’; $result = mb_ Substr ($STR, 0,2); // Hello //First occurrence of […]