[reprint] programmer must: 30 suggestions for writing high quality SQL

Time:2020-11-16

Blog reprinted from30 suggestions for writing high quality SQL

The following article comes from the little boy who picks up the snail

More practical SQL optimization statements, so I want to share and reprint, also convenient for their own reference

The following is the Reprint:

In this paper, combined with the example demo, 30 suggestions on optimizing SQL are elaborated, most of which are summarized in the actual development, hoping to be helpful to you.

1. Try not to use select * in query SQL, but select specific fields.

Counter example:

select * from employee;

For example:

select id,name from employee;

reason:

  • Only the required fields are selected to save resources and reduce network overhead.

  • When a select * query is made, it is likely that the overlay index will not be used, which will result in a back table query.

2. If you know that there is only one query result or only one maximum / minimum record, limit 1 is recommended.

Suppose you have an employee table now, and you want to find a person named Jay.

CREATE TABLE `employee`(

Counter example:

select id,name from employee where name='jay';

For example:

select id,name from employee where name='jay' limit 1;

reason:

  • After adding limit 1, as long as a corresponding record is found, it will not continue to scan down, and the efficiency will be greatly improved.

  • Of course, if the name is the only index, there is no need to add limit 1, because the existence of limit is mainly to prevent full table scanning, so as to improve performance. If a statement can predict whether a full table scan is not needed and whether there is a limit, there is no significant difference in performance.

3. Try to avoid using or to join conditions in the where clause.

Create a new user table with a common index userid. The table structure is as follows:

CREATE TABLE `user`(

Suppose you need to query a user whose userid is 1 or 18 years old. It is easy to have the following SQL:

Counter example:

select * from user where userid=1 or age =18

For example:

//Use union all

reason:

  • Using or may invalidate the index and scan the whole table.

For the case of or + age without index, assume that it has gone through the index of userid, but when it comes to the age query condition, it has to scan the whole table, that is, it needs three steps: full table scan + index scan + merge. If it starts with a full table scan, it’s done once. MySQL has an optimizer. Considering the efficiency and cost, the index may fail when encountering the or condition, which seems reasonable and reasonable.

4. Optimize limit paging.

When we do paging requirements, we usually use limit. However, when the offset is very large, the query efficiency becomes low.

Counter example:

select id,name,age from employee limit 10000,10

For example:

//Scheme 1: return the maximum record (offset) of the last query

reason:

  • When the offset is the largest, the query efficiency will be lower, because MySQL does not skip the offset to directly fetch the following data, but first adds the offset + the number of items to be fetched, and then discards the data in the previous offset section and returns it.

  • If the optimization scheme 1 is used, the maximum query record (offset) of the last time can be returned, which can skip the offset and improve the efficiency.

  • In scheme 2, using order by + index can also improve query efficiency.

  • For scheme 3, it is suggested to discuss with the business. Is it necessary to check the pagination after this. Because the vast majority of users will not turn back too many pages.

5. Optimize your like statements.

In daily development, if you use fuzzy keyword query, it’s easy to think of like, but like may make your index invalid.

Counter example:

select userId,name from user where userId like '%123';

For example:

select userId,name from user where userId like '123%';

reason:

  • Put% before the index as follows:[reprint] programmer must: 30 suggestions for writing high quality SQL

  • If you put% after the keyword, you will still go to the index, as follows:[reprint] programmer must: 30 suggestions for writing high quality SQL

6. Use the where condition to limit the data to be queried and avoid returning redundant rows.

Suppose the business scenario is like this: query whether a user is a member.

I’ve seen the old implementation code like this…

Counter example:

List<Long> userIds = sqlMap.queryList("select userId from user where isVip=1");

For example:

Long userId = sqlMap.queryObject("select userId from user where userId='userId' and isVip='1' ");

reason:

  • If you need any data, you can check what data you need to avoid returning unnecessary data and save money.

7. Try to avoid using MySQL’s built-in functions on index columns.

Business requirements: query the users who have logged in in in the last seven days (assuming logintime is indexed).

Counter example:

select userId,loginTime from loginuser where Date_ADD(loginTime,Interval 7 DAY) >=now();

For example:

explain select userId,loginTime from loginuser where  loginTime >= Date_ADD(NOW(),INTERVAL - 7 DAY);

reason:

  • The built-in function of MySQL is used on the index column, and the index is invalid.[reprint] programmer must: 30 suggestions for writing high quality SQL

  • If there is no built-in function for the index column, the index will still go.[reprint] programmer must: 30 suggestions for writing high quality SQL

8. We should try our best to avoid the expression operation on the field in the where clause, which will cause the system to abandon the use of index and carry out full table scanning.

Counter example:

select * from user where age-1=10;

For example:

select * from user where age =11;

reason:

  • Although the age is indexed directly, it is indexed.[reprint] programmer must: 30 suggestions for writing high quality SQL

9. Inner join, left join and right join are preferred. If it is a left join, the results in the left table should be as small as possible.

  • Inner join inner join. When two tables are joined, only the result sets that match exactly in the two tables are retained.
  • When a join query is made between two tables, all rows in the left table are returned, even if there are no matching records in the right table.
  • When a join query is made between two tables, all rows in the right table are returned, even if there are no matching records in the left table.

On the premise that SQL requirements are met, it is recommended to use inner join first. If you want to use left join, the data results of the left table should be as small as possible, and if possible, put them on the left side for processing.

Counter example:

select * from tab1 t1 left join tab2 t2 on t1.size = t2.size where t1.id>2;

For example:

select * from (select * from tab1 where id >2) t1 left join tab2 t2 on t1.size = t2.size;

reason:

  • If inner join is an equivalent join, the number of rows returned may be less, so the performance will be better.

  • Similarly, if left join is used, the data results of the left table should be as small as possible, and the conditions should be put to the left for processing, which means that the number of rows returned may be relatively small.

10. Try to avoid using the! = or < > operator in the where clause, otherwise the engine will abandon the use of index and scan the whole table.

Counter example:

select age,name from user where age <>18;

For example:

//Consider writing two separate SQL statements

reason:

  • Using! = and < > is likely to invalidate the index.[reprint] programmer must: 30 suggestions for writing high quality SQL

11. When using a federated index, pay attention to the order of index columns, and generally follow the leftmost matching principle.

Table structure: (there is a federated index idxuseriage, with userid before and age after).

CREATE TABLE `user`(

Counter example:

select * from user where age = 10;

[reprint] programmer must: 30 suggestions for writing high quality SQL

For example:

//In line with the left most matching principle

[reprint] programmer must: 30 suggestions for writing high quality SQL

[reprint] programmer must: 30 suggestions for writing high quality SQLreason:

  • When we create a union index, such as (K1, K2, K3), it is equivalent to creating (K1), (K1, K2) and (K1, K2, K3), which is the leftmost matching principle.

  • The federated index does not meet the leftmost principle, and the index will generally fail, but this is also related to the MySQL optimizer.

12. To optimize the query, we should consider building indexes on the columns involved in where and order by to avoid full table scanning.

Counter example:

Select * from user where address: 'Shenzhen' order by age;

[reprint] programmer must: 30 suggestions for writing high quality SQLFor example:

//Add index

[reprint] programmer must: 30 suggestions for writing high quality SQL

13. If you insert too much data, consider batch insertion.

Counter example:

for(User u :list){

For example:

//Insert 500 batches at a time, and do it in batches

reason:

  • Batch insertion performance is good, more time-saving.

A metaphor: if you need to move 10000 bricks to the top of the building, you have an elevator. The elevator can put a proper amount of bricks at a time (up to 500 bricks). You can choose to transport one brick at a time, or 500 bricks at a time. Which time do you think is consumed?

14. When appropriate, use overlay indexes.

Overlay index can make your SQL statement do not need to return to the table, only access the index can get all the data needed, greatly improving the query efficiency.

Counter example:

//Like fuzzy query, no index

For example:

//If Id is the primary key, then it is a normal index, that is, the overlay index appears.

[reprint] programmer must: 30 suggestions for writing high quality SQL

15. Use the distinct keyword carefully.

The distinct keyword is generally used to filter duplicate records to return records that do not duplicate. When it is used to query a field or few fields, it will bring optimization effect to the query. However, when used in many fields, it will greatly reduce the query efficiency.

Counter example:

SELECT DISTINCT * from user;

For example:

select DISTINCT name from user;

reason:

  • The CPU time and occupancy time of the statement with distinct is higher than that of the statement without distinct. When querying many fields, if distinct is used, the database engine will compare the data and filter out the duplicate data. However, the comparison and filtering process will take up system resources and CPU time.

16. Remove redundant and duplicate indexes.

Counter example:

KEY `idx_userId` (`userId`)

For example:

//Delete the userid index because the combined index (a, b) is equivalent to creating (a) and (a, b) indexes

reason:

  • Duplicate indexes need to be maintained, and the optimizer should also consider them one by one when optimizing queries, which will affect performance.

17. If the amount of data is large, optimize your modify / delete statements.

Avoid modifying or deleting too much data at the same time, because it will cause the CPU utilization rate is too high, thus affecting other people’s access to the database.

Counter example:

//Delete 100000 or 1000000 +?

For example:

//Delete in batches, such as 500 each time

reason:

  • If too much data is deleted at one time, there may be a lock wait timeout exceeded error, so it is recommended to operate in batches.

18. Consider using the default value instead of null in the where clause.

Counter example:

select * from user where age is not null;

[reprint] programmer must: 30 suggestions for writing high quality SQLFor example:

//Set 0 as the default value

[reprint] programmer must: 30 suggestions for writing high quality SQLreason:

  • It is not that if is null or is not null is used, the index will not be kept. This is related to the MySQL version and query cost.

If the MySQL optimizer finds that the cost of index walking is higher than that of not indexing, it will definitely give up the index!=,<>,is null,is not nullIt is often considered to make the index invalid. In fact, in general, the query cost is high, and the optimizer automatically abandons the index.

  • If you change the null value to the default value, it is often possible to go to the index. At the same time, the expression will be relatively clear.

19. Do not have more than five table connections.

  • The more tables you have, the more time and overhead it takes to compile.

  • It is more readable to split the join table into smaller execution.

  • If you have to join a lot of tables to get data, that means bad design.

20. Rational use of exist & in

Suppose table a represents the employee table of an enterprise, and table B represents the Department table. To query all employees in all departments, it is easy to have the following SQL:

select * from A where deptId in (select deptId from B);

This is equivalent to:

Query department table B first

select deptId from B

The deptid of the Department is used to query the employee of A

select * from A where A.deptId = B.deptId

It can be abstracted into such a loop:

List<> resultSet;

Obviously, in addition to in, we can also use exists to implement the same query function, as follows:

select * from A where exists (select 1 from B where A.deptId = B.deptId);

Because the understanding of exists query is to execute the main query first, obtain the data, and then put it into the sub query for conditional verification. According to the verification result (true or false), whether the data result of the main query is retained or not is determined.

Then, to write in this way is equivalent to:

Select * from a, and make a loop from table a first

Select * from b where a.deptid = b.deptid, and then loop from table B

It can be abstracted as follows:

List<> resultSet ;

Database is the most difficult link with the program release. Suppose you link twice, and each time you do a million times of data set query, and then you leave, you only do it twice; on the contrary, if you set up a million links and apply for link release repeatedly, the system can’t stand it. The principle of MySQL optimization is that small tables drive large tables, and small datasets drive large datasets, so as to make performance better.

So, we have to choose the one with the smallest outermost cycle, that is,IfB’s**If the amount of data is less than a, in is suitable. If the data amount of B is greater than a, then exist * * is suitable.

21. Try to replace union with union all.

If there are no duplicate records in the retrieval results, union all is recommended to replace Union.

Counter example:

select * from user where userid=1

For example:

select * from user where userid=1

reason:

  • If you use union, regardless of whether the retrieval results are repeated or not, a merge is attempted and then sorted before the final result is output. If it is known that there are no duplicate records in the retrieval results, use union all instead of union, which will improve the efficiency.

22. The index should not be too many, generally less than 5.

  • The more indexes, the better. Although the index improves the efficiency of query, it also reduces the efficiency of insertion and update.

  • When inserting or updating, the index may be rebuilt. Therefore, careful consideration should be given to building an index, depending on the specific situation.

  • The number of indexes in a table should not exceed 5. If there are too many indexes, it is necessary to consider whether some indexes exist.

23. Try to use numeric fields. If the fields only contain numerical information, try not to design them as character type.

Counter example:

`king_ ID ` varchar (20) not null comment 'Guardian ID'

For example:

`king_ ID ` int (11) not null comment 'Guardian ID'

reason:

  • Compared with numeric fields, character type reduces query and join performance and increases storage overhead.

24. The index is not suitable for fields with a large number of duplicate data, such as gender database fields.

Because the SQL optimizer optimizes the query according to the amount of data in the table. If there is a large amount of duplicate data in the index column, the MySQL query optimizer calculates that the cost of not running the index is lower, and it is likely to give up the index.

25. Try to avoid returning too much data to the client.

Suppose the business requirement is that users request to view the live data they have watched in the last year.

Counter example:

//Query all the data at once

For example:

//Paging query

26. When connecting multiple tables in the SQL statement, please use the alias of the table and prefix the alias on each column to make the semantics clearer.

Counter example:

select * from A inner

For example:

select memeber.name,deptment.deptName from A member inner

27. Use varchar / nvarchar instead of char / nchar.

Counter example:

`Deptname ` char (100) default null comment 'department name'

For example:

`Deptname ` varchar (100) default null comment 'department name'

reason:

  • Because the storage space of variable length field is small, the storage space can be saved.

  • Secondly, for queries, it is more efficient to search in a relatively small field.

28. In order to improve the efficiency of the group by statement, unnecessary records can be filtered out before the statement is executed.

Counter example:

select job,avg(salary) from employee  group by job having job ='president' or job = 'managent';

For example:

select job,avg(salary) from employee where job ='president' or job = 'managent' group by job;

29. If the field type is a string, the where must be enclosed in quotation marks, otherwise the index will be invalid.

Counter example:

select * from user where userid =123;

[reprint] programmer must: 30 suggestions for writing high quality SQLFor example:

select * from user where userid ='123';

[reprint] programmer must: 30 suggestions for writing high quality SQLreason:

  • Why is the first statement not indexed without single quotation marks? This is because when there is no single quotation mark, it is a comparison between strings and numbers. Their types do not match. MySQL will do implicit type conversion, convert them to floating-point numbers, and then compare them.

30. Use explain to analyze your SQL plan.

When writing SQL in daily development, try to form a habit. Use explain to analyze the SQL you write, especially index.

explain select * from user where userid =10086 or age =18;

This work adoptsCC agreementThe author and the link to this article must be indicated in the reprint