Million level database optimization


1. To optimize the query, it is necessary to avoid scanning the whole table as much as possible. First of all, we should consider building indexes on the columns involved in where and order by.

2. We should try our best to avoid judging the null value of the field in the where clause, otherwise the engine will give up using the index and scan the whole table

select id from t where num is null

It is best not to leave null for the database, as far as possible to use not null to fill the database

Notes, descriptions, comments, etc. can be set to null. For others, it is better not to use null.

Do not think that null does not need space. For example: char (100) type, when a field is created, the space is fixed. No matter whether the value is inserted (null is also included), it takes up 100 characters of space. If it is a variable length field such as varchar, null does not occupy space.

You can set the default value of 0 on num to ensure that there is no null value in num column in the table, and then query as follows:

select id from t where num = 0

3. Try to avoid using the! = or < > operator in the where clause, otherwise the engine will abandon the use of index and scan the whole table.

4. Try to avoid using or to join conditions in the where clause. If a field has an index and a field has no index, the engine will give up using the index and scan the whole table. For example:

select id from t where num=10 or Name = 'admin'

You can query as follows:

select id from t where num = 10
union all
select id from t where Name = 'admin'

5. In and not in should also be used with caution, otherwise full table scanning will be caused, such as:

select id from t where num in(1,2,3)

For continuous values, use between instead of in

select id from t where num between 1 and 3

Many times, it is a good choice to use exists instead of in

select num from a where num in(select num from b)

Replace with the following statement:

select num from a where exists(select 1 from b where num=a.num)

6. The following query will also cause a full table scan:

select id from t where name like ‘%abc%’

To improve efficiency, full-text retrieval can be considered.

7. If a parameter is used in the where clause, it will also cause a full table scan. Because SQL only resolves local variables at run time, the optimizer cannot delay the selection of an access plan to run time; it must do so at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and cannot be used as an input for index selection. The following statement will scan the whole table:

select id from t where num = @num

You can force the query to use the index instead:

Select id from t with (index) where num = @ num

. try to avoid expression operations on fields in the where clause, which will cause the engine to abandon the use of indexes and perform a full table scan. For example:

select id from t where num/2 = 100

It should be replaced by:

select id from t where num = 100*2

9. We should try our best to avoid function operation on the field in the where clause, which will cause the engine to abandon the use of index and scan the whole table. For example:

Select id from t where substring (name, 1,3) ='abc '- – name ID starting with ABC select id from t where DateDiff (day, created,' 2005-11-30 ') = 0 - –' 2005-11-30 '-- generated ID

It should be replaced by:

select id from t where name like 'abc%'
select id from t where createdate >= '2005-11-30' and createdate < '2005-12-1'

10. Do not perform function, arithmetic or other expression operations on the left of “=” in the where clause, otherwise the system may not be able to use the index correctly.

11. When the index field is used as a condition, if the index is a composite index, the first field in the index must be used as a condition to ensure that the system uses the index. Otherwise, the index will not be used, and the field order should be consistent with the index order as much as possible.

12. Do not write meaningless queries, such as generating an empty table structure:

select col1,col2 into #t from t where 1=0

This kind of code will not return any result set, but will consume system resources, so it should be changed as follows:

create table #t(…)

13. UPDATE statement: if only 1 or 2 fields are changed, do not update all fields, otherwise frequent calls will cause obvious performance consumption and bring a lot of logs.

14. For joining multiple tables with large amount of data (even if hundreds of them are large here), pagination is required before joining, otherwise the logical reading will be very high and the performance will be very poor. count(*) from table;In this way, count without any conditions will cause full table scanning, and has no business significance. It must be eliminated.

16. The index is not the more the better. Although the index can improve the efficiency of the corresponding select, it also reduces the efficiency of the insert and update. Because the index may be rebuilt during the insert or update, careful consideration should be given to how to build the index, depending on the specific situation. It is better not to have more than 6 indexes in a table. If there are too many indexes, we should consider whether it is necessary to build indexes on some infrequently used columns.

17. We should avoid updating the clustered index data column as much as possible, because the order of the clustered index data column is the physical storage order of the table records. Once the column value changes, the order of the whole table record will be adjusted, which will consume considerable resources. If the application system needs to update the clustered index data column frequently, it needs to consider whether the clustered index should be built.

18. Try to use numeric fields. If the fields only contain numerical information should not be designed as character type, it will reduce the performance of query and connection, and increase the storage cost. This is because the engine compares each character in the string one by one when processing queries and joins, and only one comparison is needed for numeric types.

19. Use varchar / nvarchar instead of char / nchar as much as possible, because the storage space of variable length field is small, which can save storage space. Secondly, for query, the search efficiency in a relatively small field is obviously higher.

20. Don’t use it anywhereselect * from t, replace “*” with a specific field list, and do not return any fields that cannot be used.

21. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, note that the index is very limited (only the primary key index).

22. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources. Temporary tables are not unusable, and using them appropriately can make some routines more efficient, for example, when you need to repeatedly refer to a dataset in a large or common table. However, for one-time events, it is best to use the export table.

23. When creating a new temporary table, if a large amount of data is inserted at one time, it can be usedselect intoreplacecreate tableTo avoid creating a large number of logs to improve the speed; if the amount of data is small, in order to ease the resources of the system table, you should firstcreate table, and then insert.

24. If temporary tables are used, all temporary tables must be explicitly deleted at the end of the stored proceduretruncate table, and thendrop tableTo avoid long-term locking of system tables.

25. Try to avoid using cursors, because cursors are inefficient. If the data operated by cursors exceeds 10000 rows, you should consider rewriting them.

26. Before using cursor based method or temporary table method, we should find a set based solution to solve the problem. The set based method is usually more effective.

27. Like temporary tables, cursors are not unusable. Using fast for small datasets_ Forward cursors are generally superior to other row by row processing methods, especially when several tables must be referenced to obtain the required data. Routines that include “totals” in the result set are usually faster than using cursors. If development time allows, both cursor based and set based methods can be tried to see which method works better.

28. Set at the beginning of all stored procedures and triggersSET NOCOUNT ON, set at the endSET NOCOUNT OFF。 There is no need to send it to the client after each statement of the stored procedure and trigger is executedDONE_IN_PROCNews.

29. Try to avoid large transaction operation and improve the concurrent ability of the system.

30. Try to avoid returning large amount of data to the client. If the amount of data is too large, we should consider whether the corresponding demand is reasonable.

Case study: split large delete or insert statements and batch submit SQL statements
If you need to perform a large delete or insert query on an online site, you need to be very careful not to stop your entire website from responding. Because these two operations will lock the table. Once the table is locked, other operations will not be able to enter.
Apache has many child processes or threads. Therefore, its work is quite efficient, and our server does not want to have too many sub processes, threads and database links, which is a huge occupation of server resources, especially memory.
If you lock your table for a period of time, such as 30 seconds, then for a site with a high number of visits, the accumulated access processes / threads, database links, and the number of open files in these 30 seconds may not only crash your web service, but also cause your entire server to hang up immediately.
So, if you have a big deal, you must split it and use itLIMIT oracle(rownum),sqlserver(top)Condition is a good method. Here is an example of MySQL:


   //Only 1000 at a time

   mysql_query(“delete from logs where log_date <= ’2012-11-01’ limit 1000”);

   if(mysql_affected_rows() == 0){
     //Delete complete, exit!

//Pause for a period of time at a time and release the table for other processes / threads to access.


This paper is excerpted from the original…

This work adoptsCC agreementThe author and the link to this article must be indicated in the reprint

Zhou Jiawei