The cost of delete and update operations is often higher than that of insert, so a good design needs to reduce the update and delete operations of database.
3.1 update operation
The updating operation of database will bring a series of “effects”: the updating operation needs to record logs (so that it can be rolled back in case of error); updating variable length fields (such as varchar type) will bring changes to the physical storage of data (the movement of records); updating index fields will lead to index reconstruction; updating primary keys will lead to data reorganization, etc. All of these will not only result in the low efficiency of the update operation itself, but also reduce the query performance in the future due to the generation of disk fragments. In order to deal with this situation, there are two strategies: one is to reduce the number of updates and write the updates of multiple fields to the same statement; the other is to avoid updates. These two strategies are applicable to different situations. The following will illustrate the two situations.
3.1.1 reduce the number of updates
There is a code cleaning process in the integration library, that is, assigning values to the self coded fields of business data through the connection code table. Code cleaning is actually a process of updating business data tables by associating code tables, which requires connecting multiple code tables and updating multiple self coding fields. To complete the update, there are two ways to write the update statement: one is to write multiple SQL statements, each statement updates a self encoded field; the other is to write all the updates in one statement. The update statement for updating the bank code is as follows:
updateTBL_INCOME_TMP A setBANKCODESELF = ( select SELFCODE from TBL_BANKINFO B where A.BANKCODE = B.BANKCODE )
The following is a diagram of the statement to update multiple self coding fields through one update statement:
Set code 1 self encoding = self encoding is obtained by association code 1 table,
Code 2 self coding = self coding is obtained by associating code 2 table,
Code n self encoding = self encoding is obtained by associating code n table
Using 20 million test data. The test results of the two methods are shown in the table below. From the test results, we can see that the performance of the one-time update method is improved ten times, which greatly improves the performance.
Time consuming for multiple update methods
Time consuming for one update method
3.1.2 avoid updating
Here is a popular example. This kind of situation is often encountered. A company has a system of employee attendance system. In order to improve the performance of query statistics, some tables containing redundant information are established on the basis of the original system. Take the employee table as an example, the process of obtaining data is shown in Figure 12. The first step is to put the employee information into the new table, and then connect to update the “department name” through the field “department ID”.
Figure 12. Association update
Generally, in order to save the storage cost, the field such as department name is designed to be variable length. Therefore, when it is updated, the disk data will be reorganized to form disk fragments, which will affect the query performance.
To avoid this, we can use the method shown in Figure 13 to avoid updating. This method completes the insertion of redundant data table in one step, and then connects the Department table to get the “department name” when inserting again, so as to avoid the update operation.
Figure 13. Avoid updates
Beginners may think that the deletion operation is very simple and can be completed quickly. In fact, this is a wrong understanding. In the process of deletion, a large number of disks need to be scanned; database logs need to be recorded; and the deletion process does not release disk space, wastes disk space, and makes the data on disk fragmented, which is a fatal blow to the performance of subsequent queries. There are two ways to deal with it: one is to reorganize the tables that are often deleted; the other is to avoid deletion.
The reorg operation rearranges the physical order of the table data and removes free space from the fragmented data.
Because the delete operation does not release disk space, the table will become fragmented after the delete operation, which leads to a serious performance degradation. This situation also occurs after multiple update operations. Reorganizing table data may help if statistics are collected but no significant performance improvement is seen. When reorganizing the table data, the physical order of the data is rearranged according to the specified index, and the free space in the fragmented data is removed. This enables the data to be accessed more quickly, thus improving performance.
3.2.2 Avoid deletion — intermediate table and formal table mode
Intermediate table and formal table patterns are often used when data needs complex processing. The data is processed in the intermediate table, and then the data that meets the condition is transferred to the formal table, and the data that does not meet the condition is retained in the intermediate table. Figure 14 illustrates the process of transferring data from the intermediate table to the formal table: after data processing, it is necessary to insert the data with flag = 1 in the intermediate table temp1 into the formal table and delete the data with flag = 1 in the intermediate table temp1.
Figure 14. Transferring data from intermediate table to formal table
Because the flag field is not a clustered index, when the intermediate table temp1 is deleted, a large number of fragments will be left in the disk, as shown in Figure 15. Not only will there be so many disk fragments left, but also the space of deleted data will not be automatically released. The result is not only a waste of disk space, but also a sharp decline in query performance.
Figure 15. Disk fragmentation after deletion
We can use the command to clear the table to avoid deletion. In addition to the intermediate table temp1 and the formal table, add the auxiliary temporary table temp2. If the data flag = 0 in temp1 is only 10%, this optimization will significantly improve the performance. The specific steps are as follows:
1. Insert the data with flag = 0 in temp1 into temp2
2. Clear table temp1
3. Insert the data in temp2 into temp1
3.3 how to make access more efficient
A large part of the content of this section comes from the book the art of SQL, which collects the general experience of database development. Although not limited to specific DBMS and hardware platform, it is a practical book.
1. Connect to the database at one time and do a lot of things. Do not disconnect until processing is complete.
2. A SQL statement contains as many operations as possible. Figuratively: thousands of statements, with the help of the cursor loop, very slow. It is still very slow to process the same data with several statements. Change to a statement, solve the problem, the best.
3. Close to the core of DBMS. Try to use the functions that come with the database. Reduce custom functions. No matter how smart the database optimizer is, it doesn’t know the custom function.
4. Don’t connect too many tables in one statement. The recommended upper limit is 5.
5. Centralize the frequently updated columns: when a row is updated, DB2 will record all the changed columns, so putting the frequently updated columns together can reduce the recording work of DB2. This is just a small performance recommendation, so no major application or database design changes should be made to implement it.
The above is the whole content of this article about whether deletion and update of MySQL will affect the performance. I hope you like it.