MSSQL · ideas of cleaning up large amount of historical data

Time:2021-7-22
Reading time |0.6 minutes word count |963.2 characters
primary coverage |1. Introduction & background
“MSSQL · ideas for cleaning up large amount of historical data
Author | SCscHero Writing time | 2021/7/15 PM10:45
Article type |Series Completion |Completed
motto Every great cause has a trivial beginning.

1、 Introduction & background    Completion: 100%


a) Coping with problems

Faced with the deletion / migration of hundreds of millions of data, the deletion time is often very long. Is there a more efficient method?

b) Coping scenarios

In case of massive data migration / deletion (such as deletion of historical data of sea volume), efficient cleaning means are needed.

c) Solution principle & method

  1. Delete method: suitable for deleting small amount of data (I think it is suitable for deleting less than 2 million data).
  2. Effective data extraction, write back delete method(I think it is suitable for the situation that the amount of deleted data is much larger than the amount of retained data, and the ratio is 2:8. For example, there are 90 million invalid data and 10 million valid data in 100 million data tables.)

2、 Delete method    Completion: 100%


Delete delete statement, T-SQL foundation. It’s simple.

--Basic grammar
DELETE [DBName].[TableName] WHERE [Column]='XXXX'

3、 Extract write back delete method    Completion: 100%


a) Main ideas

  1. First write the query statement, and extract the effective data to another table.
  2. Truncate the original table, and delete the primary key constraint and auto increment attribute in GUI interface.
  3. Insert the valid data back into the original table, and set the original primary key constraint and auto increment attribute.

b) Common Q & A

  1. Why delete primary key constraint and auto increment attribute in Gui.
    At present, due to the change of self increasing attributes, the underlying processing logic of MSSQL is more complex to use T-SQL, so the GUI interface is used to change. If you encounter a window such as “prevent saving the change configuration that requires the table to be re created” and so on,Click on the portal
  2. Why is it so slow to set the primary key and auto increment attributes after inserting the valid data back into the original table?
    Because the underlying processing of MSSQL to change the data table structure is to create a new temporary table, set it as a self incrementing primary key, insert the data and then insert it back. In theory, the larger the amount of data, the slower the speed. In case of overtime,Click on the portal
  3. Is it feasible to set “allow explicit insertion of self adding columns” for self adding columns?
    This problem has been tried in data write back, but even if the auto increment column is allowed to be inserted explicitly, it still needs to be opened. So the easiest way is to turn off the auto increment property.
SET IDENTITY_INSERT [tablename] ON

4、 Statement and references    Completion: 100%


Original blog, please do not reprint without permission.

If you have any help, you are welcome to like, collect and pay attention. If you have any questions, please comment! If you need to contact the blogger, you can directly send a private message to scscshero.