Pit! Just wrote a line of SQL, and the whole system hung up


Pit! Just wrote a line of SQL, and the whole system hung up

Author: different technology house
source: juejin.im/post/5e670f0151882549274a65ef


Insert into select. On this day, XXX received a request to migrate the data in table a to table B for backup. This would like to find out through the program, and then batch insert. However, XXX felt that it was a bit slow and required a lot of network I / O, so he decided to adopt other methods to implement it. By wandering in the ocean of Baidu, he found that you can use insert into select to implement it. In this way, you can avoid using network I / O. instead, you can directly use SQL to rely on database I / O. this is not great. And then he was fired.

What happened to the accident.

Because of the order in the data database_ The amount of data in today is too large. At that time, it seemed that there were 700W, and it was increasing at the speed of 30W every day. So the boss ordered XXX to order_ Part of the data in today is migrated to order_ Record, and set order_ Data deletion in today. This will lower the order_ The amount of data in the today table.

Considering that the database I / O will be occupied, in order not to affect the business, the plan is to start the migration after 9:00. However, at 8:00, XXX tried to migrate a small part of the data (1000 pieces), and felt that there was no problem, so he began to consider mass migration.

  • In the process of migration, the emergency group is the first to reactA small number of users failed to pay, and thenA large number of users failed to payAnd the situationFailed to initialize orderAt the same timeThe cloud server also started to alarm
  • Then XXX panicked and immediately stopped the migration.

I thought we could recover by stopping the migration, but we didn’t. What happens later can be replenished by your brain.

Accident reduction

A simplified version of the database was established locally, and 100W data was generated. Simulate what happens on the line.

Establish table structure

Order form

CREATE TABLE `order_today` ( 
`ID ` varchar (32) not null comment 'primary key', 
`merchant_ id` varchar(32) CHARACTER SET utf8 COLLATE utf8_ general_ Ci not null comment 'merchant number', 
`Amount ` decimal (15,2) not null comment 'order amount', 
`pay_ success_ Time ` datetime not null comment 'payment success time', 
`order_ status` varchar(10) CHARACTER SET utf8 COLLATE utf8_ general_ Ci not null comment 'payment status s: payment succeeded, F: order payment failed', 
`remark` varchar(100) CHARACTER SET utf8 COLLATE utf8_ general_ Ci default null comment 'comment', 
`create_ time` timestamp NOT NULL DEFAULT CURRENT_ Timestamp comment 'creation time', 
`update_ time` timestamp NOT NULL DEFAULT CURRENT_ TIMESTAMP ON UPDATE CURRENT_ Timestamp comment 'modification time - automatically updated when modifying', 
KEY `idx_ merchant_ id` (`merchant_ ID ') using BTREE comment' merchant ID '

Order record form

CREATE TABLE order_record like order_today;

Today’s order table data

Pit! Just wrote a line of SQL, and the whole system hung up

Simulated migration

Move the data before the 8th to order_ In the record table.

   pay_success_time < '2020-03-08 00:00:00';

Run the migrated SQL in Navicat and open another window to insert data to simulate the order.

Pit! Just wrote a line of SQL, and the whole system hung up

Pit! Just wrote a line of SQL, and the whole system hung up

It can be seen from the above that it can be inserted normally at the beginning, but then it suddenly gets stuck, and it takes 23 seconds to succeed, and then the insertion can continue. At this time, the migration has been successful, so it can be inserted normally.

The reason for this

At the default transaction isolation level: insert into order_ record select * from order_ The lock rule of today is: order_ Record table lock_ Today step by step lock (scan one lock, one lock).

Analyze the execution process.

By observing the execution of the migration SQL, you will find the order_ Today is a full table scan, which means that when the insert into select from statement is executed, MySQL will scan the order from top to bottom_ The records in today are locked. In this way, it is not the same as directly locking the table.

This can also explain why only a small number of users fail to pay at the beginning, and a large number of subsequent users fail to pay or initialize orders. Because at the beginning, only a small part of the data is locked, and the data that is not locked can be normally modified to the normal state. As more and more data is locked, a large number of payment failures occur. Finally, all of them are locked, so that the order cannot be inserted, and the initialization order fails.


Because the query condition will cause an order_ Today full table scanning, what can avoid full table scanning? It’s very simple. Pay_ success_ Add an IDX to the time field_ pay_ suc_ Time index is OK. Since the index query is taken, the whole table will not be scanned and the table will be locked. Only the qualified records will be locked.

The final SQL

   order_today FORCE INDEX (idx_pay_suc_time)
   pay_success_time <= '2020-03-08 00:00:00';

Execution process

Pit! Just wrote a line of SQL, and the whole system hung up


When using the insert into tab select * from tableb statement, make sure that there are corresponding indexes for where, order or other conditions after tableb to avoid locking all records in tableb.

Welfare time

todayMigrant workers’ technology roadUnion【Peking University Press】Bring you “Oracle high performance system architecture practical combat encyclopedia” technical book benefits, help you better learn Oracle database technology.

Book delivery rules:The official account tells you about your Oracle learning and experience, or your unique view on the performance of the database.Like 1-5Each reader sent out a copy of the book.