MySQL update optimization strategy

Time:2021-3-2

Simulation scenario 1:

To adjust the structure of a table in the database, add several fields, and then refresh the previous data. The content of refresh is to match one of the existing fields URL, and then update the newly added fields type and typeID.

Later, I wrote a shell script to brush the data. As a result, I was confused after running the shell script. Why is it so slow~~~

Scene reappearance

Copy codeThe code is as follows:
CREATE TABLE `fuckSpeed` (
  `uin` bigint(20) unsigned NOT NULL DEFAULT 0,
  `id` int(11) unsigned NOT NULL DEFAULT 0,
  `url` varchar(255) NOT NULL DEFAULT ”,
  `type` int(11) unsigned NOT NULL DEFAULT 0,
  `typeid` varchar(64) NOT NULL DEFAULT ”,
  ……
  KEY `uin_id` (`uin`,`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The table structure is like the above (many fields are omitted). There is only one union index in the tableuin_idWhen I update, I have the following ideas:

First, a certain amount of data is obtained according to an ID range

Copy codeThe code is as follows:
select id,url from funkSpeed where id>=101 and id<=200;

Traverse all the data, update each data

#First of all, the data is processed and matchedType and typeID

Copy codeThe code is as follows:
update fuckSpeed set type=[type],typeid=[typeid] where id=[id]

After following the above ideas, I found that the update was particularly slow, with an average of about 3-5 per second. I was also drunk. When I looked at the data to be updated, there were a total of 32W + pieces. In this way, it would take about 24h + to update, that is, more than one day. Er ~ ~ I cried, thinking about what must be the problem.

discover problems
First of all, I wonder if it’s very slow because only one process is updating. I started five processes and segmented the ID, just like the following

Copy codeThe code is as follows:
./update_url.sh 0 10000 &
./update_url.sh 10000 20001 &
./update_url.sh 20001 30001 &
./update_url.sh 30002 40002 &
./update_url.sh 40003 50003 &

After running, we found that it was still like that, the speed did not improve much, or it was about 3-5 updates per second. Think about it, too. Time can not be spent on the steps before inserting data (matching, assembling SQL statements,.) There should be something wrong with the insertion

Let’s look at my SQL statements

Copy codeThe code is as follows:
select id,url from funkSpeed where id>=101 and id<=200;,

Here, try to execute the command line, the result is as follows

Copy codeThe code is as follows:
mysql> select id,url from funkSpeed where id>=0 and id<=200;
Empty set (0.18 sec)

It took 0.18 seconds. At this time, I guess it suddenly dawned on me that I didn’t use the union index. The condition for the union index to take effect is that there must be a field on the left. Verify it with explain. It’s like this:

Copy codeThe code is as follows:
mysql> explain id,url from funkSpeed where id>=0 and id<=200;
+————-+——+—————+——+———+——+——–+————-+
| table       | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+————-+——+—————+——+———+——+——–+————-+
| funkSpeed   | ALL  | NULL          | NULL | NULL    | NULL | 324746 | Using where |
+————-+——+—————+——+———+——+——–+————-+
1 row in set (0.00 sec)

Then use the union index:

Copy codeThe code is as follows:
mysql> select uin,id from funkSpeed where uin=10023 and id=162;
+————+———-+
| uin        |   id     |
+————+———-+
| 10023      | 162      |
+————+———-+
1 row in set (0.00 sec)

mysql> explain select uin,id from funkSpeed where uin=10023 and id=162;
+————-+——+—————+———-+———+————-+——+————-+
| table       | type | possible_keys | key      | key_len | ref         | rows | Extra       |
+————-+——+—————+———-+———+————-+——+————-+
| funkSpeed   | ref  | uin_id        | uin_id   | 12      | const,const |    4 | Using index |
+————-+——+—————+———-+———+————-+——+————-+
1 row in set (0.00 sec)

You can see that it’s almost a second search. At this time, you can basically conclude that the problem is in the index

When I select, the number of times is relatively small, and the ID difference between each two selections is 10000, so it can be ignored here, and there is no way to optimize here, unless an index is added to the ID.

What’s the problem

Copy codeThe code is as follows:
update fuckSpeed set type=[type],typeid=[typeid] where id=[id]

Here in the update time will also use the query, mymysqL version is5.5, can’texplain updateOtherwise, it will be able to verify what I said. We need to update here32w+Every piece of data will be updated. Every piece of data is about 0.2S, which is too scary~~

solve the problem
Once the problem is found, it’s much easier to solve~~

selectAdd a field UIN, and change it to the following

Copy codeThe code is as follows:
select uin,id,url from funkSpeed where id>=101 and id<=200;

Then use it when updating

Copy codeThe code is as follows:
update fuckSpeed set type=[type],typeid=[typeid] where uin=[uin] id=[id]

In this way, the index is used.

After changing the code, I tried to start a process to see what the effect was. Sure enough, the effect was not improved a little, with an average of 30 + times / S. in this way, all the updates can be completed in about 3 hours.

Simulation scenario 2:
Six table PID fields are required to be written to the corresponding brand_ ID field

Question SQL background: the project has 6 tables, which should be written according to the PID fieldbrand_idField. But two of these tables are at the level of tens of millions. After my worker runs, the onlinemysqlMaster slave synchronization immediately delayed! After running for more than an hour, it was delayed to 40 minutes, and only more than 100000 rows of data were updated. The questions are as follows:

Copy codeThe code is as follows:
<! — update brand ID according to commodity ID — >
    <update parameterClass=”com.jd.chat.worker.domain.param.UpdateBrandIdParam”>
        UPDATE $tableName$
        SET brand_id = #newBrandId#
        WHERE pid = #pid#
            AND brand_id = 0
    </update>

The MySQL experts of the project team analyzed it for me. Because the PID field has no index, the MySQL engine needs to scan out the columns that are equal to the PID value, and then update the data. That is, it needs to scan 1000W + rows of disk data to complete the SQL. What’s more, I need to execute as many different PIDs as there are in this ten million level table.
Colleagues give me the suggestion of vertically dividing the SQL code level according to the ID field. Update 1000 rows of data each time, so that the MySQL engine does not need to scan the whole table every time, and the database pressure is one in ten thousand before. Moreover, as the primary key, ID is indexed, which can greatly optimize the query performance. The optimized SQL is as follows:

Copy codeThe code is as follows:
<! — update brand ID according to commodity ID — >
    <update parameterClass=”com.jd.chat.worker.domain.param.UpdateBrandIdParam”>
        UPDATE $tableName$
        SET brand_id = #newBrandId#
        WHERE pid = #pid#
            AND brand_id = 0
            AND id BETWEEN #startNum# AND #endNum#
    </update>

Only the statement of ID limit interval is used to vertically cut the code level of a ten million level large table. There is no delay between the master and slave of MySQL after the worker goes online again! And after monitoring, just 10 minutes to update a dozen million data, the efficiency is six times before! The more important thing is to balance the database load and make the application run healthily.

Through the above two problems simulation scene reproduction analysis MySQL update optimization strategy, I hope to help you in the database.