Optimized implementation of count() for MySQL large tables

Time:2021-10-26

The following is my judgment based on the combination of the data structure of the B + tree and the speculation of the experimental results. If there is any error, please correct it!

Today, I experimented with the count () operation optimization of MySQL. The following discussion is based on mysql5.7 InnoDB storage engine. X86 windows operating system.

The structure of the created table is as follows (the amount of data is 1 million):

表结构

The first is about which of the count (*), count (PK) and count (1) of MySQL is faster.
The results are as follows:

这里写图片描述 

这里写图片描述 

这里写图片描述 

No difference! After adding the where clause, the time of the three queries is the same. I won’t post pictures.

I wrote one before when I was in the companyselect count(*) from tableSQL statements are very slow when there is a lot of data. So how to optimize it?

This starts with the index of InnoDB, which is B + tree.

For the primary key index, it only stores data on the leaf node, and itsKey is the primary key, andValue is the whole piece of data
For auxiliary indexes: key is the column to be indexed, and value is the primary key.

This gives us two messages:
1. The whole piece of data will be found according to the primary key
2. Only the primary key can be found according to the auxiliary index, and then the remaining information must be found through the primary key.

So if we want to optimize the count (*) operation, we need to find oneshortTo create a secondary index for it.
In my casestatus, although its “severality” is almost 0

Index first:ALTER TABLE test1 ADD INDEX (status);
Then query, as shown in the following figure:

这里写图片描述 

It can be seen that the query time has decreased from 3.35s to 0.26s, and the query speed has been improved nearly13 times

If the index isstrWhat will happen to this column?
Index first:alter table test1 add index (str)
The results are as follows:

这里写图片描述

As you can see, the time is 0.422s, which is also very fast, but compared withstatusThere is still a gap of about 1.5 times in this column.

Be bold and do an experiment. I’ll take itstatusDelete the index of this column and create onestatusandleft(omdb,200)(an average of 1000 characters in this column) and then look at the query time.
Indexing:alter table test1 add index (status,omdb(200))
The results are as follows:

这里写图片描述 

The time is 1.172s


alter table test1 add index (status,imdbid);

Supplement!!
Pay attention to index failure!
Normal appearance after indexing:

这里写图片描述 

You can see the key_ Len is 6, and the description of extra is using index

If the index fails:

这里写图片描述

There are many cases of index invalidation, such as using functions= Please refer to the official documents for details.

There is no deep research on MySQL. The above is based on my judgment combined with the data structure of B + tree and the speculation of experimental results. If there is any error, please correct it!

This is the end of this article about the optimization and implementation of count() of MySQL large tables. For more information about the optimization of count() of MySQL large tables, please search the previous articles of developeppaer or continue to browse the relevant articles below. I hope you will support developeppaer in the future!