How should MySQL count the number of rows


I believe everyone has encountered the case of obtaining the number of data rows in the MySQL table when writing code. Most people use it when obtaining the number of data table rowsCOUNT(*)However, there are many other methods, such as count (1), count (primary key) and count (field). There are different opinions on which way MySQL can execute faster. In fact, I didn’t know which way MySQL can execute faster and who was right (laughing and crying). Fortunately, recently, I have been studying the MySQL column of geek time carefully. There is a special section to discuss this problem. After reading it, I have also lifted my doubts for a long time.

In this article, the InnoDB engine of MySQL is discussed. MyISAM engine records the total row number of a table on the disk, which is very efficient when querying (if the where condition is added, it cannot be returned directly from the disk). For InnoDB, due to multi version concurrency control (mvcc), even if the query InnoDB table should “return how many rows” at the same time, it is uncertain. For example, suppose there are 10000 rows of data in table t:

time Session a Session B Session C
T1 begin;
T2 select count(*) from t;
T3 Insert into t (insert a row);
T4 begin;
T5 Insert into t (insert a row);
T6 select count(*) from t; (return 10000) select count(*) from t; (return 10002); select count(*) from t; (return 10001)

Session a starts the transaction at T1 to get the consistency view. The data read at any time in the transaction at the repeatable reading level is the same. The update of other transactions has no impact on session a, socount(*)The result is 10000. Session B starts the transaction at T4 to get the consistency view. Before T4, session C has newly inserted a statement and committed (when an update statement is executed separately, InnoDB will start a transaction by itself, and commit immediately after the statement is executed). Session B inserts a new piece of data in T5 and queries in T6count(*)The result is 10002 (the insert statement of session C has been committed at T4 begin, so you can see this update in the transaction of Session B). Since the transaction has not been committed when Session B is in T6, session C cannot see the update of Session B, so session C is in T6count(*)The result is 10001.

Count is an aggregate function. Its function is to judge each row in the returned result set. If the parameter of count function is not null, it will accumulate 1, otherwise it will not accumulate, and finally return the accumulated value. Next, let’s look at the execution efficiency of each count version:

  • Count (primary key ID) InnoDB traverses the whole table, takes out the primary key value of each row and returns it to the MySQL server layer. Because the primary key cannot be null, the server layer accumulates directly by row, and finally returns the accumulated value to the client.
  • Count (1) traverses the whole table without value. The server layer puts a number “1” into each returned row and accumulates by row. Count (1) is faster than count (primary key), because no value is required, reducing data transmission.
  • Count (field) traverses the whole table, reads the field values from the records line by line to the server layer, and the server layer judges that the value is not null, and then accumulates.
  • COUNT(*)MySQL is specially optimized to find the smallest index tree in the table. The InnoDB ordinary index tree is much smaller than the primary key indexCOUNT(*)Traversing which tree is the same,count(*)MySQL does not get the record value when,count(*)It is definitely not null. It is accumulated directly by row in the server layer.

Therefore, the values of this version of count from low to high are:

Count (field) < Count (primary key) < COUNT(1)COUNT(*)

So I suggest you use it as much as possiblecount(*)To get the number of record lines.

In addition, it should be noted that many people record the number of rows in the table to redis for sales, but this can not ensure that the count in redis is accurate and consistent with the data in MySQL table. This is because the two different storage systems do not support distributed transactions, so they can not get an accurate consistent view, If the number of table rows is stored separately for efficiency, it is best to store them in a separate MySQL table, so the problem of unable to get a consistent view can be solved.

For more detailed analysis of MySQL, it is recommended to pay attention toMySQL 45 lectures

This work adoptsCC agreement, reprint must indicate the author and the link to this article

Official account: Network Management BI, Golang, Laravel, Docker, K8s and other learning experience sharing

Recommended Today

SQL statement of three-level linkage of provinces, cities and counties

The first is the table creation statement Copy codeThe code is as follows: CREATE TABLE `t_address_province` ( `id` INT AUTO_ Increment primary key comment ‘primary key’,`Code ` char (6) not null comment ‘province code’,`Name ` varchar (40) not null comment ‘province name’)Engine = InnoDB default charset = utf8 comment = ‘province information table’; CREATE TABLE […]