In SQL server, count (*) or count (1) or count ([column]) is probably the most commonly used aggregate function. In fact, many people can’t distinguish the three. This article will elaborate the function, relationship and the underlying principle of the three.
Usually, I often see some so-called optimization suggestions that use count (1) instead of count (*) to improve performance. The reason given is that count (*) will bring full table scanning. In fact, how to write count doesn’t make any difference.
Count (1) and count (*) actually mean to evaluate whether the expression in count () is null. If it is null, it will not count, and if it is not null, it will count. For example, as shown in code 1, null is specified in count (the optimizer does not allow null to be specified explicitly, so it needs to be assigned to a variable to specify it).
SELECT COUNT(@xx) FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]
Null is specified in code listing 1. Count
Since all rows are null, the result is not counted to 0, as shown in Figure 1.
Figure 1. Obviously, the result is 0
So when you specify count (*) or count (1) or no matter count (‘anything ‘), the result will be the same, because these values are not null, as shown in Figure 2.
Figure 2. As long as a non null expression is specified in count, the result makes no difference
What about the count column?
For count (column), it is also applicable to the above rules. It evaluates whether the value of each row in the column is null. If it is null, it will not count, and if it is not null, it will count. Therefore, count (column) is a count in which the accounting column or its combination is not empty.
How to implement count (*)?
I mentioned count earlier ( ) When there are non null values, you only need to find out the number of non null rows in the specific table in SQL server, that is, all rows (if all the values of a row are null, then the row is equivalent to non existence). The simplest way is to find a column that is not null. If the column has an index, the index will be used. Of course, for performance, SQL server will select the narrowest index to reduce io.
We delete all the nonclustered indexes from the [person]. [address] table of the AdventureWorks 2012 sample database, and create an index on the modifydate column whose data type is datetime. Let’s look at the execution plan, as shown in Figure 3
Figure 3. Index using createDate
We continue to build an index on the stateprovinceid column, which is an int column, accounting for 4 bytes, shorter than the previous 8-byte datetime type column, so SQL Server chose the stateprovinceid index. As shown in Figure 4.
Figure 4. A shorter stateprovinceid index is selected
Therefore, if count (*) is used more in a table, it will greatly improve the performance to consider establishing a single column index in the shortest column.