The difference and execution of select count (*) and count (1) in SQL Server

Time:2021-5-10

In SQL server, count (*) or count (1) or count ([column]) is probably the most commonly used aggregate function. In fact, many people can’t distinguish the three. This article will elaborate the function, relationship and the underlying principle of the three.

Usually, I often see some so-called optimization suggestions that use count (1) instead of count (*) to improve performance. The reason given is that count (*) will bring full table scanning. In fact, how to write count doesn’t make any difference.

Count (1) and count (*) actually mean to evaluate whether the expression in count () is null. If it is null, it will not count, and if it is not null, it will count. For example, as shown in code 1, null is specified in count (the optimizer does not allow null to be specified explicitly, so it needs to be assigned to a variable to specify it).


DECLARE @xx INT
SET @xx=NULL
 
SELECT COUNT(@xx) FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]

Null is specified in code listing 1. Count

 

Since all rows are null, the result is not counted to 0, as shown in Figure 1.

image

Figure 1. Obviously, the result is 0

 

So when you specify count (*) or count (1) or no matter count (‘anything ‘), the result will be the same, because these values are not null, as shown in Figure 2.

image

Figure 2. As long as a non null expression is specified in count, the result makes no difference

What about the count column?

For count (column), it is also applicable to the above rules. It evaluates whether the value of each row in the column is null. If it is null, it will not count, and if it is not null, it will count. Therefore, count (column) is a count in which the accounting column or its combination is not empty.

 

How to implement count (*)?

As mentioned earlier, when count() has a non null value, you only need to find the number of non null rows in the specific table in SQL server, that is, all rows (if a row has all null values, then the row does not exist). The simplest way is to find a column that is not null. If the column has an index, the index will be used. Of course, for performance, SQL server will select the narrowest index to reduce io.

We delete all the nonclustered indexes from the [person]. [address] table of the AdventureWorks 2012 sample database, and create an index on the modifydate column whose data type is datetime. Let’s look at the execution plan, as shown in Figure 3

image

Figure 3. Index using createDate

 

We continue to build an index on the stateprovinceid column, which is an int column, accounting for 4 bytes, shorter than the previous 8-byte datetime type column, so SQL Server chose the stateprovinceid index. As shown in Figure 4.

image

Figure 4. A shorter stateprovinceid index is selected

 

Therefore, if count (*) is used more in a table, it will greatly improve the performance to consider establishing a single column index in the shortest column.