Analysis of the problem that index search becomes index scan in SQL Server

Time:2020-11-20

What happens in SQL server that causes its execution plan to change from index seek to index scan? The following test, summary and induction are made from several aspects combined with the context of specific scenarios.

1: Implicit conversion causes execution plan to change from index seek to index scan

Implicit Conversion will cause index scan instead of index seek. While implicit conversions occur in SQL Server to allow data evaluations against different data types, they can introduce performance problems for specific data type conversions that result in an index scan occurring during the execution.  Good design practices and code reviews can easily prevent implicit conversion issues from ever occurring in your design or workload. 

As an example, the HumanResources.Employee Table, because the field type of nationalidnumber is nvarchar, the following SQL is implicitly converted, resulting in index scan


SELECT NationalIDNumber, LoginID 
FROM HumanResources.Employee 
WHERE NationalIDNumber = 112457891 

clipboard

We can avoid implicit conversion in SQL in two ways:

1: Make sure that the two you are comparing have the same data type.

2: Use explicit conversion.

By ensuring that the data types of the two comparisons are the same, we can make SQL index seek, as shown below


SELECT nationalidnumber,
    loginid
FROM  humanresources.employee
WHERE nationalidnumber = N'112457891' 

clipboard[1]

Note: not all implicit conversions will result in index scan. Implicit conversions that cause index scans are described in the implicit conversions that cause index scans. As shown in the figure below, I will not give too much introduction here.

clipboard[2]

clipboard[3]

Some measures and methods to avoid implicit transformation

1: Good design and code specification (early stage)

2: Rereview publishing scripts (Interim)

3: Query implicitly converted SQL via script (later)

The following is the SQL statement that searches the database for implicit conversions from the execution plan


SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
DECLARE @dbname SYSNAME 
SET @dbname = QUOTENAME(DB_NAME());
WITH XMLNAMESPACES 
  (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan') 
SELECT 
  stmt.value('(@StatementText)[1]', 'varchar(max)'), 
  t.value('(ScalarOperator/Identifier/ColumnReference/@Schema)[1]', 'varchar(128)'), 
  t.value('(ScalarOperator/Identifier/ColumnReference/@Table)[1]', 'varchar(128)'), 
  t.value('(ScalarOperator/Identifier/ColumnReference/@Column)[1]', 'varchar(128)'), 
  ic.DATA_TYPE AS ConvertFrom, 
  ic.CHARACTER_MAXIMUM_LENGTH AS ConvertFromLength, 
  t.value('(@DataType)[1]', 'varchar(128)') AS ConvertTo, 
  t.value('(@Length)[1]', 'int') AS ConvertToLength, 
  query_plan 
FROM sys.dm_exec_cached_plans AS cp 
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp 
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS batch(stmt) 
CROSS APPLY stmt.nodes('.//Convert[@Implicit="1"]') AS n(t) 
JOIN INFORMATION_SCHEMA.COLUMNS AS ic 
  ON QUOTENAME(ic.TABLE_SCHEMA) = t.value('(ScalarOperator/Identifier/ColumnReference/@Schema)[1]', 'varchar(128)') 
  AND QUOTENAME(ic.TABLE_NAME) = t.value('(ScalarOperator/Identifier/ColumnReference/@Table)[1]', 'varchar(128)') 
  AND ic.COLUMN_NAME = t.value('(ScalarOperator/Identifier/ColumnReference/@Column)[1]', 'varchar(128)') 
WHERE t.exist('ScalarOperator/Identifier/ColumnReference[@Database=sql:variable("@dbname")][@Schema!="[sys]"]') = 1

2: Non Sarg predicates cause the execution plan to change from index seek to index scan

Sarg (searchable arguments) is also called query parameter. It is defined as an operation to limit search, because it usually refers to a specific match, a match within a range of values or an and connection of two or more conditions. The most typical case of statements that do not satisfy Sarg form is that they include non operator statements, such as: not,! =, < >;,! <;,! >; not exists, not in, not like, etc. in addition, there are also statements such as using functions and predicates for operations in predicates.

2.1: index scan will be caused by using function of index field


SELECT nationalidnumber,
    loginid
FROM  humanresources.employee
WHERE SUBSTRING(nationalidnumber,1,3) = '112'

clipboard[4]

2.2 operation of index field will cause index scan

An operation on an index field field causes the execution plan to change from index seek to index scan:


SELECT * FROM Person.Person WHERE BusinessEntityID + 10 < 260

clipboard[5]

Generally, this situation should be avoided as much as possible. If possible, logical conversion of SQL should be carried out as much as possible (as shown below). Although this example seems very simple, but in practice, we have seen many such cases, just like many people know that smoking is harmful to health, but they just can’t quit! Many people may know this, but they still make this mistake in practice. That’s the truth!


SELECT * FROM Person.Person WHERE BusinessEntityID < 250

clipboard[6]

2.3 like fuzzy query results in index scan

Whether the like statement belongs to Sarg depends on the type of wildcard used. Like ‘condition%’ belongs to Sarg, and like ‘condition’ belongs to non Sarg predicate operation


SELECT * FROM Person.Person WHERE LastName LIKE 'Ma%'

clipboard[7]


SELECT * FROM Person.Person WHERE LastName LIKE '%Ma%'

clipboard[8]

3: SQL query returns data pages that have reached tipping point, which will cause index scan or table scan


What is the tipping point?
It's the point where the number of rows returned is "no longer selective enough". SQL Server chooses NOT to use the nonclustered index to look up the corresponding data rows and instead performs a table scan.

As for tipping point, let’s not tangle with the concept, but start with a vivid example


SET NOCOUNT ON;
DROP TABLE TEST
CREATE TABLE TEST (OBJECT_ID INT, NAME VARCHAR(8));
CREATE INDEX PK_TEST ON TEST(OBJECT_ID)
DECLARE @Index INT =1;
WHILE @Index <= 10000
BEGIN
  INSERT INTO TEST
  SELECT @Index, 'kerry';
  SET @Index = @Index +1;
END
UPDATE STATISTICS TEST WITH FULLSCAN;
SELECT * FROM TEST WHERE OBJECT_ID= 1

As shown above, when we query object_ When id = 1, the optimizer uses index seek

clipboard[9]

Upper object_ There is only one data with id = 1, if object_ What happens if the data with id = 1 reaches 20% of the total data in the whole table? We can update 2001 data manually. At this point, the execution plan of SQL becomes table scan.


UPDATE TEST SET OBJECT_ID =1 WHERE OBJECT_ID<=2000;
UPDATE STATISTICS TEST WITH FULLSCAN;
SELECT * FROM TEST WHERE OBJECT_ID= 1

clipboard[10]

clipboard[11]

The critical point determines whether SQL server uses a bookmark lookup or a full table / index scan. This also means that the critical point is only related to non covering, nonclustered indexes (emphasis).

Why is the tipping point interesting?
It shows that narrow (non-covering) nonclustered indexes have fewer uses than often expected (just because a query has a column in the WHERE clause doesn’t mean that SQL Server’s going to use that index)
It happens at a point that’s typically MUCH earlier than expected… and, in fact, sometimes this is a VERY bad thing!
Only nonclustered indexes that do not cover a query have a tipping point. Covering indexes don’t have this same issue (which further proves why they’re so important for performance tuning)
You might find larger tables/queries performing table scans when in fact, it might be better to use a nonclustered index. How do you know, how do you test, how do you hint and/or force… and, is that a good thing?

4: Missing or incorrect statistics can cause index scan

Missing or incorrect statistics can easily turn index seek into index scan. This is easy to understand, but it is difficult to construct such a case. I didn’t think of it at the moment, so I’ll skip it here.

5: Predicates that are not the first column of a federated index cause an index scan

SELECT * INTO Sales.SalesOrderDetail_Tmp FROM Sales.SalesOrderDetail;
CREATE INDEX PK_SalesOrderDetail_Tmp ON Sales.SalesOrderDetail_Tmp(SalesOrderID, SalesOrderDetailID);
UPDATE STATISTICS  Sales.SalesOrderDetail_Tmp WITH FULLSCAN;

The results of the following SQL statement are consistent, but the second SQL statement scans the index because the predicate is not the first column of the federated index


SELECT * FROM Sales.SalesOrderDetail_Tmp
WHERE SalesOrderID=43659 AND SalesOrderDetailID<10

clipboard[12]


SELECT * FROM Sales.SalesOrderDetail_Tmp WHERE SalesOrderDetailID<10

clipboard[13]