In today’s article, I want to show you how to communicate your work and thinking process to the query optimizer when you want to create an index design for a specific query. Now let’s discuss it together!
Query with questions
Let’s look at the following query:
DECLARE @i INT = 999 SELECT SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, LineTotal FROM Sales.SalesOrderDetail WHERE ProductID < @i ORDER BY CarrierTrackingNumber GO
As you can see, we use a local variable and a predicate that is not equal to the predicate Sales.SalesOrderDetail Table to get some records. When you execute that query and look at its execution plan, you will find that it has some serious problems
- SQL server needs to scan Sales.SalesOrderDetail The entire nonclustered index of the table because there are no supported nonclustered indexes. For this scan, the query needs 1382 logical reads, and the running time is nearly 800 ms.
- The query optimizer introduces the filter operator into the query plan, which makes a row by row comparison to check the matching rows (productid < @ I)
- Because of order by carriertrackingnumber, a sort operator is introduced in the execution plan.
- Sorting operators spread to tempdb because of incorrect cardinality estimation. With the combination of local variables and predicates, SQL Server estimates 30% of the rows from the base hard code of the table. In our case, the estimated number of rows is 36395 (121317 * 30%). In fact, the query returns 120621 rows, which means that the sort operator has to spread to tempdb because the requested memory grant is too small.
Now I ask you – can you improve this query? What’s your suggestion? Take a break and think for a few minutes. How can you improve the query without modifying the query itself?
Let’s debug the query!
Of course, we need to do index related adjustments to improve. Without a supported nonclustered index, it is the only query optimizer that can run our query using a schedule. But what is a good nonclustered index for the specified query? Generally speaking, I look at search predicates to think about possible non clustered quick printing. In our example, the search predicate is as follows:
WHERE ProductID < @i
We request the rows to be filtered in the productid column. So we want to create a supported nonclustered index on that column. We index:
CREATE NONCLUSTERED INDEX idx_Test ON Sales.SalesOrderDetail(ProductID) GO
After the nonclustered index is created, we need to verify the changes, so we execute the query code again. What’s the result? The query optimizer doesn’t use the nonclustered index we just created! We create a supported nonclustered index on the search predicate, which is not referenced by the query optimizer? Usually people have no idea about it. In fact, we can prompt the query optimizer to use nonclustered indexes to better understand why the query optimizer does not automatically select indexes
DECLARE @i INT = 999 SELECT SalesOrderID, SalesOrderDetailID, CarrierTrackingNumber, OrderQty, LineTotal FROM Sales.SalesOrderDetail WITH (INDEX(idx_Test)) WHERE ProductID < @i ORDER BY CarrierTrackingNumber GO
When you look at the execution plan now, you will see the following wildness – a parallel plan:
The query cost 370109 logical reads! The running time is basically the same as just now. What’s going on here? When you look at the execution plan carefully, you will find that the query optimizer introduces bookmark search, because the nonclustered index created just now is not an overlay nonclustered index for the query. The query goes beyond the so-calledTipping pointBecause we use the current search predicate to get almost all the lines. So it doesn’t make sense to combine nonclustered indexes and bookmark lookups.
Don’t think about why the query optimizer doesn’t select the nonclustered index created just now. We have expressed our ideas to the query optimizer itself. We have asked the query optimizer through the query prompt why the nonclustered index is not automatically selected. As I said at the beginning: I don’t want to think too much.
We use nonclustered indexes to solve this problem. At the leaf level of nonclustered indexes, we have to include the extra columns requested from the select list. You can look at the bookmark again to find out which columns are currently missing in the leaf layer
We rebuild the nonclustered index
CREATE NONCLUSTERED INDEX idx_Test ON Sales.SalesOrderDetail(ProductID) INCLUDE (CarrierTrackingNumber, OrderQty, UnitPrice, UnitPriceDiscount) WITH ( DROP_EXISTING = ON ) GO
We’ve made another change, so we can rerun the query to verify it. But this time we don’t add a query prompt, because now the query optimizer will automatically select nonclustered indexes. What’s the result? When you look at the execution plan, the index is now selected.
SQL server now performs a lookup operation on the nonclustered index, but we also have a sort operator in the execution plan. Because of the 30% hard coding of cardinality calculation, sort still needs to spread to tempdb. My God! Our logic read has dropped to 757, but the running time is still nearly 800 milliseconds. What should you do now?
Now we can try to include the carriertrackingnumber column directly in the navigation structure of the nonclustered index. This is the column where SQL Server sorts operators. When we add this column (as the primary key) to the nonclustered index, we sort that column physically, so the sort operator should disappear. As a positive side effect, it will not spread to tempdb. In the execution plan, there is no operator concerned with wrong cardinality calculation. So we try that assumption and rebuild the nonclustered index again
CREATE NONCLUSTERED INDEX idx_Test ON Sales.SalesOrderDetail(CarrierTrackingNumber, ProductID) INCLUDE (OrderQty, UnitPrice, UnitPriceDiscount) WITH ( DROP_EXISTING = ON ) GO
As you can see from the index definition, we have now physically pre sorted the data of the carriertrackingnumber and productid columns. When you re execute the query again, when you look at the execution plan, you will see that the sort operator has disappeared, and SQL server has scanned the entire leaf layer of the nonclustered index (using the residual predicate as the search predicate).
This implementation plan is not bad! We only need 763 logical reads, and now the running time has been reduced to 600 milliseconds. Compared with just now, it has 25% improvement! But: the query optimizer suggests a better nonclustered index by missing index recommendations! For the time being, let’s create the recommended nonclustered index:
CREATE NONCLUSTERED INDEX [SQL Server doesn't care about names, why I should care about names?] ON [Sales].[SalesOrderDetail] ([ProductID]) INCLUDE ([SalesOrderID],[SalesOrderDetailID],[CarrierTrackingNumber],[OrderQty],[LineTotal]) GO
When you re execute the original query now, you will find something surprising: the query optimizer uses the nonclustered index created by “we” just now, and the lack of index suggestions has disappeared!
You have just created an index that SQL server never uses – except for insert, update and delete statements, SQL server has to maintain your nonclustered index. For your database, you just created a “pure” space wasting index. On the other hand, you’ve satisfied the query optimizer by eliminating missing index suggestions. But that’s not the goal: the goal is to create an index that will be used again.
Conclusion: never trust query optimizer!
Today’s article is a little controversial, but I want you to show me how the query optimizer helps you when you create an index, and how the query optimizer fooles you. So it’s important to make small adjustments, run your query immediately, and verify the changes.
The above is the whole content of this article, I hope to help you learn.