Using a function on an Indexed Column in the WHERE clause leads to an Index/Table Scan instead of an Index Seek. This issue occurs in majority of the scenarios, but in some cases Sql Server does Index Seek even when you are using function on an indexed column in the WHERE clause.
Let us understand how using function on an Indexed Column in the WHERE clause results in the performance issue with an example. Let us create a Customer table as shown in the below image with sample one million records by executing the following script.
CREATE DATABASE SqlHints101PerfTips2 GO USE SqlHints101PerfTips2 GO --Create Demo Table Customers CREATE TABLE dbo.Customers ( CustomerId INT IDENTITY(1,1) NOT NULL, FirstName NVARCHAR(50), LastName NVARCHAR(50), PhoneNumber VARCHAR(10), EmailAddress NVARCHAR(50), CreationDate DATETIME ) GO --Populate 1 million dummy customer records INSERT INTO dbo.Customers (FirstName, LastName, PhoneNumber, EmailAddress, CreationDate) SELECT TOP 1000000 REPLACE(NEWID(),'-',''), REPLACE(NEWID(),'-',''), CAST( CAST(ROUND(RAND(CHECKSUM(NEWID()))*1000000000+4000000000,0) AS BIGINT) AS VARCHAR(10)), REPLACE(NEWID(),'-','') + '@gmail.com', DATEADD(HOUR,CAST(RAND(CHECKSUM(NEWID())) * 19999 as INT) + 1 ,'2006-01-01') FROM sys.all_columns c1 CROSS JOIN sys.all_columns c2 GO --Update one customer record with some know values UPDATE dbo.Customers SET FirstName = 'Basavaraj', LastName = 'Biradar', PhoneNumber = '4545454545', EmailAddress = 'basav@gmail.com' WHERE CustomerId = 100000 GO --Create a PK and a Clustered Index on CustomerId column ALTER TABLE dbo.Customers ADD CONSTRAINT PK_Customers_CustomerId PRIMARY KEY CLUSTERED (CustomerId) GO
Create a Non-Clustered index on the DateOfBirth column of the Customers Table by executing following statement:
--Create a non-clustered index on FirstName column CREATE NONCLUSTERED INDEX IX_Customers_FirstName on dbo.Customers (FirstName) GO
First let us now enable the execution plan in the Sql Server Management Studio by pressing the key stroke CTRL + M and also enable the IO and TIME statistics by executing the following statement
SET STATISTICS IO,TIME ON
Use Case Scenario
Asssume that we need to write a query to return all the customers whose FirstName starts with Basav. For this requirement we can write a query like below:
SELECT * FROM dbo.Customers WHERE LEFT(FirstName,5) = 'Basav'
Let us execute the above query and observe the result, execution plan and IO and Time Statistics.
Let us go to the Execution Plan tab of the result and see the execution plan.
From the above execution plan, we can see that the index IX_Customers_FirstName on FirstName column is used, but it is doing an Index Scan instead of Index Seek. Let us hover the mouse over the Index Scan node in the execution plan and and observe the node properties
In the node properties we can clearly see that index IX_Customers_FirstName on FirstName column is used but it is doing an index scan, because of this we can see Number of Rows Read as one million records.
Let us go to Messages tab of the result and go over the IO and TIME statistics.
Why Index Scan?
Using function on the Index column in the where caluse causes the function to be evaluated against each row and thus forcing optimizer not to use the Index.
What is the solution?
One solution in this case is to re-write the query using LIKE clause as below instead of using the LEFT function.
SELECT * FROM dbo.Customers WHERE FirstName LIKE 'Basav%'
Execute the above re-written query and observe the Result, execution plan and IO and Time stats
From the above result we can see that there is no change in the result returned from the re-written query.
Now from the execution plan we can see that it is doing the Index Seek of the non-clustered index IX_Customers_FirstName on FirstName column. Also the execution plan is very simplified and un-necessary parallelism and other overheads are no-longer present. Let us hover the mouse over the Index Seek node in the execution plan and and observe the node properties
In the properties pop-up now we see the seek of the index IX_Customers_FirstName on FirstName column and also Number of Rows Read has reduced from one million rows to one row.
We can see that Logical reads have been reduced from 10341 to 8, CPU time has been reduced from 233 millseconds to 0 milliseconds and query elapsed time has reduced from 173 milliseconds to 71 milliseconds.
Comparison of the Execution Plan
From the above execution plan comparison, we can see that query with function on an index column has 99% cost, where as the query without function on an index column has 1% cost
Execute the following statement to stop displaying the IO and TIME statistics when query is executed
SET STATISTICS IO,TIME OFF
Conclusion:
In Sql Server in majority of the scenario using a function on an Indexed Column in the WHERE caluse leads to an Index/Table Scan instead of an Index Seek. If you are using function on an index column in the WHERE clause, verify the execution plan and if it is resulting in un-desired results then try to re-write the query without using the function on a Index Column in the WHERE clause.