Evils of using function on an Index Column in the WHERE clause– Tip 2: Sql Server 101 Performance Tuning Tips and Tricks

Using a function on an Indexed Column in the WHERE clause leads to an Index/Table Scan instead of an Index Seek. This issue occurs in majority of the scenarios, but in some cases Sql Server does Index Seek even when you are using function on an indexed column in the WHERE clause.

Let us understand how using function on an Indexed Column in the WHERE clause results in the performance issue with an example. Let us create a Customer table as shown in the below image with sample one million records by executing the following script.

Performance Issue Function on Index Column Customers Demo Table

CREATE DATABASE SqlHints101PerfTips2
GO
USE SqlHints101PerfTips2
GO
--Create Demo Table Customers
CREATE TABLE dbo.Customers (
	CustomerId INT IDENTITY(1,1) NOT NULL,
    FirstName NVARCHAR(50), 
	LastName  NVARCHAR(50),
	PhoneNumber VARCHAR(10),
	EmailAddress NVARCHAR(50),
	CreationDate DATETIME
)
GO
--Populate 1 million dummy customer records
INSERT INTO dbo.Customers (FirstName, LastName, PhoneNumber, EmailAddress, CreationDate)
SELECT TOP 1000000 REPLACE(NEWID(),'-',''), REPLACE(NEWID(),'-',''), 
	CAST( CAST(ROUND(RAND(CHECKSUM(NEWID()))*1000000000+4000000000,0) AS BIGINT) AS VARCHAR(10)),
	REPLACE(NEWID(),'-','') + '@gmail.com',		
	DATEADD(HOUR,CAST(RAND(CHECKSUM(NEWID())) * 19999 as INT) + 1 ,'2006-01-01')
FROM sys.all_columns c1
		CROSS JOIN sys.all_columns c2
GO
--Update one customer record with some know values
UPDATE dbo.Customers 
SET FirstName = 'Basavaraj', LastName = 'Biradar', 
	PhoneNumber = '4545454545', EmailAddress = 'basav@gmail.com'
WHERE CustomerId = 100000
GO
--Create a PK and a Clustered Index on CustomerId column
ALTER TABLE dbo.Customers 
ADD CONSTRAINT PK_Customers_CustomerId 
PRIMARY KEY CLUSTERED (CustomerId)
GO

Create a Non-Clustered index on the DateOfBirth column of the Customers Table by executing following statement:

--Create a non-clustered index on FirstName column
CREATE NONCLUSTERED INDEX IX_Customers_FirstName
	on dbo.Customers (FirstName)
GO

First let us now enable the execution plan in the Sql Server Management Studio by pressing the key stroke CTRL + M and also enable the IO and TIME statistics by executing the following statement

SET STATISTICS IO,TIME ON

Use Case Scenario

Asssume that we need to write a query to return all the customers whose FirstName starts with Basav. For this requirement we can write a query like below:

SELECT *
FROM  dbo.Customers
WHERE LEFT(FirstName,5) = 'Basav'

Let us execute the above query and observe the result, execution plan and IO and Time Statistics.

RESULT:
Performance Issue Function on Index Column Result

Let us go to the Execution Plan tab of the result and see the execution plan.

Execution Plan
Performance Issue Function on Index Column Execution Plan

From the above execution plan, we can see that the index IX_Customers_FirstName on FirstName column is used, but it is doing an Index Scan instead of Index Seek. Let us hover the mouse over the Index Scan node in the execution plan and and observe the node properties

Performance Issue Function on Index Column Execution Plan Properties

In the node properties we can clearly see that index IX_Customers_FirstName on FirstName column is used but it is doing an index scan, because of this we can see Number of Rows Read as one million records.

Let us go to Messages tab of the result and go over the IO and TIME statistics.

IO and TIME Statistics
Performance Issue Function on Index Column IO and TIME

Why Index Scan?

Using function on the Index column in the where caluse causes the function to be evaluated against each row and thus forcing optimizer not to use the Index.

What is the solution?

One solution in this case is to re-write the query using LIKE clause as below instead of using the LEFT function.

SELECT *
FROM  dbo.Customers
WHERE FirstName LIKE 'Basav%' 

Execute the above re-written query and observe the Result, execution plan and IO and Time stats

RESULT:
Performance Issue Function on Index Column Result After Fix

From the above result we can see that there is no change in the result returned from the re-written query.

Execution Plan
Performance Issue Function on Index Column Execution Plan After Fix

Now from the execution plan we can see that it is doing the Index Seek of the non-clustered index IX_Customers_FirstName on FirstName column. Also the execution plan is very simplified and un-necessary parallelism and other overheads are no-longer present. Let us hover the mouse over the Index Seek node in the execution plan and and observe the node properties

Performance Issue Function on Index Column Execution Plan Properties After Fix

In the properties pop-up now we see the seek of the index IX_Customers_FirstName on FirstName column and also Number of Rows Read has reduced from one million rows to one row.

IO and Time Statistics
Performance Issue Function on Index Column IO and TIME After Fix

We can see that Logical reads have been reduced from 10341 to 8, CPU time has been reduced from 233 millseconds to 0 milliseconds and query elapsed time has reduced from 173 milliseconds to 71 milliseconds.

Comparison of the Execution Plan

Side By Side Comparision

From the above execution plan comparison, we can see that query with function on an index column has 99% cost, where as the query without function on an index column has 1% cost

Execute the following statement to stop displaying the IO and TIME statistics when query is executed

SET STATISTICS IO,TIME OFF

Conclusion:

In Sql Server in majority of the scenario using a function on an Indexed Column in the WHERE caluse leads to an Index/Table Scan instead of an Index Seek. If you are using function on an index column in the WHERE clause, verify the execution plan and if it is resulting in un-desired results then try to re-write the query without using the function on a Index Column in the WHERE clause.

Implicit conversion an evil for Index – Tip 1: Sql Server 101 Performance Tuning Tips and Tricks

In Sql Server when we compare a column of lower data type to a value of higher data type, then it leads to an implicit conversion of the column of lower data type to the type of the value which we are comparing. This implicit conversion will result in Index Scan instead of an Index Seek and resulting in performance issue. This is one of the very common mistake resulting in performance bottleneck.

Let us understand how the implicit conversion results in a performance issue with an example. Let us create a Customer table as shown in the below image with sample one million records by executing the following script.

Implicit Conversion Performance Tip 1 Customers Table

--Create Demo Database SqlHints101PerfTips
CREATE DATABASE SqlHints101PerfTips
GO
USE SqlHints101PerfTips
GO
--Create Demo Table Customers
CREATE TABLE dbo.Customers (
	CustomerId INT IDENTITY(1,1) NOT NULL,
    FirstName VARCHAR(50), 
	LastName  VARCHAR(50),
	PhoneNumber VARCHAR(10),
	EmailAddress VARCHAR(50),
    CreationDate DATETIME
)
GO
--Populate 1 million dummy customer records
INSERT INTO dbo.Customers (FirstName, LastName, PhoneNumber, EmailAddress, CreationDate)
SELECT TOP 1000000 REPLACE(NEWID(),'-',''), REPLACE(NEWID(),'-',''), 
	CAST( CAST(ROUND(RAND(CHECKSUM(NEWID()))*1000000000+4000000000,0) AS BIGINT) AS VARCHAR(10)),
	REPLACE(NEWID(),'-','') + '@gmail.com',		
	DATEADD(HOUR,CAST(RAND(CHECKSUM(NEWID())) * 19999 as INT) + 1 ,'2006-01-01')
FROM sys.all_columns c1
		CROSS JOIN sys.all_columns c2
GO
--Update one customer record with some know values
UPDATE dbo.Customers 
SET FirstName = 'Basavaraj', LastName = 'Biradar', 
	PhoneNumber = '4545454545', EmailAddress = 'basav@gmail.com'
WHERE CustomerId = 100000
GO
--Create a PK and a Clustered Index on CustomerId column
ALTER TABLE dbo.Customers 
ADD CONSTRAINT PK_Customers_CustomerId 
PRIMARY KEY CLUSTERED (CustomerId)
GO

Create a Non-Clustered index on the EmailAddress column of the Customers Table by executing following statement:

CREATE NONCLUSTERED INDEX IX_Customers_EmailAddress
	on dbo.Customers (EmailAddress)

Execute the following script to create stored procedure to return customer details for the given input email address

CREATE PROCEDURE dbo.GetCustomerDetailsByEmailAddress
	@EmailAddress NVARCHAR(50)
AS
BEGIN
	SELECT *
	FROM dbo.Customers
	WHERE EmailAddress = @EmailAddress
END

First let us now enable the execution plan in the Sql Server Management Studio by pressing the key stroke CTRL + M and also enable the IO and TIME statistics by executing the following statement

SET STATISTICS IO,TIME ON

Let us now execute the stored procedure GetCustomerDetailsByEmailAddress to return the details for the customer whose email address is basav@gmail.com.

EXEC GetCustomerDetailsByEmailAddress 'basav@gmail.com'

RESULT:
Implicit Conversion Performance Tip 1 Result 1

Let us go to the Execution Plan tab of the result and see the execution plan.

Execution Plan
Implicit Conversion causing Index Scan

From the above execution plan, we can see that the index IX_Customers_EmailAddress on EmailAddress column is used, but it is doing an Index Scan instead of Index Seek. Let us hover over the Index Scan node in the execution plan and and observe the node properties.


Implicit Conversion Causing Index Scan details

In the node properties we can clearly see an implicit conversion of the EmailAddress column of the Customers table to NVARCHAR(50) data type and scan of the Index IX_Customers_EmailAddress on the EmailAddress column.

Let us go to Messages tab of the result and go over the IO and TIME statistics.

IO and TIME Statistics
Implicit Conversion Resulting in Index Scan IO

Why Index Scan?

The stored procedure parameter @EmailAddress is of type NVARCHAR(50), where as the EmailAddress column in the Customers table is of type VARCHAR(50). As NVARCHAR is an higher data type compared to the VARCHAR data type, so sql is trying to convert the EmailAddress column value to NVARCHAR before comparing it with the NVARCHAR @EmailAddress parameter value

What is the solution?

One solution is to change the stored procedures input parameter @EmailAddress datatype from NVARCHAR(50) to VARCHAR(50) which matches with EmailAddress column data type in the Customer table. Alter the above stored procedure by executing the following statement:

ALTER PROCEDURE dbo.GetCustomerDetailsByEmailAddress
    @EmailAddress VARCHAR(50)
AS
BEGIN
    SELECT *
    FROM dbo.Customers
    WHERE EmailAddress = @EmailAddress
END

Now re-execute the stored procedure by executing the following statement

EXEC GetCustomerDetailsByEmailAddress 'basav@gmail.com'

RESULT:
Implicit Conversion Performance Tip 1 Result After Fix

From the above result we can see that there is no change in from the data returned by the SP.

Execution Plan
Implicit Conversion Execution plan After Fix

Now from the execution plan we can see that it is doing the Index Seek of the non-clustered index IX_Customers_EmailAddress on the EmailAddress column. Also the execution plan is very simplified and un-necessary parallelism and other overheads are no-longer present.

Implicit Conversion Execution plan properties After Fix

In the properties pop-up now we don’t see any implicit conversion and also Estimated number of rows to read also changed from one million to one.

IO and Time Statistics
Implicit Conversion Resulting in Index Scan IO After fixing

We can see that Logical reads have been reduced from 7225 to 6 and also CPU time has been reduced from 280 millseconds to 0 milliseconds. And also we don’t see the use of the WorkTable.

Conclusion:

In Sql Server when we compare a column of lower data type to a value of higher data type it leads to an implicit conversion of the column of lower data type to the type of the value which we are comparing. This implicit conversion will result in Index Scan instead of an Index Scan and resulting in performance issue. To avoid such performance issues, we should always make sure that the type of parameter and table column type should always match.

Difference between INT and BIGINT data type in Sql Server

Both INT and BIGINT are exact numeric data types, used for storing integer value. Below table lists out the major difference between INT and BIGINT Data Types.

[ALSO READ] TINYINT Vs SMALLINT

INT

BIGINT

Storage Size 4 Bytes 8 Bytes
Minimum Value -2,147,483,648 (-2^31) -9,223,372,036,854,775,808 (-2^63)
Maximum Value 2,147,483,647 (2^31-1) 9,223,372,036,854,775,807 (2^63-1)
Usage Example
DECLARE @i INT
SET @i = 150
PRINT @i

RESULT:
150

DECLARE @i BIGINT
SET @i = 150
PRINT @i

RESULT:
150

Example of Storage Size used by the variable to store the value
DECLARE @i INT
SET @i = 150
PRINT DATALENGTH( @i)

RESULT:
4

DECLARE @i BIGINT
SET @i = 150
PRINT DATALENGTH( @i)

RESULT:
8

Example of INT out of range value
DECLARE @i INT
SET @i = 2147483648
PRINT @i

RESULT:

Msg 8115, Level 16, State 2, Line 2
Arithmetic overflow error converting expression to data type int.

DECLARE @i BIGINT
SET @i = 2147483648
PRINT @i

RESULT:
2147483648

Try to store Negative value
DECLARE @i INT
SET @i = -150
PRINT @i

RESULT:
-150

DECLARE @i BIGINT
SET @i = -150
PRINT @i

RESULT:
-150

[ALSO READ] SMALLINT Vs INT

Selecting the correct data type while creating a table is very critical. In-correct selection of the data type will result in performance and storage issues over the time as the data grows. As in-correct selection of data type results requiring more storage space to store and no. of records stored in each data page will be less. And on top if index is created on such columns, it not only takes the extra space in storing the value in a row in the data page but also requires extra space in the index. Less the no. of records stored in the data page, then to serve the queries Sql Server needs to load more no. of data pages to the memory. So, it is very crucial to select the correct data type while creating table. Hope the above differences will help you in selecting the correct data type while creating the table.

ALSO READ

Difference between SMALLINT and INT data type in Sql Server

Both SMALLINT and INT are exact numeric data types, used for storing integer value. Below table lists out the major difference between SMALLINT and INT Data Types.

[ALSO READ] TINYINT Vs SMALLINT

SMALLINT

INT

Storage Size 2 Bytes 4 Bytes
Minimum Value -32,768 (-2^15) -2,147,483,648 (-2^31)
Maximum Value 32,767 (2^15-1) 2,147,483,647 (2^31-1)
Usage Example
DECLARE @i SMALLINT
SET @i = 150
PRINT @i

RESULT:
150

DECLARE @i INT
SET @i = 150
PRINT @i

RESULT:
150

Example of Storage Size used by the variable to store the value
DECLARE @i SMALLINT
SET @i = 150
PRINT DATALENGTH( @i)

RESULT:
2

DECLARE @i INT
SET @i = 150
PRINT DATALENGTH( @i)

RESULT:
4

Example of SMALLINT out of range value
DECLARE @i SMALLINT
SET @i = 32768
PRINT @i

RESULT:

Msg 220, Level 16, State 1, Line 2
Arithmetic overflow error for data type smallint, value = 32768.

DECLARE @i INT
SET @i = 32768
PRINT @i

RESULT:
32768

Try to store Negative value
DECLARE @i SMALLINT
SET @i = -150
PRINT @i

RESULT:
-150

DECLARE @i INT
SET @i = -150
PRINT @i

RESULT:
-150

Example of both SMALLINT and INT out of range value
DECLARE @i SMALLINT
SET @i = 2147483648
PRINT @i

RESULT:

Msg 8115, Level 16, State 2, Line 2
Arithmetic overflow error converting expression to data type smallint.

DECLARE @i INT
SET @i = 2147483648
PRINT @i

RESULT:

Msg 8115, Level 16, State 2, Line 2
Arithmetic overflow error converting expression to data type int.

[ALSO READ] TINYINT Vs INT

Selecting the correct data type while creating a table is very critical. In-correct selection of the data type will result in performance and storage issues over the time as the data grows. As in-correct selection of data type results requiring more storage space to store and no. of records stored in each data page will be less. And on top if index is created on such columns, it not only takes the extra space in storing the value in a row in the data page but also requires extra space in the index. Less the no. of records stored in the data page, then to serve the queries Sql Server needs to load more no. of data pages to the memory. For example: for table column, which stores state_id, choosing an INT data type instead of TINYINT or SMALLINT column is in-efficient as the number of states in a country in worst case scenario too never exceeds a three-digit number. So, for state_id column if we choose INT data type then it will always take 4 bytes for storing it irrespective of the value stored in it. Whereas TINYINT would have taken 1 byte for storing the same value and SMALLINT would have taken 2 bytes. So, it is very crucial to select the correct data type while creating table. Hope the above differences will help you in selecting the correct data type while creating the table.

ALSO READ

Difference between TINYINT and SMALLINT data type in Sql Server

Both TINYINT and SMALLINT are exact numeric data types, used for storing integer value. Below table lists out the major difference between TINYINT and SMALLINT Data Types.

[ALSO READ] TINYINT Vs INT

TINYINT

SMALLINT

Storage Size 1 Byte 2 Bytes
Minimum Value 0 -32,768 (-2^15)
Maximum Value 255 32,767 (2^15-1)
Usage Example
DECLARE @i TINYINT
SET @i = 150
PRINT @i

RESULT:
150

DECLARE @i SMALLINT
SET @i = 150
PRINT @i

RESULT:
150

Example of Storage Size used by the variable to store the value
DECLARE @i TINYINT
SET @i = 150
PRINT DATALENGTH( @i)

RESULT:
1

DECLARE @i SMALLINT
SET @i = 150
PRINT DATALENGTH( @i)

RESULT:
2

Example of TINYINT out of range value
DECLARE @i TINYINT
SET @i = 260
PRINT @i

RESULT:

Msg 220, Level 16, State 2, Line 2
Arithmetic overflow error for data type tinyint, value = 260.

DECLARE @i SMALLINT
SET @i = 260
PRINT @i

RESULT:
260

Try to store Negative value
DECLARE @i TINYINT
SET @i = -150
PRINT @i

RESULT:

Msg 220, Level 16, State 2, Line 2
Arithmetic overflow error for data type tinyint, value = -150.

DECLARE @i SMALLINT
SET @i = -150
PRINT @i

RESULT:
-150

Example of both TINYINT and SMALLINT out of range value
DECLARE @i TINYINT
SET @i = 32768
PRINT @i

RESULT:

Msg 220, Level 16, State 2, Line 2
Arithmetic overflow error for data type tinyint, value = 32768.

DECLARE @i SMALLINT
SET @i = 32768
PRINT @i

RESULT:

Msg 220, Level 16, State 1, Line 2
Arithmetic overflow error for data type smallint, value = 32768.

[ALSO READ] SMALLINT Vs INT

Selecting the correct data type while creating a table is very critical. In-correct selection of the data type will result in performance and storage issues over the time as the data grows. As in-correct selection of data type results requiring more storage space to store and no. of records stored in each data page will be less. And on top if index is created on such columns, it not only takes the extra space in storing the value in a row in the data page but also requires extra space in the index. Less the no. of records stored in the data page, then to serve the queries Sql Server needs to load more no. of data pages to the memory. For example: for table column, which stores state_id, choosing an INT data type instead of TINYINT or SMALLINT column is in-efficient as the number of states in a country in worst case scenario too never exceeds a three-digit number. So, for state_id column if we choose INT data type then it will always take 4 bytes for storing it irrespective of the value stored in it. Whereas TINYINT would have taken 1 byte for storing the same value and SMALLINT would have taken 2 bytes. So, it is very crucial to select the correct data type while creating table. Hope the above differences will help you in selecting the correct data type while creating the table.

ALSO READ