This article explains the strategy for Indexing JSON value in Sql Server 2016 with examples, as we don’t have any special indexes like the XML indexes which we had for the XML value stored in the XML data type column.
[ALSO READ]:
- Native JSON Support in Sql Server 2016
- FOR JSON Clause in Sql Server 2016
- OPENJSON Function in Sql Server 2016
- ISJSON Function in Sql Server 2016
- JSON_VALUE Function in Sql Server 2016
- JSON_QUERY Function in Sql Server 2016
- lax and strict JSON Path modes in Sql Server 2016
If we are storing JSON data in a table column, then we may come across a scenario where we may need to retrieve only the records with specific JSON property value. For example if we are storing Customer address and phone details in JSON structure in a Varchar/Nvarchar column, then we may get a request to get all the customers whose phone member matches to the input value. To handle such use cases, creating an index on the JSON column is not the correct approach and of no use. As it indexes the complete JSON value like any other value in a Varchar/NVarchar column and we are looking for particular JSON Property value which is not at the beginning of the JSON string. And also it takes lot of additional storage space as the complete JSON value is indexed. So, creating such indexes is of no use.
But for JSON we do have an alternative way of Indexing JSON Property. This article explains how we can index a JSON property from the JSON string by example and also explains the storage implications if any.
Let us create a sample Database with a Customer table with 200,000 records as shown in the following image by the following script:
CREATE DATABASE SqlHintsJSONDemo GO USE SqlHintsJSONDemo GO CREATE TABLE dbo.Customer( Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY, Name NVARCHAR(50), Detail NVARCHAR(MAX)) GO INSERT INTO dbo.Customer ( Name, Detail ) SELECT TOP 200000 NEWID(), REPLACE('{"Address":{"State":"KA","Country":"India"}, "Phone":"@Phone"}', '@Phone', 100000000-ROW_NUMBER() OVER (ORDER BY SC1.object_id)) FROM SYS.all_columns SC1 CROSS JOIN SYS.all_columns SC2
Below is the Customer table storage usage details at this point of time.
Enable the IO and TIME statistics using the following script, to measure the performance of the queries:
--Enable Statistics SET STATISTICS IO ON SET STATISTICS TIME ON GO
Let us execute the below statement to get the details of the Customer whose phone number is 99890000
SELECT * FROM dbo.Customer WHERE JSON_VALUE(Detail,'$.Phone') = '99890000'
From the above result it is clear that fetching the customer data with this approach is resulting in higher number of IO and CPU time.
Let us now explore the solution for the above problem. The solution to this problem is create a non-persisted computed column where this column value will be the value of the Phone property from the JSON string and then index it. Let us understand this by performing these steps on the Customer table created above:
First add a non-persisted computed column PhoneNumber to the Customer table and the value of this computed column is the Phone property value extracted using the JSON_VALUE function from the Detail column having Address and Phone information stored in the JSON format.
ALTER TABLE dbo.Customer ADD PhoneNumber AS JSON_VALUE(Detail,'$.Phone')
Let us verify is there any storage implication of adding a computed column
From the results it is clear that a non-persisted computed column doesn’t take any additional storage space. Non-persisted computed column value is computed/evaluated at run-time.
Let us now try to get the details of the Customer whose phone number is 99890000 by using the computed column PhoneNumber.
SELECT * FROM dbo.Customer WHERE PhoneNumber = '99890000'
From the result it is clear that the computed column didn’t improve the IO and Time taken to execute the query.
Let us now create an Index on the Computed column PhoneNumber by using the following script:
CREATE INDEX IX_Customer_PhoneNumber ON dbo.Customer(PhoneNumber)
When you create this index you may get the below warning. The reason for the warning is Sql Server doesn’t know what will be the length of the value which will be extracted by the JSON_VALUE function. And the maximum length of the value that can be returned by JSON_VALUE function is NVARCHAR(4000).
Warning! The maximum key length for a nonclustered index is 1700 bytes.
The index ‘IX_Customer_PhoneNumber’ has maximum length of 8000 bytes.
For some combination of large values, the insert/update operation will fail.
Let us verify the Customer Table storage detail after adding index on the computed column:
From the result it is clear that there is no change the storage used for the table, it has taken only the extra storage for the index. So the computed column is still not persisted only in the index tree the computed value is persisted. Which is similar to creating index on any other table column.
Now let us verify whether we have any improvement in the performance of the query to retrieve the Customer details based on the Phone JSON property value (i.e. by the indexed computed column PhoneNumber).
SELECT * FROM dbo.Customer WHERE PhoneNumber = '99890000'
From the result it is clear that adding index on the computed column has drastically improved the IO and CPU time.
We can execute the below statement to disable the IO and TIME statistics, which we have enabled in the beginning of this article.
--Disable statistics SET STATISTICS IO OFF SET STATISTICS TIME OFF GO
Conclusion: If we have a need to retrieve record from table based on a particular property value in the JSON string column, in such scenario creating a computed column whose value is extracted by the JSON_VALUE function from the JSON string column and indexing is the optimal approach. It takes extra storage for the index just like indexing any other column. And the computed column is still not persisted even after creating an index on it. And while creating index it takes some extra time to evaluate and extract the computed value to store in the index pages, again this is noting different when create a index on a computed column.
[ALSO READ]:
- Native JSON Support in Sql Server 2016
- DROP IF EXISTS Statement in Sql Server 2016
- Compare Execution Plans in Sql Server 2016
- Live Query Statistics in Sql Server 2016
- DATEDIFF_BIG Function in Sql Server 2016
- Difference between DATEDIFF and DATEDIFF_BIG functions in Sql Server
- SESSION_CONTEXT in Sql Server 2016
Thanks for article, very informative indeed.
What do you suggest when the JSON data is not structured? can that be indexed?