Tag Archives: JSON in Sql

Indexing Strategy for JSON Value in Sql Server 2016

This article explains the strategy for Indexing JSON value in Sql Server 2016 with examples, as we don’t have any special indexes like the XML indexes which we had for the XML value stored in the XML data type column.

[ALSO READ]:

If we are storing JSON data in a table column, then we may come across a scenario where we may need to retrieve only the records with specific JSON property value. For example if we are storing Customer address and phone details in JSON structure in a Varchar/Nvarchar column, then we may get a request to get all the customers whose phone member matches to the input value. To handle such use cases, creating an index on the JSON column is not the correct approach and of no use. As it indexes the complete JSON value like any other value in a Varchar/NVarchar column and we are looking for particular JSON Property value which is not at the beginning of the JSON string. And also it takes lot of additional storage space as the complete JSON value is indexed. So, creating such indexes is of no use.

But for JSON we do have an alternative way of Indexing JSON Property. This article explains how we can index a JSON property from the JSON string by example and also explains the storage implications if any.

Let us create a sample Database with a Customer table with 200,000 records as shown in the following image by the following script:

Customer Table
Script:

CREATE DATABASE SqlHintsJSONDemo
GO
USE SqlHintsJSONDemo
GO
CREATE TABLE dbo.Customer( 
	Id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
	Name NVARCHAR(50), Detail NVARCHAR(MAX))
GO
INSERT INTO dbo.Customer ( Name, Detail )
SELECT TOP 200000 NEWID(),
 REPLACE('{"Address":{"State":"KA","Country":"India"},
 "Phone":"@Phone"}',
 '@Phone', 100000000-ROW_NUMBER() OVER (ORDER BY SC1.object_id))
FROM SYS.all_columns SC1
		CROSS JOIN SYS.all_columns SC2

Below is the Customer table storage usage details at this point of time.

Initial SpaceUsed by the Customer table

Enable the IO and TIME statistics using the following script, to measure the performance of the queries:

--Enable Statistics
SET STATISTICS IO ON
SET STATISTICS TIME ON
GO

Let us execute the below statement to get the details of the Customer whose phone number is 99890000

SELECT *
FROM dbo.Customer
WHERE JSON_VALUE(Detail,'$.Phone') = '99890000'

RESULT:
Get Customer Detail using JSON_VALUE function

From the above result it is clear that fetching the customer data with this approach is resulting in higher number of IO and CPU time.

Let us now explore the solution for the above problem. The solution to this problem is create a non-persisted computed column where this column value will be the value of the Phone property from the JSON string and then index it. Let us understand this by performing these steps on the Customer table created above:

First add a non-persisted computed column PhoneNumber to the Customer table and the value of this computed column is the Phone property value extracted using the JSON_VALUE function from the Detail column having Address and Phone information stored in the JSON format.

ALTER TABLE dbo.Customer
ADD PhoneNumber AS JSON_VALUE(Detail,'$.Phone')

Let us verify is there any storage implication of adding a computed column

SpaceUsed by the Customer table after adding computed column

From the results it is clear that a non-persisted computed column doesn’t take any additional storage space. Non-persisted computed column value is computed/evaluated at run-time.

Let us now try to get the details of the Customer whose phone number is 99890000 by using the computed column PhoneNumber.

SELECT *
FROM dbo.Customer
WHERE PhoneNumber  = '99890000'

RESULT:
Get Customer Detail using Computed column

From the result it is clear that the computed column didn’t improve the IO and Time taken to execute the query.

Let us now create an Index on the Computed column PhoneNumber by using the following script:

CREATE INDEX IX_Customer_PhoneNumber
 ON dbo.Customer(PhoneNumber)

When you create this index you may get the below warning. The reason for the warning is Sql Server doesn’t know what will be the length of the value which will be extracted by the JSON_VALUE function. And the maximum length of the value that can be returned by JSON_VALUE function is NVARCHAR(4000).

Warning! The maximum key length for a nonclustered index is 1700 bytes.
The index ‘IX_Customer_PhoneNumber’ has maximum length of 8000 bytes.
For some combination of large values, the insert/update operation will fail.

Let us verify the Customer Table storage detail after adding index on the computed column:

SpaceUsed by the Customer table after adding index on computed column

From the result it is clear that there is no change the storage used for the table, it has taken only the extra storage for the index. So the computed column is still not persisted only in the index tree the computed value is persisted. Which is similar to creating index on any other table column.

Now let us verify whether we have any improvement in the performance of the query to retrieve the Customer details based on the Phone JSON property value (i.e. by the indexed computed column PhoneNumber).

SELECT *
FROM dbo.Customer
WHERE PhoneNumber  = '99890000'

RESULT:
Get Customer Detail using Indexed Computed column

From the result it is clear that adding index on the computed column has drastically improved the IO and CPU time.

We can execute the below statement to disable the IO and TIME statistics, which we have enabled in the beginning of this article.

--Disable statistics
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
GO

Conclusion: If we have a need to retrieve record from table based on a particular property value in the JSON string column, in such scenario creating a computed column whose value is extracted by the JSON_VALUE function from the JSON string column and indexing is the optimal approach. It takes extra storage for the index just like indexing any other column. And the computed column is still not persisted even after creating an index on it. And while creating index it takes some extra time to evaluate and extract the computed value to store in the index pages, again this is noting different when create a index on a computed column.

[ALSO READ]:

OPENJSON Function in Sql Server 2016

OPENJSON is one of the new JSON function introduced in Sql Server 2016, it is very much similar to the OPENXML function. This table valued function can be used to transform the JSON text to one or many rows. Basically, it provides a row-set view of the input JSON string.

Syntax:

OPENJSON( json_string [, json_path ] )
[ WITH (column_mapping_definition1
        [,column_mapping_definition2] 
        [,… column_mapping_definitionN])
]

Where: column_mapping_definition :: column_name column_type [ column_json_path ] [ AS JSON ]

json_string is the JSON string from which will be transformed to row(s).

json_path is the location of the JSON string in the json_string parameter, which will converted to one or many rows.

WITH clause: This is an optional clause it can used to explicitly specify the schema of the result. Each column in the resultant output can be defined by specifying the column name, column type, it’s location i.e. column json path and the [AS JSON] clause specifies that the column value will be a JSON object or an array, if it is not specified the expected column value is a scalar value.


[ALSO READ]:

Let us understand this OPENJSON function with an extensive list of examples.

OPENJSON examples without explicit schema declaration

Example 1: Basic OPENJSON example with default schema

SELECT *
FROM OPENJSON('{"FirstName":"Basavaraj","LastName":"Biradar"}')

RESULT:
OPENJSON with out schema declaration Example 1

From the result we can observe that when OPENJSON function is executed with the default schema (i.e. Without explicit schema declarartion), following three columns are returned in the result.

Key: Property name if it is present otherwise it will be index
value: Property value
type: An integer number that represents the type of the value. Next example explains all the possible type values.

Example 2: Example explaining all the possible type column values in case of an OPENJSON function with default schema

DECLARE @json_string NVARCHAR(MAX) = N'
	{"Name":"Basavaraj","Id":1,"Permanent":true,
		 "Hobbies":["Blogging","Cricket"],
		"Address":{"State":"KA","Country":"India"}}'
SELECT *
FROM OPENJSON(@json_string)

RESULT:
OPENJSON with out schema declaration Example 2

So from the result we can say that type clumn value will be 1 in case the value is STRING, 2 if the value is of type INT, 3 if the value is of type Boolean, 4 if value is of the type JSON array and 5 if the value is of the type JSON object.

Example 3: Fetching only the key and value columns (i.e. in this example not fetching the type column)

DECLARE @json_string NVARCHAR(MAX) = N'
	{"Id":1,"Name":"Basavaraj",
		"Address":{"State":"KA","Country":"India"}}'
SELECT [key],value
FROM OPENJSON(@json_string)

RESULT:
OPENJSON with default schema Example 3

Example 4: In this example specifying the json_path in the OPENJSON function which is pointing to Address JSON object in the input JSON string

DECLARE @json_string NVARCHAR(MAX) = N'
  {"Id":1,"Name":"Basavaraj",
   "Address":{"State":"KA","Country":"India"}}'
SELECT [key],value
FROM OPENJSON(@json_string, '$.Address')

RESULT:
OPENJSON with default schema  Example 4

Example 5: One more example where the JSON text has only one attribute at the root level and this attribute is pointing to a nested JSON string

DECLARE @json_string NVARCHAR(MAX) = N'
  {"Customer":{"Id":1,"Name":"Basavaraj",
   "Address":{"State":"KA","Country":"India"}}}'
SELECT [key],value
FROM OPENJSON(@json_string)

RESULT:
OPENJSON function with default schema Example 5

Example 6: Input JSON string is a JSON array of strings.

DECLARE @json_string NVARCHAR(MAX) = '["Blogging","Cricket"]'
SELECT [key],value
FROM OPENJSON(@json_string)

RESULT:
OPENJSON function with default schema Example 6

Example 7: Input JSON string is a JSON array of integers.

DECLARE @json_string NVARCHAR(MAX) = '[1,2,3,4]'
SELECT [key],value
FROM OPENJSON(@json_string)

RESULT:
OPENJSON function with default schema Example 7

From the results of example 6 and 7, we can see that OPENJSON function converts the JSON array to a result set. So such feature will be useful in converting an array of input values to a temporary result and joining with other tables.

Example 8: In this example specifying the json_path in the OPENJSON function which is pointing to Hobbies JSON array in the input JSON string

DECLARE @json_string NVARCHAR(MAX) = N'
    {"Id":1,"Name":"Basavaraj",
        "Hobbies":["Blogging","Cricket"]}'
SELECT [key],value
FROM OPENJSON(@json_string,'$.Hobbies')

RESULT:
OPENJSON function with default schema Example 8

Example 9: In this example the input JSON string is an array of nested JSON objects

DECLARE @json_string NVARCHAR(MAX) = 
 N'{"Customers":
    [{"Id":1,"Name":"Basavaraj",
            "Address":{"State":"KA","Country":"India"}},
     {"Id":2,"Name":"Kalpana",
            "Address":{"State":"NY","Country":"United State"}}
    ]
 }'
SELECT *
FROM OPENJSON(@json_string)

RESULT:
OPENJSON function with default schema Example 9 1

Let us re-execute the above example by specifying the json_path which is pointing to an array of JSON objects

DECLARE @json_string NVARCHAR(MAX) = 
 N'{"Customers":
    [{"Id":1,"Name":"Basavaraj",
            "Address":{"State":"KA","Country":"India"}},
     {"Id":2,"Name":"Kalpana",
            "Address":{"State":"NY","Country":"United State"}}
    ]
 }'
SELECT *
FROM OPENJSON(@json_string,'$.Customers')

RESULT:
OPENJSON function with default schema Example 9 2

OPENJSON examples with explicit schema definition

Example 1: This is an example of an OPENJSON function with explicit schema declaration.

DECLARE @json_string NVARCHAR(MAX) = 
 N'{"Customers":
    [{"Id":1,"Name":"Basavaraj",
      "Address":{"State":"KA","Country":"India"}},
     {"Id":2,"Name":"Kalpana",
      "Address":{"State":"NY","Country":"United State"}}
    ]
 }'
SELECT *
FROM OPENJSON(@json_string,'$.Customers')
WITH(Id INT, Name NVarchar(100))

RESULT:
OPENJSON with explicit schema declaration Example 1

In this example the json_path in the OPENJSON function points to the array of the JSON objects. In the schema declaration columns values json path is not mentioned, in such cases it tries to match the JSON property name by column name and returns its corresponding JSON property value.

Example 2: Let us extended the example 1, in this example try to get the customer Country from the nested JSON object Address apart from Id and Name.

DECLARE @json_string NVARCHAR(MAX) = 
 N'{"Customers":
    [{"Id":1,"Name":"Basavaraj",
            "Address":{"State":"KA","Country":"India"}},
     {"Id":2,"Name":"Kalpana",
            "Address":{"State":"NY","Country":"United State"}}
    ]
 }'
SELECT *
FROM OPENJSON(@json_string,'$.Customers')
WITH(Id INT, Name Varchar(100), [Address.Country] NVarchar(50))

RESULT:
OPENJSON with explicit schema definition Example 2

Again in this example the column values json path is not mentioned, so it goes by the column name. So the column name [Address.Country] implies the OPEJSON function returns the Country property value from the Address JSON object.

Example 3: In this example EmployeeId column’s value json path is specified. Other columns (i.e. Name and Address.Country) go by the column name in the input json text for it’s value as explained in the previous example

DECLARE @json_string NVARCHAR(MAX) = 
 N'{"Customers":
    [{"Id":1,"Name":"Basavaraj",
      "Address":{"State":"KA","Country":"India"}},
     {"Id":2,"Name":"Kalpana",
      "Address":{"State":"NY","Country":"United State"}}
    ]
 }'
SELECT *
FROM OPENJSON(@json_string,'$.Customers')
WITH(EmployeeId INT '$.Id', Name Varchar(100), 
     [Address.Country] NVarchar(50))

RESULT:
OPENJSON with explicit schema definition Example 3

If column value JSON path is mentioned then it overrides the column name approach to extract the column value. So from the result it is clear that, EmployeeId column value is extracted by the column value json path i.e. ‘$.id’, if it would have gone by the column name then we would have not got any value for this column because there is no property in the input JSON string with key name as EmployeeId.

Example 4: In this example for all the columns the column value JSON path is defined

 
DECLARE @json_string NVARCHAR(MAX) = 
 N'{"Customers":
    [{"Id":1,"Name":"Basavaraj",
            "Address":{"State":"KA","Country":"India"}},
     {"Id":2,"Name":"Kalpana",
            "Address":{"State":"NY","Country":"United State"}}
    ]
 }'
SELECT *
FROM OPENJSON(@json_string,'$.Customers')
WITH(EmployeeId INT '$.Id', EmployeeName Varchar(100) '$.Name', 
		Country NVarchar(50) '$.Address.Country')

RESULT:
OPENJSON with explicit schema definition Example 4
[ALSO READ]:lax and strict JSON Path modes in Sql Server 2016

lax and strict JSON path modes influence on the OPENJSON JSON function output

let us understand the influence of the lax/strict JSON path modes on the OPENJSON function output with extensive list of examples

Example 1: In the below example the json_path ‘$.City’ mentioned in the OPENJSON function doesn’t exists in the input JSON string. And in this json_path the JSON path mode has not be specified explicitly, so it goes with default JSON path mode lax.

DECLARE @json_string NVARCHAR(MAX) = N'
{"Id":1,"Name":"Basavaraj",
  "Address":{"State":"KA","Country":"India"}}'
SELECT [key],value
FROM OPENJSON(@json_string, '$.City')

RESULT:
OPENJSON lax and strict JSON path modes Example 1

Let us execute this example by specifying the JSON path mode as strict

DECLARE @json_string NVARCHAR(MAX) = 
	N'{"Id":1,"Name":"Basavaraj",
	   "Address":{"State":"KA","Country":"India"}}'
SELECT [key],value
FROM OPENJSON(@json_string, 'strict$.City')

RESULT:

Msg 13608, Level 16, State 3, Line 4
Property cannot be found in specified path.

Let us execute this example by explicitly specifying the default JSON path mode lax

DECLARE @json_string NVARCHAR(MAX) = 
	N'{"Id":1,"Name":"Basavaraj",
	   "Address":{"State":"KA","Country":"India"}}'
SELECT [key],value
FROM OPENJSON(@json_string, 'lax$.City')

RESULT:
OPENJSON lax and strict JSON path modes Example 1 2

Example 2: In the below OPENJSON function with explicit schema declaration, DOB column value JSON path ‘$.Dob’ doesn’t exists in the input JSON string. In the json_path the JSON path mode has not be specified explicitly, so it goes with default JSON path mode which is lax.

DECLARE @json_string NVARCHAR(MAX) = 
 N'{"Customers":
    [{"Id":1,"Name":"Basavaraj",
            "Address":{"State":"KA","Country":"India"}},
     {"Id":2,"Name":"Kalpana",
            "Address":{"State":"NY","Country":"United State"}}
    ]
 }'
SELECT *
FROM OPENJSON(@json_string,'$.Customers')
WITH(EmployeeId INT '$.Id', EmployeeName Varchar(100) '$.Name',
		DOB Varchar(10) '$.Dob')

RESULT:
OPENJSON lax and strict JSON path modes Example 2

Let us execute the previous query by specifying the JSON path mode as strict i.e. ‘strict$.Dob’

DECLARE @json_string NVARCHAR(MAX) = N'
{"Customers":
    [{"Id":1,"Name":"Basavaraj",
      "Address":{"State":"KA","Country":"India"}},
     {"Id":2,"Name":"Kalpana",
      "Address":{"State":"NY","Country":"United State"}}
    ]
 }'
SELECT *
FROM OPENJSON(@json_string,'$.Customers')
WITH(EmployeeId INT '$.Id', EmployeeName Varchar(100) '$.Name',
		DOB Varchar(10) 'strict$.Dob')

RESULT:

Msg 13608, Level 16, State 6, Line 9
Property cannot be found in specified path.

Let us execute the above query by specifying the default JSON path mode explicitly ‘lax$.Dob’

DECLARE @json_string NVARCHAR(MAX) = N'
{"Customers":
    [{"Id":1,"Name":"Basavaraj",
      "Address":{"State":"KA","Country":"India"}},
     {"Id":2,"Name":"Kalpana",
      "Address":{"State":"NY","Country":"United State"}}
    ]
 }'
SELECT *
FROM OPENJSON(@json_string,'$.Customers')
WITH(EmployeeId INT '$.Id', EmployeeName Varchar(100) '$.Name',
		DOB Varchar(10) 'lax$.Dob')

RESULT:
OPENJSON lax and strict JSON path modes Example 2 2
[ALSO READ]:

ISJSON Function in Sql Server 2016

ISJSON function validates whether the string parameter supplied to it is a valid JSON or not. If supplied string is a valid JSON then it will return value as 1, otherwise it returns the value 0. In case input is a NULL then it returns output as NULL.

Syntax:

ISJSON( String_Expression )

WHERE String_Expression can be a table column or a string (i.e. Varchar/NVarchar) variable or a string constant. And this string expression is evaluated to check whether it is a valid JSON.

[ALSO READ]:

Let us understand this function with extensive list of examples:

Example 1: Input is a valid JSON, in this case the ISJSON function will return value as 1

DECLARE @JSONText NVarchar(Max) = '[{"Id":1,"Name":"Basavaraj"},
			             {"Id":2,"Name":"Kalpana"}]'
IF ISJSON ( @JSONText ) = 1
	PRINT 'Valid JSON'

RESULT:
ISJSON Valid JSON

Example 2: Input is an InValid JSON, in this case the ISJSON function will return value as 0

DECLARE @JSONText NVarchar(Max) = 'Basavaraj'
IF ISJSON ( @JSONText ) = 0
	PRINT 'InValid JSON'

RESULT:
ISJSON InValid JSON

Example 3: Input is a NULL value, in this case the ISJSON function will return value as 0

SELECT ISJSON ( NULL ) 'ISJSON RESULT'

RESULT:
ISJSON NULL OutPut
[ALSO READ]: