
When would you work with JSON Data?
- Storing transactional OLTP workloads in a semi-structured NoSQL Database to ensure you have flexible schemas. This applies to master data as well such as online product catalogues
- Load, query and analyze logs generated by operational systems using T-SQL to parse the JSON text in the logs.
- Real Time analytics of IoT data
- Transforming relational data stored in a database into JSON structures for REST API integration
- etc.
This article will focus only on extracting information from JSON text using T-SQL, not converting relational data into JSON structures.
JSON Functions in T-SQL
- ISJSON
- ISJSON tests whether a string contains valid JSON data
- It returns a Boolean True/False value as a result
- It is used to extract scalar (singular) attributes from a JSON object
- Returns a value of type nvarchar(4000)
- Used to extract an object or array from a JSON document
- Returns a JSON fragment of type nvarchar(max)
- OPENJSON is used to transform an array of JSON objects into a tabular structure, where each object in the array represents a single row in the table.
- You can specify data types using the WITH clause.
- You can cherry pick the JSON fields you want to expose, you don’t need to provide a type and name for each attribute that exists.
- NULL is simply returned if an attribute doesn’t exist. This ties into the idea that JSON has fluid structure.
- The optional strict prefix in the path specifies that values for the specified properties must exist in the JSON text.
- Updates the value of a property in a JSON string and returns the updated JSON string
- Example:
/*JSON_MODIFY Example*/
DECLARE @vJSONVariable NVARCHAR(255) =
'
{
"name" : "Dian",
"skills": ["Procrastinating" , "Movie Trivia"]
}
'
/*Test that the string is saved correctly into the variable*/
PRINT @vJSONVariable
/* Update Attribute*/
SET @vJSONVariable = JSON_MODIFY( @vJSONVariable, '$.name' , 'Tony Stark')
/*Test that the change was applied correctly */
PRINT @vJSONVariable
/*Output would be
{
"name" : "Tony Stark",
"skills": ["Procrastinating" , "Movie Trivia"]
}
*/
Comparing JSON_VALUE and JSON_QUERY
- JSON_VALUE is used only to extract scalar attributes from a JSON Object. It will return NULL or error depending on the strict setting in your environment when you are referencing a non scalar attribute.
- JSON_QUERY is the reverse, it extracts only an object or array from a JSON object. It will return NULL or error depending on the strict setting in your environment when you are referencing a non object/array attribute.
JSON Sample
{
"a": "[1,2]",
"b": [1, 2],
"c": "hi"
}
Results of using JSON_VALUE vs JSON_QUERY to extract attributes from the sample JSON
Path | JSON_VALUE returns | JSON_QUERY returns | Notes |
---|---|---|---|
$ | NULL or error | { "a": "[1,2]", "b": [1,2], "c":"hi"} | JSON_VALUE wont work since it is an object being referenced. |
$.a | [1,2] | NULL or error | "a" is just a scalar string, not an actual array object. |
$.b | NULL or error | [1,2] | "b" is an actual array object. |
$.b[0] | 1 | NULL or error | The first element in the array is a scalar value. |
$.c | hi | NULL or error | "c" is a simple string scalar value. |
Practical Examples
Query .json files direclty in Synapse Analytics Serverless
- Note, there is a code difference when querying classic JSON and modern JSON text this way.
Classic JSON - note it specifies the ROWTERMINATOR
SELECT TOP 100
jsonContent
FROM
OPENROWSET(
BULK 'https://mydatalakename.dfs.core.windows.net/rawdata/MockJSONData/MOCK_DATA.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x0b',
FIELDTERMINATOR ='0x0b',
ROWTERMINATOR = '0x0b' /*See this additional setting*/
)
WITH (
jsonContent varchar(MAX)
) AS [result]
Modern JSON - note it doesn't specify the ROWTERMINATOR
SELECT TOP 100
jsonContent
FROM
OPENROWSET(
BULK 'https://mydatalakename.dfs.core.windows.net/rawdata/MockJSONData/MOCK_DATA.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x0b',
FIELDTERMINATOR ='0x0b'
)
WITH (
jsonContent varchar(MAX)
) AS [result]
Query .json files
{
"id": 1,
"first_name": "Gordan",
"last_name": "Panton",
"email": "gpanton0@prnewswire.com",
"gender": "Genderfluid",
"ip_address": "253.89.93.33",
"stores_purchased_at": "StoreD",
"attribute_with_nested_array": [
{
"nested_record_id": "one",
"nested_record_Name": "NameOne"
},
{
"nested_record_id": "two",
"nested_record_Name": "NameTwo"
},
{
"nested_record_id": "three",
"nested_record_Name": "NameThree"
}
]
}
SELECT TOP 100
/*Scalar attributes extraction using JSON_VALUE*/
JSON_VALUE([jsonContent], '$.id') AS [id]
, JSON_VALUE([jsonContent], '$.first_name') AS [first_name]
, JSON_VALUE([jsonContent], '$.last_name') AS [last_name]
, JSON_VALUE([jsonContent], '$.email') AS [email]
, JSON_VALUE([jsonContent], '$.gender') AS [gender]
, JSON_VALUE([jsonContent], '$.ip_address') AS [ip_address]
, JSON_VALUE([jsonContent], '$.stores_purchased_at') AS [stores_purchased_at]
/*Scalar attributes extraction from a nested array of objects via OPENJSON and JSON_QUERY*/
, [NestedArray].[nested_record_id]
, [NestedArray].[nested_record_Name]
FROM
OPENROWSET(
BULK 'https://mydatalakename.dfs.core.windows.net/rawdata/MockJSONData/MOCK_DATA-SingleObjectWithNestedArray.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x0b',
FIELDTERMINATOR ='0x0b',
ROWTERMINATOR = '0x0b'
)
WITH(
[jsonContent] varchar(MAX) /*This now contains the full JSON document from the file. Note the datatype is varchar(MAX).*/
) AS [result]
CROSS APPLY OPENJSON
/*Use JSON_QUERY here because we are querying a nested array, so we cant use JSON_VALUE.
We perform JSON_QUERY on the [result].[JSONDocument] field since that is the column with the full JSON Document we want to extract dat from.
The path to the attribute is using the normal path to the attribute name that contains the array */
(JSON_QUERY([jsonContent], '$.attribute_with_nested_array')) /*Note, if you want only the top most record from this array, replace this line with "(JSON_QUERY([jsonContent], '$.attribute_with_nested_array[0]'))"*/
WITH(
[nested_record_id] varchar(255) '$.nested_record_id',
[nested_record_Name] varchar(255) '$.nested_record_Name'
) AS [NestedArray]
Results
id | first_name | last_name | gender | ip_address | stores_purchased_at | nested_record_id | nested_record_Name | |
---|---|---|---|---|---|---|---|---|
1 | Gordan | Panton | gpanton0@prnewswire.com | Genderfluid | 253.89.93.33 | StoreD | one | NameOne |
1 | Gordan | Panton | gpanton0@prnewswire.com | Genderfluid | 253.89.93.33 | StoreD | two | NameTwo |
1 | Gordan | Panton | gpanton0@prnewswire.com | Genderfluid | 253.89.93.33 | StoreD | three | NameThree |
Query a column in a tabular structure with embedded JSON Data
This method can be used when querying a SQL Table or a file in a Data Lake with tabular structure such as a CSV file or Parquet file. One column in the source data structure contains self-contains JSON data.
This could be the case when you source system cannot send data to the Data Lake directly, but only to an intermediate SQL Server and embed the source JSON data in a column (RavenDB is one such system).
This examples showcases a few things
- Querying a file from a data lake with JSON embedded in a column
- Querying a simple nested object using dot notation
- Querying scalar top level values using JSON_VALUE
- Querying complex nested objects using JSON_QUERY
{
"id": 1,
"first_name": "Gordan",
"last_name": "Panton",
"email": "gpanton0@prnewswire.com",
"gender": "Genderfluid",
"ip_address": "253.89.93.33",
"stores_purchased_at": "StoreD",
"attribute_with_nested_values": {
"nested_columne_one": "one",
"nested_columne_two": "two",
"nested_columne_three": "three"
}
}
SELECT
TOP 100
/*Normal Sacalar attributes at parent level in the document in the column*/
JSON_VALUE([result].[JSONDocument], '$.id') AS [id]
, JSON_VALUE([result].[JSONDocument], '$.first_name') AS [first_name]
, JSON_VALUE([result].[JSONDocument], '$.last_name') AS [last_name]
, JSON_VALUE([result].[JSONDocument], '$.email') AS [email]
, JSON_VALUE([result].[JSONDocument], '$.gender') AS [gender]
, JSON_VALUE([result].[JSONDocument], '$.ip_address') AS [ip_address]
, JSON_VALUE([result].[JSONDocument], '$.stores_purchased_at') AS [stores_purchased_at]
/*Query fields with nested objects or arrays
Note here we are using JSON_QUERY first ,
not JSON_VALUE because we are working with nested objects.
After we have queried the complex object,
we can use JSON_VALUE to extract a scalar value from the nested object.
The key difference between JSON_VALUE and JSON_QUERY is that JSON_VALUE
returns a scalar value, while JSON_QUERY returns an object or an array.
*/
, JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values') AS [attribute_with_nested_values]
, JSON_VALUE(
JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values')
, '$.nested_columne_one'
) AS [attribute_with_nested_values-nested_columne_one]
, JSON_VALUE(
JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values')
, '$.nested_columne_two'
) AS [attribute_with_nested_values-nested_columne_two]
, JSON_VALUE(
JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values')
, '$.nested_columne_three'
) AS [attribute_with_nested_values-nested_columne_three]
/*Alternate method to extract scalar attributs from single level nested objects
using dot notation*/
, JSON_VALUE([result].[JSONDocument], '$.attribute_with_nested_values.nested_columne_one') AS [attribute_with_nested_values-nested_columne_one-dotnotation]
FROM
OPENROWSET(
BULK 'https://mydatalakename.dfs.core.windows.net/rawdata/MockCSVFiles/CSVWithJsonColumnMockData_NestedAttributesSample.csv',
FORMAT = 'CSV',
PARSER_VERSION='2.0',
HEADER_ROW = true
) AS [result]
Results
id | first_name | last_name | gender | ip_address | stores_purchased_at | attribute_with_nested_values | attribute_with_nested_values-nested_columne_one | attribute_with_nested_values-nested_columne_two | attribute_with_nested_values-nested_columne_three | attribute_with_nested_values-nested_columne_one-dotnotation | |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Gordan | Panton | gpanton0@prnewswire.com | Genderfluid | 253.89.93.33 | StoreD | { "nested_columne_one": "one", "nested_columne_two": "two", "nested_columne_three": "three" } | one | two | three | one |
Query nested array using OPENJSON and CROSS APPLY
{
"id": 2,
"first_name": "Tynan",
"last_name": "MacKeller",
"email": "tmackeller1@cbc.ca",
"gender": "Male",
"ip_address": "115.228.50.83",
"stores_purchased_at": "StoreA",
"attribute_with_nested_array": [
{
"nested_record_id": "one"
},
{
"nested_record_id": "two"
},
{
"nested_record_id": "three"
}
]
}
SELECT
TOP 100
/*Normal Sacalar attributes at parent level in the document in the column*/
JSON_VALUE([result].[JSONDocument], '$.id') AS [id]
, JSON_VALUE([result].[JSONDocument], '$.first_name') AS [first_name]
, JSON_VALUE([result].[JSONDocument], '$.last_name') AS [last_name]
, JSON_VALUE([result].[JSONDocument], '$.email') AS [email]
, JSON_VALUE([result].[JSONDocument], '$.gender') AS [gender]
, JSON_VALUE([result].[JSONDocument], '$.ip_address') AS [ip_address]
, JSON_VALUE([result].[JSONDocument], '$.stores_purchased_at') AS [stores_purchased_at]
/*NormalAttributes extracted from the nested array*/
, [ArrayRecords].[nested_record_id]
FROM
OPENROWSET(
BULK 'https://mydatalakename.dfs.core.windows.net/rawdata/MockCSVFiles/CSVWithJsonColumnMockData_NestedArraySample.csv',
FORMAT = 'CSV',
PARSER_VERSION='2.0',
HEADER_ROW = true
) AS [result]
/*Use CROSS APPLY to perform a calculation for each row in the [result] output table from source, and get one or more rows from the function back*/
CROSS APPLY OPENJSON
/*Use JSON_QUERY here because we are querying a nested array, so we cant use JSON_VALUE.
We perform JSON_QUERY on the [result].[JSONDocument] field since that is the column with the full JSON Document we want to extract dat from.
The path to the attribute is using the normal path to the attribute name that contains the array */
(JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_array'))
/*Here in the WITH clause of the OPENJSON we treat the attribute_with_nested_array attribute mentioend above as the top level of the document.
Since we simple return a field name, that field name from all records in the array is returned as distinct record.
If we used the [0] array index notioan, we could have extracted specific rows as well*/
WITH(
[nested_record_id] varchar(255) '$.nested_record_id'
) AS [ArrayRecords]
Results
id | first_name | last_name | gender | ip_address | stores_purchased_at | nested_record_id | |
---|---|---|---|---|---|---|---|
2 | Tynan | MacKeller | tmackeller1@cbc.ca | Male | 115.228.50.83 | StoreA | one |
2 | Tynan | MacKeller | tmackeller1@cbc.ca | Male | 115.228.50.83 | StoreA | two |
2 | Tynan | MacKeller | tmackeller1@cbc.ca | Male | 115.228.50.83 | StoreA | three |
Nested object query using dot notation - multiple object nesting using JSON_QUERY
{
"id": 1,
"first_name": "Gordan",
"last_name": "Panton",
"email": "gpanton0@prnewswire.com",
"gender": "Genderfluid",
"ip_address": "253.89.93.33",
"stores_purchased_at": "StoreD",
"attribute_with_nested_values": {
"levelone_nested_columne_one": "one_one",
"levelone_nested_columne_two": "one_two",
"levelone_nested_columne_three": {
"leveltwo_nested_columne_one": "two_one",
"leveltwo_nested_columne_two": "two_two",
"leveltwo_nested_columne_three": ["one", "two", "three"]
}
}
}
SELECT
TOP 100
/*Normal Sacalar attributes at parent level in the document in the column*/
JSON_VALUE([result].[JSONDocument], '$.id') AS [id]
/*Query fields with multiple nested objects
* Note here we are using JSON_QUERY,
not JSON_VALUE because we are working with nested objects.
* The key difference between JSON_VALUE and JSON_QUERY is that
JSON_VALUE returns a scalar value, while JSON_QUERY returns an object or an array.
* attribute_with_nested_values - attribute that has top level scalar attributes,
as well as another attribute with a nested complex object - this need to be queried using JSON_QUERY
* levelone_nested_columne_three - nested attributes with top level scalar attributes,
as well as complex objects such as the array.
- If we want to extract one of the top level attributes, we can just use JSON_VALUE.
- If we were to want to extract the array,
we need JSON_QUERY or use index notation with JSON_VALUE to get a single element from the array
*/
, JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values') AS [attribute_with_nested_values]
, JSON_VALUE(
JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values.levelone_nested_columne_three')
, '$.leveltwo_nested_columne_one'
) AS [leveltwo_nested_columne_one]
, JSON_VALUE(
JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values.levelone_nested_columne_three')
, '$.leveltwo_nested_columne_three[0]'
) AS [leveltwo_nested_columne_three_0]
FROM
OPENROWSET(
BULK 'https://mydatalakename.dfs.core.windows.net/rawdata/MockCSVFiles/CSVWithJsonColumnMockData_NestedAttributesSample_MultiObject.csv',
FORMAT = 'CSV',
PARSER_VERSION='2.0',
HEADER_ROW = true
) AS [result]
Results
id | attribute_with_nested_values | leveltwo_nested_columne_one | leveltwo_nested_columne_three_0 |
---|---|---|---|
1 | { "levelone_nested_columne_one": "one_one", "levelone_nested_columne_two": "one_two", "levelone_nested_columne_three": { "leveltwo_nested_columne_one": "two_one", "leveltwo_nested_columne_two": "two_two", "leveltwo_nested_columne_three": ["one", "two", "three"] } } | two_one | one |
Example from Microsoft with nested complex objects extraction via OPENSJON
This example shows how to extract complex attributes from nested array elements using OPENJSON.
The key here is the notation used in the WITH Clause of the OPENJSON function to extract the nested complex object correctly.
See line 15 in the SQL code below.
[
{
"id": 2,
"info": {
"name": "John",
"surname": "Smith"
},
"age": 25
},
{
"id": 5,
"info": {
"name": "Jane",
"surname": "Smith",
"skills": [
"SQL",
"C#",
"Azure"
]
},
"dob": "2005-11-04T12:00:00"
}
]
DECLARE @json NVARCHAR(MAX);
SET @json = N'[
{"id": 2, "info": {"name": "John", "surname": "Smith"}, "age": 25},
{"id": 5, "info": {"name": "Jane", "surname": "Smith", "skills": ["SQL", "C#", "Azure"]}, "dob": "2005-11-04T12:00:00"}
]';
SELECT id, firstName, lastName, age, dateOfBirth, skill
FROM OPENJSON(@json)
WITH (
id INT 'strict $.id',
firstName NVARCHAR(50) '$.info.name',
lastName NVARCHAR(50) '$.info.surname',
age INT,
dateOfBirth DATETIME2 '$.dob',
skills NVARCHAR(MAX) '$.info.skills' AS JSON
/*Note here you use NVARCHAR(MAX) as the data type for the attribute
with nested complex object. In the next OPENJSON you can
then parse this JSON again using the regular methods.
*/
)
OUTER APPLY OPENJSON(skills)
WITH (skill NVARCHAR(8) '$')
/*Since the entire value of the array element is a string value,
you can just use the $ notation to get the value.*/
;
Results
ID | firstName | lastName | age | dateOfBirth | skill |
---|---|---|---|---|---|
2 | John | Smith | 25 | null | null |
5 | Jane | Smith | null | 2005-11-04T12:00:00 | SQL |
5 | Jane | Smith | null | 2005-11-04T12:00:00 | C# |
5 | Jane | Smith | null | 2005-11-04T12:00:00 | Azure |
If you like what I do please consider supporting me on Ko-Fi
Hi there! This post couldn’t be written any better! Reading through this post reminds me of my previous room mate! He always kept talking about this. I will forward this article to him. Pretty sure he will have a good read. Thank you for sharing!
Thank you for the content. It was very useful and helpful today! The Synapse SQL editor was/is displaying an error on the second WITH clause, but the query executes successfully… in case anyone else experiences this. Error: incorrect token ( expecting BLOCKING_HIERARCHY, XMLNAMESPACES, common_table_expression
Oh my goodness!! I looked for something like this for 2 months back in 2021 and nothing existed. I had to struggle through on my own using workarounds and the process is slow and prone to breaking. Now in 2022 I have to overhaul the pipeline because of changes downstream and I’m stuck in the same position. Except this time I found you! I cannot thank you enough for this article. It made my life so much better.
Can you provide an example as how we can filter content from Json using SQL.. like filter records that do not have a particular value in their key-value pair or filter record if its fall within a time range
I have been browsing online more than 3 hours today, but I never found any fascinating article like yours. It is pretty price enough for me. In my opinion, if all website owners and bloggers made good content as you probably did, the internet will likely be a lot more useful than ever before.
My spouse and i got really relieved Ervin managed to finish up his inquiry from your precious recommendations he came across from your site. It is now and again perplexing to just happen to be freely giving tips and tricks that many people have been trying to sell. We fully grasp we now have the website owner to thank for that. These illustrations you made, the easy website navigation, the relationships you aid to foster – it’s most exceptional, and it is helping our son and us consider that that subject matter is exciting, which is certainly extremely fundamental. Thanks for the whole thing!
Great examples. I’m using Azure Synapse Microsoft Azure SQL Data Warehouse – 10.0.13670.0 Feb 24 2022 15:33:02 and I cannot get your examples to work. It simply returns: Msg 103010, Level 16, State 1, Line 1
Parse error at line: 4, column: 5: Incorrect syntax near ‘OPENROWSET’. I do have bulk permissions, so I can’t figure out what the syntax error is. Any ideas?
The error talks about syntax, so it isn’t a permission issue. Can you post your entire SQL statement perhaps?