Query JSON data in SQL Server and Synapse Analytics

When would you work with JSON Data?

JSON is a popular data representation format used on the web to exchange information between remote parties.
It is also used for storing unstructured data in log files or NoSQL Document Databases such as MongoDB or Azure CosmosDB.
SQL also has the ability to store JSON data in a text based field such as varchar(max). SQL Server (On-Premises, Azure SQL as well as Synapse Analytics) has native functions to parse JSON structures and extract attributes.
 
Many of Azure’s services store and maintain its infrastructure in JSON as well. E.g. The structure and definition of the entire Azure Data Factory is maintained in a set of JSON files. At runtime, the output of a Copy Activity in the Data Factory produces a JSON Object with all the metadata related to the copy activity’s execution. If you were to store that information in a SQL Database for custom logging purposes, you need a way to extract the metrics from those logs for reporting. 
 
Since this is a widely used, plain text data storage method that is easy to work with, there are countless other cases where you can work with JSON, such as:
  • Storing transactional OLTP workloads in a semi-structured NoSQL Database to ensure you have flexible schemas. This applies to master data as well such as online product catalogues
  • Load, query and analyze logs generated by operational systems using T-SQL to parse the JSON text in the logs.
  • Real Time analytics of IoT data
  • Transforming relational data stored in a database into JSON structures for REST API integration
  • etc.

This article will focus only on extracting information from JSON text using T-SQL, not converting relational data into JSON structures.

JSON Functions in T-SQL

The key functions you would need are:
  • ISJSON
    • ISJSON tests whether a string contains valid JSON data
    • It returns a Boolean True/False value as a result
    • It is used to extract scalar (singular) attributes from a JSON object
    • Returns a value of type nvarchar(4000)
    • Used to extract an object or array from a JSON document
    • Returns a JSON fragment of type nvarchar(max)
    • OPENJSON is used to transform an array of JSON objects into a tabular structure, where each object in the array represents a single row in the table.
    • You can specify data types using the WITH clause.
    • You can cherry pick the JSON fields you want to expose, you don’t need to provide a type and name for each attribute that exists.
      • NULL is simply returned if an attribute doesn’t exist. This ties into the idea that JSON has fluid structure.
      • The optional strict prefix in the path specifies that values for the specified properties must exist in the JSON text.
    • Updates the value of a property in a JSON string and returns the updated JSON string
    • Example:
 
/*JSON_MODIFY Example*/

DECLARE @vJSONVariable NVARCHAR(255) = 
'
    {
        "name" : "Dian",
        "skills": ["Procrastinating" , "Movie Trivia"]
    }
'

/*Test that the string is saved correctly into the variable*/
PRINT @vJSONVariable

/* Update Attribute*/
SET @vJSONVariable = JSON_MODIFY( @vJSONVariable, '$.name' , 'Tony Stark')

/*Test that the change was applied correctly */
PRINT @vJSONVariable

/*Output would be
{
    "name" : "Tony Stark",
    "skills": ["Procrastinating" , "Movie Trivia"]
}
*/

Comparing JSON_VALUE and JSON_QUERY

Comparing JSON_VALUE and JSON_QUERY is important as these will be the functions used most often when consuming JSON data in T-SQL. 
  • JSON_VALUE is used only to extract scalar attributes from a JSON Object. It will return NULL or error depending on the strict setting in your environment when you are referencing a non scalar attribute.
  • JSON_QUERY is the reverse, it extracts only an object or array from a JSON object. It will return NULL or error depending on the strict setting in your environment when you are referencing a non object/array attribute.

JSON Sample

{
    "a": "[1,2]",
    "b": [1, 2],
    "c": "hi"
}

Results of using JSON_VALUE vs JSON_QUERY to extract attributes from the sample JSON

Path

JSON_VALUE returns

JSON_QUERY returns

Notes

$

NULL or error

{ "a": "[1,2]", "b": [1,2], "c":"hi"}

JSON_VALUE wont work since it is an object being referenced.

$.a

[1,2]

NULL or error

"a" is just a scalar string, not an actual array object.

$.b

NULL or error

[1,2]

"b" is an actual array object.

$.b[0]

1

NULL or error

The first element in the array is a scalar value.

$.c

hi

NULL or error

"c" is a simple string scalar value.

Practical Examples

Query .json files direclty in Synapse Analytics Serverless

Example for when you have data inside a .json text file in a Data Lake you want to query. This could be the case when your data extraction process dumps your source JSON Documents as-is into the Data Lake, one by one.
  • Note, there is a code difference when querying classic JSON and modern JSON text this way.

Classic JSON - note it specifies the ROWTERMINATOR

SELECT TOP 100
    jsonContent
FROM
    OPENROWSET(
        BULK 'https://mydatalakename.dfs.core.windows.net/rawdata/MockJSONData/MOCK_DATA.json',
        FORMAT = 'CSV',
        FIELDQUOTE = '0x0b',
        FIELDTERMINATOR ='0x0b',
        ROWTERMINATOR = '0x0b' /*See this additional setting*/
    )
    WITH (
        jsonContent varchar(MAX)
    ) AS [result]

Modern JSON - note it doesn't specify the ROWTERMINATOR

SELECT TOP 100
    jsonContent
FROM
    OPENROWSET(
        BULK 'https://mydatalakename.dfs.core.windows.net/rawdata/MockJSONData/MOCK_DATA.json',
        FORMAT = 'CSV',
        FIELDQUOTE = '0x0b',
        FIELDTERMINATOR ='0x0b'
    )
    WITH (
        jsonContent varchar(MAX)
    ) AS [result]

Query .json files

JSON Sample
{
    "id": 1,
    "first_name": "Gordan",
    "last_name": "Panton",
    "email": "gpanton0@prnewswire.com",
    "gender": "Genderfluid",
    "ip_address": "253.89.93.33",
    "stores_purchased_at": "StoreD",
    "attribute_with_nested_array": [
        {
            "nested_record_id": "one",
            "nested_record_Name": "NameOne"
        },
        {
            "nested_record_id": "two",
            "nested_record_Name": "NameTwo"
        },
        {
            "nested_record_id": "three",
            "nested_record_Name": "NameThree"
        }
    ]
}
SQL Code
SELECT TOP 100
    /*Scalar attributes extraction using JSON_VALUE*/
      JSON_VALUE([jsonContent], '$.id') AS [id]
    , JSON_VALUE([jsonContent], '$.first_name') AS [first_name]
    , JSON_VALUE([jsonContent], '$.last_name') AS [last_name]
    , JSON_VALUE([jsonContent], '$.email') AS [email]
    , JSON_VALUE([jsonContent], '$.gender') AS [gender]
    , JSON_VALUE([jsonContent], '$.ip_address') AS [ip_address]
    , JSON_VALUE([jsonContent], '$.stores_purchased_at') AS [stores_purchased_at]
    /*Scalar attributes extraction from a nested array of objects via OPENJSON and JSON_QUERY*/
    , [NestedArray].[nested_record_id]
    , [NestedArray].[nested_record_Name]
FROM
    OPENROWSET(
        BULK 'https://mydatalakename.dfs.core.windows.net/rawdata/MockJSONData/MOCK_DATA-SingleObjectWithNestedArray.json',
        FORMAT = 'CSV',
        FIELDQUOTE = '0x0b',
        FIELDTERMINATOR ='0x0b',
        ROWTERMINATOR = '0x0b'
    )
    WITH(
        [jsonContent] varchar(MAX) /*This now contains the full JSON document from the file. Note the datatype is varchar(MAX).*/
    ) AS [result]
CROSS APPLY OPENJSON 
    /*Use JSON_QUERY here because we are querying a nested array, so we cant use JSON_VALUE.
    We perform JSON_QUERY on the [result].[JSONDocument] field since that is the column with the full JSON Document we want to extract dat from. 
    The path to the attribute is using the normal path to the attribute name that contains the array */
    (JSON_QUERY([jsonContent], '$.attribute_with_nested_array')) /*Note, if you want only the top most record from this array, replace this line with "(JSON_QUERY([jsonContent], '$.attribute_with_nested_array[0]'))"*/
WITH(
    [nested_record_id] varchar(255) '$.nested_record_id',
    [nested_record_Name] varchar(255) '$.nested_record_Name'
) AS [NestedArray]

Results

idfirst_namelast_nameemailgenderip_addressstores_purchased_atnested_record_idnested_record_Name
1GordanPantongpanton0@prnewswire.comGenderfluid253.89.93.33StoreDoneNameOne
1GordanPantongpanton0@prnewswire.comGenderfluid253.89.93.33StoreDtwoNameTwo
1GordanPantongpanton0@prnewswire.comGenderfluid253.89.93.33StoreDthreeNameThree

Query a column in a tabular structure with embedded JSON Data

This method can be used when querying a SQL Table or a file in a Data Lake with tabular structure such as a CSV file or Parquet file. One column in the source data structure contains self-contains JSON data.

This could be the case when you source system cannot send data to the Data Lake directly, but only to an intermediate SQL Server and embed the source JSON data in a column (RavenDB is one such system).

This examples showcases a few things

  • Querying a file from a data lake with JSON embedded in a column
  • Querying a simple nested object using dot notation
  • Querying scalar top level values using JSON_VALUE
  • Querying complex nested objects using JSON_QUERY
JSON Sample
{
    "id": 1,
    "first_name": "Gordan",
    "last_name": "Panton",
    "email": "gpanton0@prnewswire.com",
    "gender": "Genderfluid",
    "ip_address": "253.89.93.33",
    "stores_purchased_at": "StoreD",
    "attribute_with_nested_values": {
        "nested_columne_one": "one",
        "nested_columne_two": "two",
        "nested_columne_three": "three"
    }
}
SQL Code
SELECT
    TOP 100 
    /*Normal Sacalar attributes at parent level in the document in the column*/
    JSON_VALUE([result].[JSONDocument], '$.id') AS [id]
    , JSON_VALUE([result].[JSONDocument], '$.first_name') AS [first_name]
    , JSON_VALUE([result].[JSONDocument], '$.last_name') AS [last_name]
    , JSON_VALUE([result].[JSONDocument], '$.email') AS [email]
    , JSON_VALUE([result].[JSONDocument], '$.gender') AS [gender]
    , JSON_VALUE([result].[JSONDocument], '$.ip_address') AS [ip_address]
    , JSON_VALUE([result].[JSONDocument], '$.stores_purchased_at') AS [stores_purchased_at]
    /*Query fields with nested objects or arrays
        Note here we are using JSON_QUERY first , 
            not JSON_VALUE because we are working with nested objects.
        After we have queried the complex object, 
            we can use JSON_VALUE to extract a scalar value from the nested object.
        The key difference between JSON_VALUE and JSON_QUERY is that JSON_VALUE 
            returns a scalar value, while JSON_QUERY returns an object or an array.
    */
    , JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values') AS [attribute_with_nested_values]
    , JSON_VALUE(
        JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values')
        , '$.nested_columne_one'
      ) AS [attribute_with_nested_values-nested_columne_one]
    , JSON_VALUE(
        JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values')
        , '$.nested_columne_two'
      ) AS [attribute_with_nested_values-nested_columne_two]
    , JSON_VALUE(
        JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values')
        , '$.nested_columne_three'
      ) AS [attribute_with_nested_values-nested_columne_three]
    /*Alternate method to extract scalar attributs from single level nested objects
        using dot notation*/
    , JSON_VALUE([result].[JSONDocument], '$.attribute_with_nested_values.nested_columne_one') AS [attribute_with_nested_values-nested_columne_one-dotnotation]
FROM
    OPENROWSET(
        BULK 'https://mydatalakename.dfs.core.windows.net/rawdata/MockCSVFiles/CSVWithJsonColumnMockData_NestedAttributesSample.csv',
        FORMAT = 'CSV',
        PARSER_VERSION='2.0',
        HEADER_ROW = true
    ) AS [result]

Results

idfirst_namelast_nameemailgenderip_addressstores_purchased_atattribute_with_nested_valuesattribute_with_nested_values-nested_columne_oneattribute_with_nested_values-nested_columne_twoattribute_with_nested_values-nested_columne_threeattribute_with_nested_values-nested_columne_one-dotnotation
1GordanPantongpanton0@prnewswire.comGenderfluid253.89.93.33StoreD

{

    "nested_columne_one": "one",

    "nested_columne_two": "two",

    "nested_columne_three": "three"

}

onetwothreeone

Query nested array using OPENJSON and CROSS APPLY

Example where you want to get the nested rows from an array attribute in the JSON Document, and present each attribute as a new row in a tabular dataset. Useful when a one-to-many relationship is modelled in the JSON as a JSON Doc with an attribute that has a nested array that shows the “many” side of the relationship.
JSON Sample
{
    "id": 2,
    "first_name": "Tynan",
    "last_name": "MacKeller",
    "email": "tmackeller1@cbc.ca",
    "gender": "Male",
    "ip_address": "115.228.50.83",
    "stores_purchased_at": "StoreA",
    "attribute_with_nested_array": [
        {
            "nested_record_id": "one"
        },
        {
            "nested_record_id": "two"
        },
        {
            "nested_record_id": "three"
        }
    ]
}
SQL Code
SELECT
    TOP 100 
    /*Normal Sacalar attributes at parent level in the document in the column*/
    JSON_VALUE([result].[JSONDocument], '$.id') AS [id]
    , JSON_VALUE([result].[JSONDocument], '$.first_name') AS [first_name]
    , JSON_VALUE([result].[JSONDocument], '$.last_name') AS [last_name]
    , JSON_VALUE([result].[JSONDocument], '$.email') AS [email]
    , JSON_VALUE([result].[JSONDocument], '$.gender') AS [gender]
    , JSON_VALUE([result].[JSONDocument], '$.ip_address') AS [ip_address]
    , JSON_VALUE([result].[JSONDocument], '$.stores_purchased_at') AS [stores_purchased_at]
    /*NormalAttributes extracted from the nested array*/
    , [ArrayRecords].[nested_record_id]
FROM
    OPENROWSET(
        BULK 'https://mydatalakename.dfs.core.windows.net/rawdata/MockCSVFiles/CSVWithJsonColumnMockData_NestedArraySample.csv',
        FORMAT = 'CSV',
        PARSER_VERSION='2.0',
        HEADER_ROW = true
    ) AS [result]
/*Use CROSS APPLY to perform a calculation for each row in the [result] output table from source, and get one or more rows from the function back*/
CROSS APPLY OPENJSON 
    /*Use JSON_QUERY here because we are querying a nested array, so we cant use JSON_VALUE.
    We perform JSON_QUERY on the [result].[JSONDocument] field since that is the column with the full JSON Document we want to extract dat from. 
    The path to the attribute is using the normal path to the attribute name that contains the array */
    (JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_array'))
/*Here in the WITH clause of the OPENJSON we treat the attribute_with_nested_array attribute mentioend above as the top level of the document.
Since we simple return a field name, that field name from all records in the array is returned as distinct record. 
If we used the [0] array index notioan, we could have extracted specific rows as well*/
WITH(
    [nested_record_id] varchar(255) '$.nested_record_id'
) AS [ArrayRecords]

Results

idfirst_namelast_nameemailgenderip_addressstores_purchased_atnested_record_id
2TynanMacKellertmackeller1@cbc.caMale115.228.50.83StoreAone
2TynanMacKellertmackeller1@cbc.caMale115.228.50.83StoreAtwo
2TynanMacKellertmackeller1@cbc.caMale115.228.50.83StoreAthree

Nested object query using dot notation - multiple object nesting using JSON_QUERY

In this example, you have a nested object, which also has a nested object. But in this case, the second level nested object has an array as one of the properties. This complicates the method to extract the attributes from the second level nested object.
JSON Sample
{
    "id": 1,
    "first_name": "Gordan",
    "last_name": "Panton",
    "email": "gpanton0@prnewswire.com",
    "gender": "Genderfluid",
    "ip_address": "253.89.93.33",
    "stores_purchased_at": "StoreD",
    "attribute_with_nested_values": {
        "levelone_nested_columne_one": "one_one",
        "levelone_nested_columne_two": "one_two",
        "levelone_nested_columne_three": {
            "leveltwo_nested_columne_one": "two_one",
            "leveltwo_nested_columne_two": "two_two",
            "leveltwo_nested_columne_three": ["one", "two", "three"]
        }
    }    
}
SQL Code
SELECT
    TOP 100 
    /*Normal Sacalar attributes at parent level in the document in the column*/
    JSON_VALUE([result].[JSONDocument], '$.id') AS [id]
    /*Query fields with multiple nested objects
        * Note here we are using JSON_QUERY, 
            not JSON_VALUE because we are working with nested objects.
        * The key difference between JSON_VALUE and JSON_QUERY is that 
            JSON_VALUE returns a scalar value, while JSON_QUERY returns an object or an array.
        * attribute_with_nested_values - attribute that has top level scalar attributes, 
            as well as another attribute with a nested complex object - this need to be queried using JSON_QUERY
        * levelone_nested_columne_three - nested attributes with top level scalar attributes, 
            as well as complex objects such as the array. 
            - If we want to extract one of the top level attributes, we can just use JSON_VALUE. 
            - If we were to want to extract the array, 
                we need JSON_QUERY or use index notation with JSON_VALUE to get a single element from the array
    */
    , JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values') AS [attribute_with_nested_values]
    , JSON_VALUE(
        JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values.levelone_nested_columne_three')
        , '$.leveltwo_nested_columne_one'
    ) AS [leveltwo_nested_columne_one]
    , JSON_VALUE(
        JSON_QUERY([result].[JSONDocument], '$.attribute_with_nested_values.levelone_nested_columne_three')
        , '$.leveltwo_nested_columne_three[0]'
    ) AS [leveltwo_nested_columne_three_0]
FROM
    OPENROWSET(
        BULK 'https://mydatalakename.dfs.core.windows.net/rawdata/MockCSVFiles/CSVWithJsonColumnMockData_NestedAttributesSample_MultiObject.csv',
        FORMAT = 'CSV',
        PARSER_VERSION='2.0',
        HEADER_ROW = true
    ) AS [result]

Results

idattribute_with_nested_valuesleveltwo_nested_columne_oneleveltwo_nested_columne_three_0
1

{

    "levelone_nested_columne_one": "one_one",

    "levelone_nested_columne_two": "one_two",

    "levelone_nested_columne_three": {

        "leveltwo_nested_columne_one": "two_one",

        "leveltwo_nested_columne_two": "two_two",

        "leveltwo_nested_columne_three": ["one", "two", "three"]

    }

}

two_oneone

Example from Microsoft with nested complex objects extraction via OPENSJON

This example shows how to extract complex attributes from nested array elements using OPENJSON. 

The key here is the notation used in the WITH Clause of the OPENJSON function to extract the nested complex object correctly.

See line 15 in the SQL code below.

JSON Sample
[
    {
        "id": 2,
        "info": {
            "name": "John",
            "surname": "Smith"
        },
        "age": 25
    },
    {
        "id": 5,
        "info": {
            "name": "Jane",
            "surname": "Smith",
            "skills": [
                "SQL",
                "C#",
                "Azure"
            ]
        },
        "dob": "2005-11-04T12:00:00"
    }
]
SQL Code
DECLARE @json NVARCHAR(MAX);
SET @json = N'[  
  {"id": 2, "info": {"name": "John", "surname": "Smith"}, "age": 25},
  {"id": 5, "info": {"name": "Jane", "surname": "Smith", "skills": ["SQL", "C#", "Azure"]}, "dob": "2005-11-04T12:00:00"}  
]';

SELECT id, firstName, lastName, age, dateOfBirth, skill  
FROM OPENJSON(@json)  
  WITH (
    id INT 'strict $.id',
    firstName NVARCHAR(50) '$.info.name',
    lastName NVARCHAR(50) '$.info.surname',  
    age INT,
    dateOfBirth DATETIME2 '$.dob',
    skills NVARCHAR(MAX) '$.info.skills' AS JSON 
        /*Note here you use NVARCHAR(MAX) as the data type for the attribute 
            with nested complex object. In the next OPENJSON you can 
            then parse this JSON again using the regular methods.
        */
  )
OUTER APPLY OPENJSON(skills)
  WITH (skill NVARCHAR(8) '$')
    /*Since the entire value of the array element is a string value, 
    you can just use the $ notation to get the value.*/
;

Results

IDfirstNamelastNameagedateOfBirthskill
2JohnSmith25

null

null

5JaneSmith

null

2005-11-04T12:00:00SQL
5JaneSmith

null

2005-11-04T12:00:00C#
5JaneSmith

null

2005-11-04T12:00:00Azure

If you like what I do please consider supporting me on Ko-Fi

8 thoughts on “Query JSON data in SQL Server and Synapse Analytics”

  1. Hi there! This post couldn’t be written any better! Reading through this post reminds me of my previous room mate! He always kept talking about this. I will forward this article to him. Pretty sure he will have a good read. Thank you for sharing!

  2. Thank you for the content. It was very useful and helpful today! The Synapse SQL editor was/is displaying an error on the second WITH clause, but the query executes successfully… in case anyone else experiences this. Error: incorrect token ( expecting BLOCKING_HIERARCHY, XMLNAMESPACES, common_table_expression

  3. Anthinator5000

    Oh my goodness!! I looked for something like this for 2 months back in 2021 and nothing existed. I had to struggle through on my own using workarounds and the process is slow and prone to breaking. Now in 2022 I have to overhaul the pipeline because of changes downstream and I’m stuck in the same position. Except this time I found you! I cannot thank you enough for this article. It made my life so much better.

  4. ankit kumar

    Can you provide an example as how we can filter content from Json using SQL.. like filter records that do not have a particular value in their key-value pair or filter record if its fall within a time range

  5. I have been browsing online more than 3 hours today, but I never found any fascinating article like yours. It is pretty price enough for me. In my opinion, if all website owners and bloggers made good content as you probably did, the internet will likely be a lot more useful than ever before.

  6. My spouse and i got really relieved Ervin managed to finish up his inquiry from your precious recommendations he came across from your site. It is now and again perplexing to just happen to be freely giving tips and tricks that many people have been trying to sell. We fully grasp we now have the website owner to thank for that. These illustrations you made, the easy website navigation, the relationships you aid to foster – it’s most exceptional, and it is helping our son and us consider that that subject matter is exciting, which is certainly extremely fundamental. Thanks for the whole thing!

  7. Great examples. I’m using Azure Synapse Microsoft Azure SQL Data Warehouse – 10.0.13670.0 Feb 24 2022 15:33:02 and I cannot get your examples to work. It simply returns: Msg 103010, Level 16, State 1, Line 1
    Parse error at line: 4, column: 5: Incorrect syntax near ‘OPENROWSET’. I do have bulk permissions, so I can’t figure out what the syntax error is. Any ideas?

Comments are closed.