Dynamic External Tables in Azure Synapse Analytics On-Demand

What is an External Table?

This article will focus on the Synapse Analytics implementation of External Tables.
However, note that there are other flavours of external tables and they behave slightly differently depending on which product you are using to defined it.
External Tables in Azure Synapse Analytics are used to query data via a T-SQL interface (the table) which is stored outside of an SQL Server Database or SQL Pool. As at the time of writing, the potential external storage locations are:
  • Hadoop (Only available for dedicated SQL Pools though, not serverless SQL Pools)
  • Azure Blob Store (Flat Namespaces)
  • Azure Data Lake Store (Hierarchical Namespaces)

CET vs CETAS

The command used to define an external table is CREATE EXTERNAL TABLE or CET. However, there are a few other objects that the CREATE EXTERNAL TABLE  is dependent on that needs to be instantiated first.
  • An EXTERNAL DATA SOURCE created via the command CREATE EXTERNAL DATA SOURCE is needed to reference an external Azure Storage Location and specify the credential that should be used to access the storage. The credential object is only needed if you are not going to allow AD Passthrough authentication.
  • An EXTERNAL FILE FORMAT created via the command  CREATE EXTERNAL FILE FORMAT is needed to describe format of the files you are going to read from and the various options needed to consume them correctly i.e. CSV or Parquet files.
  • Then lastly the EXTERNAL TABLE can be created on top of the files placed on the data source with the same file format.

The CET command only creates an object in the SQL Serverless Database where the command is executed (either the default master database or a custom one if created beforehand ). Running this command does not move data around, generate new data in the targeted storage account or delete any data in that storage account. It merely creates the SQL object that points to the storage location and allows one to query the files stored there as if they were stored in a normal SQL Database.

On the other hand, the CREATE EXTERNAL TABLE AS SELECT OR CETAS statement operates slightly differently.

CETAS, as the name suggests, contains a SELECT statement in the definition as well. This means that when the command is executed, the SELECT statement is run and the results are physically serialized using the FILE FORMAT specified and stored in the location specified by the DATA SOURCE as files.

Dropping an external table via the DROP EXTERNAL TABLE command does not affect the files it pointed to. 

For this article, we will focus on the `CET` flavour only.

Implementation

Since the external table is simply an SQL Object, we are able to dynamically generate it using a combination of Dynamic SQL and a Stored Procedure with a few parameters.
CREATE OR ALTER PROCEDURE [dbo].[uspGenerateDynamicExternalTable]
(   
   /*External Table SCHEMA and TABLE name*/
     @pSQLSchema VARCHAR(255)
   , @pSQLTableName VARCHAR(255)
   /*COLUMN Structure*/
   , @pColumnDefinitions VARCHAR(MAX)
   /*LOCATION*/
   , @pBlobStorageDirectory VARCHAR(255)
   /*DATA_SOURCE*/
   , @pExternalDataSourceName VARCHAR(255)
   /*FILEFORMAT*/
   , @pFileExtention VARCHAR(255)
   , @pFileFormatName VARCHAR(255)
)
AS
/*=====================================================================================================================================================
Author:         Dian Germishuizen
Created Date:   2021-10-17
Description:    Stored Procedure that can generate a new external table in Azure Synapse Analytics Serverless on a dynamic location in a Data lake. 
-------------------------------------------------------------------------------------------------------------------------------------------------------
Test Code
-------------------------------------------------------------------------------------------------------------------------------------------------------
DECLARE 
    --External Table SCHEMA and TABLE name
      @pSQLSchema VARCHAR(255) = 'dbo'
    , @pSQLTableName VARCHAR(255) = 'DynamicExternalTableName'
    --COLUMN Structure
    , @pColumnDefinitions VARCHAR(MAX) = '[DateTimeField] DATETIME2(7),[StringField] VARCHAR(8000),[IntegerField] INT,[FloatField] FLOAT'
    --LOCATION
    , @pBlobStorageDirectory VARCHAR(255) = 'ParentDirectoryInsideTheContainer/SubDirectory/TableName'
    --DATA_SOURCE
    , @pExternalDataSourceName VARCHAR(255) = 'mystorageaccountname_mycontainername'
    --FILEFORMAT
    , @pFileExtention VARCHAR(255) = 'parquet'
    , @pFileFormatName VARCHAR(255) = 'SynapseParquetFileFormat'
    ;

--Execute the procedure with dummy values
EXEC [dbo].[uspGenerateDynamicExternalTable]
      @pSQLSchema
    , @pSQLTableName
    , @pColumnDefinitions
    , @pBlobStorageDirectory
    , @pExternalDataSourceName
    , @pFileExtention
    , @pFileFormatName
    ;
=====================================================================================================================================================*/
BEGIN
    /*======================================================================
    Declare a string variable to hold the dynamically generated SQL Code. 
    This string variable's contents will get executed later to actually generate the external table
    ======================================================================*/
    DECLARE @vSQL VARCHAR(MAX) =
    '
    IF EXISTS 
    (    
        /*Drop the pre-existing version of this table if it should already exist*/
        SELECT *
        FROM [sys].[external_tables]
        INNER JOIN [sys].[schemas]
            ON [external_tables].[schema_id] = [schemas].[schema_id]
        WHERE [schemas].[name] = ''' + @pSQLSchema + '''
            AND [external_tables].[name] = ''' + @pSQLTableName + '''
    ) 
    BEGIN
        DROP EXTERNAL TABLE [' + @pSQLSchema + '].[' + @pSQLTableName + ']
    END

    /*CREATE conmmand that will generate the new object*/
    CREATE EXTERNAL TABLE [' + @pSQLSchema + '].[' + @pSQLTableName + ']
    (
        ' + @pColumnDefinitions + '
    )
    WITH
    (
        /*Note here, due to the wild card indicator, all files in the directory will be included with the applicable extention*/
        LOCATION = ''' + @pBlobStorageDirectory + '/*.' + @pFileExtention + '''
        , DATA_SOURCE = [' + @pExternalDataSourceName + ']
        , FILE_FORMAT = [' + @pFileFormatName + ']
    ) ' ;

   /*======================================================================
   Execute the dynamic SQL that was generated
   ======================================================================*/
   PRINT( @vSQL ); /*Also printing the string value in order to inspect the contents for thoubleshooting purpouses*/   
   EXEC( @vSQL );

END
The resulting code that gets generated and executed:
IF EXISTS 
(    
    /*Drop the pre-existing version of this table if it should already exist*/
    SELECT *
    FROM [sys].[external_tables]
    INNER JOIN [sys].[schemas]
        ON [external_tables].[schema_id] = [schemas].[schema_id]
    WHERE [schemas].[name] = 'dbo'
        AND [external_tables].[name] = 'DynamicExternalTableName'
) 
BEGIN
    DROP EXTERNAL TABLE [dbo].[DynamicExternalTableName]
END

/*CREATE conmmand that will generate the new object*/
CREATE EXTERNAL TABLE [dbo].[DynamicExternalTableName]
(
    [DateTimeField] DATETIME2(7)
    , [StringField] VARCHAR(8000)
    , [IntegerField] INT
    , [FloatField] FLOAT
)
WITH
(
    /*Note here, due to the wild card indicator, all files in the directory will be included with the applicable extention*/
    LOCATION = 'ParentDirectoryInsideTheContainer/SubDirectory/TableName/*.parquet'
    , DATA_SOURCE = [mystorageaccountname_mycontainername]
    , FILE_FORMAT = [SynapseParquetFileFormat]
) 

Use Cases

This type of dynamic sql object creation is especially useful when automating sections of your virtual lakehouse and you need SQL Objects to expose Data Lake directories as tables to downstream consumers.
 
Let me know in the comments down below in which other situations this would be useful!

If you like what I do please consider supporting me on Ko-Fi

4 thoughts on “Dynamic External Tables in Azure Synapse Analytics On-Demand”

  1. I¦ve been exploring for a little for any high-quality articles or blog posts in this sort of space . Exploring in Yahoo I finally stumbled upon this website. Studying this information So i¦m glad to express that I’ve an incredibly good uncanny feeling I discovered just what I needed. I so much without a doubt will make certain to do not put out of your mind this web site and give it a look on a constant basis.

  2. Excellent blog here! Also your site loads up fast! What web host are you using? Can I get your affiliate link to your host? I wish my website loaded up as fast as yours lol

Comments are closed.