
What is an External Table?
- Hadoop (Only available for dedicated SQL Pools though, not serverless SQL Pools)
- Azure Blob Store (Flat Namespaces)
- Azure Data Lake Store (Hierarchical Namespaces)
CET vs CETAS
- An EXTERNAL DATA SOURCE created via the command CREATE EXTERNAL DATA SOURCE is needed to reference an external Azure Storage Location and specify the credential that should be used to access the storage. The credential object is only needed if you are not going to allow AD Passthrough authentication.
- An EXTERNAL FILE FORMAT created via the command CREATE EXTERNAL FILE FORMAT is needed to describe format of the files you are going to read from and the various options needed to consume them correctly i.e. CSV or Parquet files.
- Then lastly the EXTERNAL TABLE can be created on top of the files placed on the data source with the same file format.
The CET command only creates an object in the SQL Serverless Database where the command is executed (either the default master database or a custom one if created beforehand ). Running this command does not move data around, generate new data in the targeted storage account or delete any data in that storage account. It merely creates the SQL object that points to the storage location and allows one to query the files stored there as if they were stored in a normal SQL Database.
On the other hand, the CREATE EXTERNAL TABLE AS SELECT OR CETAS statement operates slightly differently.
CETAS, as the name suggests, contains a SELECT statement in the definition as well. This means that when the command is executed, the SELECT statement is run and the results are physically serialized using the FILE FORMAT specified and stored in the location specified by the DATA SOURCE as files.
Dropping an external table via the DROP EXTERNAL TABLE command does not affect the files it pointed to.
For this article, we will focus on the `CET` flavour only.
Implementation
CREATE OR ALTER PROCEDURE [dbo].[uspGenerateDynamicExternalTable]
(
/*External Table SCHEMA and TABLE name*/
@pSQLSchema VARCHAR(255)
, @pSQLTableName VARCHAR(255)
/*COLUMN Structure*/
, @pColumnDefinitions VARCHAR(MAX)
/*LOCATION*/
, @pBlobStorageDirectory VARCHAR(255)
/*DATA_SOURCE*/
, @pExternalDataSourceName VARCHAR(255)
/*FILEFORMAT*/
, @pFileExtention VARCHAR(255)
, @pFileFormatName VARCHAR(255)
)
AS
/*=====================================================================================================================================================
Author: Dian Germishuizen
Created Date: 2021-10-17
Description: Stored Procedure that can generate a new external table in Azure Synapse Analytics Serverless on a dynamic location in a Data lake.
-------------------------------------------------------------------------------------------------------------------------------------------------------
Test Code
-------------------------------------------------------------------------------------------------------------------------------------------------------
DECLARE
--External Table SCHEMA and TABLE name
@pSQLSchema VARCHAR(255) = 'dbo'
, @pSQLTableName VARCHAR(255) = 'DynamicExternalTableName'
--COLUMN Structure
, @pColumnDefinitions VARCHAR(MAX) = '[DateTimeField] DATETIME2(7),[StringField] VARCHAR(8000),[IntegerField] INT,[FloatField] FLOAT'
--LOCATION
, @pBlobStorageDirectory VARCHAR(255) = 'ParentDirectoryInsideTheContainer/SubDirectory/TableName'
--DATA_SOURCE
, @pExternalDataSourceName VARCHAR(255) = 'mystorageaccountname_mycontainername'
--FILEFORMAT
, @pFileExtention VARCHAR(255) = 'parquet'
, @pFileFormatName VARCHAR(255) = 'SynapseParquetFileFormat'
;
--Execute the procedure with dummy values
EXEC [dbo].[uspGenerateDynamicExternalTable]
@pSQLSchema
, @pSQLTableName
, @pColumnDefinitions
, @pBlobStorageDirectory
, @pExternalDataSourceName
, @pFileExtention
, @pFileFormatName
;
=====================================================================================================================================================*/
BEGIN
/*======================================================================
Declare a string variable to hold the dynamically generated SQL Code.
This string variable's contents will get executed later to actually generate the external table
======================================================================*/
DECLARE @vSQL VARCHAR(MAX) =
'
IF EXISTS
(
/*Drop the pre-existing version of this table if it should already exist*/
SELECT *
FROM [sys].[external_tables]
INNER JOIN [sys].[schemas]
ON [external_tables].[schema_id] = [schemas].[schema_id]
WHERE [schemas].[name] = ''' + @pSQLSchema + '''
AND [external_tables].[name] = ''' + @pSQLTableName + '''
)
BEGIN
DROP EXTERNAL TABLE [' + @pSQLSchema + '].[' + @pSQLTableName + ']
END
/*CREATE conmmand that will generate the new object*/
CREATE EXTERNAL TABLE [' + @pSQLSchema + '].[' + @pSQLTableName + ']
(
' + @pColumnDefinitions + '
)
WITH
(
/*Note here, due to the wild card indicator, all files in the directory will be included with the applicable extention*/
LOCATION = ''' + @pBlobStorageDirectory + '/*.' + @pFileExtention + '''
, DATA_SOURCE = [' + @pExternalDataSourceName + ']
, FILE_FORMAT = [' + @pFileFormatName + ']
) ' ;
/*======================================================================
Execute the dynamic SQL that was generated
======================================================================*/
PRINT( @vSQL ); /*Also printing the string value in order to inspect the contents for thoubleshooting purpouses*/
EXEC( @vSQL );
END
IF EXISTS
(
/*Drop the pre-existing version of this table if it should already exist*/
SELECT *
FROM [sys].[external_tables]
INNER JOIN [sys].[schemas]
ON [external_tables].[schema_id] = [schemas].[schema_id]
WHERE [schemas].[name] = 'dbo'
AND [external_tables].[name] = 'DynamicExternalTableName'
)
BEGIN
DROP EXTERNAL TABLE [dbo].[DynamicExternalTableName]
END
/*CREATE conmmand that will generate the new object*/
CREATE EXTERNAL TABLE [dbo].[DynamicExternalTableName]
(
[DateTimeField] DATETIME2(7)
, [StringField] VARCHAR(8000)
, [IntegerField] INT
, [FloatField] FLOAT
)
WITH
(
/*Note here, due to the wild card indicator, all files in the directory will be included with the applicable extention*/
LOCATION = 'ParentDirectoryInsideTheContainer/SubDirectory/TableName/*.parquet'
, DATA_SOURCE = [mystorageaccountname_mycontainername]
, FILE_FORMAT = [SynapseParquetFileFormat]
)
Use Cases
If you like what I do please consider supporting me on Ko-Fi
I¦ve been exploring for a little for any high-quality articles or blog posts in this sort of space . Exploring in Yahoo I finally stumbled upon this website. Studying this information So i¦m glad to express that I’ve an incredibly good uncanny feeling I discovered just what I needed. I so much without a doubt will make certain to do not put out of your mind this web site and give it a look on a constant basis.
WONDERFUL Post.thanks for share..more wait .. …
Excellent blog here! Also your site loads up fast! What web host are you using? Can I get your affiliate link to your host? I wish my website loaded up as fast as yours lol
Thank you very much, I really appreciate it! I am using Hostinger to host the site, here is the referral link: https://hostinger.com?REFERRALCODE=1DIANGERMIS98