What is dynamic SQL?
- In Azure Synapse Analytics (Serverless and Dedicated Pool) you can dynamically generate an external table pointing to a location in the data lake which is determined at runtime. This is ideal when you want the external table to only point to the latest file available in a lake, not all historical files.
- Maintenance – you can perform the same maintenance task on a large set of objects when you make the maintenance code dynamic in terms of the object it interacts on. These actions can be to maintain indexes, drop tables not needed, clear the cache of databases etc.
- Pivot and Unpivot – you can create a stored procedure to pivot or unpivot a table structure based on the column names sent in as parameters. This means you never have to google how to do the actual pivot or unpivot again – joy!
- JSON Normalization – you can dynamically normalize a JSON object or array by dynamically getting all the attributes from the JSON string and generating the code to persist the normalized structure.
- Optimized data retrieval – this is very useful for databases that get requests from multiple users at a time. You can create a dynamic query that will filter a column based on the value sent in from a front end application at runtime. This means a reduced set of data is returned, increasing performance and security.
- Dynamic SQL can be hard to debug. I typically print out the SQL statement generated first before ever adding the EXEC statement at all so I can investigate the code generated and run it manually before letting SQL run it for me.
- It is vulnerable to SQL Injection attacks, especially if you rely on input from users and not input from secure backend sources the users don’t interact with.
- It can be slower to execute due to the execution plans the SQL engine uses to perform the actions the code instructs having to be generated anew each time – depending on the degree to which the query statement is altered as part of you dynamic-ness.
SQL Server Example
/*Variable to hold the value we will filter on at runtime*/ DECLARE @vCustomerID INT; /*Variable to hold the query string we will alter at runtime then execute*/ DECLARE @vSqlQuery VARCHAR(4000); /*Define the base version of the query*/ SET @vSqlQuery = ' SELECT * FROM [dbo].[Customers] WHERE [Customers].[CustomerID] = ' + @vCustomerID + '' /*Execute the code*/ EXEC (@vSqlQuery)
SELECT * FROM [dbo].[Customers] WHERE [Customers].[CustomerID] = 123
- You need to have a dataframe already defined earlier in the notebook and have it saved as a temporary view called sourceTemporaryView
- The dataframe must have a field called ProductCategory to apply the partitioning on. This line can actually also be made dynamic to pass in a dynamic field name to partition on.
- You must have a linked data lake where the data for this delta table will be stored. This location is defined by the vDeltaTablePath variable value. This should ideally be the path to a mounted data lake location.
# pDataLakeContainer: "silver" # pTableName": "Product" # vDeltaTablePath: "/mnt/silver/WorldWideImporters/Batch/Product/" vDeltaTableCreateStatement = 'CREATE TABLE IF NOT EXISTS ' \ + pDataLakeContainer + '.' + pTableName + ' \n' \ + 'USING DELTA ' + '\n' \ + 'LOCATION \'' + vDeltaTablePath + '\' ' + '\n' \ + 'PARTITIONED BY ( ProductCategory )' + '\n' \ + 'AS' + '\n' \ + 'SELECT * FROM sourceTemporaryView' #Check final output print('vDeltaTableCreateStatement: ' + vDeltaTableCreateStatement) #Execute the SQL spark.sql(vDeltaTableCreateStatement)
CREATE TABLE IF NOT EXISTS silver.Product USING DELTA LOCATION '/mnt/silver/WorldWideImporters/Batch/Product/' PARTITIONED BY ( ProductCategory ) AS SELECT * FROM sourceTemporaryView
If you like what I do please consider supporting me on Ko-Fi