Dynamic Datasets in Azure Data Factory

In a previous post linked at the bottom, I showed how you can setup global parameters in your Data Factory that is accessible from any pipeline at run time. This post will show you how you can leverage global parameters to minimize the number of datasets you need to create. Specifically, I will show how …

Dynamic Datasets in Azure Data Factory Read More »

Parameterize Synapse Analytics Spark Notebooks Efficiently

When creating pipelines in any sort of data flow to move data from an incoming source to a target location, ideally you don’t want to create single-purpose activities. These would only perform one action or set of actions on a specific set of source and target objects. E.g. take the source.sales table, filter out where …

Parameterize Synapse Analytics Spark Notebooks Efficiently Read More »

Query JSON data in SQL Server and Synapse Analytics

When would you work with JSON Data? JSON is a popular data representation format used on the web to exchange information between remote parties.It is also used for storing unstructured data in log files or NoSQL Document Databases such as MongoDB or Azure CosmosDB.SQL also has the ability to store JSON data in a text …

Query JSON data in SQL Server and Synapse Analytics Read More »

Global Parameters 101 in Azure Data Factory

What the heck are they? Global Parameters are fixed values across the entire Data Factory and can be referenced in a pipeline at execution time. They have many applications e.g. when multiple pipelines require identical parameters and values at run time and you don’t want to duplicate them in variables across said pipelines. When you utilize …

Global Parameters 101 in Azure Data Factory Read More »

Query Azure Data Lake via Synapse Serverless Security Credentials Setup

Overview Azure Synapse Analytics Serverless SQL Endpoint has the capability to query files that are stored in an Azure Data Lake using T-SQL code as if they were regular tables in a relational database.These files can be semi-structured or unstructured in nature. Using the Create External Table As Select (CETAS) functionality, you can even generate new …

Query Azure Data Lake via Synapse Serverless Security Credentials Setup Read More »

Dynamic External Tables in Azure Synapse Analytics On-Demand

What is an External Table? This article will focus on the Synapse Analytics implementation of External Tables.However, note that there are other flavours of external tables and they behave slightly differently depending on which product you are using to defined it. SQL Server SQL Database Azure Synapse Analytics Analytics Platform System (PDW) External Tables in …

Dynamic External Tables in Azure Synapse Analytics On-Demand Read More »

Convert the Character Set/Encoding of a String field in a PySpark DataFrame on Databricks

TL;DR When defining your PySpark dataframe using spark.read, use the .withColumns() function to override the contents of the affected column. Use the encode function of the pyspark.sql.functions library to change the Character Set Encoding of the column.  import pyspark.sql.functions dataFame = ( spark.read.json(varFilePath) ) .withColumns(“affectedColumnName”, sql.functions.encode(“affectedColumnName”, ‘utf-8’)) Scenario The scenario where this would be needed …

Convert the Character Set/Encoding of a String field in a PySpark DataFrame on Databricks Read More »