Databricks

Posts related to content that can be used in the Databricks platform.

Date Table Generation Notebook for Databricks Unity Catalog 

In this episode of “How do I create a date table from nothing in my chosen platform”? I am covering Databricks.   No lengthy life story of what bubble tea I was drinking when creating this or childhood memories like some food bloggers. The code is linked below, look at the comments in the notebook if …

Date Table Generation Notebook for Databricks Unity Catalog  Read More »

Dynamic SQL in Databricks and SQL Server

What is dynamic SQL? Dynamic SQL is a programming technique where you write a general purpose query and store it in a string variable, then alter key words in the string at runtime to alter the type of actions it will perform, the data it will return or the objects it will perform these actions …

Dynamic SQL in Databricks and SQL Server Read More »

Function to mount a storage account container to Azure Databricks

What does it mean to mount a storage account to Azure Databricks Databricks has a built in “Databricks File System (DBFS)”. It is a distributed file system mounted onto your Databricks workspace. It is mounted directly to your cluster and is only accessible while the cluster is running. It is an abstraction on object storage. …

Function to mount a storage account container to Azure Databricks Read More »

Get the latest file from Azure Data Lake in Databricks

There are many ways to orchestrate a data flow in the cloud. One such option is to have an independent process pull data from source systems and land the latest batch of data in an Azure Data Lake as a single file. The next layer where you process the data can be handled in many …

Get the latest file from Azure Data Lake in Databricks Read More »

Convert the Character Set/Encoding of a String field in a PySpark DataFrame on Databricks

TL;DR When defining your PySpark dataframe using spark.read, use the .withColumns() function to override the contents of the affected column. Use the encode function of the pyspark.sql.functions library to change the Character Set Encoding of the column.  import pyspark.sql.functions dataFame = ( spark.read.json(varFilePath) ) .withColumns(“affectedColumnName”, sql.functions.encode(“affectedColumnName”, ‘utf-8’)) Scenario The scenario where this would be needed …

Convert the Character Set/Encoding of a String field in a PySpark DataFrame on Databricks Read More »