pyspark

Function to mount a storage account container to Azure Databricks

What does it mean to mount a storage account to Azure Databricks Databricks has a built in “Databricks File System (DBFS)”. It is a distributed file system mounted onto your Databricks workspace. It is mounted directly to your cluster and is only accessible while the cluster is running. It is an abstraction on object storage. …

Function to mount a storage account container to Azure Databricks Read More »

Get the latest file from Azure Data Lake in Databricks

There are many ways to orchestrate a data flow in the cloud. One such option is to have an independent process pull data from source systems and land the latest batch of data in an Azure Data Lake as a single file. The next layer where you process the data can be handled in many …

Get the latest file from Azure Data Lake in Databricks Read More »

Convert the Character Set/Encoding of a String field in a PySpark DataFrame on Databricks

TL;DR When defining your PySpark dataframe using spark.read, use the .withColumns() function to override the contents of the affected column. Use the encode function of the pyspark.sql.functions library to change the Character Set Encoding of the column.  import pyspark.sql.functions dataFame = ( spark.read.json(varFilePath) ) .withColumns(“affectedColumnName”, sql.functions.encode(“affectedColumnName”, ‘utf-8’)) Scenario The scenario where this would be needed …

Convert the Character Set/Encoding of a String field in a PySpark DataFrame on Databricks Read More »