Convert the Character Set/Encoding of a String field in a PySpark DataFrame on Databricks

TL;DR When defining your PySpark dataframe using spark.read, use the .withColumns() function to override the contents of the affected column. Use the encode function of the pyspark.sql.functions library to change the Character Set Encoding of the column.  import pyspark.sql.functions dataFame = ( spark.read.json(varFilePath) ) .withColumns(“affectedColumnName”, sql.functions.encode(“affectedColumnName”, ‘utf-8’)) Scenario The scenario where this would be needed …

Convert the Character Set/Encoding of a String field in a PySpark DataFrame on Databricks Read More »