In this episode of “How do I create a date table from nothing in my chosen platform”? I am covering Databricks.
No lengthy life story of what bubble tea I was drinking when creating this or childhood memories like some food bloggers. The code is linked below, look at the comments in the notebook if you have questions.
Features of the date table
Gregorian fields – Year, Month, Day, Weekday etc.
Fiscal Period Fields – This is configurable to start at any month of the year. There is now day level shift to have the fiscal period start at e.g. the 25th. Only a month level shift.
Boolean fields – these are to be used in reporting platforms for easy filtering of special date ranges such as current and last month, previous 12 months, current day etc.
Retail calendar – many times companies want to bucket weeks into perfectly comparable fields so that day of week to previous weeks can be compared more accurately.
TL;DR When defining your PySpark dataframe using spark.read, use the .withColumns() function to override the contents of the affected column. Use the encode function of…