Creating a Dimensional Model (Star Schema) in Databricks by Andreas WattsCreating a Dimensional Model (Star Schema) in Databricks by Andreas Watts

Creating a Dimensional Model (Star Schema) in Databricks

Andreas Watts

Database Specialist

Data Engineer

Data Modelling Analyst

Azure SQL Database

PySpark

Data

Creating a Dimensional Model (Star Schema) in Databricks

For this project I built a dimensional model on car sales data using PySpark and Databricks.

Tech Stack: Azure Cloud, Azure SQL Server, Azure Data Factory, Databricks, Python (PySpark, Pandas)

Creating a Dimensional Model (Star Schema) in Databricks

Project Architecture

ETL

Notebooks

Workflow

Project Architecture

Data is extracted from an Azure cloud SQL server using Azure Data Factory to Azure Data Lake Gen2 and saved as Parquet files. The data is then transformed within Databricks and then modelled into a star schema consisting of a fact table and multiple dimensions.