Data-Driven Using Airflow, Dbt Cloud, and AWS Tech Stack

Ahmad Kamiludin

0

Data Analyst

Data Engineer

AWS

dbt

Python

Introduction

This project showcases the seamless integration of modern data engineering tools to collect, process, and visualize football statistics from the top five European leagues. It automates data scraping with Apache Airflow, efficiently stores and queries data using Amazon S3 and Redshift, transforms it with dbt Cloud, and creates interactive dashboards in Amazon QuickSight, resulting in a robust pipeline that delivers actionable insights.

Architecture

DAG in Airflow (Extract and Load)

Data Lineage in Dbt (Only One Source/Table for Demonstration)

Simple Dashboard

Technology Used

Amazon S3
Amazon Redshift
DBT (Data Build Tool)
Amazon Quicksight
Docker
Apache Airflow
Python
SQL

Dataset Used

This dataset originates from scraping the understat.com website, which provides detailed statistics on team and player performances from the top five European leagues during the 2023 season, including the English Premier League (EPL), La Liga, Bundesliga, Serie A, and Ligue 1. The data was collected using Python.
More info about dataset: https://understat.com/

Article About this Project

Like this project
0

Posted Nov 28, 2024

This project leverages modern data engineering tools to automate the collection, processing, and visualization of football data from Europe’s top leagues.

Likes

0

Views

6

Tags

Data Analyst

Data Engineer

AWS

dbt

Python

Ahmad Kamiludin

Data Engineer | Python Developer | Cloud Data Architect

Real-Time Data Pipeline Using Apache Kafka, Flink, and MongoDB
Real-Time Data Pipeline Using Apache Kafka, Flink, and MongoDB
Scalable Data Pipeline with MySQL, Google Cloud, & Looker Studio
Scalable Data Pipeline with MySQL, Google Cloud, & Looker Studio
Analytics Engineering with Airbnb Data Using dbt
Analytics Engineering with Airbnb Data Using dbt
Real-Time Music Data Pipeline Using Apache Kafka
Real-Time Music Data Pipeline Using Apache Kafka