This project automates the extraction, transformation, and loading (ETL) of YouTube channel data using Python and Apache Airflow. It fetches video stats from Alex The Analyst's channel and stores them in a database.
š ļø Tech Stack
Python
YouTube Data API v3
Apache Airflow (workflow orchestration)
pandas (data processing)
PostgreSQL (data storage)
dotenv (env management)
š Project Structure
āāā youtube_extract.py # Extracts video data from YouTube
āāā youtube_transform.py # Transforms the data
āāā youtube_load.py # Saves data to PostgreSQL
āāā dags/youtube_etl_dag.py # Airflow DAG definition
āāā .env # Environment variables
āāā alex_videos.csv # Raw extracted data
āāā youtube_alex_data_transformed.csv # Cleaned/transformed data
āāā README.md # Project documentation
šHow It Works
Extract
Uses YouTube API to fetch video metadata from a channel playlist.
Transform
Processes timestamps, cleans, and structures data.
Load
Saves the cleaned dataset to a database.
ā Use Cases
YouTube performance analytics
Content strategy reporting
Automated creator dashboards
š Article Step-by-step write-up coming soon on
Like this project
Posted Jun 15, 2025
ETL pipeline analysing YouTube channel using Airflow, YouTube API v3, and Python to extract, process, and load data; visualised insights with Grafana dashboard.