Uber Data Engineering ETL Project by Pankaj Kumar PramanikUber Data Engineering ETL Project by Pankaj Kumar Pramanik

Uber Data Engineering ETL Project

Pankaj Kumar Pramanik

Pankaj Kumar Pramanik

Built an end-to-end data engineering pipeline to analyze Uber-like trip data on Google Cloud Platform. The workflow uses Mage.ai to orchestrate ETL in Python/SQL, stages raw files in Cloud Storage, runs pipeline compute on a GCP Compute Instance, loads curated datasets into BigQuery, and powers analytics-ready reporting in Looker Studio. GitHub
The project models the dataset for efficient analytics (fact/dimension style), includes reusable SQL queries for insights, and is based on the NYC TLC trip record dataset (yellow/green taxi trips with pickup/dropoff, fares, distances, payment type, passenger count, etc.). GitHub
Tech stack: Python, SQL, Mage.ai, BigQuery, Cloud Storage, Compute Engine, Looker Studio.
Like this project

Posted Feb 10, 2025

Built scalable data pipelines on GCP for Uber, enabling advanced analytics and insights with modern data engineering tools like BigQuery, Dataflow and Pub/Sub.