Built an end-to-end data engineering pipeline to analyze Uber-like trip data on Google Cloud Platform. The workflow uses Mage.ai to orchestrate ETL in Python/SQL, stages raw files in Cloud Storage, runs pipeline compute on a GCP Compute Instance, loads curated datasets into BigQuery, and powers analytics-ready reporting in Looker Studio. GitHub
The project models the dataset for efficient analytics (fact/dimension style), includes reusable SQL queries for insights, and is based on the NYC TLC trip record dataset (yellow/green taxi trips with pickup/dropoff, fares, distances, payment type, passenger count, etc.). GitHub
Built scalable data pipelines on GCP for Uber, enabling advanced analytics and insights with modern data engineering tools like BigQuery, Dataflow and Pub/Sub.