Scalable Data Pipeline with MySQL, Google Cloud, & Looker Studio

Ahmad Kamiludin

0

Data Analyst

Data Engineer

dbt

Google Cloud Platform

Python

Introduction

This project aims to build an automated and scalable ELT (Extract, Load, Transform) pipeline using cloud technologies. Product data will be extracted from MySQL, loaded into Google Cloud Storage (GCS), and transferred to Google BigQuery, where DBT Core will handle the data transformations. The transformed data will then be visualized in Looker Studio, providing insights into trends and sales. The process is automated using Apache Airflow to ensure smooth data flow without manual intervention. The project utilizes the Amazon Products 2023 dataset from Kaggle and is designed to enhance efficiency and scalability in managing large datasets, allowing for deeper analysis and quicker decision-making.

Architecture

Simple Dashboard

Technology Used

MySQL Database
Google Cloud Storage
Google Bigquery
DBT (Data Build Tool)
Looker Studio
Docker
Apache Airflow
Python
SQL

Dataset Used

Amazon Products Dataset 2023 (1.4M Products). Amazon is one of the biggest online retailers in the USA that sells over 12 million products. With this dataset, you can get an in-depth idea of what products sell best, which SEO titles generate the most sales, the best price range for a product in a given category, and much more.

Article About this Project

Like this project
0

Posted Nov 28, 2024

This project builds an automated, scalable ELT pipeline using MySQL, Google Cloud Platform and dbt core.

Likes

0

Views

1

Tags

Data Analyst

Data Engineer

dbt

Google Cloud Platform

Python

Ahmad Kamiludin

Data Engineer | Python Developer | Cloud Data Architect

Analytics Engineering with Airbnb Data Using dbt
Analytics Engineering with Airbnb Data Using dbt
Real-Time Music Data Pipeline Using Apache Kafka
Real-Time Music Data Pipeline Using Apache Kafka
Data Migration from Snowflake to Microsoft Azure
Data Migration from Snowflake to Microsoft Azure
Data-Driven Using Airflow, Dbt Cloud, and AWS Tech Stack
Data-Driven Using Airflow, Dbt Cloud, and AWS Tech Stack