Data Engineering Projects in Karachi CityData Engineering Projects in Karachi CityProblem:
Financial data from stocks and crypto APIs was scattered, refreshed manually, and not ready for analytics or ML use.
Solution:
Built an Apache Airflow pipeline to collect, transform, validate, and load real-time financial data from multiple APIs into PostgreSQL, MongoDB, AWS RDS, and Qdrant.
Tools:
Apache Airflow, Python, PostgreSQL, MongoDB, AWS RDS, Qdrant, APIs
Result:
Automated sub-hourly data refresh, processed thousands of records daily, and delivered clean data for dashboards, analytics, and vector search. Problem:
Batch data processing was not suitable for real-time analytics and scalable cloud-based data ingestion.
Solution:
Created a real-time streaming pipeline using Kafka on AWS EC2, stored processed data in S3, cataloged it with AWS Glue, and queried it with Amazon Athena.
Tools:
Python, Apache Kafka, AWS EC2, Amazon S3, AWS Glue, Amazon Athena, Pandas
Result:
Built an end-to-end cloud data streaming workflow that supports real-time ingestion, storage, cataloging, and SQL-based analytics.