I architect robust data pipelines using Apache Spark, Airflow, PostgreSQL, and MySQL — cutting manual processing by 30% for 50+ stakeholders at TCS. From raw ingestion to clean, governed data ready for analytics, I deliver near-real-time pipelines that scale.
✅ PySpark + SQL (CTEs, stored procedures, window functions)
✅ Airflow orchestration
✅ AWS S3 integration
✅ Data quality & governance standards
Starting at: $400/project