ETL Migration to Databricks: Full migration of legacy Informatica PowerCenter ETL workflows to Azure Databricks and Azure Data Factory. Converted complex Informatica mappings to optimized PySpark transformations. Built parameterized ADF pipelines for automated scheduling, monitoring, and error alerting. Delivered 100% reconciliation validation between legacy and migrated pipeline outputs. Tech: Informatica PowerCenter, Azure Databricks, Azure Data Factory, PySpark, Delta Lake.
1
17
Real-Time Streaming Pipeline: Fault-tolerant real-time data pipeline reducing data latency from 12+ hours to under 60 seconds. Built on Azure Event Hub and Databricks Structured Streaming with exactly-once processing semantics via checkpointing. Implemented watermarking for late-arriving data and Delta Lake sink for concurrent dashboard reads. Tech: Azure Event Hub, Databricks Structured Streaming, PySpark, Delta Lake, ADLS Gen2.
1
21
Azure Data Lakehouse: End-to-end Azure data lakehouse for retail analytics using Medallion Architecture (Bronze/Silver/Gold). Built scalable PySpark ETL pipelines ingesting structured and semi-structured data from multiple source systems. Applied partitioning, caching, and broadcast joins for performance optimization. Delivered analytics-ready Gold datasets enabling downstream BI reporting and stakeholder dashboards. Tech: Azure Databricks, ADLS Gen2, Azure Data Factory, PySpark, Delta Lake.
1
27
AWS Sentiment & NLP Analyzer: NLP application replicating Amazon Comprehend ā performs sentiment analysis (Positive/Negative/Neutral/Mixed), key phrase extraction, and named entity detection with confidence scores. Outputs AWS Comprehend-equivalent JSON format. Supports single text and batch analysis of 20+ texts simultaneously. Tech: Python, NLP, Streamlit, Pandas.
1
37
Employee Attrition Predictor: Gradient Boosting ML classifier predicting employee attrition risk with 85%+ accuracy. Models 8 risk factors including job satisfaction, compensation, overtime, and promotion history. Generates actionable HR retention recommendations per employee. Batch analysis scores 500+ employees with $900K+ projected annual savings for a 500-person company. Tech: Python, scikit-learn, Gradient Boosting, Pandas, Streamlit.
1
40
Coffee Demand Predictor: ML forecasting model predicting daily coffee demand using weather and local event data. Improved forecast accuracy from 60% to 90%, reducing ingredient waste by 50% and stockouts by 75%. Demonstrated $12K+ projected annual savings per store. UC Berkeley Executive Education Capstone. Tech: Python, scikit-learn, Random Forest, Streamlit, Pandas, NumPy.
1
37
Resume Matcher: AI-powered semantic matching system that scores a resume against any job description, identifies skill gaps, and generates specific improvement suggestions. Uses Sentence Transformers for semantic similarity scoring and keyword gap analysis across 40+ tech skills. Tech: Python, Sentence Transformers, scikit-learn, NLP, Streamlit.
1
33
RAG Chatbot: Built a full-stack Retrieval-Augmented Generation system enabling users to query any PDF document in plain English. Designed the complete pipeline ā text extraction, chunking with overlap strategy, FAISS vector indexing, semantic retrieval, and LangChain-based response generation. Production deployed on Streamlit Cloud. Tech: Python, LangChain, FAISS, Sentence Transformers, Streamlit, PyPDF2.