Problem:
Many organizations still process invoices manually by reading PDF documents and entering key details (invoice number, vendor, amount, etc.) into systems. This process is slow, error-prone, and difficult to scale, and it also makes it harder to detect duplicate invoices or incorrect totals.
Solution:
This project builds an automated invoice processing pipeline that converts uploaded invoice PDFs into structured data. It uses OCR to extract text, LLMs to identify invoice fields, validation checks to ensure correctness, and Kafka-based event streaming to manage the processing pipeline. The extracted data is stored in PostgreSQL and visualized through a dashboard, enabling faster, scalable, and more reliable invoice processing.
3
1
12
Problem Statement
Urban traffic management systems lack real-time, integrated data combining traffic conditions with weather patterns. This results in poor routing decisions, delayed emergency responses, and inefficient traffic flow management.
Solution
Developed a comprehensive real-time ETL pipeline that integrates traffic APIs and weather data sources, processes millions of data points, and delivers actionable insights through interactive dashboards for traffic management and route optimization.
0
0
A fully local RAG pipeline that transforms your PDFs into a queryable knowledge base using FAISS vector search and Ollama LLMs. No cloud, no API keys - just private, grounded document intelligence running entirely on your machine.
0
1
A fully Dockerized real-time IoT ETL pipeline that simulates device telemetry, processes events through MQTT and Kafka, orchestrates workflows with Airflow, and delivers real-time alerts and insights to CRM systems with monitoring via Grafana and Loki.