Freelancers using Apache Airflow in DelhiFreelancers using Apache Airflow in DelhiProblem:
Many organizations still process invoices manually by reading PDF documents and entering key details (invoice number, vendor, amount, etc.) into systems. This process is slow, error-prone, and difficult to scale, and it also makes it harder to detect duplicate invoices or incorrect totals.
Solution:
This project builds an automated invoice processing pipeline that converts uploaded invoice PDFs into structured data. It uses OCR to extract text, LLMs to identify invoice fields, validation checks to ensure correctness, and Kafka-based event streaming to manage the processing pipeline. The extracted data is stored in PostgreSQL and visualized through a dashboard, enabling faster, scalable, and more reliable invoice processing.