Invoice Scanner | Automated Invoice Extraction Tool by Anthony MarquezInvoice Scanner | Automated Invoice Extraction Tool by Anthony Marquez

Invoice Scanner | Automated Invoice Extraction Tool

Anthony Marquez

Anthony Marquez

AI Tool · Web App · Generative AI

Invoice Scanner

Upload any invoice — PDF, image, or email — and instantly extract vendor, totals, line items, and dates. Powered by GPT-4o and Azure AI.

The Story
Invoice processing is one of the most tedious tasks in any business. I built this open source tool to eliminate it - upload a PDF, image, or forwarded email and the app extracts every critical field in seconds using a structured GPT-4o agent backed by regex fallbacks and MIME parsing.
The system handles messy, real-world invoices: scanned PDFs, low-quality images, multi-page documents, and inconsistent layouts. Azure OpenAI's structured output mode ensures the extracted data is always clean and ready to use - no hallucinations, no reformatting.


My Role
Full-Stack Developer
AI / Prompt Engineer
Product Designer

How I Built It

01

Document Ingestion & MIME Parsing

The pipeline starts before AI even touches the file. A MIME parser identifies the input type — PDF, image, or email attachment — and routes it to the correct pre-processor. PDFs are converted to page images, emails are stripped of HTML and extracted, and images are normalized for consistent OCR quality.

02

GPT-4o Structured Output Agent

The pre-processed content is sent to an Azure OpenAI agent using Structured Output mode — forcing the model to return a validated JSON schema with specific fields: vendor, invoice number, date, line items, subtotal, tax, and total. This eliminates hallucination risk and makes downstream parsing deterministic.

03

Regex Fallback Layer

For fields the AI misses or is uncertain about, a regex fallback layer runs pattern matching for common invoice formats — dates, currency amounts, tax ID patterns, and PO numbers. This hybrid AI + rule-based approach achieves near-perfect extraction accuracy across diverse invoice layouts.

04

Auth, Storage & Deployment

Secure cookie sessions handle auth with no external OAuth dependencies. Extracted results are stored in Supabase PostgreSQL with user-scoped access. The app is deployed on Render with environment parity to Azure App Service, keeping both options open for enterprise deployment.

Before & After

Tech Stack
Jinja2
FastAPI
Azure OpenAI (GPT-4o)
Regex + MIME
Supabase + PostgreSQL
Render

Developed by Anthony Marquez Camacho
Like this project

Posted Apr 30, 2026

Automate invoice processing in seconds. Upload emails or invoice text - the system extracts key fields and stores them in a secure database.