Built an AI‑driven platform that ingests scanned building permits and outputs clean, structured data for real estate workflows. Combines high‑accuracy OCR with LLM reasoning to extract entities, validate fields, and deliver API‑ready insights via FastAPI microservices—accelerating underwriting, compliance, and asset management.
Key Features
Document Ingestion & OCR: Supports PDFs/images; de‑skews, denoises, and runs OCR (multi‑engine fallback) for robust text capture.
LLM‑Powered Extraction: Identifies permit numbers, property addresses, APNs, contractor info, scope of work, fees, statuses, dates, and jurisdictions.
Validation & Cross‑Checks: Normalizes addresses, verifies against jurisdiction patterns, checks date/ID formats, and flags inconsistencies with confidence scores.
Schema‑First Outputs: Emits structured JSON conforming to a typed schema for easy downstream use.
Review UI (Optional): Human‑in‑the‑loop interface for quick field verification, diffing, and one‑click corrections.
Batch Processing: Queue‑based bulk uploads with progress tracking, retries, and partial success handling.
Compliance & Audit Trails: Stores source docs, versions, extraction prompts, model outputs, and decisions for traceability.
Integrations: Webhooks and REST endpoints to push results into CRMs, underwriting tools, data warehouses, or RPA flows.
Tech Stack
AI Layer: OCR (e.g., Tesseract/Google Vision/Azure OCR) + LLM for entity extraction and reasoning
Services: FastAPI microservices (ingestion, extraction, validation, export) behind an API gateway
Data: PostgreSQL for metadata and results; object storage for documents; Redis/queues for jobs
Ops: Docker/Kubernetes, observability (logs/metrics/traces), circuit breakers, and retries
Security: JWT auth, signed URLs, role‑based access, encryption at rest/in transit
Workflow
Upload/Watch: User uploads permits or drops into a watched bucket.
Preprocess & OCR: Clean images, run OCR with fallbacks; merge multi‑page outputs.
LLM Extraction: Parse entities and relationships; generate structured JSON with confidence scores.
Human Review (if needed): Resolve low‑confidence fields; approve to finalize.
Publish & Integrate: Push to REST/Webhooks; store in DB; notify downstream systems.
Challenges & Solutions
Low‑quality Scans: Applied de‑skewing, noise reduction, and multi‑engine OCR voting to raise accuracy.
Jurisdiction Variability: Encoded locale‑specific patterns and rule sets; used LLM reasoning with tool hints.
Data Consistency: Implemented schema validation, required‑field checks, and idempotent upserts.
Throughput & Reliability: Used queues, autoscaling workers, and backoff to handle spikes and rate limits.
Results
Faster turnaround from document intake to usable data.
Higher accuracy vs. manual keying, with auditable decisions.
Seamless integration into underwriting and compliance pipelines.
Goal
Deliver a secure, scalable AI platform that converts messy permit documents into validated, structured data—reducing manual effort and powering real estate decision‑making end to end.
Like this project
Posted Jan 12, 2026
OCR + LLM platform that extracts, validates, and structures data from scanned permits via FastAPI microservices for seamless real estate integrations.