AI Platform for Real Estate Permit Data Extraction by Joven GarciaAI Platform for Real Estate Permit Data Extraction by Joven Garcia

AI Platform for Real Estate Permit Data Extraction

Joven Garcia

Business Analyst

AI Developer

AI Engineer

Docker

FastAPI

PostgreSQL

Real Estate

Overview

Built an AI‑driven platform that ingests scanned building permits and outputs clean, structured data for real estate workflows. Combines high‑accuracy OCR with LLM reasoning to extract entities, validate fields, and deliver API‑ready insights via FastAPI microservices—accelerating underwriting, compliance, and asset management.

Key Features

Document Ingestion & OCR: Supports PDFs/images; de‑skews, denoises, and runs OCR (multi‑engine fallback) for robust text capture.

LLM‑Powered Extraction: Identifies permit numbers, property addresses, APNs, contractor info, scope of work, fees, statuses, dates, and jurisdictions.

Validation & Cross‑Checks: Normalizes addresses, verifies against jurisdiction patterns, checks date/ID formats, and flags inconsistencies with confidence scores.

Schema‑First Outputs: Emits structured JSON conforming to a typed schema for easy downstream use.

Review UI (Optional): Human‑in‑the‑loop interface for quick field verification, diffing, and one‑click corrections.

Batch Processing: Queue‑based bulk uploads with progress tracking, retries, and partial success handling.

Compliance & Audit Trails: Stores source docs, versions, extraction prompts, model outputs, and decisions for traceability.

Integrations: Webhooks and REST endpoints to push results into CRMs, underwriting tools, data warehouses, or RPA flows.

Tech Stack

AI Layer: OCR (e.g., Tesseract/Google Vision/Azure OCR) + LLM for entity extraction and reasoning

Services: FastAPI microservices (ingestion, extraction, validation, export) behind an API gateway

Data: PostgreSQL for metadata and results; object storage for documents; Redis/queues for jobs

Ops: Docker/Kubernetes, observability (logs/metrics/traces), circuit breakers, and retries

Security: JWT auth, signed URLs, role‑based access, encryption at rest/in transit

Workflow

Upload/Watch: User uploads permits or drops into a watched bucket.

Preprocess & OCR: Clean images, run OCR with fallbacks; merge multi‑page outputs.

LLM Extraction: Parse entities and relationships; generate structured JSON with confidence scores.

Validation Layer: Normalize addresses/APNs, verify jurisdiction rules, detect missing/ambiguous fields.

Human Review (if needed): Resolve low‑confidence fields; approve to finalize.

Publish & Integrate: Push to REST/Webhooks; store in DB; notify downstream systems.

Challenges & Solutions

Low‑quality Scans: Applied de‑skewing, noise reduction, and multi‑engine OCR voting to raise accuracy.

Jurisdiction Variability: Encoded locale‑specific patterns and rule sets; used LLM reasoning with tool hints.

Data Consistency: Implemented schema validation, required‑field checks, and idempotent upserts.

Throughput & Reliability: Used queues, autoscaling workers, and backoff to handle spikes and rate limits.

Results

Faster turnaround from document intake to usable data.

Higher accuracy vs. manual keying, with auditable decisions.

Seamless integration into underwriting and compliance pipelines.

Goal

Deliver a secure, scalable AI platform that converts messy permit documents into validated, structured data—reducing manual effort and powering real estate decision‑making end to end.

Like this project

Posted Jan 12, 2026

OCR + LLM platform that extracts, validates, and structures data from scanned permits via FastAPI microservices for seamless real estate integrations.

Likes

Views

Timeline

Jun 11, 2025 - Aug 15, 2025