Custom PDF Processing with AI

David Yerrington

0

Data Scientist

Product Manager

UI Designer

Python

PyTorch

React

Finance

PDF Processing and AI Automation For Real Estate Finance

A client came to me with a prototype they made using Excel, Python, and a set of system scripts for processing PDFs procedurally. During the discovery phase I evaluated the clients end goals and helped define new technical requirements but also developed the product scope in terms of design and user roles that informed design goals.
One of the key features the client wanted was to a resonsive frontend which users could quickly review the results of the model and auto scroll to specific meta-data extracted so users could review the outcomes of any AI predictions and allow for any corrections and feedback into the system.
Another key aspect of the frontend and backend was the ability to handle docuemnts from the frontend in parallel. During our research and development phase, we found that it took roughly 1.5 minutes to process each document to get about 70% coverage as our baseline with the solution developed by the client.
I managed to review a self-hosted solutions against my own CNN deep learning model with a Kafka queue system vs following services:
Azure cognitive services
Google Cloud Vision
Amazon Bedrock
What I found was per document, we could get my solution to around 15s per document at roughly 73% accuracy on average but at scale would require a more complex setup. Building a small prototype with each major cloud provider not only was cheaper, but allowed us to process our documents with an average of 1-2.5 seconds per document with nearly 90% accuracy and at a much lower operating cost. Ultimately we went with Azure due to specific positional metadata being provided but also they had a slightly higher accuracy overall.
Another important aspect was role-based user management integration. I developed the frontend and backend to handle discretionary access schemas and 3rd party logins with Google and also provided comprehensive design mockups based on my product roadmap, use case definitions and technical specifications (and the outcome was responsive!).

Scope of Deliverables

Review of Prototype
Development of Product requirements
Exploratory Analysis of PDF documents
Development of hybrid ML/AI solution
Review of 3rd party PDF processing services
Design Requirements
Lead Implementation of React frontend and Django-based backend
Docker-based CI/CD
Like this project
0

Posted Mar 18, 2025

I delivered a custom AI solution that processes PDFs that reduced the amount of review and oversight necessary to complete bid reviews by 75%.

Likes

0

Views

1

Timeline

Jan 1, 2024 - Oct 15, 2024

Tags

Data Scientist

Product Manager

UI Designer

Python

PyTorch

React

Finance

USAID - Instagram Sentiment AI Agent
USAID - Instagram Sentiment AI Agent
Mail Response Agent + Consulting - AgentDomo.AI
Mail Response Agent + Consulting - AgentDomo.AI
ODSC 2020-21 Key Speaker on Data Visualization
ODSC 2020-21 Key Speaker on Data Visualization