OCR System

Hemen Ashodia

Data Scientist

ML Engineer

Software Engineer

Django

Kubernetes

Python

Built a document processing system from the ground up to make a PDF

document searchable with OCR and perform redaction in real-time.

Implemented an OCR system capable of processing millions of pages

that had already processed several billion words.

Trained several state-of-the-art in-text processing GPT-2 and BERT models to perform high-accuracy tag recognition. Invented a patentable real-time NER algorithm to process and learn without catastrophic interference.

Customized the source code of the largest OCR tool in the market using Tesseract to allow table recognition. Built a Kubernetes cluster to perform on-demand auto-scaling and processing.

Led the team between multiple projects and coordinated a bunch of projects within different time zones.

Like this project

Posted Feb 6, 2024

Built a document processing system from the ground up to make a PDF document searchable with OCR and perform redaction in real-time, Implemented an OCR system.

Likes

Views

Clients

Braided Data