Built a document processing system from the ground up to make a PDF
document searchable with OCR and perform redaction in real-time.
Implemented an OCR system capable of processing millions of pages
that had already processed several billion words.
Trained several state-of-the-art in-text processing GPT-2 and BERT models to perform high-accuracy tag recognition. Invented a patentable real-time NER algorithm to process and learn without catastrophic interference.
Customized the source code of the largest OCR tool in the market using Tesseract to allow table recognition. Built a Kubernetes cluster to perform on-demand auto-scaling and processing.
Led the team between multiple projects and coordinated a bunch of projects within different time zones.