PDF Parser by David HalloranPDF Parser by David Halloran

PDF Parser

David Halloran

David Halloran

Had one client that got thousands of different pdfs regularly that they were manually reading and doing data entry of contact information. I built a parser that:
1) took data from AWS Textract API that spit out the info in key-value pairs
2) programmed regex statements to capture the major variations in the keys to return the correct value
3) sent the output to a csv to be appended to a master list
This cut down their workload by over 90%
Like this project

Posted Aug 7, 2024

Got a lot of manual data entry coming from pdfs or paper invoices? There's an automation for that