PDF Parser

David Halloran

Had one client that got thousands of different pdfs regularly that they were manually reading and doing data entry of contact information. I built a parser that:
1) took data from AWS Textract API that spit out the info in key-value pairs
2) programmed regex statements to capture the major variations in the keys to return the correct value
3) sent the output to a csv to be appended to a master list
This cut down their workload by over 90%
Like this project
0

Posted Aug 7, 2024

Got a lot of manual data entry coming from pdfs or paper invoices? There's an automation for that