PDF data extraction

Francisco Perez

Business Analyst
Data Analyst
Data Engineer
Microsoft Excel
Python
SQL
Evalueserve

Problem

The client needed JP Morgan data to analyse in excel, but this data came into 30 tables inside a PDF, making it impossible to analyse each time the report was out.

Task

Extract these 30 tables from the PDF report on a daily basis.
Store the data in a database and export it into a report.
Create a design the report according to the client specifications and the data from the database (dashboards, variables analysis, finantial analisis).

Solution

Use a pdf reader library in python to extract the data from the file.
Use pandas library in python to format and clean this data.
Update this data into a SQL database.
Export the updated data into an Excel file, use VBA (Excel Macros) to transfer this data into the report.
Use excel functions and graphs to show the data dynamically.
Add and easy to use code interface + Excel interface + documentation on how to update the database daily.
Partner With Francisco
View Services

More Projects by Francisco