PDF data extraction

Francisco Perez

0

Business Analyst

Data Analyst

Data Engineer

Microsoft Excel

Python

SQL

Problem

The client needed JP Morgan data to analyse in excel, but this data came into 30 tables inside a PDF, making it impossible to analyse each time the report was out.

Task

Extract these 30 tables from the PDF report on a daily basis.
Store the data in a database and export it into a report.
Create a design the report according to the client specifications and the data from the database (dashboards, variables analysis, finantial analisis).

Solution

Use a pdf reader library in python to extract the data from the file.
Use pandas library in python to format and clean this data.
Update this data into a SQL database.
Export the updated data into an Excel file, use VBA (Excel Macros) to transfer this data into the report.
Use excel functions and graphs to show the data dynamically.
Add and easy to use code interface + Excel interface + documentation on how to update the database daily.
Like this project
0

Posted Jul 8, 2024

Extract data from PDFs using python, the store it into a database and finally present it in a dynamic report with dropdowns and graphs.

Likes

0

Views

2

Clients

Evalueserve

Tags

Business Analyst

Data Analyst

Data Engineer

Microsoft Excel

Python

SQL

Francisco Perez

Quality Data Solutions for Your Business Needs

Recommender system
Recommender system
Quarterly report automation
Quarterly report automation
Airplane data automation
Airplane data automation