PDF Scraper Using OCR

Safeer Abbas

0

Data Scraper

Automation Engineer

Software Engineer

This project is a Python-based PDF scraper that utilizes Optical Character Recognition (OCR) to extract information from PDF documents. It processes PDF files, extracts relevant data, and saves the results in an Excel spreadsheet. The project is designed to handle various formats of property-related documents, making it useful for real estate professionals, researchers, and data analysts.

Features

OCR Processing: Uses ocrmypdf to convert scanned PDF documents into searchable PDFs.
Data Extraction: Extracts key information such as:
CFN (Case File Number)
Parcel ID
Property Address
Mailing Address
Company Name
Owner's Name
Violation Details
Penalty Costs
Dates of Violations and Compliance
Duplicate Removal: Cleans up mailing addresses by removing duplicate entries.
Excel Output: Saves the extracted data into an Excel file for easy access and analysis.
Logging: Provides detailed logging of the processing steps and any errors encountered.
Like this project
0

Posted Dec 9, 2024

This project is a Python-based PDF scraper that utilizes Optical Character Recognition (OCR) to extract information from PDF documents.

Likes

0

Views

0

Tags

Data Scraper

Automation Engineer

Software Engineer

Twitter_chatgpt_bot
Twitter_chatgpt_bot
SafeerAbbas624/Redfin_property_data_extractor
SafeerAbbas624/Redfin_property_data_extractor