Cyril-7/Image-to-Text

Cyril Johnson

ML Engineer

Software Engineer

AI Developer

Text Extraction Tool

This Python script extracts text from PDF, DOCX, and image files (JPG, JPEG, PNG). It utilizes libraries like pdfminer, docx, and easyocr for text extraction. The extracted text is then saved to a text file.

Usage

Installation:

Running the Script:

Output:

Libraries Used

pdfminer: For extracting text from PDF files.

docx: For extracting text from DOCX files.

easyocr: For extracting text from image files.

File Structure

text_extraction.py: The main Python script.

README.md: This file, providing information about the script.

Other files: Any PDF, DOCX, or image files you want to extract text from.

Notes

Make sure to install the necessary libraries before running the script.

This script may not perfectly extract text from complex documents or images with poor quality.

For more complex requirements or specific use cases, consider extending or modifying the script accordingly.

Like this project

Posted Jul 21, 2024

Contribute to Cyril-7/Image-to-Text development by creating an account on GitHub.

Likes

Views