Qlytics Document Intelligence

Shiming He

Qlytics Document Intelligence leverages Natural Language Processing (NLP) and Machine Learning (ML) to extract information from PDF files. The system first translates PDF files into text, then applies a custom-made text sequence classifier to tag sections, paragraphs, and tables. It then saves the representation to support information retrieval. For retrieval, I designed a scripting language named "Document Query Language" (DQL) that has a similar syntax to Python. With just several lines of code, you get to loop through paragraphs and tables, locate key/value pairs and save the output to a CSV. The entire app is designed like an IDE for documents. You develop DQL scripts in it. Then you run a batch job for similar PDFs against the same script. Extract results can also be piped through some API.
Like this project
0

Posted Sep 29, 2023

Qlytics Document Intelligence leverages Natural Language Processing (NLP) and Machine Learning (ML) to extract information from PDF files. The system first tra

Likes

0

Views

4

Private trading software for US and China financial markets
Private trading software for US and China financial markets
OS-Climate Data Extraction
OS-Climate Data Extraction