Qlytics Document Intelligence

Shiming He

Software Engineer

Qlytics Document Intelligence leverages Natural Language Processing (NLP) and Machine Learning (ML) to extract information from PDF files. The system first translates PDF files into text, then applies a custom-made text sequence classifier to tag sections, paragraphs, and tables. It then saves the representation to support information retrieval. For retrieval, I designed a scripting language named "Document Query Language" (DQL) that has a similar syntax to Python. With just several lines of code, you get to loop through paragraphs and tables, locate key/value pairs and save the output to a CSV. The entire app is designed like an IDE for documents. You develop DQL scripts in it. Then you run a batch job for similar PDFs against the same script. Extract results can also be piped through some API.

Like this project

Posted Sep 29, 2023

Qlytics Document Intelligence leverages Natural Language Processing (NLP) and Machine Learning (ML) to extract information from PDF files. The system first tra

Likes

Views

Tags

Software Engineer