Qlytics Document Intelligence

Shiming He

Software Engineer
Qlytics Document Intelligence leverages Natural Language Processing (NLP) and Machine Learning (ML) to extract information from PDF files. The system first translates PDF files into text, then applies a custom-made text sequence classifier to tag sections, paragraphs, and tables. It then saves the representation to support information retrieval. For retrieval, I designed a scripting language named "Document Query Language" (DQL) that has a similar syntax to Python. With just several lines of code, you get to loop through paragraphs and tables, locate key/value pairs and save the output to a CSV. The entire app is designed like an IDE for documents. You develop DQL scripts in it. Then you run a batch job for similar PDFs against the same script. Extract results can also be piped through some API.
Partner With Shiming
View Services

More Projects by Shiming