Method and system for analyzing and classifying relationships between tokens in a text corpus. I designed a distributed system to preprocess large text corpus as a co-inventor on this patent pending application:
https://patents.google.com/patent/US20230143418A1/en?oq=US-20230143418-A1
The system was developed using Ray, MongoDB and GlusterFS. It had multiple steps using several NLP methods and machine learning models. I worked on this project as a software engineer at nference.