Hybrid traineeship-contract work at NLP + generative AI startup
Prodhi Manisha
Data Scientist
Backend Engineer
ML Engineer
Google Cloud Platform
Python
PyTorch
Data cleaning, extraction, preparation and analysis
- Built modules for extraction and cleaning of text from PDF research articles
- Constructed datasets for downstream NLP model training and text generation
- Lexical and stylometric analysis using packages such as Spacy and nltk
- Web scraping using BeautifulSoup and Selenium to gather journal and article information
- Linear algebraic analysis of academic text features
Natural Language Processing and Generation
- Finetuned Transformers and GPT-2 models (HuggingFace, PyTorch and TensorFlow) for abstract and outline generation, and controlled text generation models for text style transfer using custom stylometric control codes
- Built Streamlit apps to facilitate user interfacing for scientific text generation
- Built topic modelling and research article clustering algorithms using LDA
- Examined word embeddings in PyTorch and studied bias detection techniques