The requirement was to find away to sift through thousands of user comments from Forums, Discord channels, Social Media channels inorder to bring to attention of the moderators the ones that required response to.
There was also a need to flag comments that were offensive, involved hatespeech of racism.
Process
Built pipelines to extract all data from different sources, cleaned the data and pushed it into a datalake.
Built pipelines to extract all data from different sources, cleaned the data and pushed it into a datalake.
Using Pandas, went through all the exploratory data analysis and found ways to clean the data
Wrote data transformation scripts to clean all the data, remove repeats, links, fix short forms and everything required in between.
Trained 4 NLP models using Glove Embeddings, BERT embeddings and then for both iterated between GRU, LSTM
Eventually used an ensemble of both Bi-directional GRU using GLove and BERT embeddings to produce two separate models for both use cases.