Content Moderation using NLP & LLMs for a SaaS startup

Muhammad Jarir Kanji

Data Scientist
Data Engineer
AI Developer
ChatGPT
Python
PyTorch
Pinwheel

Key Outcomes

Please see the Project Description and Implementation Details sections below for more information on the project.
✔️ Use NLP and AI / LLM models to serve as content moderation models and automatically flag concerning conversations.
✔️ Achieved 80% accuracy and recall.
✔️ Deployed the model as a REST API.
✔️ Built a custom data annotation platform for building in-house datasets and managed a team of 20+ annotators.
✔️ Managed a team including one NLP Engineer and one external NLP consultant.
✔️ Managed a team of 42 NLP interns for four months and directed them across a variety of functions, including building NLP models, setting up MLOps tooling, and reviewing cutting-edge research on model quantization and knowledge distillation.

Project description

Pinwheel is a startup focusing on improving children's relationship with technology. Its primary offering is a smartphone with age-appropriate guardrails and parental controls.
One of the Pinwheel phone's most important features is call and text history monitoring. However, parents are often overwhelmed by the volume of messages and are unable to thoroughly review their children's communications, leaving them open to potentially dangerous interactions. Manually reviewing communications can also lead to friction between parents and children over privacy.
This project aimed at using natural language processing (NLP) to build content moderation models that could understand the child's texting history and automatically alert the parent to concerning encounters, saving parents' time and improving the child's privacy.
I managed one junior NLP engineer and one NLP consultant during this project. For a 4-month period, I also managed a team of 42 NLP interns and directed them across a variety of functions building NLP models, MLOps tooling, and reviewing cutting-edge research.

Implementation details

We had a large dataset with millions of messages for this project. However, all of the data was unlabeled, requiring us to build in-house datasets.
To that end, I built a detailed taxonomy identifying the relationships between the different events of interest. For example, is self-harm violence? If so, what are examples of violence that are not self-harm? Understanding the problem and identifying all edge cases and nuances like this can help to provide a more accurate representation of the problem. When building custom datasets, it can also yield significant benefits in the design of the annotation solution, potentially allowing you to answer multiple questions or identify multiple events in a single annotation session, thereby saving a lot of money in the process.
Having designed the taxonomy, I built a bespoke data annotation system using open-source tooling that was specifically customized to our particular problem. I also managed the entire annotation process, including procuring contractors to serve as annotators and supervising them.
I developed a basic webapp to allow the annotators to easily access the platform and label data efficiently, with built-in checks to ensure that if the annotator was submitting erroneous labels, they were immediately asked to correct their submission, instead of contributing invalid data to our dataset.
I then both fine-tuned open-source LLMs (e.g., Falcon, WizardLM, BERT, etc.) and used advanced prompting techniques with third-party LLMs to serve as content moderators with over 80% accuracy and recall.
Finally, I deployed these models to production as a REST API, powering a demo app aimed at allowing internal stakeholders to experiment with the model.
Partner With Muhammad Jarir
View Services

More Projects by Muhammad Jarir