The Distil Roberta model which has been pretrained on a huge corpus of tweets(model name: vinai/bertweet-base) was used as the base model of choice for training, because the model already understands how tweets are constructed so it can easily generalize to our tweets, and also since its a distil roberta model, it's a smaller model that still has excellent accuracy, and most importantly, it's smaller in size, so it trains faster and the inference speed(latency speed) is lower. 4 metrics were used ( accuracy, recall, precision, and f1), but the most important metric here is recall, as the key is to identify as many potentially bad language tweets as possible to protect children. Here, the focus is on minimizing false negatives(the situation where the tweet contains bad language but the model predicts it to contain good language). It's okay to allow flag some good language as negative, but it's not good to allow bad language to get out there, so reducing this situation(higher recall) is the most important here. However, precision is also important to ensure that the recommendations provided by the application are accurate and trustworthy. Below are the metrics obtained: