Multilabel Urdu Comments Classification by Hassan TahirMultilabel Urdu Comments Classification by Hassan Tahir

Multilabel Urdu Comments Classification

Hassan Tahir

Data Scientist

Data Scraper

Data Analyst

Matplotlib

Python

TensorFlow

Collected data of urdu comments on different topics from Facebook, Twitter and Youtube and then processed this raw data and annotated it to five different labels i.e. Toxic, Rude, Offensive, Hate Speech, Abusive. A comment can belong to more than one class at the same time.

Did one hot encoding for the test labels and then perform data preprocessing such as Tokenization, Stemming, Lematization, Stop word removal and POS tagging.

Finally applied different ML and DL models to see which performed well. Got the best result with LSTM model, fine tuned it to better fit the data. Got an accuracy of 90% for the test data and 99% for the training data.

Like this project

Posted Feb 21, 2024

Trained a LSTM model for multilabel urdu comments classification.

Likes