Human vs AI Text Classifier by Anastasiya KotelnikovaHuman vs AI Text Classifier by Anastasiya Kotelnikova

Human vs AI Text Classifier

Anastasiya Kotelnikova

Completed work

Data Scientist

ML Engineer

Python

PyTorch

SQL

Artificial Intelligence

Human vs AI Text Classifier

This project builds a binary text classification system to distinguish between human-written and AI-generated text using a custom-labeled dataset. By combining TF-IDF vectorization with multiple machine learning models, it captures subtle linguistic patterns and style differences across writing sources.

Key Highlights

Custom Dataset: 5,000 samples (2,500 human + 2,500 AI-generated), curated and balanced by the author.

Models Trained:

Logistic Regression (Top performer: 100% accuracy)

Random Forest

Multinomial Naive Bayes

Calibrated Linear SVC

Text Preprocessing:

Lowercasing

Punctuation removal

Token cleaning

Evaluation Metrics:

Confusion Matrices

ROC Curves & AUC Scores

Precision, Recall, F1-Score

Visual Interpretations:

Word Clouds

Feature Importance Bar Charts

Model Accuracy Comparison

Deployment Ready:

Final model serialized with joblib

Predicts new input instantly

Results Summary

Model Accuracy AUC Score Logistic Regression 1.000 1.00 Linear SVC 0.998 - Naive Bayes 0.997 1.00 Random Forest 0.992 1.00

TF-IDF feature weights revealed strong interpretability.

Words like “industry”, “AI”, “intelligence” were predictive of AI text.

Words like “my”, “was”, “weather” were highly human-like.

ROC curves confirmed excellent model separability.

Note: Perfect accuracy is expected due to the high separability and structured nature of the dataset (intended for educational use).

Project Structure

HUMAN_VS_AI_CUSTOM/
│
├── data/
│   └── your_dataset_5000.csv
│
├── model/
│   ├── model_info.txt
│   └── text_classifier_5000.joblib
│
├── notebooks/
│   ├── Human vs AI Custom Dataset.ipynb
│   └── human_vs_ai_text_classifier.ipynb
│
├── requirements.txt
└── README.md

Author

Anastasiya Kotelnikova MS Data Science Candidate | NJIT Email: anastasiyakotelnikova21@gmail.com GitHub Profile • Portfolio Website • LinkedIn

Like this project

Completed work

Posted Jun 24, 2025

Built a classifier to distinguish human vs AI text with 100% accuracy.

Likes

Views

Timeline

Apr 24, 2025 - Apr 30, 2025