This project builds a binary text classification system to distinguish between human-written and AI-generated text using a custom-labeled dataset. By combining TF-IDF vectorization with multiple machine learning models, it captures subtle linguistic patterns and style differences across writing sources.
Key Highlights
Custom Dataset: 5,000 samples (2,500 human + 2,500 AI-generated), curated and balanced by the author.