A fully offline document summarizer built in pure Python. Uses TF-IDF scoring, position weighting, and Jaccard deduplication to extract the most important sentences from any PDF, DOCX, or TXT file — each labeled with a relevance percentage.
The result looks like this:
[1] [100% relevance] The algorithm achieved 94% accuracy on benchmark tests.
[2] [81% relevance] Training was performed on 50,000 labeled samples.
[3] [67% relevance] Results were validated using 5-fold cross validation.
Supports PDF, Word, and TXT files. Saves summaries to your computer. Runs completely offline. No subscriptions, no API keys, no internet required.
Like this project
Posted Mar 9, 2026
A fully offline document summarizer built in pure Python. Uses TF-IDF scoring, position weighting, and Jaccard deduplication to extract the most important se...