Tokenization: Breaking down text into smaller units such as words, sentences, or paragraphs.
Stopword removal: Removing common words that do not carry significant meaning (e.g., "the", "a", "is").
Stemming/Lemmatization: Reducing words to their base or root form.
Normalization: Converting text to a consistent format (e.g., lowercase, removing punctuation).