Predicting Movie Adaptations with Machine Learning | Books vs Movies Sentiment Analysis
Context
The question behind this project was simple to ask but hard to answer with confidence: can you predict whether a book is likely to be adapted into a successful movie before it happens? Studios and publishers make these bets constantly, often on instinct. This project explored whether sentiment data and audience response could make that decision more data-driven.
Strategic Approach
Rather than relying on a single data source, the approach combined multiple datasets and sentiment analysis techniques to build a model that could actually generalize, not just fit one example well.
The approach included:
Pulling and cleaning data from 4-5 separate data files, plus scraping IMDB review data directly
Using TextBlob for sentiment analysis to quantify audience reaction to both books and their film adaptations
Testing two distinct modeling approaches to answer two different questions: one predictive and one classificatory
Execution
Preprocessed and merged 4-5 data files alongside scraped IMDB review data
Applied TextBlob sentiment analysis to quantify book and movie audience reception
Built a linear regression model to estimate a movie's likely rating based on its source book's rating
Built a logistic regression model to estimate the likelihood of a book successfully turning into a movie
Validated findings through feature importance analysis and sentiment comparison visualizations across the top 10 movies with the greatest book-to-film differences
Why It Matters
This project shows the ability to go beyond standard marketing analytics into applied data science by building and validating an actual predictive model, not just reporting on existing data. It demonstrates the kind of technical depth that lets data-driven decisions go further than dashboards and reporting alone.
Results
An 88% accuracy model that can help predict whether a book is likely to be adapted into a successful movie, using its book rating, number of ratings, and number of text reviews as key features.
Tools: Python, Google Colab, TextBlob, Linear Regression, Logistic Regression
Like this project
Posted Jun 24, 2026
Built an 88% accuracy ML model predicting whether a book is likely to become a successful movie, using sentiment analysis and regression