📚 Book Recommendation Engine using K-Nearest Neighbors
A machine learning-based book recommendation system that uses collaborative filtering and K-Nearest Neighbors (KNN) algorithm to suggest similar books based on user ratings.
Quick Overview
Problem: Recommend relevant books using large-scale, sparse user rating data, where traditional rule-based methods fail to capture user preference patterns.
Solution: Built a collaborative filtering recommendation engine using K-Nearest Neighbors with cosine distance, leveraging a user–book rating matrix and sparse representations to identify similar books based on shared rating behavior.
Impact: Successfully generated meaningful book recommendations with similarity scores using 1.1 million ratings, demonstrating applied knowledge of recommender systems, distance-based learning, and data preprocessing for real-world scale datasets.
Source: The dataset is automatically downloaded in the notebook from FreeCodeCamp.
🚀 How It Works
1. Data Preprocessing
Load book and rating data from CSV files
Filter out sparse data:
Remove users with fewer than 200 ratings
Remove books with fewer than 100 ratings
This ensures statistical significance in recommendations
2. Create User-Book Matrix
User1 User2 User3 User4 ... Book A 5 0 4 5 ... Book B 0 3 0 4 ... Book C 4 5 3 0 ...
3. Train KNN Model
Uses cosine distance metric to measure similarity
Finds the 5 nearest neighbors (most similar books)
Algorithm: Brute force (most accurate for high-dimensional data)
4. Generate Recommendations
The system compares rating patterns (book "fingerprints") to find similar books.
💻 Usage
Basic Function Call
get_recommends("Where the Heart Is (Oprah's Book Club (Paperback))")
Expected Output
[ "Where the Heart Is (Oprah's Book Club (Paperback))", [ ["I'll Be Seeing You", 0.8], ['The Weight of Water', 0.77], ['The Surgeon', 0.77], ['I Know This Much Is True', 0.77], ['The Lovely Bones: A Novel', 0.72] ] ]
Output Format:
First element: Input book title
Second element: List of 5 recommended books with their distances
Lower distance = More similar books
Distance ranges from 0 (identical) to 1 (completely different)
🧮 Algorithm Details
K-Nearest Neighbors (KNN)
Algorithm Type: Lazy learning (instance-based)
Distance Metric: Cosine distance
K Value: 6 (returns 6 neighbors, skip first as it's the input book itself)
Search Method: Brute force
Why Cosine Distance?
Cosine distance measures the angle between rating vectors, making it ideal for comparing user preferences regardless of rating scale differences.
Distance = 1 - (A · B) / (||A|| × ||B||)
📈 Key Features
✅ Collaborative filtering based on user ratings
✅ Handles sparse data efficiently using sparse matrices
✅ Statistical significance through data filtering
✅ Fast recommendations using optimized KNN
✅ Returns books with similarity scores
Created as part of the FreeCodeCamp Machine Learning with Python certification.
📄 License
This project is open source and available for educational purposes.
Note: This is a learning project demonstrating collaborative filtering and KNN algorithms for recommendation systems. For production use, consider additional optimizations and error handling.