๐ Book Recommendation Engine using K-Nearest Neighbors
A machine learning-based book recommendation system that uses collaborative filtering and K-Nearest Neighbors (KNN) algorithm to suggest similar books based on user ratings.
Quick Overview
Problem: Recommend relevant books using large-scale, sparse user rating data, where traditional rule-based methods fail to capture user preference patterns.
Solution: Built a collaborative filtering recommendation engine using K-Nearest Neighbors with cosine distance, leveraging a userโbook rating matrix and sparse representations to identify similar books based on shared rating behavior.
Impact: Successfully generated meaningful book recommendations with similarity scores using 1.1 million ratings, demonstrating applied knowledge of recommender systems, distance-based learning, and data preprocessing for real-world scale datasets.
Source: The dataset is automatically downloaded in the notebook from FreeCodeCamp.
๐ How It Works
1. Data Preprocessing
Load book and rating data from CSV files
Filter out sparse data:
Remove users with fewer than 200 ratings
Remove books with fewer than 100 ratings
This ensures statistical significance in recommendations
2. Create User-Book Matrix
User1 User2 User3 User4 ... Book A 5 0 4 5 ... Book B 0 3 0 4 ... Book C 4 5 3 0 ...
3. Train KNN Model
Uses cosine distance metric to measure similarity
Finds the 5 nearest neighbors (most similar books)
Algorithm: Brute force (most accurate for high-dimensional data)
4. Generate Recommendations
The system compares rating patterns (book "fingerprints") to find similar books.
๐ป Usage
Basic Function Call
get_recommends("Where the Heart Is (Oprah's Book Club (Paperback))")
Expected Output
[ "Where the Heart Is (Oprah's Book Club (Paperback))", [ ["I'll Be Seeing You", 0.8], ['The Weight of Water', 0.77], ['The Surgeon', 0.77], ['I Know This Much Is True', 0.77], ['The Lovely Bones: A Novel', 0.72] ] ]
Output Format:
First element: Input book title
Second element: List of 5 recommended books with their distances
Lower distance = More similar books
Distance ranges from 0 (identical) to 1 (completely different)
๐งฎ Algorithm Details
K-Nearest Neighbors (KNN)
Algorithm Type: Lazy learning (instance-based)
Distance Metric: Cosine distance
K Value: 6 (returns 6 neighbors, skip first as it's the input book itself)
Search Method: Brute force
Why Cosine Distance?
Cosine distance measures the angle between rating vectors, making it ideal for comparing user preferences regardless of rating scale differences.
Distance = 1 - (A ยท B) / (||A|| ร ||B||)
๐ Key Features
โ Collaborative filtering based on user ratings
โ Handles sparse data efficiently using sparse matrices
โ Statistical significance through data filtering
โ Fast recommendations using optimized KNN
โ Returns books with similarity scores
Created as part of the FreeCodeCamp Machine Learning with Python certification.
๐ License
This project is open source and available for educational purposes.
Note: This is a learning project demonstrating collaborative filtering and KNN algorithms for recommendation systems. For production use, consider additional optimizations and error handling.
Like this project
Posted Dec 27, 2025
Developed a K-Nearest Neighbors book recommendation engine using user ratings.