Image 1 – Project Overview & Dataset Information
Customer Churn Prediction Using Random Forest
This project focuses on predicting customer churn using machine learning techniques to help businesses proactively identify customers who are likely to discontinue their services.
The predictive solution was developed using a structured approach involving Random Forest classification, SMOTE oversampling for handling class imbalance, GridSearchCV for hyperparameter optimization, and threshold tuning to improve recall performance.
The dataset contains customer demographic and behavioral attributes, including:
Age, Membership Years, Lifetime Value, Total Purchases, Days Since Last Purchase, Average Order Value, Returns Rate, Cart Abandonment Rate
The target variable is customer churn status, where:
0 = Active Customer, 1 = Churned Customer
Business Objective: The primary objective of this project is to identify customers at risk of churn so businesses can implement preventive retention strategies and reduce customer attrition.
Image 2 – Machine Learning Pipeline
End-to-End Machine Learning Workflow:
A comprehensive machine learning pipeline was designed to ensure robustness, reproducibility, and business relevance throughout the modeling process.
The workflow consisted of:
1. Data Cleaning
Prepared and validated the dataset by handling inconsistencies and ensuring data quality.
2. Exploratory Data Analysis (EDA)
Investigated customer behavior patterns and feature distributions to understand underlying trends.
3. Baseline Random Forest Modeling
Established an initial benchmark using Random Forest classification.
4. SMOTE Oversampling
Addressed class imbalance to improve the model's ability to detect churned customers.
5. Hyperparameter Tuning
Optimized model performance using GridSearchCV.
6. Threshold Tuning
Adjusted classification thresholds to maximize business-oriented objectives, particularly recall.
7. Model Evaluation
Assessed predictive performance using multiple evaluation metrics.
Professional Value
This structured workflow demonstrates adherence to industry best practices rather than relying solely on default machine learning configurations.
Image 3 – Correlation Heatmap
Exploratory Correlation Analysis
A correlation heatmap was generated to identify relationships between customer attributes and churn behavior.
The analysis revealed several noteworthy insights:
Customers with longer periods since their last purchase exhibited a stronger tendency to churn.
Higher cart abandonment rates were moderately associated with increased churn risk. Demographic variables such as age showed minimal correlation with churn outcomes.
Key Insight
The strongest relationship with churn was observed in:
Days Since Last Purchase (correlation = 0.312)
suggesting that customer inactivity is a meaningful indicator of potential attrition.
Business Relevance
Understanding these relationships enables organizations to focus their retention initiatives on the factors most strongly associated with customer loss.
Image 4 – Key Insights, Recommendations & Technologies Used
Key Insights
Several actionable findings emerged from the analysis:
1. Customers with extended inactivity periods are more likely to churn.
2. Elevated cart abandonment behavior may signal disengagement.
3. Improving recall is critical because accurately identifying potential churners aligns directly with the business objective.
Business Recommendations:
Based on the findings, the following strategies are recommended:
Target High-Risk Customers:
Deploy retention campaigns aimed at customers identified as likely to churn.
Personalize Customer Communication:
Develop personalized email and promotional initiatives to improve engagement.
Strengthen Loyalty Programs:
Offer incentives and rewards to reactivate inactive customers.
Monitor Behavioral Indicators:
Continuously track customer activity metrics to detect early warning signs of churn.
Technologies Used
The project was implemented using the following technologies:
Python
Pandas
NumPy
Matplotlib
Seaborn
Scikit-Learn
Imbalanced-Learn
1
28
The reports folder contains model evaluation outputs generated during the machine learning workflow. These reports provide insights into model performance, feature importance, and predictive capabilities, helping stakeholders understand both the effectiveness and business implications of the solution.
1. Feature Importance Report
Feature importance analysis was performed to identify the variables that contributed most to customer churn predictions, providing valuable to business insights.
2. ROC Curve Report
ROC-AUC analysis was used to compare multiple machine learning models and identify the model with the strongest predictive performance.
3. Confusion Matrix Report
A confusion matrix was generated to evaluate classification outcomes and understand the strengths and limitations of the predictive model.