Projects using KerasProjects using KerasEnd-to-End Machine Learning Pipeline for Telecom Customer Churn
1. The Business Problem
Customer churn is a major challenge for telecommunications companies, driven by competition, service issues, and changing consumer preferences. This project was designed to transition the company from reactive support to proactive retention using data-driven strategies such as customer segmentation, personalized offers, and loyalty programs,.
2. Data Exploration & Insights (EDA)
I performed a comprehensive descriptive analysis on a database of 7,043 customers with 21 distinct variables,. Key findings included:
Contractual Risk: Customers on month-to-month contracts showed significantly higher churn compared to those on one- or two-year commitments,.
Service Preference: While Fiber Optic plans were the most popular, they also represented a critical segment for monitoring due to their higher price points,.
Financial Indicators: Churned customers had a higher average monthly charge of $74.44, compared to $61.27 for retained customers.
Payment Behavior: The "Electronic Check" payment method was most strongly associated with service cancellation,.
3. Engineering & Preprocessing Pipeline
To prepare the data for high-performance modeling, I implemented a rigorous preprocessing workflow:
Data Cleaning: Removed irrelevant identifiers like customerID and addressed potential data quality issues. The dataset was verified to have zero missing or NaN values,.
Feature Engineering: Applied Label Encoding to transform categorical text variables into a numerical format suitable for machine learning algorithms,.
Data Splitting: Adopted a standard 80/20 train-test split to ensure the model could generalize effectively to unseen data,.
4. Model Development & Benchmarking
I developed and benchmarked eight distinct machine learning algorithms to identify the most effective solution for this specific application:
Linear & Probabilistic: Logistic Regression, Naive Bayes.
Tree-Based: Decision Tree, Random Forest.
Boosting Frameworks: AdaBoost, Gradient Boosting, XGBoost, and LightGBM,.
5. Performance Evaluation & Results
Models were evaluated using ROC curves, confusion matrices, and detailed classification reports,.
Winner: Logistic Regression achieved the highest accuracy at 81.83%,.
Secondary Performers: Gradient Boosting (81.05%) and AdaBoost (80.98%) also showed strong predictive power.
6. Technical Conclusion
This data-driven approach proves that proactive churn prediction is essential for business sustainability. By identifying that customers prioritize high-speed fiber optic services but are sensitive to pricing and contract terms, the company can now optimize its pricing and retention strategies to maximize user satisfaction and revenue.