Projects using KerasProjects using Keras
Cover image for End-to-End Machine Learning Pipeline for
End-to-End Machine Learning Pipeline for Telecom Customer Churn 1. The Business Problem Customer churn is a major challenge for telecommunications companies, driven by competition, service issues, and changing consumer preferences. This project was designed to transition the company from reactive support to proactive retention using data-driven strategies such as customer segmentation, personalized offers, and loyalty programs,. 2. Data Exploration & Insights (EDA) I performed a comprehensive descriptive analysis on a database of 7,043 customers with 21 distinct variables,. Key findings included: Contractual Risk: Customers on month-to-month contracts showed significantly higher churn compared to those on one- or two-year commitments,. Service Preference: While Fiber Optic plans were the most popular, they also represented a critical segment for monitoring due to their higher price points,. Financial Indicators: Churned customers had a higher average monthly charge of $74.44, compared to $61.27 for retained customers. Payment Behavior: The "Electronic Check" payment method was most strongly associated with service cancellation,. 3. Engineering & Preprocessing Pipeline To prepare the data for high-performance modeling, I implemented a rigorous preprocessing workflow: Data Cleaning: Removed irrelevant identifiers like customerID and addressed potential data quality issues. The dataset was verified to have zero missing or NaN values,. Feature Engineering: Applied Label Encoding to transform categorical text variables into a numerical format suitable for machine learning algorithms,. Data Splitting: Adopted a standard 80/20 train-test split to ensure the model could generalize effectively to unseen data,. 4. Model Development & Benchmarking I developed and benchmarked eight distinct machine learning algorithms to identify the most effective solution for this specific application: Linear & Probabilistic: Logistic Regression, Naive Bayes. Tree-Based: Decision Tree, Random Forest. Boosting Frameworks: AdaBoost, Gradient Boosting, XGBoost, and LightGBM,. 5. Performance Evaluation & Results Models were evaluated using ROC curves, confusion matrices, and detailed classification reports,. Winner: Logistic Regression achieved the highest accuracy at 81.83%,. Secondary Performers: Gradient Boosting (81.05%) and AdaBoost (80.98%) also showed strong predictive power. 6. Technical Conclusion This data-driven approach proves that proactive churn prediction is essential for business sustainability. By identifying that customers prioritize high-speed fiber optic services but are sensitive to pricing and contract terms, the company can now optimize its pricing and retention strategies to maximize user satisfaction and revenue.
4
769
Cover image for RAG is only as good
RAG is only as good as the data you feed it. 📄➡️🤖 I am excited to share that I’ve completed the Build an AI-Powered Document Retrieval System with IBM Granite and Docling lab from IBM SkillsBuild! While my previous work focused on the RAG pipeline, this lab went deeper into the most critical step: Document Parsing. We often forget that real-world data isn't clean text—it's locked in complex PDFs and formatted documents. What I built in this hands-on lab: 🔹 Advanced Parsing with Docling: I used Docling to not just "read" text, but to understand the structure of documents, preserving the context for the AI. 🔹 Granite Power: Leveraged IBM Granite models (granite-embedding-30m-english) to create high-quality vector embeddings. 🔹 Seamless Integration: Orchestrated the entire workflow using LangChain to connect the parsed data with the retrieval engine. This skill allows me to build AI agents that don't just "guess" answers but can accurately retrieve information from complex business documents. Technical breakdown of what I built: 🔹 Orchestration: Used LangChain to manage the flow between the user, the database, and the model. 🔹 Embeddings: Leveraged IBM Granite models (granite-embedding-30m-english) to convert text into vector representations. 🔹 Data Processing: Implemented document loading and chunking strategies to optimize context windows. 🔹 Synthesis: Created a system that retrieves relevant data and generates accurate, fact-based summaries. This experience has given me the practical skills to build AI applications that are not just "smart," but also accurate and domain-specific.
1
21