Customer Segmentation Using Python

Joy Johnson

Table of Contents

❖ Case Study
❖ Data Source
❖ Tools for Analysis
❖ Method of Analysis
❖ Exploratory Data Analysis
❖ Hypothesis
❖ Data Clustering
❖ Findings and Recommendations

Case Study

A service providing company with 5, 174 customers across Chicago, wants to identify its high-value customers in order to tailor and develop focused marketing campaigns as well as allocate resources more effectively.

Tools for Analysis

❖ Python
Libraries and packages used:
• Pandas
• Numpy
• Matplotlib
• Seaborn
• Scipy
• Scikit-learn

Method of Analysis

I conducted a customer segmentation based on value using the K-means clustering approach and applied an analytic pipeline which involved five stages:
• Importing the necessary libraries and packages and loading the dataset
• Data preprocessing which involved data cleaning and transformation
• Exploratory Data Analysis(EDA)
• Data clustering using K-means algorithm
• Clustering analysis and visualization

Exploratory Data Analysis

After the data cleaning and data transformation process, I performed statistical analysis using the Pearson correlation coefficient method to detect if there were any relationships, associations or patterns present between variables in the dataset.
The result of the analysis showed that there was a strong positive relationship (0.83) between the total revenue generated by customers and the length of time (tenure) spent with the company. There was also a slightly positive correlation (0.26) between the tenure of customers and the number of referrals brought to the company.

Hypothesis

This is my assumption based on the findings of the EDA conducted:
❖ Customers that spend a significant number of time(tenure) with the company yield profitability for the company in terms of revenue.
It is therefore important to pinpoint these valuable customers and tailor marketing strategies to retain them.

Data Clustering

I applied the unsupervised machine learning approach using K-means algorithm for the customer segmentation. The several steps of the algorithm are as follows:
• Utilized the Elbow method to determine the optimal number of clusters.
• Initialized K as the centroid randomly .
• Calculated the Euclidean distance of every data point from every centroid in the space.
• Allocated every data point to the nearest centroid based on the calculated distance.
• Iterated until the centroids and data points remain the same and fixed.

Findings and Recommendations

Segment_1

Insights

Cluster A: These customers are fairly new to the company and don't have any referrals. This might be because they recently joined.
Cluster B: These customers have longer tenures but have no referrals. This indicates a possibility of inactiveness or nonchalance.
Cluster C: These customers range from those that joined newly to those with longer tenures who have more referrals. This shows that they are active and contribute significantly to the company.
Cluster D: These customers are fairly new to the company but they also have more referrals. This also shows that they are active and contribute significantly to the company.

Marketing Strategy

It's always less expensive to maintain an existing relationship than to create a new one. So, customers in Cluster C and Cluster D are valuable customers that show great potentials and should be targeted with strategies for retention.
To retain these customers and encourage repeat business, the company could reward them for bringing in referrals, and offer loyalty programs. These programs can include rewards for long-term contracts, discounts, or exclusive access to new products and services.

Segment_2

Insights

Cluster A: These customers are newer to the company but generate more revenue. This shows profitability and there are prospects for longevity .
Cluster B: These customers have longer tenures and they also generate a lot revenue. This indicates that they are high-value customers.
Cluster C: These customers range from those that newly joined to those with longer tenures, but they also generate a substantial amount of revenue. They also have the potential of being profitable customers.
Cluster D: These customers have been with the company for some time but generate a bit lower revenue. There seems to be potential for growth among these customers as most of them are fairly new to the company .

Marketing Strategy

These customers that show high level of customer lifetime value (CLV), which is a measure of the average revenue generated over their entire relationship with the company, should be provided compelling services that drive retention. These users can then be passed over with regard to acquisition campaigns, saving budget that can be better spent elsewhere.
Like this project

Posted Feb 15, 2025

I analyzed customer data in order to conduct customer segmentation based on value, utilizing the unsupervised machine learning K-means algorithm for clustering.

Interactive Excel Dashboard
Interactive Excel Dashboard
Sales Performance Analysis
Sales Performance Analysis
Customer Churn Analysis
Customer Churn Analysis

Join 50k+ companies and 1M+ independents

Contra Logo

© 2025 Contra.Work Inc