Data analyst

Cristian Barrientos

Python
### Project: **Customer Purchase Behavior Analysis**
#### **Objective:**
To analyze customer purchasing behavior to identify trends and insights that can help a retail company improve its marketing and sales strategies.
#### **Steps:**
### 1. **Data Collection:**
- **Dataset**: A CSV file containing transactional data with columns such as `Customer ID`, `Product ID`, `Purchase Date`, `Purchase Amount`, and `Product Category`.
- You can use publicly available datasets from platforms like Kaggle or simulate data.
```python
import pandas as pd
# Load dataset
df = pd.read_csv("customer_data.csv")
```
### 2. **Data Cleaning:**
- Handle missing values, duplicates, and outliers.
```python
# Check for missing values
df.isnull().sum()
# Drop duplicates
df.drop_duplicates(inplace=True)
# Fill missing values (example for Purchase Amount)
df['Purchase Amount'].fillna(df['Purchase Amount'].median(), inplace=True)
# Handle outliers (example using IQR method)
Q1 = df['Purchase Amount'].quantile(0.25)
Q3 = df['Purchase Amount'].quantile(0.75)
IQR = Q3 - Q1
df = df[~((df['Purchase Amount'] < (Q1 - 1.5 * IQR)) | (df['Purchase Amount'] > (Q3 + 1.5 * IQR)))]
```
### 3. **Exploratory Data Analysis (EDA):**
- Perform basic statistics and visualization to understand the data.
```python
import matplotlib.pyplot as plt
import seaborn as sns
# Summary statistics
print(df.describe())
# Visualize the distribution of purchase amounts
sns.histplot(df['Purchase Amount'], bins=30)
plt.title('Distribution of Purchase Amounts')
plt.show()
# Category-wise purchase count
category_counts = df['Product Category'].value_counts()
sns.barplot(x=category_counts.index, y=category_counts.values)
plt.title('Product Category Distribution')
plt.show()
```
### 4. **Customer Segmentation:**
- Use the RFM (Recency, Frequency, Monetary) model to segment customers.
```python
from datetime import datetime
# Recency: Days since the last purchase
df['Purchase Date'] = pd.to_datetime(df['Purchase Date'])
max_date = df['Purchase Date'].max()
df['Recency'] = (max_date - df['Purchase Date']).dt.days
# Frequency: Total number of purchases
freq_df = df.groupby('Customer ID').size().reset_index(name='Frequency')
# Monetary: Total purchase amount
monetary_df = df.groupby('Customer ID')['Purchase Amount'].sum().reset_index(name='Monetary')
# Combine into RFM dataframe
rfm_df = df.groupby('Customer ID').agg({'Recency': 'min', 'Customer ID': 'size', 'Purchase Amount': 'sum'}).reset_index()
```
### 5. **Data Visualization:**
- Create visualizations to present the insights.
```python
# Top 10 customers by purchase amount
top_customers = df.groupby('Customer ID')['Purchase Amount'].sum().nlargest(10)
top_customers.plot(kind='bar')
plt.title('Top 10 Customers by Purchase Amount')
plt.show()
# Correlation heatmap
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()
```
### 6. **Insights and Recommendations:**
- **Trend**: Identify which product categories generate the most revenue.
- **Customer Segmentation**: Categorize customers by their purchase frequency and monetary value to create targeted marketing strategies.
- **Action**: Recommend offering personalized discounts to high-frequency or high-value customers to boost loyalty.
### 7. **Conclusion:**
Partner With Cristian
View Services

More Projects by Cristian