SQL PROJECTS

Laksmi Wulandiari

Data Analyst
PostgreSQL
Python

BANK CUSTOMER CHURN ANALYSIS

This notebook will analyze the possibility of bank customer churn based on insights gathered from the datasets. Insights from personal info such as age, gender, income and demographic location, type of card, products, their credit score and how long the customer has been with the bank. It is advantageous for the bank to know how to retain customer loyalty, rather than looking out for another customer.

IMPORT LIBRARY

Import psycopg2 for multi-threaded applications and manages its own connection pool, pandas for data analysis and manipulation, and sqlalchemy to facilitates the communication between Python programs and databases.

Dataset Explanation

RowNumber : Corresponds to the record (row) number
CustomerId : Customer's id number
Surname : Customer's surname
CreditScore : Prediction of customer credit behavior
Geography : Customer’s location
Gender : Customer's gender information
Age : Customer's age information
Tenure : Number of years the customer has been stay with the bank
Balance : Customer's balance information in their account
NumOfProducts : Number of products that a customer has purchased
HasCrCard : Whether or not a customer has a credit card
IsActiveMember : Whether or not a customer is active
EstimatedSalary : Customer's estimated salary amount
Exited : Whether or not the customer left the bank
Complain : Whether customer has complaint or not
Satisfaction Score : Customer satisfaction score against the bank
Card Type : Type of card hold by the customer
Points Earned : Points earned by the customer for using credit card

QUERY

# count total customer that is churn/exited or not 


q='''

WITH temp_churn AS(
SELECT exited,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)

SELECT STATUS,
COUNT(exited) as Total
FROM temp_churn
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df.head()


status total
0 Not Churn 7962
1 Churn 2038
# count how many active customer that churn


q='''

WITH temp_isactivemember AS(
SELECT exited,
CASE
WHEN isactivemember = 1 THEN 'Active'
ELSE 'Not Active'
END AS isactivemember
from customer_churn_records
)

SELECT isactivemember,
COUNT (CASE WHEN exited = 1 THEN 1 END) AS Churn,
COUNT (CASE WHEN exited = 0 THEN 1 END) AS Not_Churn
FROM temp_isactivemember
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df

isactivemember churn not_churn
0 Not Active 1303 3546
1 Active 735 4416
# count total customer that is churn/exited or not based on their gender


q='''

SELECT gender,
COUNT(gender) as Total,
COUNT(case when exited = 1 then 1 end) as Churn,
COUNT(case when exited = 0 then 1 end) as Not_churn
FROM customer_churn_records
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df.head()

gender total churn not_churn
0 Female 4543 1139 3404
1 Male 5457 899 4558

# count how many customer that churn based on their demographic

q='''

SELECT geography,
COUNT (CASE WHEN exited = 1 THEN 1 END) AS Churn,
COUNT (CASE WHEN exited = 0 THEN 1 END) AS Not_Churn
FROM customer_churn_records
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df

geography churn not_churn
0 Spain 413 2064
1 France 811 4203
2 Germany 814 1695

# count how many customer that churn based on their group age

q='''

SELECT
CASE
WHEN age <= 20 THEN 'Group <= 20'
WHEN age >= 21 AND age <= 40 THEN 'Group 21-40'
WHEN age >= 41 AND age <= 60 THEN 'Group 41-60'
ELSE 'Group > 60'
END AS age_category,
COUNT(CASE WHEN exited = 1 then 1 end) as Churn,
COUNT(CASE WHEN exited = 0 then 1 end) as Not_Churn
FROM customer_churn_records
GROUP BY 1
ORDER BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df

age_category churn not_churn
0 Group <= 20 5 84
1 Group 21-40 682 5648
2 Group 41-60 1236 1881
3 Group > 60 115 349

# count total customer that is churn/exited or not based on how long
# they've been as customer

q='''

WITH temp_tenure AS(
SELECT tenure,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)

SELECT STATUS,
AVG(tenure) as Average_tenure
FROM temp_tenure
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df.head()

status average_tenure
0 Not Churn 5.032781
1 Churn 4.934740

# count how many customer that churn based on the amount of credit score
# (average, highest and lowest)

q='''

WITH temp_creditscore AS(
SELECT creditscore,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)

SELECT STATUS,
AVG(creditscore) as avg_creditscore,
MAX(creditscore) as max_creditscore,
MIN(creditscore) as min_creditscore
FROM temp_creditscore
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df.head()

status avg_creditscore max_creditscore min_creditscore
0 Not Churn 651.837855 850 405
1 Churn 645.414622 850 350

# count how many customer that churn based on how many bank product they use

q='''

WITH temp_bankprod AS(
SELECT numofproducts,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)

SELECT STATUS,
AVG(numofproducts) as avg_numofproducts
FROM temp_bankprod
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df.head()

status avg_numofproducts
0 Not Churn 1.544210
1 Churn 1.475466

# count how many customer that churn based on the average score of customer
# satisfaction score towards the bank

q='''

WITH temp_satisfaction AS(
SELECT satisfaction_score,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)

SELECT STATUS,
AVG(satisfaction_score) as satisfaction_level
FROM temp_satisfaction
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df.head()

status satisfaction_level
0 Not Churn 3.017960
1 Churn 2.997547

# count how many customer that churn based on the amount of complain filed by customer

q='''

WITH temp_complain AS(
SELECT complain,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)

SELECT STATUS,
COUNT(complain) as complain
FROM temp_complain
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df.head()

status complain
0 Not Churn 7962
1 Churn 2038

# count how many customer that churn based on credit card ownership

q='''

WITH temp_hascrcard AS(
SELECT hascrcard,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)

SELECT STATUS,
COUNT (CASE WHEN hascrcard = 1 THEN 1 END) AS has_creditcard,
COUNT (CASE WHEN hascrcard = 0 THEN 1 END) AS no_creditcard
FROM temp_hascrcard
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df

status has_creditcard no_creditcard
0 Not Churn 5630 2332
1 Churn 1425 613

# count how many customer that churn based on the their card type

q='''

WITH temp_card AS(
SELECT card_type,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)

SELECT STATUS,
COUNT(CASE WHEN card_type = 'PLATINUM' THEN 1 END) as PLATINUM,
COUNT(CASE WHEN card_type = 'DIAMOND' THEN 1 END) as DIAMOND,
COUNT(CASE WHEN card_type = 'GOLD' THEN 1 END) as GOLD,
COUNT(CASE WHEN card_type = 'SILVER' THEN 1 END) as SILVER
FROM temp_card
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df.head()

status platinum diamond gold silver
0 Not Churn 1987 1961 2020 1994
1 Churn 508 546 482 502

# count how many customer that churn based on the amount of point customer earned

q='''

WITH temp_point AS(
SELECT point_earned,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)

SELECT STATUS,
AVG(point_earned) as avg_point,
MAX(point_earned) as max_point,
MIN(point_earned) as min_point
FROM temp_point
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df.head()

status avg_point max_point min_point
0 Not Churn 607.044084 1000 119
1 Churn 604.448479 1000 163

# count how many customer that churn based on customer estimated salary

q='''

WITH temp_salary AS(
SELECT estimatedsalary,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)

SELECT STATUS,
AVG(estimatedsalary) as avg_salary,
MAX(estimatedsalary) as max_salary,
MIN(estimatedsalary) as min_salary
FROM temp_salary
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df.head()

status avg_salary max_salary min_salary
0 Not Churn 99726.853103 199992.0 90.07
1 Churn 101509.908819 199808.0 11.58

# count how many customer that churn based on the amount of customer's bank balance
# (average, highest, lowest)

q='''

WITH temp_balance AS(
SELECT balance,
CASE
WHEN exited = 1 THEN 'Churn'
ELSE 'Not Churn'
END AS STATUS
from customer_churn_records
)

SELECT STATUS,
AVG(balance) as avg_balance,
MAX(balance) as max_balance,
MIN(balance) as min_balance
FROM temp_balance
GROUP BY 1

'''

df = pd.read_sql(q,engine_postgresql)
df.head()

status avg_balance max_balance min_balance
0 Not Churn 72742.446747 221532 0
1 Churn 91109.104024 250898 0

CONCLUSION

Based on above QUERY, there are some category that can help us identify what aspects that really affected customer to churn. Even though, customer that already stay quite long, can still churn, or customer that has, let's say, fairly amount of saving on their bank account, can still churn.
Customer in the age category between 41 to 60, are more likely to churn than any other age category. To fix this problem, bank can focused on creating or enhancing their product and services that can help attract and maintain customer in certain age category, like smoother service with minimum queue time for older customer.
Customer who has credit card tends to not churn and stay with the bank. It probably best for the bank to convince more customer to apply for credit cards with various promotion, that depends on customer segmentation, which can be based on their type of card [ Diamond, Platinum, Gold, Silver ], gender, age, expenses, and demographic location.
Satisfaction score from customer who stay and churn is a little alarming [ 3.017960 / 2.997547 ]. Bank need to evaluate, to keep the satisfaction score between churn customer and customer who stay, at distance, and keep active customer which is less likely to churn.
Partner With Laksmi
View Services

More Projects by Laksmi