1. Predicting Bank Customer Churn Using
Classification
Assignment - 02
RMIT University
Authors
1) Hewayalage Vishva Lahiru Kantha Abeyrathne (s3735195)
Student
RMIT University, Melbourne City Campus
s3735195@student.rmit.edu.au
2) Kodithuwakku Arachchige Iresh Udara Kaushalya (s3704769)
Student
RMIT University, Melbourne City Campus
s3704769@student.rmit.edu.au
Date of Report: 2nd of June 2019
2. Tableof Contents
1. Introduction..........................................................................................................................3
2. Methodology .........................................................................................................................3
2.1 Dataset.............................................................................................................................3
2.2 Data Pre-processing.......................................................................................................4
2.3 Data Exploration............................................................................................................5
2.3.1 Exploring columns ..................................................................................................5
2.3.2 Relationship of Features with Target Feature .....................................................5
2.4 Feature Selection and Ranking .....................................................................................7
2.5 Model Fitting ..................................................................................................................7
2.5.1 K-Nearest Neighbour (KNN) .................................................................................7
2.5.2 Hyper Parameter Tuning with K-Nearest Neighbour (KNN) ............................8
2.5.3 Decision Tree (DT)..................................................................................................8
2.5.4 Hyper Parameter Tuning with Decision Tree (DT).............................................8
3. Results ...................................................................................................................................9
3.1 Evaluation of K-Nearest Neighbour (KNN) with default parameters ......................9
3.2 Evaluation of Decision Tree (DT) with default parameters .......................................9
3.3 Evaluation of K-Nearest Neighbour (KNN) with Hyper Parameter Tuning .........10
3.4 Evaluation of Decision Tree (DT) with Hyper Parameter Tuning ..........................10
3.5 Confusion matrix of KNN and DT .............................................................................10
3.6 Classification Error Rate of KNN and DT ................................................................11
3.7 Precision, Recall and F1-Score of KNN and DT .......................................................11
4. Discussion............................................................................................................................12
5. Conclusion ..........................................................................................................................12
6. References...........................................................................................................................12
3. Abstract
The main purpose of this report was to predict customer chain using data related to a particular bank
and its customers with the support of classification. Dataset was obtained from Kaggle repository and
necessary pre-processing tasks were performed prior to classification. K-Nearest Neighbours and
Decision Tress were comparatively evaluated with parameter tuning in order to identify the best
model. Decision Tree classifier with parameter tuning was selected as the best model after evaluation
process. The report concludes that selected model performs well for unseen data of customers who are
not going to churn whereas prediction power decays with the data related to the customers who are
not going to churn. It is recommended to feed more data of customers who are going to churn in to the
model in order to have better outcomes.
1. Introduction
With the improvement of data science and data analytics fields, technologies of those have become
immensely popular in many domains as well as various industries. It is even more valuable for
banking domain to utilize the feature or capabilities of data science since they are dealing with tons of
data in daily basis. One of the main requirements of a bank is to predict the customer chain in order to
retain their most valuable customers without allowing them to move to another bank. Churn
prediction has been performed as a research in previous years due to the increasing demand of the
banking sectors [1-2]. This report will discuss the project that aims to identify the best classification
model out of K-Nearest Neighbours and Decision Trees in order to better predict the customers who
are going to churn using related data.
2. Methodology
The main steps of the model building procedure can be pointed out as data collection, data pre-
processing, training data with two classification algorithms and evaluation of the built model using
test data. Methodology is folded for subsections, section 1 analyses the dataset, section 2 describes
necessary data pre-processing steps, section 3 explores the data and their relationships using several
visualizations, section 4 discusses the classification tasks using selected algorithms. Finally,
algorithms are evaluated using test data to identify the best suited model for solving the problem.
2.1 Dataset
A dataset is related to banking domain as the main goal of this project is to predict the customer churn
using classification tasks. Dataset is obtained from Kaggle repository which is popular in the research
area of data science. It is consisted with 10,000 observations with 14 columns where each column
represents the data related to customer. ‘Exited’ is the binary target feature of the dataset which states
whether a customer is going to churn or not. All the attributes and description of the attributes along
with possible ranges are shown below in the Table 2.1.
4. Table 2.1. Description of the dataset
2.2 Data Pre-processing
Python Pandas functions were used to search for unnecessary whitespaces and data entry errors in the
categorical variables as the very first step under data pre-processing. In the next step, missing values
for each column were observed and there were none of them in the dataset. Summary statistics were
used to detect any outliers or impossible values in the numerical attributes. Finally, attributes such as,
Row Number, Customer ID and Surname were removed from the dataset since those attributes do not
have a impact towards the final prediction results with the classification.
Attribute Description Possible Ranges
Row Number (Integer) Row number in the dataset 1 – 10 000
Customer ID (Integer) ID of the bank customer Random Numbers
Surname (Character) Surname of the customer
Credit Score (Integer) Score based on customer
behaviour
350 - 850
Geography (Categorical) Countries of the respective
customers
France, Germany,
Spain
Gender (Categorical) Gender of the customer Male, Female
Age (Integer) Age of the customer 18 - 92
Tenure (Integer) Period of having the account in
months
0 - 10
Balance Balance of the customer’s
account
0 – 25 089.09
Number of Products (Integer) Number of accounts that
customer has
1 - 4
Has Credit Card (Categorical) Does the customer have a
credit card
Yes -1, No - 0
Is Active Member (Categorical) Is customer an active member Yes -1, No - 0
Estimated Salary Estimated salary of customer 11.58 – 199992.48
Exited (Categorical) Customer is going to churn or
not
Yes -1, No - 0
5. 2.3 Data Exploration
2.3.1 Exploring columns
Pie char can be plotted to explore the proportions of the classes of target variable and 79.63% of data
are related to Retained customers while other 20.37% included for exited customers. Majority of the
customers are from France while second place and third place go to Germany and Spain respectively
after plotting the bar chart. It was explored that majority of the customer are from male category using
a bar chart. Further bar charts were used to observe the customers with the credit cards and active
members. Majority of the customers are active members while most of them have credit cards align
with the bank.
Numerical columns were explored using histograms and Credit Score feature has shown a symmetric
distribution with the data. Age has shown a right skewed histogram showing lesser number of old age
customers. Tenure column has displayed a almost uniform distribution while 0 and 10 having lower
values. Symmetric distribution was observed in the Balance variable apart from 0 values which
indicates customers with 0 balance. It can be stated that most of the customers have one account or
two accounts after observing the histogram of number of products variables. Finally, histogram of
Estimated Salary has plotted, and it was consisted with a uniform distribution spreading different
ranges of estimated salaries of the customers.
2.3.2 Relationship of Features with Target Feature
This section mainly focuses on the relationship of all the features with target feature in order to find
out the impact of those feature respective to final outcome of the prediction.
Figure 2.3.2.1. Relationship between Categorical Features and Target Feature
6. As it can be observed from the above figure, France has the majority of the customers dealing with
the bank while Germany has the lowest proportion. Graph was plotted with the expectation to have
France as the country to produce more customers who are going to churn. But, it has almost an
inverse relationship with the target feature where customers from Germany has the highest probability
of leaving the bank. It is required for bank to focus more on countries where there are fewer
customers.
Majority of the customers are from Male category and expected to have more from male category to
leave the bank. Surprisingly, Female category has a better chance of moving on to another bank
according to the above figure. Focusing more woman or supplying benefits for them with their
accounts would be a way to stop them churning.
Expectation was there to observe customers with no credit cards more in the churned proportion.
However, Customers with the credit card have a much probability of churning compared to customers
who do not have any credits cards with the bank. It is surprising factor associated with the dataset.
As it can be explored in the above figure, as expected, customers who are not active with the bank
have the highest probability of leaving the bank and there should be a way to engage them with the
bank.
Figure 2.3.2.2. Relation between Numerical Features and Target Feature
7. As expected,much of a difference cannot be obtained from the above figure where there is no impact
for the target feature with the credit score of the customers. According to the above figure, older
customers tend to churn at a rapid pace compared to younger customers and bank need to reconsider
their approach when it comes to target market. It was against the hypothesis made when plotting the
graph.
Customer who have spent a little time or much longer time have a better chance of churning regard to
that of customer who have an average tenure with the bank, and it is an expected result.
According to the above figure and as it was not intended, bank is more likely to lose customers with
higher account balance and it would be a negative effect towards the bank and necessary facilities, or
various loan options have to be introduced to retain those valuable customers. Number of accounts
does not seem to have an impact on target feature as the same hypothesis was made. From the above
figure, it is obvious that estimated salary of a customer does not have much of a prediction power
towards final outcome as it was expected according to the hypothesis.
2.4 Feature Selection and Ranking
Features are selected according to their importance or contribution towards final outcome using
Random Forest Importance. Attributes Age' 'Estimated Salary' 'Credit Score' 'Balance' 'Number of
Products' gained maximum score according to the order while other five attributes shared almost a
similar score compared to higher score. Finally, all 1o attributed were considered for model building
phase.
2.5 Model Fitting
Since this is a classification problem, K-Nearest Neighbour (KNN) and Decision Tree (DT) classifiers
are used to fit the model with banking data in order to identify the best classifier out of the two. Prior
to fit the data using selected classifiers, categorical attributes in the dataset were encoded to numerical
attributes using one-hot encoding mechanism in order to feed in to algorithms.
2.5.1 K-Nearest Neighbour (KNN)
As the first step of training the data with KNN algorithms, it is required to specify an adequate K
value to achieve higher results. K-folds cross validation was used to find out the best k value using
range of values 1 – 7. Optimal value was selected as 5 as it gave the lowest misclassification error.
8. Figure 2.5.1.1. Result of misclassification error vs k.
Dataset was trained using KNN algorithm having severaltrain/test splits as 80%:20%, 60%:40% and
50%:50% in order to identify the best split. Default parameters were used as the initial step.
2.5.2 Hyper Parameter Tuning with K-Nearest Neighbour (KNN)
Having identified the best train/test split with default parameter setting, parameter values were
changed with different values in order to maximise the performance of the KNN model. Distance
based methods were used instead of default ‘uniform’ parameter with a p value range of [0-2] in order
to have Minkowski, Manhattan and Euclidean distance-based methods respectively in contention.
2.5.3 Decision Tree (DT)
After KNN,data were separately trained using DT with default parameters having same train/test
splits (80%:20%, 60%:40% and 50%:50%) as KNN.
2.5.4 Hyper Parameter Tuning with Decision Tree (DT)
Default parameter values were changed where maximum depth for tree was 3,minimum number of
samples for a leaf node was 5 and minimum number of sample split was 2.
9. 3. Results
Evaluation results of both the classification models are discussed in this section.
3.1 Evaluation of K-Nearest Neighbour (KNN) withdefault parameters
Table 3.1.1. Classification Results of KNN model
Classification Model Train/Test Split Accuracy
KNN 50% - 50 % 75.1%
KNN 60% - 40 % 75.63%
KNN 80% - 20 % 76.45 %
Maximum accuracy results were produced by 80% - 20% train/test split with an accuracy level of
76.45%
3.2 Evaluation of Decision Tree (DT) with default parameters
Table 3.2.1. Classification Results of DT model
Classification Model Train/Test Split Accuracy
DT 50% - 50 % 78.42%
DT 60% - 40 % 78.23%
DT 80% - 20 % 79.5%
80% - 20% train/test split provided maximum accuracy for DT as well with 79.5% accuracy level.
10. 3.3 Evaluation of K-Nearest Neighbour (KNN) withHyper Parameter Tuning
Table 3.3.1 Classification Results of KNN model with parameter tuning
Classification Model Accuracy
KNN(Minkowski) 74.0%
KNN (Manhattan) 74.7%
KNN (Euclidean) 74.0 %
With the parameter tuning, Manhattan distance method provided better results of 74.7% compared to
other distance-based methods. But it was not able to surpass the accuracy level which was obtained
using default parameter settings.
3.4 Evaluation of Decision Tree (DT) with Hyper Parameter Tuning
Table 3.4.1. Classification Results of DT model with parameter tuning
Classification Model Parameters Accuracy
DT Max depth=3,
Min sample split=2,
Min sample leaf=5
84.25%
With the change of parameter values DT algorithm performed well for this particular dataset with a
higher accuracy rate of 84.25%.
3.5 Confusion matrix of KNN and DT
Confusion matrix can be identified as a measurement to assess the performance of a model.
[[1430 65]
[ 355 50]]
Above confusion matrix was obtained from KNN model with default parameters which provided the
best accuracy. With the matrix, it is obvious that model predicts class value 0 as 0 for 1430 times
11. while it predicts 0 as 1 for 65 times. Model does not look better prediction class value of 1 where it
predicts value 1 as 1 for only 50 times while it predicts 1 as 0 for 355 times.
[[1576 19]
[ 296 109]]
Above matrix shows the results of DT model with hyper parameter tuning which had the best
accuracy for that model. With the results, it is clear that DT also predicts class value of 0 with great
accuracy level while prediction of the class value 1 is pretty low.
3.6 Classification Error Rate of KNN andDT
Classification error can be stated as 1 – accuracy as the simplest term. Having assessed the results of
KNN and DT models in the previous sections, both had best accuracy levels 0f 76.45% and 84.25%
respectively. Therefore, error rate of KNN model is 23.55% whereas DT has the error rate of 15.75%.
3.7 Precision, Recall and F1-Score of KNN and DT
Precision Recall and F1-Score are measures to interpret a model performance. Precision can be stated
as the ratio correctly predicted positive observations to total predicted positive observations. Recall is
the ratio of correctly predicted positive observation to all observations in class ‘yes’. F1-Score is a
weighted average of both precision and Recall.
Figure 3.7.1. Performance measurements of optimized KNN
Figure3.7.2. Performance measurements of optimized DT
12. Having analysed all the performance measurements,it can be stated that Both the algorithms work
better for class 0 while it is not ideal for predicting class 1 though DT have some capability of that
over KNN.
4. Discussion
Having assessed the results for both the classification models, Decision Tree classifier can be selected
as the best model for this project with an accuracy rate of 84.25% surpassing 76.45% accuracy level
of K-Nearest Neighbour classifier. It can be considered as a very high accuracy rate considering the
data as well. Only drawback that can be found from the model is its inability to predict the class 1
with high accuracy. Precision value of the class confirmed it with a lower value compared to class 0.
Class 1 of target feature relates to the customers who are going to churn, and this model works well to
predict the customers who are not going to change this bank to another.
5. Conclusion
In this report, comparative study between K-Nearest Neighbours classifier and Decision Tree
classifier were considered to predict the customers who are going to move to another bank using
banking data in Kaggle repository. Decision Tree model with parameter tuning was selected as the
best model with the accuracy results and performance measures. Selected model lacks the ability to
predict customers who are going to churn while performing well to customers who are not going to
churn. This can be further improved by feeding more data related that particular class over time.
6. References
[11 Hadden, J.,Tiwari, A.,Roy, R., & Ruta, D. (2007). Computer assisted customer churn
management: State-of-the-art and future trends. Computers & Operations Research,34(10), 2902-
2917.
[2] Sayed, H., Abdel-Fattah, M. A.,& Kholief, S. (2018). Predicting Potential Banking Customer
Churn using Apache Spark ML and MLlib Packages:A Comparative Study. INTERNATIONAL
JOURNAL OF ADVANCEDCOMPUTER SCIENCE ANDAPPLICATIONS,9(11), 674-677.