SlideShare una empresa de Scribd logo
1 de 30
MACHINE LEARNING
Telecom Churn Analysis
DATA SCIENCE
PROJECT
Team Info
 Ali Qaiser Syed
 Annada Prasad
 Chia Aik Lee
 Diego Jaramillo
 Kiran Devathi
 Manoj Kalamkar
 Praveen Parvataneni
 Subrata Roy
 Ravi Bodkai
 Vasudev Pendyala
Domain
Topic Telecom Churn Analysis
Telecom
Churn (loss of customers to competition) is a problem for telecom companies because
it is expensive to acquire a new customer and companies want to retain their existing
customers. Most telecom companies suffer from voluntary churn. Churn rate has strong
impact on the life time value of the customer because it affects the length of service
and the future revenue of the company. For example if a company has 25% churn rate
then, the average customer lifetime is 4 years; similarly a company with a churn rate of
50%, has an average customer lifetime of 2 years.
In the targeted approach the company tries to identify in advance customers who are
likely to churn. The company then targets those customers with special programs or
incentives. This approach can bring in huge loss for a company, if churn predictions are
inaccurate, because then firms are wasting incentive money on customers who would
have stayed anyway.
Introduction
Project Objective
 To predict Customer Churn.
 Highlighting the main variables/factors influencing Customer Churn.
 Use various ML algorithms to build prediction models, evaluate the
accuracy and performance of these models.
 Finding out the best model for our business case & providing executive
Summary.
Dataset
Description
 Source provided by Upx Academy for
data science machine learning project
evaluation
 Source dataset is in txt format with csv.
 Dataset contains 4617 rows and 21
columns
 There is no missing values for the
provided input dataset.
 Churn_status is the variable which
notifies whether a particular customer is
churned or not. And we will be
developing our models to predict this
variable only.
Column Names
State factor
Account_Len integer
Area integer
Ph_No. factor
Int_Plan factor
Vmail_Plan factor
messgs integer
tot_day_mins numbe
r
tot_day_calls integer
tot_day_chrgs numbe
r
tot_evening_mins numbe
r
Column Names
tot_evening_calls integer
tot_evening_chrgs numbe
r
tot_ngt_mins numbe
r
tot_ngt_calls integer
tot_ngt_chrgs numbe
r
tot_int_mins numbe
r
tot_int_calls integer
tot_int_chrgs numbe
r
cust_calls_made Integer
churn_status factor
Model Building Steps
Define Problem Data Collection Data Preparation Perform EDA
Feature
Selection
Build Models
Validate &
Measure Models
Performance
Improve Models
Performances
Execute Models
for Prediction
Select the Best
Fit Model for our
Business Case
Steps Description
DefineProblem
In this project, we will be addressing Customer churn problem for a fictitious
US Telecom company. UpX academy has provided us the usage pattern for 4617
customers for over a period of time along with the info whether particular customer
is churned or not.
Utilizing the input data, we have to build a model which can predict the customer
going to be churned well in advance.
contd..
DataCollection
The input data has been provided by UpX academy in txt format with comma
separated values. Read in the data into R workbook by appropriate methods
ensuring without loosing the data. Here for this loading the data into R we have
used read.table command with sep = ‘,’ . As columns names were not part of input
dataset and we have provided them separately, hence we have provided the
col.names as well to read.table command.
Steps Description
DataPreparation
Perform general housekeeping activities such as checking for the completeness of
the content, look at the dimensions & review the structure of input dataset, peek into
the data, summarize the data, and get a snapshot of all the features. Check if there
is any missing data for particular customers, luckily for our case there is no missing
data.
As a final step, transform the data into a format ready to apply with the algorithms
and modeling.
contd..
PerformEDA
Perform EDA (exploratory data analysis) to understand the data and its
applicability on the problem. Understand how the data and its features are
interrelated & correlated, evaluate presence of outliers and its effects. Here we will
use various box plots and bar plots to understand what are the features which
majorly impact on our outcome variable ‘churn_status’. We will see in our next
slides the various outcomes from our EDA.
Steps Description contd..
PerformEDA
Steps Description contd..
PerformEDA
Steps Description contd..
PerformEDA
Steps Description contd..
PerformEDA
Survival Analysis (Cox Hazard Method)
There are several Survival analysis methods which are applied to examine the
time it takes for particular event to occur. For our business problem, we can make a
case like, the time it takes for a existing customer to churn. As we have one of the
input variable as Account_Length (in number of days) which denotes period of stay
for a particular customer before he/she churned, our input data fits right for
applying Survival Analysis.
Create a Survival Object & Cox Model:-
train_survival$survival <- Surv(train_survival$Account_Len, train_survival$churn_status == 2)
results <- coxph(survival ~ Int_Plan + Vmail_Plan + messgs + tot_day_calls + tot_day_chrgs +
tot_day_mins +
+ tot_int_calls + tot_int_chrgs + tot_ngt_chrgs
+ tot_int_mins + cust_calls_made, data = train_survival)
Steps Description contd..
PerformEDA
Survival Analysis (Cox Hazard Method)
Now lets see what does the summary stats from Cox Model convey us :-
coef exp(coef) se(coef) z p
Int_Plan yes 1.29e+00 3.64e+00 1.12e-01 11.53 <2e-16 Note:- From the summary
Vmail_Plan yes -1.74e+00 1.75e-01 5.54e-01 -3.14 0.0017 we can see that the
messgs 3.38e-02 1.03e+00 1.68e-02 2.02 0.0438 most statistically
tot_day_calls 1.14e-03 1.00e+00 2.50e-03 0.46 0.6480 significant coef are
tot_day_chrgs 8.29e+00 3.97e+03 1.73e+01 0.48 0.6314 Int_Plan_yes &
tot_day_mins -1.40e+00 2.47e-01 2.94e+00 -0.48 0.6336 cust_calls_made. Have
tot_int_calls -6.52e-02 9.37e-01 2.26e-02 -2.89 0.0039 also checked the
tot_int_chrgs -1.46e+01 4.47e-07 1.79e+01 -0.82 0.4131 Proportional Hazards
tot_ngt_chrgs 6.75e-02 1.07e+00 2.17e-02 3.11 0.0019 for these.(check R
tot_int_mins 4.03e+00 5.64e+01 4.82e+00 0.84 0.4032 code).
cust_calls_made 3.07e-01 1.36e+00 3.16e-02 9.72 <2e-16
Steps Description contd..
PerformEDA
Survival Analysis (Cox Hazard Method)
Lets plot now the survival plot for our Customers.
So, from the figure we can see that for the first
50-70 days we have no customer churn, that
means we are able to retain all of our customers
in the first 50-70 days. For next 50-70 days the
percentage of not churned reduces almost from
100% to 80% approx. And decreases
considerably in such a way that, after 200 days of
tenure the percentage of not churned comes
down almost to 40 %. That means by the end of
completing 200 days tenure, 60% of our
customers are churned who joined us on day 1.
Steps Description contd..
PerformEDA
Main Observations from our EDA
 Usage pattern for Churned Customers which mainly includes minutes
(Day/Evening/Night/Int) and their charges incurred is higher than the Not
Churned Customers. So, Churned Customers seems to be relatively high value
customers & more business relevant.
 Both Box and Bar plots are complementing each other with their findings.
 Box plot for Customer Calls made gives us enough proofs that customers going
to be Churned Call Customer Care many times before getting churned.
 Lucky enough to have no missing values in our input dataset.
 From Survival Analysis it is confirmed that Customer Calls Made & International
Plan are most influential features. Total Day Mins Used is the also one
suggested by our traditional EDA methods.
Steps Description
Feature
Selection
After EDA, its now possible for us to identify the major features/variables which
influence significantly Customer Churn, we will now use these features for our
model preparation. Further we can Drop the features that do/may not influence our
outcome variable.
contd..
BuildModels
Before starting with Building Model, we will segregate our input data in 3 datasets
with proportions 60(train):20(test):20(pred for final prediction).
As our business case is for classification model, we will be building models for all
possible ML algorithms which can be applied for Classification problem. Our
approach will be to proceed with ML algorithm one by one and at the same time will
be validating/improving the each ML algorithm outcomes.
Steps Description
BuildModels
Logistic Regression
logit.model = glm(churn_status ~ Account_Len + tot_day_chrgs +
tot_evening_chrgs + tot_ngt_chrgs +
tot_int_chrgs + cust_calls_made, family = "binomial", data = train)
contd..
Validate&Measure
Now lets Test the above build model on test dataset, as the model is a logistic
regression we assume for any observation if the probability of customer to be
churned is > 0.5 we will assume that customer is churned and similarly if < 0.5 , we
will assume its not churned. Lets validate and measure our assumption .
Logit
Pred
False. True.
False. 115741 98.27
115
Sensitivit
y
Specifici
ty
Accurac
y
85.48
10.15
1313True.
Steps Description contd..
FinalPrediction
ImproveModel
Performance
Variating the Threshold Value
In previous step we used our threshold value as 0.5 to decide whether customer is
churned or not depending upon whether the value is above 0.5 or not. Lets variate
the threshold value a bit and check if the performance of the model is increasing or
not. Upon testing for various values (0.4 & 0.3), it was found that the performance
of the overall model increases when threshold value is selected as 0.3 . We
request you please refer to the attached R code for more details on this.
Final Prediction with threshold value as 0.3 on pred dataset, and calculate the model
performance for one last time for logit model.
Logit
Pred
False. True.
False. 86756 91.85
115
Sensitivit
y
Specifici
ty
Accurac
y
84.21
32.28
4167True.
Steps Description
BuildModels
K-nearest neighbour
knn_model <- knn(train = knn_train, test = knn_test,cl = train$churn_status, k=53 )
k= 53 has been calculated as the sqrt of number of observations 2785
contd..
Validate&Measure
Validating the above model against our test data set, lets see what our confusion
metrix and other performance parameters convey :-
99.73
115
Sensitivit
y
Specifici
ty
Accurac
y
87.55
13.28
Knn.Pre
d
False. True.
False. 111752
172True.
Steps Description contd..
FinalPrediction
ImproveModel
Performance
-> Using Elbow Method to identify
Optimum value of k.
-> Normalize the data to supress the
scale irregularities as knn works on
distance between the nearest
neighbors.
Final Prediction after applying above improvement techniques on pred dataset, lets
calculate the Final model performance for knn algorithm
Knn.Pre
d
False. True.
False. 84805 92.04
115
Sensitivit
y
Specifici
ty
Accurac
y
89.26
33.85
4318True.
Steps Description
BuildModels
Decision Trees
tree_model <- tree(churn_status ~.-State-Area-Ph_No., data = train)
Classification tree: tree
Variables actually used in tree construction:
"tot_day_mins" "cust_calls_made" "Int_Plan" "tot_evening_mins"
"Vmail_Plan“
"tot_int_calls" "tot_int_mins" "tot_ngt_mins" , Number of terminal
nodes: 15
contd..
Validate&Measure
Validating the above model against our test data set, lets see what our confusion
matrix and other performance parameters convey :-
98.01
115
Sensitivit
y
Specifici
ty
Accurac
y
93.99
70.31
Tree.Pre
d
False. True.
False. 38739
9015True.
Steps Description contd..
FinalPrediction
ImproveModel
Performance
-> Pruning, so that the model doesn’t overfit the
dataset.
tree_validate <- cv.tree(object = tree_model,FUN =
prune.misclass )
plot(x=tree_validate$size,y=tree_validate$dev, type="b")
From the plot one can see that tree_validate$dev diff is same
from for tree levels 11-14, so can we assume the best tree
level size to be 12 (rather than original number 15) at the
cost of some bias.Final Prediction after applying above improvement techniques on pred dataset, lets
calculate the
Final model performance for decision trees algorithm
Tree
Pred
False. True.
False. 44801 97.32
115
Sensitivit
y
Specifici
ty
Accurac
y
93.05
65.35
8322True.
Steps Description
BuildModels
Random Forest
random_model <- randomForest(as.factor(churn_status)~ Account_Len
+ Int_Plan + Vmail_Plan + tot_day_mins + tot_evening_mins +
tot_ngt_mins
+ tot_int_mins + cust_calls_made,data = train,ntree = 2000,
importance = TRUE)
Please note the variable importance plot also compliments our earlier
findings from EDA.
contd..
Validate&Measure
Validating the above model against our test data set, lets see what our confusion matrix
and other performance parameters convey :-
98.93
115
Sensitivit
y
Specifici
ty
Accurac
y
94.10
64.62
Ran.Pre
d
False. True.
False. 44746
848True.
Steps Description contd..
FinalPrediction
ImproveModel
Performance
-> Performance Boosting by Cross Validation with folds k=10.
train_control <- trainControl(method="cv", number=10)
random_model_train <- train(as.factor(churn_status)~ Account_Len + Int_Plan + Vmail_Plan
+ tot_day_mins + tot_evening_mins + tot_ngt_mins
+ tot_int_mins + cust_calls_made, data=train, method="rf",
metric="Accuracy", trControl=train_control)
Final Prediction after applying above improvement techniques on pred dataset, lets
calculate the
Final model performance for Random Forest algorithm
Ran
Pred
False. True.
False. 47808 98.17
115
Sensitivit
y
Specifici
ty
Accurac
y
93.05
62.99
8015True.
Steps Description
BuildModels
Support Vector Machine
svm_model<-svm(churn_status ,data=train,kernel='radial',gamma=1,cost=100)
SVM-Type: C-classification SVM-Kernel: radial
cost: 100
gamma: 1 Number of Support Vectors: 2785
contd..
Validate&Measure
Validating the above model against our test data set, lets see what our confusion matrix
and other performance parameters convey :-
100
115
Sensitivit
y
Specifici
ty
Accurac
y
85.49
0
SVM.Pre
d
False. True.
False. 128754
00True.
Steps Description contd..
FinalPrediction
ImproveModel
Performance
-> Using Cross Validation technique to optimise the best values for gamma and cost.
svm.tune <- tune(svm,churn_status ~ Account_Len + Int_Plan + Vmail_Plan
+ tot_day_mins + tot_evening_mins + tot_ngt_mins
+ tot_int_mins + cust_calls_made,data=train,kernel='radial',ranges = list(cost =
c(0.1,1,10,100,1000), gamma = c(0.5,1,2,3,4)))
Final Prediction after applying above improvement techniques on pred dataset, lets
calculate the
Final model performance for SVM algorithm
Ran
Pred
False. True.
False. 59813 98.78
115
Sensitivit
y
Specifici
ty
Accurac
y
92.74
55.54
6810True.
Steps Description
In the past few slides for our Classification Business Problem, we have created
many models, measured their performances, applied various improvement
techniques and at the end of each step we created best model for each
corresponding ML algorithms. As we have the performance results with us for the
models, lets select the best one out of them which suits best for our business case.
contd..
SelecttheBestFitModel
Logit kNN Decision T Random
F
SVM
Accuracy 84.21 89.26 93.05 93.05 92.74
Specificity 91.85 92.04 97.32 98.17 98.78
Sensitivity 32.28 33.85 65.35 62.99 55.54
Decision Trees (CART) model / Random Forest suits best for our business
scenario.
Summary
 Customers Churned were high business Value customers, their usage pattern
was high as compared to not churned.
 Analysis from Survival Model suggests that we are able to retain all our
customers in our first 50-70 days, then in next 50-70 days we are able to retain 80%
and further this % reduces to 40% by the end of 200 days.
 Customers who are having International Plan OR those who call Customer Care
for their service related queries churn more. These two factors are the most
influential ones for a customer to be churned. So, one possible measure can be like
company can always keep a check for all the customers who call Customer Care
more frequently and can pool them in “Possible to be Churned Customers in future”,
so that company can take their issues on priority or address them rapidly. Same
measure/check can be implemented for International Plan Customers as well.
contd..
Summary
 Out of various models developed for Prediction of Customers going to be
churned, Decision Trees Model (CART) and Random Forest suits best for our
business case. Both the models are almost similar w.r.t each other. The accuracy
for both the models is almost same i.e. 93.05 %. The only difference between their
performance is, RF is able to correctly predict 62.99% of customers to be churned
as true but also with the false out ratio of 1.83%. That means apart from predicting
62.99% correctly it is also predicting incorrectly for some 1.83% customers as
churned(which is not true).
Where as Decision Tree(CART) model is able to predict correctly 65.35%
customers to be churned as true but also predicts incorrectly for some 2.68%
customers as churned(which also is not true). So it depends upon Business which
model it would like to use if it would like to go for higher predictive power at cost of
some Fall Out ratio/error deviation Decision Tree is best. ELSE if company wants to
stress more on minimizing error rate than RF can be the best bet.
contd..
In God we trust, all others must bring Data…..

Más contenido relacionado

La actualidad más candente

Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxAniket Patil
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET Journal
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomChris Chen
 
Telecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analyticsTelecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analyticssheetal sharma
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenPoo Kuan Hoong
 
CUSTOMER CHURN ANALYSIS.pdf
CUSTOMER CHURN ANALYSIS.pdfCUSTOMER CHURN ANALYSIS.pdf
CUSTOMER CHURN ANALYSIS.pdfSateesh Godewar
 
Recommender systems in practice
Recommender systems in practiceRecommender systems in practice
Recommender systems in practiceBigData Republic
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
Decision tree for Predictive Modeling
Decision tree for Predictive ModelingDecision tree for Predictive Modeling
Decision tree for Predictive ModelingEdureka!
 
Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentationRamandeep Kaur Bagri
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationVishva Abeyrathne
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerceAlexander Konduforov
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionalityNikhil Sharma
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card PredictionAlexandre Pinto
 

La actualidad más candente (20)

Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Churn modelling
Churn modellingChurn modelling
Churn modelling
 
IRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom IndustryIRJET - Customer Churn Analysis in Telecom Industry
IRJET - Customer Churn Analysis in Telecom Industry
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in Telecom
 
Telcom churn .pptx
Telcom churn .pptxTelcom churn .pptx
Telcom churn .pptx
 
Telecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analyticsTelecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analytics
 
Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
 
Predicting the e-commerce churn
Predicting the e-commerce churnPredicting the e-commerce churn
Predicting the e-commerce churn
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
CUSTOMER CHURN ANALYSIS.pdf
CUSTOMER CHURN ANALYSIS.pdfCUSTOMER CHURN ANALYSIS.pdf
CUSTOMER CHURN ANALYSIS.pdf
 
Recommender systems in practice
Recommender systems in practiceRecommender systems in practice
Recommender systems in practice
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Decision tree for Predictive Modeling
Decision tree for Predictive ModelingDecision tree for Predictive Modeling
Decision tree for Predictive Modeling
 
Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentation
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
 
Recommender systems for E-commerce
Recommender systems for E-commerceRecommender systems for E-commerce
Recommender systems for E-commerce
 
Curse of dimensionality
Curse of dimensionalityCurse of dimensionality
Curse of dimensionality
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card Prediction
 

Similar a Telecom Churn Analysis

Open06
Open06Open06
Open06butest
 
Om0010 operations management
Om0010 operations managementOm0010 operations management
Om0010 operations managementsmumbahelp
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance OptimizationAlbert Chu
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning projectAlex Austin
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaTrushita Redij
 
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET Journal
 
CUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9i
CUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9iCUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9i
CUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9iAkash Gupta
 
Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...eSAT Publishing House
 
Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...eSAT Journals
 
Documentation on bigmarket copy
Documentation on bigmarket   copyDocumentation on bigmarket   copy
Documentation on bigmarket copyswamypotharaveni
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate templateSteven Bonacorsi
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate templateSteven Bonacorsi
 
Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022SkillCertProExams
 
maXbox Starter 43 Work with Code Metrics ISO Standard
maXbox Starter 43 Work with Code Metrics ISO StandardmaXbox Starter 43 Work with Code Metrics ISO Standard
maXbox Starter 43 Work with Code Metrics ISO StandardMax Kleiner
 
Betsol | Machine Learning for IT Project Estimates
Betsol | Machine Learning for IT Project Estimates  Betsol | Machine Learning for IT Project Estimates
Betsol | Machine Learning for IT Project Estimates BETSOL
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learningSandeep Garg
 
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
Customer Satisfaction Data -  Multiple Linear Regression Model.pdfCustomer Satisfaction Data -  Multiple Linear Regression Model.pdf
Customer Satisfaction Data - Multiple Linear Regression Model.pdfruwanp2000
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersMohitMhapuskar
 

Similar a Telecom Churn Analysis (20)

Open06
Open06Open06
Open06
 
Om0010 operations management
Om0010 operations managementOm0010 operations management
Om0010 operations management
 
Customer analytics for e commerce
Customer analytics for e commerceCustomer analytics for e commerce
Customer analytics for e commerce
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Insurance Optimization
Insurance OptimizationInsurance Optimization
Insurance Optimization
 
Workshop: Your first machine learning project
Workshop: Your first machine learning projectWorkshop: Your first machine learning project
Workshop: Your first machine learning project
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_Trushita
 
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov DisplayIRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
IRJET- Slant Analysis of Customer Reviews in View of Concealed Markov Display
 
CUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9i
CUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9iCUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9i
CUSTOMER CARE ADMINISTRATION-developer-2000 and oracle 9i
 
Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...
 
Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...
 
Documentation on bigmarket copy
Documentation on bigmarket   copyDocumentation on bigmarket   copy
Documentation on bigmarket copy
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate template
 
Improve phase lean six sigma tollgate template
Improve phase   lean six sigma tollgate templateImprove phase   lean six sigma tollgate template
Improve phase lean six sigma tollgate template
 
Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022Google machine learning engineer exam dumps 2022
Google machine learning engineer exam dumps 2022
 
maXbox Starter 43 Work with Code Metrics ISO Standard
maXbox Starter 43 Work with Code Metrics ISO StandardmaXbox Starter 43 Work with Code Metrics ISO Standard
maXbox Starter 43 Work with Code Metrics ISO Standard
 
Betsol | Machine Learning for IT Project Estimates
Betsol | Machine Learning for IT Project Estimates  Betsol | Machine Learning for IT Project Estimates
Betsol | Machine Learning for IT Project Estimates
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
Customer Satisfaction Data -  Multiple Linear Regression Model.pdfCustomer Satisfaction Data -  Multiple Linear Regression Model.pdf
Customer Satisfaction Data - Multiple Linear Regression Model.pdf
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
 

Último

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 

Último (20)

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 

Telecom Churn Analysis

  • 1. MACHINE LEARNING Telecom Churn Analysis DATA SCIENCE PROJECT
  • 2. Team Info  Ali Qaiser Syed  Annada Prasad  Chia Aik Lee  Diego Jaramillo  Kiran Devathi  Manoj Kalamkar  Praveen Parvataneni  Subrata Roy  Ravi Bodkai  Vasudev Pendyala
  • 3. Domain Topic Telecom Churn Analysis Telecom Churn (loss of customers to competition) is a problem for telecom companies because it is expensive to acquire a new customer and companies want to retain their existing customers. Most telecom companies suffer from voluntary churn. Churn rate has strong impact on the life time value of the customer because it affects the length of service and the future revenue of the company. For example if a company has 25% churn rate then, the average customer lifetime is 4 years; similarly a company with a churn rate of 50%, has an average customer lifetime of 2 years. In the targeted approach the company tries to identify in advance customers who are likely to churn. The company then targets those customers with special programs or incentives. This approach can bring in huge loss for a company, if churn predictions are inaccurate, because then firms are wasting incentive money on customers who would have stayed anyway.
  • 4. Introduction Project Objective  To predict Customer Churn.  Highlighting the main variables/factors influencing Customer Churn.  Use various ML algorithms to build prediction models, evaluate the accuracy and performance of these models.  Finding out the best model for our business case & providing executive Summary.
  • 5. Dataset Description  Source provided by Upx Academy for data science machine learning project evaluation  Source dataset is in txt format with csv.  Dataset contains 4617 rows and 21 columns  There is no missing values for the provided input dataset.  Churn_status is the variable which notifies whether a particular customer is churned or not. And we will be developing our models to predict this variable only. Column Names State factor Account_Len integer Area integer Ph_No. factor Int_Plan factor Vmail_Plan factor messgs integer tot_day_mins numbe r tot_day_calls integer tot_day_chrgs numbe r tot_evening_mins numbe r Column Names tot_evening_calls integer tot_evening_chrgs numbe r tot_ngt_mins numbe r tot_ngt_calls integer tot_ngt_chrgs numbe r tot_int_mins numbe r tot_int_calls integer tot_int_chrgs numbe r cust_calls_made Integer churn_status factor
  • 6. Model Building Steps Define Problem Data Collection Data Preparation Perform EDA Feature Selection Build Models Validate & Measure Models Performance Improve Models Performances Execute Models for Prediction Select the Best Fit Model for our Business Case
  • 7. Steps Description DefineProblem In this project, we will be addressing Customer churn problem for a fictitious US Telecom company. UpX academy has provided us the usage pattern for 4617 customers for over a period of time along with the info whether particular customer is churned or not. Utilizing the input data, we have to build a model which can predict the customer going to be churned well in advance. contd.. DataCollection The input data has been provided by UpX academy in txt format with comma separated values. Read in the data into R workbook by appropriate methods ensuring without loosing the data. Here for this loading the data into R we have used read.table command with sep = ‘,’ . As columns names were not part of input dataset and we have provided them separately, hence we have provided the col.names as well to read.table command.
  • 8. Steps Description DataPreparation Perform general housekeeping activities such as checking for the completeness of the content, look at the dimensions & review the structure of input dataset, peek into the data, summarize the data, and get a snapshot of all the features. Check if there is any missing data for particular customers, luckily for our case there is no missing data. As a final step, transform the data into a format ready to apply with the algorithms and modeling. contd.. PerformEDA Perform EDA (exploratory data analysis) to understand the data and its applicability on the problem. Understand how the data and its features are interrelated & correlated, evaluate presence of outliers and its effects. Here we will use various box plots and bar plots to understand what are the features which majorly impact on our outcome variable ‘churn_status’. We will see in our next slides the various outcomes from our EDA.
  • 12. Steps Description contd.. PerformEDA Survival Analysis (Cox Hazard Method) There are several Survival analysis methods which are applied to examine the time it takes for particular event to occur. For our business problem, we can make a case like, the time it takes for a existing customer to churn. As we have one of the input variable as Account_Length (in number of days) which denotes period of stay for a particular customer before he/she churned, our input data fits right for applying Survival Analysis. Create a Survival Object & Cox Model:- train_survival$survival <- Surv(train_survival$Account_Len, train_survival$churn_status == 2) results <- coxph(survival ~ Int_Plan + Vmail_Plan + messgs + tot_day_calls + tot_day_chrgs + tot_day_mins + + tot_int_calls + tot_int_chrgs + tot_ngt_chrgs + tot_int_mins + cust_calls_made, data = train_survival)
  • 13. Steps Description contd.. PerformEDA Survival Analysis (Cox Hazard Method) Now lets see what does the summary stats from Cox Model convey us :- coef exp(coef) se(coef) z p Int_Plan yes 1.29e+00 3.64e+00 1.12e-01 11.53 <2e-16 Note:- From the summary Vmail_Plan yes -1.74e+00 1.75e-01 5.54e-01 -3.14 0.0017 we can see that the messgs 3.38e-02 1.03e+00 1.68e-02 2.02 0.0438 most statistically tot_day_calls 1.14e-03 1.00e+00 2.50e-03 0.46 0.6480 significant coef are tot_day_chrgs 8.29e+00 3.97e+03 1.73e+01 0.48 0.6314 Int_Plan_yes & tot_day_mins -1.40e+00 2.47e-01 2.94e+00 -0.48 0.6336 cust_calls_made. Have tot_int_calls -6.52e-02 9.37e-01 2.26e-02 -2.89 0.0039 also checked the tot_int_chrgs -1.46e+01 4.47e-07 1.79e+01 -0.82 0.4131 Proportional Hazards tot_ngt_chrgs 6.75e-02 1.07e+00 2.17e-02 3.11 0.0019 for these.(check R tot_int_mins 4.03e+00 5.64e+01 4.82e+00 0.84 0.4032 code). cust_calls_made 3.07e-01 1.36e+00 3.16e-02 9.72 <2e-16
  • 14. Steps Description contd.. PerformEDA Survival Analysis (Cox Hazard Method) Lets plot now the survival plot for our Customers. So, from the figure we can see that for the first 50-70 days we have no customer churn, that means we are able to retain all of our customers in the first 50-70 days. For next 50-70 days the percentage of not churned reduces almost from 100% to 80% approx. And decreases considerably in such a way that, after 200 days of tenure the percentage of not churned comes down almost to 40 %. That means by the end of completing 200 days tenure, 60% of our customers are churned who joined us on day 1.
  • 15. Steps Description contd.. PerformEDA Main Observations from our EDA  Usage pattern for Churned Customers which mainly includes minutes (Day/Evening/Night/Int) and their charges incurred is higher than the Not Churned Customers. So, Churned Customers seems to be relatively high value customers & more business relevant.  Both Box and Bar plots are complementing each other with their findings.  Box plot for Customer Calls made gives us enough proofs that customers going to be Churned Call Customer Care many times before getting churned.  Lucky enough to have no missing values in our input dataset.  From Survival Analysis it is confirmed that Customer Calls Made & International Plan are most influential features. Total Day Mins Used is the also one suggested by our traditional EDA methods.
  • 16. Steps Description Feature Selection After EDA, its now possible for us to identify the major features/variables which influence significantly Customer Churn, we will now use these features for our model preparation. Further we can Drop the features that do/may not influence our outcome variable. contd.. BuildModels Before starting with Building Model, we will segregate our input data in 3 datasets with proportions 60(train):20(test):20(pred for final prediction). As our business case is for classification model, we will be building models for all possible ML algorithms which can be applied for Classification problem. Our approach will be to proceed with ML algorithm one by one and at the same time will be validating/improving the each ML algorithm outcomes.
  • 17. Steps Description BuildModels Logistic Regression logit.model = glm(churn_status ~ Account_Len + tot_day_chrgs + tot_evening_chrgs + tot_ngt_chrgs + tot_int_chrgs + cust_calls_made, family = "binomial", data = train) contd.. Validate&Measure Now lets Test the above build model on test dataset, as the model is a logistic regression we assume for any observation if the probability of customer to be churned is > 0.5 we will assume that customer is churned and similarly if < 0.5 , we will assume its not churned. Lets validate and measure our assumption . Logit Pred False. True. False. 115741 98.27 115 Sensitivit y Specifici ty Accurac y 85.48 10.15 1313True.
  • 18. Steps Description contd.. FinalPrediction ImproveModel Performance Variating the Threshold Value In previous step we used our threshold value as 0.5 to decide whether customer is churned or not depending upon whether the value is above 0.5 or not. Lets variate the threshold value a bit and check if the performance of the model is increasing or not. Upon testing for various values (0.4 & 0.3), it was found that the performance of the overall model increases when threshold value is selected as 0.3 . We request you please refer to the attached R code for more details on this. Final Prediction with threshold value as 0.3 on pred dataset, and calculate the model performance for one last time for logit model. Logit Pred False. True. False. 86756 91.85 115 Sensitivit y Specifici ty Accurac y 84.21 32.28 4167True.
  • 19. Steps Description BuildModels K-nearest neighbour knn_model <- knn(train = knn_train, test = knn_test,cl = train$churn_status, k=53 ) k= 53 has been calculated as the sqrt of number of observations 2785 contd.. Validate&Measure Validating the above model against our test data set, lets see what our confusion metrix and other performance parameters convey :- 99.73 115 Sensitivit y Specifici ty Accurac y 87.55 13.28 Knn.Pre d False. True. False. 111752 172True.
  • 20. Steps Description contd.. FinalPrediction ImproveModel Performance -> Using Elbow Method to identify Optimum value of k. -> Normalize the data to supress the scale irregularities as knn works on distance between the nearest neighbors. Final Prediction after applying above improvement techniques on pred dataset, lets calculate the Final model performance for knn algorithm Knn.Pre d False. True. False. 84805 92.04 115 Sensitivit y Specifici ty Accurac y 89.26 33.85 4318True.
  • 21. Steps Description BuildModels Decision Trees tree_model <- tree(churn_status ~.-State-Area-Ph_No., data = train) Classification tree: tree Variables actually used in tree construction: "tot_day_mins" "cust_calls_made" "Int_Plan" "tot_evening_mins" "Vmail_Plan“ "tot_int_calls" "tot_int_mins" "tot_ngt_mins" , Number of terminal nodes: 15 contd.. Validate&Measure Validating the above model against our test data set, lets see what our confusion matrix and other performance parameters convey :- 98.01 115 Sensitivit y Specifici ty Accurac y 93.99 70.31 Tree.Pre d False. True. False. 38739 9015True.
  • 22. Steps Description contd.. FinalPrediction ImproveModel Performance -> Pruning, so that the model doesn’t overfit the dataset. tree_validate <- cv.tree(object = tree_model,FUN = prune.misclass ) plot(x=tree_validate$size,y=tree_validate$dev, type="b") From the plot one can see that tree_validate$dev diff is same from for tree levels 11-14, so can we assume the best tree level size to be 12 (rather than original number 15) at the cost of some bias.Final Prediction after applying above improvement techniques on pred dataset, lets calculate the Final model performance for decision trees algorithm Tree Pred False. True. False. 44801 97.32 115 Sensitivit y Specifici ty Accurac y 93.05 65.35 8322True.
  • 23. Steps Description BuildModels Random Forest random_model <- randomForest(as.factor(churn_status)~ Account_Len + Int_Plan + Vmail_Plan + tot_day_mins + tot_evening_mins + tot_ngt_mins + tot_int_mins + cust_calls_made,data = train,ntree = 2000, importance = TRUE) Please note the variable importance plot also compliments our earlier findings from EDA. contd.. Validate&Measure Validating the above model against our test data set, lets see what our confusion matrix and other performance parameters convey :- 98.93 115 Sensitivit y Specifici ty Accurac y 94.10 64.62 Ran.Pre d False. True. False. 44746 848True.
  • 24. Steps Description contd.. FinalPrediction ImproveModel Performance -> Performance Boosting by Cross Validation with folds k=10. train_control <- trainControl(method="cv", number=10) random_model_train <- train(as.factor(churn_status)~ Account_Len + Int_Plan + Vmail_Plan + tot_day_mins + tot_evening_mins + tot_ngt_mins + tot_int_mins + cust_calls_made, data=train, method="rf", metric="Accuracy", trControl=train_control) Final Prediction after applying above improvement techniques on pred dataset, lets calculate the Final model performance for Random Forest algorithm Ran Pred False. True. False. 47808 98.17 115 Sensitivit y Specifici ty Accurac y 93.05 62.99 8015True.
  • 25. Steps Description BuildModels Support Vector Machine svm_model<-svm(churn_status ,data=train,kernel='radial',gamma=1,cost=100) SVM-Type: C-classification SVM-Kernel: radial cost: 100 gamma: 1 Number of Support Vectors: 2785 contd.. Validate&Measure Validating the above model against our test data set, lets see what our confusion matrix and other performance parameters convey :- 100 115 Sensitivit y Specifici ty Accurac y 85.49 0 SVM.Pre d False. True. False. 128754 00True.
  • 26. Steps Description contd.. FinalPrediction ImproveModel Performance -> Using Cross Validation technique to optimise the best values for gamma and cost. svm.tune <- tune(svm,churn_status ~ Account_Len + Int_Plan + Vmail_Plan + tot_day_mins + tot_evening_mins + tot_ngt_mins + tot_int_mins + cust_calls_made,data=train,kernel='radial',ranges = list(cost = c(0.1,1,10,100,1000), gamma = c(0.5,1,2,3,4))) Final Prediction after applying above improvement techniques on pred dataset, lets calculate the Final model performance for SVM algorithm Ran Pred False. True. False. 59813 98.78 115 Sensitivit y Specifici ty Accurac y 92.74 55.54 6810True.
  • 27. Steps Description In the past few slides for our Classification Business Problem, we have created many models, measured their performances, applied various improvement techniques and at the end of each step we created best model for each corresponding ML algorithms. As we have the performance results with us for the models, lets select the best one out of them which suits best for our business case. contd.. SelecttheBestFitModel Logit kNN Decision T Random F SVM Accuracy 84.21 89.26 93.05 93.05 92.74 Specificity 91.85 92.04 97.32 98.17 98.78 Sensitivity 32.28 33.85 65.35 62.99 55.54 Decision Trees (CART) model / Random Forest suits best for our business scenario.
  • 28. Summary  Customers Churned were high business Value customers, their usage pattern was high as compared to not churned.  Analysis from Survival Model suggests that we are able to retain all our customers in our first 50-70 days, then in next 50-70 days we are able to retain 80% and further this % reduces to 40% by the end of 200 days.  Customers who are having International Plan OR those who call Customer Care for their service related queries churn more. These two factors are the most influential ones for a customer to be churned. So, one possible measure can be like company can always keep a check for all the customers who call Customer Care more frequently and can pool them in “Possible to be Churned Customers in future”, so that company can take their issues on priority or address them rapidly. Same measure/check can be implemented for International Plan Customers as well. contd..
  • 29. Summary  Out of various models developed for Prediction of Customers going to be churned, Decision Trees Model (CART) and Random Forest suits best for our business case. Both the models are almost similar w.r.t each other. The accuracy for both the models is almost same i.e. 93.05 %. The only difference between their performance is, RF is able to correctly predict 62.99% of customers to be churned as true but also with the false out ratio of 1.83%. That means apart from predicting 62.99% correctly it is also predicting incorrectly for some 1.83% customers as churned(which is not true). Where as Decision Tree(CART) model is able to predict correctly 65.35% customers to be churned as true but also predicts incorrectly for some 2.68% customers as churned(which also is not true). So it depends upon Business which model it would like to use if it would like to go for higher predictive power at cost of some Fall Out ratio/error deviation Decision Tree is best. ELSE if company wants to stress more on minimizing error rate than RF can be the best bet. contd..
  • 30. In God we trust, all others must bring Data…..