SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Project Bank Marketing
By: Rupa Dutta
Gautam Buddha - A philosopher and a thinker
Legends have it that he
obtained enlightenment
sitting under a tree and
advocated to the world
a new philosophy that
is called..
The Middle Path
Underfitting model Overfitting model
In the world of analytics, we are often faced with the challenge to avoid under-fitting and overfitting
models and find a balance. A balanced model has better chances of working for previously unseen
data. I would like to present this Data mining project as a quest for finding a reasonable model that
fares well against all matrices i.e. finding that “middle-path”
Optimalmodel
Critical to avoiding an
under-Fitting model is:
• Gather enough but not too much data
• Identify and get rid of noise and outliers
• Remove irrelevant features - can confuse models
• Massage the data well
• Identify nominal , ordinal and continuous feature
• Condition the features before feeding to models
Critical to avoiding an
overfitting model is:
• Test , test and more test
• Cross validate models against different mix
• Weigh against multiple performance matrices
• If possible, test against real time unseen data
Let’s get started…
Business problem at hand
Feature analysis - interesting observation
Feature selection and transformation
Model building and evaluation
Conclusion
Business problem at hand
What we have
Data gathered from recent campaign by a bank
Campaign was about getting people to sign up for term deposits
We have customer information along with information whether those
customer signed up for the term deposit
What we want
A machine learning model that can tell if a new customer is likely to
sign up for term deposit
Feature analysis -
interesting observation
Feature Analysis
Will a new customer sign up for term deposit?
Strong indicator for yes.. Previous outcome
65 % who previously said yes said yes again!!!
Although a lot of outcomes were unknown, still a good feature
0
17.5
35
52.5
70
Previously said yes said no
%whoSignedUpforTermDeposit
Feature Analysis
Will a new customer sign up for term deposit?
Strong indicator for yes.. Housing Loan
20 % of those who did not have a housing loan said yes!!!
0
5.5
11
16.5
22
No Housing Loan Housing Loan
%whoSignedUpforTermDeposit
Feature Analysis
Will a new customer sign up for term deposit?
Strong indicator for yes.. Loan Default
13% of those who had no loan default said yes.
Nobody with loan default said yes - type of info that classification algorithms can use
0
3.5
7
10.5
14
No Loan Default Loan Default
%whoSignedUpforTermDeposit
Feature Analysis
Will a new customer sign up for term deposit?
Moderate indicator for yes.. Age
Percentage almost constant across wide range - not much of a differentiating factor
% who Signed Up for Term Deposit
21
24
27
30
33
36
39
42
45
48
51
54
57
60
0 10 20 30 40
Feature selection and
transformation
Like a country is only as good as it’s people, a model
is only as good as quality of input data
Feature selection and transformation
Feature Selection table
Feature Description Pre-processing
age Continuous None
job Categorical Converted to Binary Matrix
marital status Categorical Converted to Binary Matrix
education Categorical
Converted to ordinal. 1 = primary,
2 = secondary, 3 = Tertiary
has credit in default?average yearly balance Continuous Numerically scaled
contact communication mode Categorical Discarded, feature irrelevant
last contact day of the month Categorical Discarded, feature irrelevant
last contact month of year Categorical Discarded, feature irrelevant
last contact duration Continuous Numerically scaled
Feature selection and transformation
Feature Selection table
Feature Description Pre-processing
number of contacts performed
during this campaign
Continuous Numerically scaled
number of days that passed by
after the client was last
contacted
Continuous Weak feature, discarded
marital status Categorical Converted to Binary Matrix
outcome of the previous
marketing campaign
YES/NO Converted to Binary
has credit in default? YES/NO Converted to Binary
has housing loan? YES/NO Converted to Binary
has personal loan? YES/NO Converted to Binary
Feature selection and transformation
Special mention about pre-processing done on education -
Analysis showed that higher the education level, more are the chances of a person signing up for term
deposit. Converting education to a binary matrix would have caused this information to be lost.
Therefore, the categories were manually converted to numerical scale of 1,2 and 3 with 1 = primary
and 3 = tertiary
0
3.5
7
10.5
14
Primary Secondary Tertiary
%whoSignedUpforTermDeposit
Feature selection and transformation
The special processing of “education” feature improved MCC score of several algorithm, specially of
gradient descent and AdaBoost that rely heavily on previous errors
0.38
0.39
0.4
0.41
0.42
Gradiant Descent AdaBoost
MCCscores
Model building and evaluation
Model building and evaluation
Choice of models - ensemble models
Random Forest - everyone’s favourite
An ensemble model that combines decision trees
Parameters used
Depth = 5
No of classifiers = 100
AdaBoost - acclaimed
Developed in 2003, it is considered one of the
Best out-of-the box classifier. Combines several
Weak algorithms and learns from mistakes. .
Less susceptible to overfitting
Model building and evaluation
Choice of models - non- ensemble models - linear models worked well on the data!
Model building and evaluation
Matthews Correlation Coefficient scores of each model
Moderate
Strong
Any model with MCC score greater
then 0.40 is considered strong.
According to stats, 4 different models
qualify, with gradient descent scoring
the most. Does it mean Gradient Descent
Is the right choice? Is it a good fit?
The real question is: does it overfit?
Gradient Descent
AdaBoost
Regression
Neural Net
G
radientD
escent
AdaBoost
R
egression
N
euralN
et
Model building and evaluation
Let’s seek the answer using evaluation metrics from 5 fold cross validation
5-fold cross validation - Matrix Accuracy
Gradient Descent
AdaBoost
Regression
Neural Net
Gradient Descent
AdaBoost
Regression
Neural Net
5-fold cross validation - Matrix ROC score
Model building and evaluation
Preferred Model
AdaBoost
MCC Score = 0.41 Accuracy = 90% ROC score = 0.88
Conclusion
Conclusion
linear ensemble models fitted well
With more effort, a better relationship of the features can be gleaned. For
example, marital status is strongly related to financial position. Such
information can help improve the models further
Quest for an optimal model demonstrated that cross validation is an quite an
useful strategy that can not only save time in testing but also assist in making
a better choice of model
In real world scenario, won’t harm to test all 4 top models on unseen data
May the light of Buddha’s wisdom be shown
on all of us and guide us towards good fitting
models.
Final Thoughts ….

Más contenido relacionado

La actualidad más candente

Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AIGary Allemann
 
Advanced analytics proposal review guide
Advanced analytics proposal review guideAdvanced analytics proposal review guide
Advanced analytics proposal review guideEddy Ti
 
The IoT Academy training part3 AI model
The IoT Academy training part3 AI modelThe IoT Academy training part3 AI model
The IoT Academy training part3 AI modelThe IOT Academy
 
Basics of AB testing in online products
Basics of AB testing in online productsBasics of AB testing in online products
Basics of AB testing in online productsAshish Dua
 
Talent Week presentation - Sarah Marrs
Talent Week presentation - Sarah MarrsTalent Week presentation - Sarah Marrs
Talent Week presentation - Sarah MarrsQualtrics
 
Business Analytics
Business AnalyticsBusiness Analytics
Business AnalyticsLambert1035
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101Ashish Dua
 
Common Data Driven Mistakes with HAVI's Sr. Director of Advanced Analytics
Common Data Driven Mistakes with HAVI's Sr. Director of Advanced AnalyticsCommon Data Driven Mistakes with HAVI's Sr. Director of Advanced Analytics
Common Data Driven Mistakes with HAVI's Sr. Director of Advanced AnalyticsPromotable
 
Sentiment analysis presentation
Sentiment analysis presentationSentiment analysis presentation
Sentiment analysis presentationGunjanSrivastava23
 
Scientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of dataScientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of dataWilliam Grosso
 
Learn How to Make Machine Learning Work
Learn How to Make Machine Learning WorkLearn How to Make Machine Learning Work
Learn How to Make Machine Learning WorkiTrainMalaysia1
 
Business statistics done
Business statistics doneBusiness statistics done
Business statistics donesmumbahelp
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender modelsParmeshwar Khurd
 
Machine Learning Basics using Azure ML
Machine Learning Basics using Azure MLMachine Learning Basics using Azure ML
Machine Learning Basics using Azure MLKarthikeyan VK
 
Analytical think and quantitative reasoning
Analytical think and quantitative reasoningAnalytical think and quantitative reasoning
Analytical think and quantitative reasoningLijo Tom Jose Vattamala
 
Perception Analyzer Overview
Perception Analyzer OverviewPerception Analyzer Overview
Perception Analyzer Overviewmdulle
 
Explainable AI
Explainable AIExplainable AI
Explainable AIDinesh V
 
How a Step-by-Step CRO Approach Helped Baby Tula Drive 16% More Revenue |. VW...
How a Step-by-Step CRO Approach Helped Baby Tula Drive 16% More Revenue |. VW...How a Step-by-Step CRO Approach Helped Baby Tula Drive 16% More Revenue |. VW...
How a Step-by-Step CRO Approach Helped Baby Tula Drive 16% More Revenue |. VW...VWO
 

La actualidad más candente (20)

Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
TURF Analysis
TURF Analysis TURF Analysis
TURF Analysis
 
Advanced analytics proposal review guide
Advanced analytics proposal review guideAdvanced analytics proposal review guide
Advanced analytics proposal review guide
 
The IoT Academy training part3 AI model
The IoT Academy training part3 AI modelThe IoT Academy training part3 AI model
The IoT Academy training part3 AI model
 
Basics of AB testing in online products
Basics of AB testing in online productsBasics of AB testing in online products
Basics of AB testing in online products
 
Talent Week presentation - Sarah Marrs
Talent Week presentation - Sarah MarrsTalent Week presentation - Sarah Marrs
Talent Week presentation - Sarah Marrs
 
Business Analytics
Business AnalyticsBusiness Analytics
Business Analytics
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101
 
Common Data Driven Mistakes with HAVI's Sr. Director of Advanced Analytics
Common Data Driven Mistakes with HAVI's Sr. Director of Advanced AnalyticsCommon Data Driven Mistakes with HAVI's Sr. Director of Advanced Analytics
Common Data Driven Mistakes with HAVI's Sr. Director of Advanced Analytics
 
Sentiment analysis presentation
Sentiment analysis presentationSentiment analysis presentation
Sentiment analysis presentation
 
Scientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of dataScientific revenue unreasonable effectiveness of data
Scientific revenue unreasonable effectiveness of data
 
Learn How to Make Machine Learning Work
Learn How to Make Machine Learning WorkLearn How to Make Machine Learning Work
Learn How to Make Machine Learning Work
 
Business statistics done
Business statistics doneBusiness statistics done
Business statistics done
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Machine Learning Basics using Azure ML
Machine Learning Basics using Azure MLMachine Learning Basics using Azure ML
Machine Learning Basics using Azure ML
 
Analytical think and quantitative reasoning
Analytical think and quantitative reasoningAnalytical think and quantitative reasoning
Analytical think and quantitative reasoning
 
Perception Analyzer Overview
Perception Analyzer OverviewPerception Analyzer Overview
Perception Analyzer Overview
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
How a Step-by-Step CRO Approach Helped Baby Tula Drive 16% More Revenue |. VW...
How a Step-by-Step CRO Approach Helped Baby Tula Drive 16% More Revenue |. VW...How a Step-by-Step CRO Approach Helped Baby Tula Drive 16% More Revenue |. VW...
How a Step-by-Step CRO Approach Helped Baby Tula Drive 16% More Revenue |. VW...
 

Similar a Data mining - Machine Learning

The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
How to Effectively Experiment in PM by LendingTree Sr PM
How to Effectively Experiment in PM by LendingTree Sr PMHow to Effectively Experiment in PM by LendingTree Sr PM
How to Effectively Experiment in PM by LendingTree Sr PMProduct School
 
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionBig Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionMatt Stubbs
 
Data-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingData-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingJack Nguyen (Hung Tien)
 
Patrick McKenzie Opticon 2014: Advanced A/B Testing
Patrick McKenzie Opticon 2014: Advanced A/B TestingPatrick McKenzie Opticon 2014: Advanced A/B Testing
Patrick McKenzie Opticon 2014: Advanced A/B TestingPatrick McKenzie
 
Validation and hypothesis based product management by Abdallah Al-Khalidi
Validation and hypothesis based  product management by Abdallah Al-KhalidiValidation and hypothesis based  product management by Abdallah Al-Khalidi
Validation and hypothesis based product management by Abdallah Al-KhalidiAbdallah Al-Khalidi
 
Business Intelligence Using SAS Final Presentation
Business Intelligence Using SAS Final PresentationBusiness Intelligence Using SAS Final Presentation
Business Intelligence Using SAS Final PresentationJodi Liu
 
Lead scoring case study presentation
Lead scoring case study presentationLead scoring case study presentation
Lead scoring case study presentationMithul Murugaadev
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationSara Hooker
 
Lead Scoring Case Study
Lead Scoring Case StudyLead Scoring Case Study
Lead Scoring Case StudyLumbiniSardare
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?Ganes Kesari
 
Steve Lawrence - Agile Metrics
Steve Lawrence - Agile MetricsSteve Lawrence - Agile Metrics
Steve Lawrence - Agile MetricsAgileNZ Conference
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learningShishir Choudhary
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePedro Ecija Serrano
 
MonetizingStatistics
MonetizingStatisticsMonetizingStatistics
MonetizingStatisticsAaron Sankey
 
How and Why eCornell Does Agile Marketing
How and Why eCornell Does Agile MarketingHow and Why eCornell Does Agile Marketing
How and Why eCornell Does Agile MarketingRob Kingyens
 
Hair_EOMA_1e_Chap001_PPT.pptx
Hair_EOMA_1e_Chap001_PPT.pptxHair_EOMA_1e_Chap001_PPT.pptx
Hair_EOMA_1e_Chap001_PPT.pptxAsadAli104515
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment Kunal Kashyap
 

Similar a Data mining - Machine Learning (20)

The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
How to Effectively Experiment in PM by LendingTree Sr PM
How to Effectively Experiment in PM by LendingTree Sr PMHow to Effectively Experiment in PM by LendingTree Sr PM
How to Effectively Experiment in PM by LendingTree Sr PM
 
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing AttributionBig Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
Big Data LDN 2017: Advanced Analytics Applied to Marketing Attribution
 
Data-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B TestingData-Driven UI/UX Design with A/B Testing
Data-Driven UI/UX Design with A/B Testing
 
Patrick McKenzie Opticon 2014: Advanced A/B Testing
Patrick McKenzie Opticon 2014: Advanced A/B TestingPatrick McKenzie Opticon 2014: Advanced A/B Testing
Patrick McKenzie Opticon 2014: Advanced A/B Testing
 
Validation and hypothesis based product management by Abdallah Al-Khalidi
Validation and hypothesis based  product management by Abdallah Al-KhalidiValidation and hypothesis based  product management by Abdallah Al-Khalidi
Validation and hypothesis based product management by Abdallah Al-Khalidi
 
Business Intelligence Using SAS Final Presentation
Business Intelligence Using SAS Final PresentationBusiness Intelligence Using SAS Final Presentation
Business Intelligence Using SAS Final Presentation
 
Lead scoring case study presentation
Lead scoring case study presentationLead scoring case study presentation
Lead scoring case study presentation
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Lead Scoring Case Study
Lead Scoring Case StudyLead Scoring Case Study
Lead Scoring Case Study
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
 
Steve Lawrence - Agile Metrics
Steve Lawrence - Agile MetricsSteve Lawrence - Agile Metrics
Steve Lawrence - Agile Metrics
 
Introduction to machine learning and deep learning
Introduction to machine learning and deep learningIntroduction to machine learning and deep learning
Introduction to machine learning and deep learning
 
Metrics patterns session discussion at DAAG 2015
Metrics patterns session discussion at DAAG 2015Metrics patterns session discussion at DAAG 2015
Metrics patterns session discussion at DAAG 2015
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
 
MonetizingStatistics
MonetizingStatisticsMonetizingStatistics
MonetizingStatistics
 
How and Why eCornell Does Agile Marketing
How and Why eCornell Does Agile MarketingHow and Why eCornell Does Agile Marketing
How and Why eCornell Does Agile Marketing
 
Hair_EOMA_1e_Chap001_PPT.pptx
Hair_EOMA_1e_Chap001_PPT.pptxHair_EOMA_1e_Chap001_PPT.pptx
Hair_EOMA_1e_Chap001_PPT.pptx
 
Presentatie Webshop Wednesday 30 september 2015
Presentatie Webshop Wednesday 30 september 2015Presentatie Webshop Wednesday 30 september 2015
Presentatie Webshop Wednesday 30 september 2015
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 

Último

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 

Último (20)

Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 

Data mining - Machine Learning

  • 2. Gautam Buddha - A philosopher and a thinker
  • 3. Legends have it that he obtained enlightenment sitting under a tree and advocated to the world a new philosophy that is called..
  • 5. Underfitting model Overfitting model In the world of analytics, we are often faced with the challenge to avoid under-fitting and overfitting models and find a balance. A balanced model has better chances of working for previously unseen data. I would like to present this Data mining project as a quest for finding a reasonable model that fares well against all matrices i.e. finding that “middle-path” Optimalmodel
  • 6. Critical to avoiding an under-Fitting model is: • Gather enough but not too much data • Identify and get rid of noise and outliers • Remove irrelevant features - can confuse models • Massage the data well • Identify nominal , ordinal and continuous feature • Condition the features before feeding to models
  • 7. Critical to avoiding an overfitting model is: • Test , test and more test • Cross validate models against different mix • Weigh against multiple performance matrices • If possible, test against real time unseen data
  • 8. Let’s get started… Business problem at hand Feature analysis - interesting observation Feature selection and transformation Model building and evaluation Conclusion
  • 9. Business problem at hand What we have Data gathered from recent campaign by a bank Campaign was about getting people to sign up for term deposits We have customer information along with information whether those customer signed up for the term deposit What we want A machine learning model that can tell if a new customer is likely to sign up for term deposit
  • 11. Feature Analysis Will a new customer sign up for term deposit? Strong indicator for yes.. Previous outcome 65 % who previously said yes said yes again!!! Although a lot of outcomes were unknown, still a good feature 0 17.5 35 52.5 70 Previously said yes said no %whoSignedUpforTermDeposit
  • 12. Feature Analysis Will a new customer sign up for term deposit? Strong indicator for yes.. Housing Loan 20 % of those who did not have a housing loan said yes!!! 0 5.5 11 16.5 22 No Housing Loan Housing Loan %whoSignedUpforTermDeposit
  • 13. Feature Analysis Will a new customer sign up for term deposit? Strong indicator for yes.. Loan Default 13% of those who had no loan default said yes. Nobody with loan default said yes - type of info that classification algorithms can use 0 3.5 7 10.5 14 No Loan Default Loan Default %whoSignedUpforTermDeposit
  • 14. Feature Analysis Will a new customer sign up for term deposit? Moderate indicator for yes.. Age Percentage almost constant across wide range - not much of a differentiating factor % who Signed Up for Term Deposit 21 24 27 30 33 36 39 42 45 48 51 54 57 60 0 10 20 30 40
  • 15. Feature selection and transformation Like a country is only as good as it’s people, a model is only as good as quality of input data
  • 16. Feature selection and transformation Feature Selection table Feature Description Pre-processing age Continuous None job Categorical Converted to Binary Matrix marital status Categorical Converted to Binary Matrix education Categorical Converted to ordinal. 1 = primary, 2 = secondary, 3 = Tertiary has credit in default?average yearly balance Continuous Numerically scaled contact communication mode Categorical Discarded, feature irrelevant last contact day of the month Categorical Discarded, feature irrelevant last contact month of year Categorical Discarded, feature irrelevant last contact duration Continuous Numerically scaled
  • 17. Feature selection and transformation Feature Selection table Feature Description Pre-processing number of contacts performed during this campaign Continuous Numerically scaled number of days that passed by after the client was last contacted Continuous Weak feature, discarded marital status Categorical Converted to Binary Matrix outcome of the previous marketing campaign YES/NO Converted to Binary has credit in default? YES/NO Converted to Binary has housing loan? YES/NO Converted to Binary has personal loan? YES/NO Converted to Binary
  • 18. Feature selection and transformation Special mention about pre-processing done on education - Analysis showed that higher the education level, more are the chances of a person signing up for term deposit. Converting education to a binary matrix would have caused this information to be lost. Therefore, the categories were manually converted to numerical scale of 1,2 and 3 with 1 = primary and 3 = tertiary 0 3.5 7 10.5 14 Primary Secondary Tertiary %whoSignedUpforTermDeposit
  • 19. Feature selection and transformation The special processing of “education” feature improved MCC score of several algorithm, specially of gradient descent and AdaBoost that rely heavily on previous errors 0.38 0.39 0.4 0.41 0.42 Gradiant Descent AdaBoost MCCscores
  • 20. Model building and evaluation
  • 21. Model building and evaluation Choice of models - ensemble models Random Forest - everyone’s favourite An ensemble model that combines decision trees Parameters used Depth = 5 No of classifiers = 100 AdaBoost - acclaimed Developed in 2003, it is considered one of the Best out-of-the box classifier. Combines several Weak algorithms and learns from mistakes. . Less susceptible to overfitting
  • 22. Model building and evaluation Choice of models - non- ensemble models - linear models worked well on the data!
  • 23. Model building and evaluation Matthews Correlation Coefficient scores of each model Moderate Strong Any model with MCC score greater then 0.40 is considered strong. According to stats, 4 different models qualify, with gradient descent scoring the most. Does it mean Gradient Descent Is the right choice? Is it a good fit? The real question is: does it overfit? Gradient Descent AdaBoost Regression Neural Net G radientD escent AdaBoost R egression N euralN et
  • 24. Model building and evaluation Let’s seek the answer using evaluation metrics from 5 fold cross validation 5-fold cross validation - Matrix Accuracy Gradient Descent AdaBoost Regression Neural Net Gradient Descent AdaBoost Regression Neural Net 5-fold cross validation - Matrix ROC score
  • 25. Model building and evaluation Preferred Model AdaBoost MCC Score = 0.41 Accuracy = 90% ROC score = 0.88
  • 27. Conclusion linear ensemble models fitted well With more effort, a better relationship of the features can be gleaned. For example, marital status is strongly related to financial position. Such information can help improve the models further Quest for an optimal model demonstrated that cross validation is an quite an useful strategy that can not only save time in testing but also assist in making a better choice of model In real world scenario, won’t harm to test all 4 top models on unseen data
  • 28. May the light of Buddha’s wisdom be shown on all of us and guide us towards good fitting models. Final Thoughts ….