SlideShare una empresa de Scribd logo
1 de 54
Adventures in SegmentationUsing Applied Data Mining to add Business Value    Drew Minkin
The Value Add of Data Mining Segmentation 101 Segmentation Tools in Analysis Services Methodology for Segmentation Analysis Building Confidence in your Model 2 Agenda
3 The Value Add of Data Mining
Statistics for the Computer Age Evolution, not revolution with traditional statistics Statistics enriched with brute-force capabilities of modern computing Associated with industrial-sized data sets 4 Value Add - What is Data Mining?
5 Data Mining OLAP  Reports (Ad hoc) Reports (Static) Value Add - Data Mining in the BI Spectrum Business Knowledge SQL-Server 2008 Relative Business Value Easy Difficult
VoterVault From Mid-1990s Massive get-out-the-vote drive for those expected to vote Republican Demzilla Names typically have 200 to 400 information items 6 Value Add – Data Mining and Democracy
“The quiet statisticians have changed our world; not by discovering new facts or technical developments, but by changing the ways that we reason, experiment and form our opinions.”	 					-- Ian Hacking Value  Add – The Promise of Data Mining 7
8 Value Add – Spheres of Influence
Value Add – Operational Benefits Improved efficiency Inventory management Risk management
Value Add – Strategic Benefits The Bottom Line Increased agility Brand building Differentiate  message “Relationship” building
Value Add – Tactical Benefits Reduction of costs Transactional leakage Outlier analysis
Identify a group of customers who are expected to attrite Conduct marketing campaigns to change the behavior in the desired direction  change their behavior, reduce the attrition rate.  Value Add  - Customer Attrition Analysis
Slow attriters: Customers who slowly pay down their outstanding balance until they become inactive.  Fast attriters: Customers who quickly pay down their balance and either lapse it or close it via phone call or write in. 	 Value Add  - Target Result
Credit models Retention models Elasticity models Cross-sell models Lifetime Value models Agent/agency monitoring Target marketing Fraud detection Value Add - Sample Applications 14
15 Segmentation 101
Unsupervised learning Associations and patterns  many entities target information Market basket analysis (“diapers and beer”) Supervised learning Predict the value  target variable  well-defined predictive variables Credit / non-credit scoring engines	 16 Segmentation – Machine Learning
Segmentation –Sample Data Sources Data Warehouse: Credit Card Data Warehouse containing about 200 product specific fields Third Party Data : A set of account related demographic and credit bureau information Segmentation files :Set of account related segmentation values based on our client's segmentation scheme which combines Risk, Profitability and External potential Payment Database :Database that stores all checks processed. The database can categorize source of checks
18 Methodology for Segmentation Analysis
19 Methodology–Distribution of Effort
20 Methodology – Segmentation Lifecycle
Research/Evaluate possible data sources Availability Hit rate Implementability Cost-effectiveness Extract/purchase data Check data for quality (QA) At this stage, data is still in a “raw” form Often start with voluminous transactional data Much of the data mining process is “messy” Methodology – Acquiring Raw Data 21
Reflects data changes over time. Recognizes and removes statistically insignificant fields Defines and introduces the "target" field Allows for second stage preprocessing and statistical analysis. Methodology – Goals of Refinement
Scoring engine Formula that classifies or separates policies (or risks, accounts, agents…) into profitable vs. unprofitable Retaining vs. non-retaining… (Non-)Linear equation f() of several predictive variables Produces continuous range of scores score = f(X1, X2, …, XN) Methodology - Scoring Engines 23
Data  To Predict Training Data Mining Model Mining Model Mining Model Methodology – Deployed Model DB data Client data Application log “Just one row” New Entry New Txion DM Engine DM Engine Predicted Data
Randomly divide data into 3 pieces Training data Test data  Validation data Use Training data to fit models Score the Test data to create a lift curve Perform the train/test steps iteratively until you have a model you’re happy with During this iterative phase, validation data is set aside in a “lock box”	 Score the Validation data and produce a lift curve Unbiased estimate of future performance Methodology - Testing 25
Examine correlations among the variables Weed out redundant, weak, poorly distributed variables Model design Build candidate models Regression/GLM Decision Trees/MARS Neural Networks Select final model 26 Methodology - Multivariate Analysis
27 Segmentation Tools in Analysis Services
Data Mining  - Algorithm Matrix Segmentation Advanced Data  Exploration Classification Forecasting Association Text Analysis Estimation Association Rules Clustering Decision Trees Linear Regression Logistic Regression Naïve Bayes Neural Nets Sequence Clustering Time Series
29 Data Mining - SQL-Server Algorithms Decision Trees Time Series  Neural Net Clustering  Sequence Clustering Association Naïve Bayes Linear and Logistic Regression
Offline and online modes Everything you do stays on the server Offline requires server admin privileges to deploy ,[object Object]
Define Mining Structure and Models
Train (process) the Structures
Verify accuracy
Explore and visualise
Perform predictions
Deploy for other users
Regularly update and re-validate the ModelData Mining -  Blueprint for Toolset
Data Mining - Cross-Validation  SQL Server 2008  X iterations of retraining and retesting the model Results from each test statistically collated Model deemed accurate (and perhaps reliable) when variance is low and results meet expectations
Data Mining - Microsoft Decision Trees Use for: Classification: churn and risk analysis Regression: predict profit or income  Association analysis based on multiple predictable variable Builds one tree for each predictable attribute Fast
COMPLEXITY_PENALTY FORCE_REGRESSOR MAXIMUM_INPUT_ATTRIBUTES MAXIMUM_OUTPUT_ATTRIBUTES MINIMUM_SUPPORT SCORE_METHOD SPLIT_METHOD Data Mining - Decision Tree Parameters
Data Mining - Microsoft Naïve Bayes  Use for: Classification Association with multiple predictable attributes Assumes all inputs are independent Simple classification technique based on conditional probability
MAXIMUM_INPUT_ATTRIBUTES MAXIMUM_OUTPUT_ATTRIBUTES MAXIMUM_STATES MINIMUM_DEPENDENCY_PROBABILITY Data Mining - Naïve Bayes Parameters
Data Mining - Clustering Applied to  Segmentation: Customer grouping, Mailing campaign Also: classification and regression Anomaly detection Discrete and continuous Note: “Predict Only” attributes not used for clustering
CLUSTER_COUNT CLUSTER_SEED CLUSTERING_METHOD MAXIMUM_INPUT_ATTRIBUTES MAXIMUM_STATES MINIMUM_SUPPORT MODELLING_CARDINALITY SAMPLE_SIZE STOPPING_TOLERANCE Data Mining - Clustering Parameters
Data Mining - Neural Network Applied to Classification Regression Great for finding complicated relationship among attributes Difficult to interpret results Gradient Descent method Output Layer Loyalty Hidden  Layers Input Layer Age Education Sex Income
HIDDEN_NODE_RATIO HOLDOUT_PERCENTAGE HOLDOUT_SEED MAXIMUM_INPUT_ATTRIBUTES MAXIMUM_OUTPUT_ATTRIBUTES MAXIMUM_STATES SAMPLE_SIZE Data Mining - Neural Network Parameters
Data Mining - Sequence Clustering Analysis of: Customer behaviour Transaction patterns Click stream Customer segmentation Sequence prediction Mix of clustering and sequence technologies Groups individuals based on their profiles including sequence data
To discover the most likely beginning, paths, and ends of a customer’s journey through our domain consider using: Association Rules Sequence Clustering Data Mining - What is a Sequence?
Data Mining - Sequence Data
Your “if” statement will test the value returned from a prediction – typically, predicted probability or outcome Steps: Build a case (set of attributes) representing the transaction you are processing at the moment E.g. Shopping basket of a customer plus their shipping info Execute a “SELECT ... PREDICTION JOIN” on the pre-loaded mining model Read returned attributes, especially case probability for a some outcome E.g. Probability > 50% that “TransactionOutcome=ShippingDeliveryFailure” Your application has just made an intelligent decision! Remember to refresh and retest the model regularly – daily? Data Mining – Minor Introduction to DMX
CLUSTER_COUNT MAXIMUM_SEQUENCE_STATES MAXIMUM_STATES MINIMUM_SUPPORT Data Mining- Sequence Clustering Parameters
45 Data Mining – Detailed Workflow
46 Data Mining – Detailed Mining Model
47 Data Mining – Detailed Mining Model

Más contenido relacionado

La actualidad más candente

Predictive analytics km chicago
Predictive analytics km chicagoPredictive analytics km chicago
Predictive analytics km chicagoKM Chicago
 
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 1): Data Management- The Foundation of all AnalyticsDay 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 1): Data Management- The Foundation of all AnalyticsAseda Owusua Addai-Deseh
 
Machine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business RevolutionMachine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business RevolutionCognizant
 
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachErik De Monte
 
Customer Decision Support System
Customer Decision Support SystemCustomer Decision Support System
Customer Decision Support SystemIRJET Journal
 
Optimization of digital marketing campaigns
Optimization of digital marketing campaignsOptimization of digital marketing campaigns
Optimization of digital marketing campaignsArmando Vieira
 
Analystics in banking and financial services
Analystics in banking and financial servicesAnalystics in banking and financial services
Analystics in banking and financial servicesRoshithaSunil
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryHoang Nguyen
 
Data Mining – analyse Bank Marketing Data Set
Data Mining – analyse Bank Marketing Data SetData Mining – analyse Bank Marketing Data Set
Data Mining – analyse Bank Marketing Data SetMateusz Brzoska
 

La actualidad más candente (19)

Predictive analytics km chicago
Predictive analytics km chicagoPredictive analytics km chicago
Predictive analytics km chicago
 
Experimenting with Data!
Experimenting with Data!Experimenting with Data!
Experimenting with Data!
 
Analytics
AnalyticsAnalytics
Analytics
 
Unit 4 Advanced Data Analytics
Unit 4 Advanced Data AnalyticsUnit 4 Advanced Data Analytics
Unit 4 Advanced Data Analytics
 
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 1): Data Management- The Foundation of all AnalyticsDay 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
Machine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business RevolutionMachine Learning: The First Salvo of the AI Business Revolution
Machine Learning: The First Salvo of the AI Business Revolution
 
Capable
CapableCapable
Capable
 
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
 
Corporate presentation
Corporate presentationCorporate presentation
Corporate presentation
 
Chapter14 example2
Chapter14 example2Chapter14 example2
Chapter14 example2
 
Customer Decision Support System
Customer Decision Support SystemCustomer Decision Support System
Customer Decision Support System
 
Optimization of digital marketing campaigns
Optimization of digital marketing campaignsOptimization of digital marketing campaigns
Optimization of digital marketing campaigns
 
Analystics in banking and financial services
Analystics in banking and financial servicesAnalystics in banking and financial services
Analystics in banking and financial services
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Aanlytics on Telecom
Aanlytics on TelecomAanlytics on Telecom
Aanlytics on Telecom
 
Predictive modelling
Predictive modellingPredictive modelling
Predictive modelling
 
Data Mining – analyse Bank Marketing Data Set
Data Mining – analyse Bank Marketing Data SetData Mining – analyse Bank Marketing Data Set
Data Mining – analyse Bank Marketing Data Set
 

Destacado

Download the complete course information(.doc)
Download the complete course information(.doc)Download the complete course information(.doc)
Download the complete course information(.doc)butest
 
CSI 5387: Concept Learning Systems / Machine Learning
CSI 5387: Concept Learning Systems / Machine Learning CSI 5387: Concept Learning Systems / Machine Learning
CSI 5387: Concept Learning Systems / Machine Learning butest
 
dorCV.doc
dorCV.docdorCV.doc
dorCV.docbutest
 
Browne2.doc
Browne2.docBrowne2.doc
Browne2.docbutest
 
An Analysis of Graph Cut Size for Transductive Learning
An Analysis of Graph Cut Size for Transductive LearningAn Analysis of Graph Cut Size for Transductive Learning
An Analysis of Graph Cut Size for Transductive Learningbutest
 
week9_Machine_Learning.ppt
week9_Machine_Learning.pptweek9_Machine_Learning.ppt
week9_Machine_Learning.pptbutest
 
Bilingual Interpreter/Translator Handbook - Clark County School ...
Bilingual Interpreter/Translator Handbook - Clark County School ...Bilingual Interpreter/Translator Handbook - Clark County School ...
Bilingual Interpreter/Translator Handbook - Clark County School ...butest
 
Machine Learning Streams with Spark 1.0
Machine Learning Streams with Spark 1.0Machine Learning Streams with Spark 1.0
Machine Learning Streams with Spark 1.0Andrew Minkin
 
Chapter6.doc
Chapter6.docChapter6.doc
Chapter6.docbutest
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 

Destacado (10)

Download the complete course information(.doc)
Download the complete course information(.doc)Download the complete course information(.doc)
Download the complete course information(.doc)
 
CSI 5387: Concept Learning Systems / Machine Learning
CSI 5387: Concept Learning Systems / Machine Learning CSI 5387: Concept Learning Systems / Machine Learning
CSI 5387: Concept Learning Systems / Machine Learning
 
dorCV.doc
dorCV.docdorCV.doc
dorCV.doc
 
Browne2.doc
Browne2.docBrowne2.doc
Browne2.doc
 
An Analysis of Graph Cut Size for Transductive Learning
An Analysis of Graph Cut Size for Transductive LearningAn Analysis of Graph Cut Size for Transductive Learning
An Analysis of Graph Cut Size for Transductive Learning
 
week9_Machine_Learning.ppt
week9_Machine_Learning.pptweek9_Machine_Learning.ppt
week9_Machine_Learning.ppt
 
Bilingual Interpreter/Translator Handbook - Clark County School ...
Bilingual Interpreter/Translator Handbook - Clark County School ...Bilingual Interpreter/Translator Handbook - Clark County School ...
Bilingual Interpreter/Translator Handbook - Clark County School ...
 
Machine Learning Streams with Spark 1.0
Machine Learning Streams with Spark 1.0Machine Learning Streams with Spark 1.0
Machine Learning Streams with Spark 1.0
 
Chapter6.doc
Chapter6.docChapter6.doc
Chapter6.doc
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 

Similar a Presentation Title

Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amatoSSSW
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge DiscoverySSSW
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...Value Amplify Consulting
 
Data Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India AnalyticsData Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India AnalyticsAyeshaSharma29
 
Operationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BIOperationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BICCG
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment Kunal Kashyap
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss sessionM Baddar
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 
Crm and analytics modules
Crm and analytics modulesCrm and analytics modules
Crm and analytics modulesAditya Madiraju
 
Introduction to Business Analytics---PPT
Introduction to Business Analytics---PPTIntroduction to Business Analytics---PPT
Introduction to Business Analytics---PPTNeerupa Chauhan
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxkprasad8
 
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsSupply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsMujtabaAliKhan12
 
Data Mining methodology
 Data Mining methodology  Data Mining methodology
Data Mining methodology rebeccatho
 
Machine Learning in Autonomous Data Warehouse
 Machine Learning in Autonomous Data Warehouse Machine Learning in Autonomous Data Warehouse
Machine Learning in Autonomous Data WarehouseSandesh Rao
 
Disrupting Risk Management through Emerging Technologies
Disrupting Risk Management through Emerging TechnologiesDisrupting Risk Management through Emerging Technologies
Disrupting Risk Management through Emerging TechnologiesDatabricks
 

Similar a Presentation Title (20)

Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
 
Claims
ClaimsClaims
Claims
 
Data Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India AnalyticsData Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India Analytics
 
Operationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BIOperationalizing Customer Analytics with Azure and Power BI
Operationalizing Customer Analytics with Azure and Power BI
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Crm and analytics modules
Crm and analytics modulesCrm and analytics modules
Crm and analytics modules
 
Introduction to Business Analytics---PPT
Introduction to Business Analytics---PPTIntroduction to Business Analytics---PPT
Introduction to Business Analytics---PPT
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsSupply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
 
Data Mining methodology
 Data Mining methodology  Data Mining methodology
Data Mining methodology
 
Machine Learning in Autonomous Data Warehouse
 Machine Learning in Autonomous Data Warehouse Machine Learning in Autonomous Data Warehouse
Machine Learning in Autonomous Data Warehouse
 
Telecom Data Analytics
Telecom Data AnalyticsTelecom Data Analytics
Telecom Data Analytics
 
Get your data analytics strategy right!
Get your data analytics strategy right!Get your data analytics strategy right!
Get your data analytics strategy right!
 
Business Analytics.pptx
Business Analytics.pptxBusiness Analytics.pptx
Business Analytics.pptx
 
RFP Presentation Example
RFP Presentation ExampleRFP Presentation Example
RFP Presentation Example
 
Disrupting Risk Management through Emerging Technologies
Disrupting Risk Management through Emerging TechnologiesDisrupting Risk Management through Emerging Technologies
Disrupting Risk Management through Emerging Technologies
 

Más de butest

1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 
Download
DownloadDownload
Downloadbutest
 

Más de butest (20)

1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 
Download
DownloadDownload
Download
 

Presentation Title

  • 1. Adventures in SegmentationUsing Applied Data Mining to add Business Value   Drew Minkin
  • 2. The Value Add of Data Mining Segmentation 101 Segmentation Tools in Analysis Services Methodology for Segmentation Analysis Building Confidence in your Model 2 Agenda
  • 3. 3 The Value Add of Data Mining
  • 4. Statistics for the Computer Age Evolution, not revolution with traditional statistics Statistics enriched with brute-force capabilities of modern computing Associated with industrial-sized data sets 4 Value Add - What is Data Mining?
  • 5. 5 Data Mining OLAP Reports (Ad hoc) Reports (Static) Value Add - Data Mining in the BI Spectrum Business Knowledge SQL-Server 2008 Relative Business Value Easy Difficult
  • 6. VoterVault From Mid-1990s Massive get-out-the-vote drive for those expected to vote Republican Demzilla Names typically have 200 to 400 information items 6 Value Add – Data Mining and Democracy
  • 7. “The quiet statisticians have changed our world; not by discovering new facts or technical developments, but by changing the ways that we reason, experiment and form our opinions.” -- Ian Hacking Value Add – The Promise of Data Mining 7
  • 8. 8 Value Add – Spheres of Influence
  • 9. Value Add – Operational Benefits Improved efficiency Inventory management Risk management
  • 10. Value Add – Strategic Benefits The Bottom Line Increased agility Brand building Differentiate message “Relationship” building
  • 11. Value Add – Tactical Benefits Reduction of costs Transactional leakage Outlier analysis
  • 12. Identify a group of customers who are expected to attrite Conduct marketing campaigns to change the behavior in the desired direction change their behavior, reduce the attrition rate. Value Add - Customer Attrition Analysis
  • 13. Slow attriters: Customers who slowly pay down their outstanding balance until they become inactive. Fast attriters: Customers who quickly pay down their balance and either lapse it or close it via phone call or write in. Value Add - Target Result
  • 14. Credit models Retention models Elasticity models Cross-sell models Lifetime Value models Agent/agency monitoring Target marketing Fraud detection Value Add - Sample Applications 14
  • 16. Unsupervised learning Associations and patterns many entities target information Market basket analysis (“diapers and beer”) Supervised learning Predict the value target variable well-defined predictive variables Credit / non-credit scoring engines 16 Segmentation – Machine Learning
  • 17. Segmentation –Sample Data Sources Data Warehouse: Credit Card Data Warehouse containing about 200 product specific fields Third Party Data : A set of account related demographic and credit bureau information Segmentation files :Set of account related segmentation values based on our client's segmentation scheme which combines Risk, Profitability and External potential Payment Database :Database that stores all checks processed. The database can categorize source of checks
  • 18. 18 Methodology for Segmentation Analysis
  • 20. 20 Methodology – Segmentation Lifecycle
  • 21. Research/Evaluate possible data sources Availability Hit rate Implementability Cost-effectiveness Extract/purchase data Check data for quality (QA) At this stage, data is still in a “raw” form Often start with voluminous transactional data Much of the data mining process is “messy” Methodology – Acquiring Raw Data 21
  • 22. Reflects data changes over time. Recognizes and removes statistically insignificant fields Defines and introduces the "target" field Allows for second stage preprocessing and statistical analysis. Methodology – Goals of Refinement
  • 23. Scoring engine Formula that classifies or separates policies (or risks, accounts, agents…) into profitable vs. unprofitable Retaining vs. non-retaining… (Non-)Linear equation f() of several predictive variables Produces continuous range of scores score = f(X1, X2, …, XN) Methodology - Scoring Engines 23
  • 24. Data To Predict Training Data Mining Model Mining Model Mining Model Methodology – Deployed Model DB data Client data Application log “Just one row” New Entry New Txion DM Engine DM Engine Predicted Data
  • 25. Randomly divide data into 3 pieces Training data Test data Validation data Use Training data to fit models Score the Test data to create a lift curve Perform the train/test steps iteratively until you have a model you’re happy with During this iterative phase, validation data is set aside in a “lock box” Score the Validation data and produce a lift curve Unbiased estimate of future performance Methodology - Testing 25
  • 26. Examine correlations among the variables Weed out redundant, weak, poorly distributed variables Model design Build candidate models Regression/GLM Decision Trees/MARS Neural Networks Select final model 26 Methodology - Multivariate Analysis
  • 27. 27 Segmentation Tools in Analysis Services
  • 28. Data Mining - Algorithm Matrix Segmentation Advanced Data Exploration Classification Forecasting Association Text Analysis Estimation Association Rules Clustering Decision Trees Linear Regression Logistic Regression Naïve Bayes Neural Nets Sequence Clustering Time Series
  • 29. 29 Data Mining - SQL-Server Algorithms Decision Trees Time Series Neural Net Clustering Sequence Clustering Association Naïve Bayes Linear and Logistic Regression
  • 30.
  • 32. Train (process) the Structures
  • 37. Regularly update and re-validate the ModelData Mining - Blueprint for Toolset
  • 38. Data Mining - Cross-Validation SQL Server 2008 X iterations of retraining and retesting the model Results from each test statistically collated Model deemed accurate (and perhaps reliable) when variance is low and results meet expectations
  • 39. Data Mining - Microsoft Decision Trees Use for: Classification: churn and risk analysis Regression: predict profit or income Association analysis based on multiple predictable variable Builds one tree for each predictable attribute Fast
  • 40. COMPLEXITY_PENALTY FORCE_REGRESSOR MAXIMUM_INPUT_ATTRIBUTES MAXIMUM_OUTPUT_ATTRIBUTES MINIMUM_SUPPORT SCORE_METHOD SPLIT_METHOD Data Mining - Decision Tree Parameters
  • 41. Data Mining - Microsoft Naïve Bayes Use for: Classification Association with multiple predictable attributes Assumes all inputs are independent Simple classification technique based on conditional probability
  • 42. MAXIMUM_INPUT_ATTRIBUTES MAXIMUM_OUTPUT_ATTRIBUTES MAXIMUM_STATES MINIMUM_DEPENDENCY_PROBABILITY Data Mining - Naïve Bayes Parameters
  • 43. Data Mining - Clustering Applied to Segmentation: Customer grouping, Mailing campaign Also: classification and regression Anomaly detection Discrete and continuous Note: “Predict Only” attributes not used for clustering
  • 44. CLUSTER_COUNT CLUSTER_SEED CLUSTERING_METHOD MAXIMUM_INPUT_ATTRIBUTES MAXIMUM_STATES MINIMUM_SUPPORT MODELLING_CARDINALITY SAMPLE_SIZE STOPPING_TOLERANCE Data Mining - Clustering Parameters
  • 45. Data Mining - Neural Network Applied to Classification Regression Great for finding complicated relationship among attributes Difficult to interpret results Gradient Descent method Output Layer Loyalty Hidden Layers Input Layer Age Education Sex Income
  • 46. HIDDEN_NODE_RATIO HOLDOUT_PERCENTAGE HOLDOUT_SEED MAXIMUM_INPUT_ATTRIBUTES MAXIMUM_OUTPUT_ATTRIBUTES MAXIMUM_STATES SAMPLE_SIZE Data Mining - Neural Network Parameters
  • 47. Data Mining - Sequence Clustering Analysis of: Customer behaviour Transaction patterns Click stream Customer segmentation Sequence prediction Mix of clustering and sequence technologies Groups individuals based on their profiles including sequence data
  • 48. To discover the most likely beginning, paths, and ends of a customer’s journey through our domain consider using: Association Rules Sequence Clustering Data Mining - What is a Sequence?
  • 49. Data Mining - Sequence Data
  • 50. Your “if” statement will test the value returned from a prediction – typically, predicted probability or outcome Steps: Build a case (set of attributes) representing the transaction you are processing at the moment E.g. Shopping basket of a customer plus their shipping info Execute a “SELECT ... PREDICTION JOIN” on the pre-loaded mining model Read returned attributes, especially case probability for a some outcome E.g. Probability > 50% that “TransactionOutcome=ShippingDeliveryFailure” Your application has just made an intelligent decision! Remember to refresh and retest the model regularly – daily? Data Mining – Minor Introduction to DMX
  • 51. CLUSTER_COUNT MAXIMUM_SEQUENCE_STATES MAXIMUM_STATES MINIMUM_SUPPORT Data Mining- Sequence Clustering Parameters
  • 52. 45 Data Mining – Detailed Workflow
  • 53. 46 Data Mining – Detailed Mining Model
  • 54. 47 Data Mining – Detailed Mining Model
  • 55. 48 Data Mining – Detailed Mining Model
  • 56. 49 Building Confidence in your Segmentation
  • 57. Which target variable to use? Frequency & severity Loss Ratio, other profitability measures Binary targets: defection, cross-sell …etc How to prepare the target variable? Period - 1-year or Multi-year? Losses evaluated @? Cap large losses? Cat losses? How / whether to re-rate, adjust premium? What counts as a “retaining” policy? …etc Building Confidence - Model Design 50
  • 58. Approaches Change the algorithm Change model parameters Change inputs/outputs to avoid bad correlations Clean the data set Perhaps there are no good patterns in data Verify statistics (Data Explorer) Building Confidence - Improving Models
  • 59. Capping Outliers reduced in influence and to produce better estimates. Binning Small and insignificant levels of character variables are regrouped. Box-Cox Transformations These transformations are commonly included, specially, the square root and logarithm. Johnson Transformations Performed on numeric variables to make them more ‘normal’. Weight of Evidence Created for character variables and binned numeric variables. 52 Building Confidence – Alternate Methods
  • 60. 53 Building Confidence - Confusion Matrix 1241 correct predictions (516 + 725) . 35 incorrect predictions (25 + 10). The model scored 1276 cases (1241+35). The error rate is 35/1276 = 0.0274. The accuracy rate is 1241/1276 = 0.9725.
  • 61. “All models are wrong, but some are useful." George Box 54 Building Confidence – Warning Signs
  • 62. Extrapolation Applying models from unrelated disciplines Equality The real world contains a surprising amount of uncertainty, fuzziness, and precariousness. Copula Binding probabilities can mask errors Distribution functions Small miscalculations can make coincidences look like certainties Gamma Human behavior difficult to quantify as a linear parameter 55 Building Confidence –Li’s Revenge
  • 63. 56 Building Confidence - Lift Curves Sort data by score Break the dataset into 10 equal pieces Best “decile”: lowest score  lowest LR Worst “decile”: highest score  highest LR Difference: “Lift” Lift = segmentation power Lift translates into ROI of the modeling project
  • 64. Building Confidence – Vetted Results ~Top 5% of 750000 2 groups with 10000 customers from random sampling 37500 top customers from the prediction list sorted by the score Group 1 Engaged or offered incentives by marketing department Group 2 No action Results Group 1 has a attrition rate 0.8%, Group 2 has 10.6% Average attrition rate is 2.2% Lift is 4.8 (10.6% /2.2%)
  • 66. Xiaohua Hu, Drexel University Jerome Friedman, Trevor Hastie, Robert Tibshirani ,The Elements of Statistical Learning James Guszcza,Deloitte Jeff Kaplan, Apollo Data Technologies Rafal Lukawiecki, Project Botticelli Ltd David L. Olson, University of Nebraska Lincoln Donald Farmer, ZhaoHui Tang and Jamie MacLennan, Microsoft ASA Corporation Richard Boire, Boire Filler Group, John Spooner, SAS Corporation Shin-Yuan Hung , Hsiu-Yu Wang , National Chung-Cheng University Felix Salmon and Chris Anderson, Wired Magazine 59 Credits