SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
presented to fwPASS on 1/26/2010

DATA MINING – A BETTER WAY
TO DESIGN A STIMULUS
PROGRAM LIKE “CASH FOR
CLUNKERS”
About Me
 Work for Systemental as a Consultant and
  Software Developer
 Software development to support Corporate
  business process improvement since 2000
  (Lean or Continuous Improvement Initiatives)
   .Net since 2004
 President, fwPASS.org
 Mfg. Eng. Technology degrees from Ball State
  University
 Six Sigma Black Belt, Certified
What We Will cover

 Data mining – what is it?
 “Cash for Clunkers”
 Other examples
   Amazon.com
   Coke Freestyle
 Basic Data Mining Concepts
 Demo time
Wikipedia

Data mining is the process of extracting
 patterns from data. Data mining is becoming
 an increasingly important tool to transform
 these data into information. It is commonly
 used in a wide range of profiling practices,
 such as marketing, surveillance, fraud
 detection and scientific discovery.
Cash for Clunkers

    Columbia City: SR 30 & SR 9
Objectives of “Cash for
Clunkers”
 Jump start automotive sector sales
   Specifically higher mileage vehicles
 Get gas guzzlers off the street
Cash for Clunkers

  How did they decide who to target and
   how?
  How would you do it?
  Where did the data come from?
  Where should the data come from?
Who to target?

 Anyone, everyone, or targeted
 Self qualified
 Organic growth or just “pull up” existing sales
 Convert foreign sales to GM
   Conflict of interest? – Government motors
 Discriminatory?
Estimating the effectiveness

 Affect of “pull up” vs. organic growth
 Peripheral commercial effect
 Estimation of payback
   Sales, plates and excise tax
   Income tax from lay-off recalls
   Reduction of unemployment
   Auto Insurance
 Reduction in tax revenue at gas pumps
Data content and source

 Public records
 CAFE
 GM Data
 Industry sponsored studies
Amazon.com
SQL Server 2005 Data Mining

 Nine algorithms (3rd party pluggable)
 Both Modeling and exploration in VS
 Integrated tools: SS*S
 API
 Data Mining Extensions to SQL (DMX)
Type of analysis

 Optimization vs. Predictive
 Descriptive – provides deeper understanding
  of existing data
 Predictive – provides insight to understand
  probability of future conditions
Data Mining Objective

 Classification – assign data to known classes
    (discrete)
   Segmentation – clustering in similar groups
   Estimation – predicting continuous values
   Association – what events occur together
   Forecasting – time series estimating of future
Algorithms

1.   Decision Trees (attributes from the tree)
2.   Naive Bayes (uses all attributes)
3.   Clustering
4.   Linear Regression
5.   Logistic Regression
6.   Neural Nets
7.   Sequence Clustering
8.   Time Series
9.   Association Rules (discrete only)
DMX

 Column syntax: Name, data type, content
  type, [usage]
 Case being analyzed – key
 Content type: key, key sequence, key time,
  discrete, continuous, discretized (# of
  buckets)
 Usage: Input, predict, predict-only (not to
  build any other part of model)
Structure

 Datamart, DW, cube
   Data source
     Mining Structure (which fields)
       Mining Models (algorithms, attributes)
         Viewers (tree, clusters, discrimination, classification)
Training the model

 SSIS Percentage Sampling Data Flow
  Component
 Training, Testing
 Estimating error
Demos

 Visual Studio
 SSMS
 Win Client
 Web Client
Miscellaneous

 Sequence or timing
 Prediction + measure of confidence
 Caution: Over-fitting the model
 Nested tables ex: transactional detail data
   Key is never foreign key to case table
   Key is what table is about
References
   http://dean-o.blogspot.com/
   http://abbottanalytics.blogspot.com/
   http://www.thearling.com/umass/index_frame.htm
   http://www.thearling.com/text/dmtechniques/dmtechniques.htm
   MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise
   http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20M
    ining%20Web%20Controls%20Library
   http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?Rele
    aseId=34035
   Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and
    Stephen Forte – Chapter 20
Thank you!

 Website
   http://www.systemental.com
 Blogs
   http://dean-o.blogspot.com/
   http://practicalhoshin.blogspot.com
 Twitter
   http://www.twitter.com/deanwillson
 Email
   dean@systemental.com
 LinkedIn
   http://www.linkedin.com/in/deanwillson

Más contenido relacionado

Similar a Data Mining with SQL Server 2005

DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
Harvinder Atwal
 
Presentation Title
Presentation TitlePresentation Title
Presentation Title
butest
 

Similar a Data Mining with SQL Server 2005 (20)

KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 
Building Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data PlatformsBuilding Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data Platforms
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Presentation Title
Presentation TitlePresentation Title
Presentation Title
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Data Mining 2008
Data Mining 2008Data Mining 2008
Data Mining 2008
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 

Más de Dean Willson

Más de Dean Willson (12)

Intro to the Internet of Things using Netduino
Intro to the Internet of Things using NetduinoIntro to the Internet of Things using Netduino
Intro to the Internet of Things using Netduino
 
Index Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessIndex Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for Success
 
Automating sql server daily health checks
Automating sql server daily health checksAutomating sql server daily health checks
Automating sql server daily health checks
 
Visual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity ToolsVisual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity Tools
 
Intro to Powershell
Intro to PowershellIntro to Powershell
Intro to Powershell
 
Continuous improvement in a professional organization
Continuous improvement in a professional organizationContinuous improvement in a professional organization
Continuous improvement in a professional organization
 
Database Source Control
Database Source ControlDatabase Source Control
Database Source Control
 
Career Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers SeriesCareer Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers Series
 
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
 
Implementing ASP.NET Role Based Security
Implementing ASP.NET Role Based SecurityImplementing ASP.NET Role Based Security
Implementing ASP.NET Role Based Security
 
Introduction to SSRS Report Builder
Introduction to SSRS Report BuilderIntroduction to SSRS Report Builder
Introduction to SSRS Report Builder
 
Designing For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps SlideshareDesigning For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps Slideshare
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Data Mining with SQL Server 2005

  • 1. presented to fwPASS on 1/26/2010 DATA MINING – A BETTER WAY TO DESIGN A STIMULUS PROGRAM LIKE “CASH FOR CLUNKERS”
  • 2. About Me  Work for Systemental as a Consultant and Software Developer  Software development to support Corporate business process improvement since 2000 (Lean or Continuous Improvement Initiatives)  .Net since 2004  President, fwPASS.org  Mfg. Eng. Technology degrees from Ball State University  Six Sigma Black Belt, Certified
  • 3. What We Will cover  Data mining – what is it?  “Cash for Clunkers”  Other examples  Amazon.com  Coke Freestyle  Basic Data Mining Concepts  Demo time
  • 4. Wikipedia Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.
  • 5. Cash for Clunkers Columbia City: SR 30 & SR 9
  • 6. Objectives of “Cash for Clunkers”  Jump start automotive sector sales  Specifically higher mileage vehicles  Get gas guzzlers off the street
  • 7. Cash for Clunkers  How did they decide who to target and how?  How would you do it?  Where did the data come from?  Where should the data come from?
  • 8. Who to target?  Anyone, everyone, or targeted  Self qualified  Organic growth or just “pull up” existing sales  Convert foreign sales to GM  Conflict of interest? – Government motors  Discriminatory?
  • 9. Estimating the effectiveness  Affect of “pull up” vs. organic growth  Peripheral commercial effect  Estimation of payback  Sales, plates and excise tax  Income tax from lay-off recalls  Reduction of unemployment  Auto Insurance  Reduction in tax revenue at gas pumps
  • 10. Data content and source  Public records  CAFE  GM Data  Industry sponsored studies
  • 12. SQL Server 2005 Data Mining  Nine algorithms (3rd party pluggable)  Both Modeling and exploration in VS  Integrated tools: SS*S  API  Data Mining Extensions to SQL (DMX)
  • 13. Type of analysis  Optimization vs. Predictive  Descriptive – provides deeper understanding of existing data  Predictive – provides insight to understand probability of future conditions
  • 14. Data Mining Objective  Classification – assign data to known classes (discrete)  Segmentation – clustering in similar groups  Estimation – predicting continuous values  Association – what events occur together  Forecasting – time series estimating of future
  • 15. Algorithms 1. Decision Trees (attributes from the tree) 2. Naive Bayes (uses all attributes) 3. Clustering 4. Linear Regression 5. Logistic Regression 6. Neural Nets 7. Sequence Clustering 8. Time Series 9. Association Rules (discrete only)
  • 16. DMX  Column syntax: Name, data type, content type, [usage]  Case being analyzed – key  Content type: key, key sequence, key time, discrete, continuous, discretized (# of buckets)  Usage: Input, predict, predict-only (not to build any other part of model)
  • 17. Structure  Datamart, DW, cube  Data source  Mining Structure (which fields)  Mining Models (algorithms, attributes)  Viewers (tree, clusters, discrimination, classification)
  • 18. Training the model  SSIS Percentage Sampling Data Flow Component  Training, Testing  Estimating error
  • 19. Demos  Visual Studio  SSMS  Win Client  Web Client
  • 20. Miscellaneous  Sequence or timing  Prediction + measure of confidence  Caution: Over-fitting the model  Nested tables ex: transactional detail data  Key is never foreign key to case table  Key is what table is about
  • 21. References  http://dean-o.blogspot.com/  http://abbottanalytics.blogspot.com/  http://www.thearling.com/umass/index_frame.htm  http://www.thearling.com/text/dmtechniques/dmtechniques.htm  MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise  http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20M ining%20Web%20Controls%20Library  http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?Rele aseId=34035  Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and Stephen Forte – Chapter 20
  • 22. Thank you!  Website  http://www.systemental.com  Blogs  http://dean-o.blogspot.com/  http://practicalhoshin.blogspot.com  Twitter  http://www.twitter.com/deanwillson  Email  dean@systemental.com  LinkedIn  http://www.linkedin.com/in/deanwillson