SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Data Mining beyond
Adventure Works

Mark Tabladillo Ph.D.
http://marktab.net
October 3, 2009
Approach of this Presentation
• Emphasize
       – Conceptual value of data mining
       – Relationship of data mining to the real
         world
• Reserve
       – Specific procedures and mechanics
       – Specific mathematics
       – Production implementation


© 2009 Mark Tabladillo Ph.D.                       2
Outline
• Data Mining Fundamentals
• Interactive Demos
• Conclusion




© 2009 Mark Tabladillo Ph.D.   3
Interactive Demos
• Sports
• Government Forecasting




© 2009 Mark Tabladillo Ph.D.   4
Data Mining Definitions
• Data mining is the automatic or semi-
  automatic process of exploring data for
  meaningful or useful patterns.
• Data mining algorithms typically use
  estimation or optimization to achieve
  results (as opposed to only calculations).




© 2009 Mark Tabladillo Ph.D.                   5
Microsoft Data Mining
• Microsoft Data Mining refers to
  Microsoft’s specific implementation of
  certain common data mining algorithms for
  the DMX (Data Mining Extensions)
  language.
• Also called SQL Server Data Mining, the
  technology is integrated into SQL Server
  rather than presented as an independent
  application.

© 2009 Mark Tabladillo Ph.D.              6
Data Mining Tasks
• Supervised
       – Answer known, what is correlated?
• Unsupervised
       – Answer unknown (unspecified), what are the
         groups?
• Forecasting
       – Given a trend, what is next?        Value
                                             Slide




© 2009 Mark Tabladillo Ph.D.                          7
List the Data Mining Algorithms
• Ten Answers
• Each one is a field of academic focus




© 2009 Mark Tabladillo Ph.D.              8
The Data Mining Algorithms
•    Microsoft Naive Bayes
•    Microsoft Linear Regression
•    Microsoft Decision Trees
•    Microsoft Time Series
•    Microsoft Clustering
•    Microsoft Sequence Clustering
•    Microsoft Association Rules
•    Microsoft Neural Networks
•    Microsoft Logistic Regression
•    Text Mining
© 2009 Mark Tabladillo Ph.D.         9
The Analyze Tab


            Menu Option                     Data Mining Algorithm
            Analyze Key Influencers         Naïve Bayes
            Detect Categories               Clustering
            Fill from Example               Logistic Regression
            Forecast                        Time Series
            Highlight Exceptions            Clustering
            Scenario Analysis (Goal Seek)   Logistic Regression
            Scenario Analysis (What If)     Logistic Regression
            Prediction Calculator           Logistic Regression
            Shopping Basket Analysis        Association Rules
© 2009 Mark Tabladillo Ph.D.                                        10
Demo One:
National League Baseball
• Directions:
  You are on the management team for the
  Atlanta Braves. To better serve the team,
  you have been instructed by the owner to
  group the players by considering both their
  position and their salary.




© 2009 Mark Tabladillo Ph.D.                11
Demo One:
National League Baseball
• The following rules apply:
       – You must make more than one group
       – Each group must have at least two players
       – Players of different position may be in the
         same group




© 2009 Mark Tabladillo Ph.D.                           12
Demo One:
National League Baseball
• Individual attributes can be used to make
  groups
• Historical statistics can be used to group
  new players
• Both supervised and unsupervised
  algorithms can be applied to the same
  data



© 2009 Mark Tabladillo Ph.D.                   13
Demo Two:
Government Forecasting
• Directions:
  The President is asking your opinion on
  how the following numbers will increase
  over the next few months. Because this
  project is sensitive, you do not know what
  these numbers measure. However, based
  on the available history, make your best
  projection for the next six periods.


© 2009 Mark Tabladillo Ph.D.               14
Demo Two:
Government Forecasting
8



7



6



5



4



3



2



1



0
    Jan Feb Mar Apr May Jun       Jul   Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun           Jul   Aug Sep Oct Nov Dec
    2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008

© 2009 Mark Tabladillo Ph.D.                                                                                             15
Demo Two:
Government Forecasting
12




10




 8




 6




 4




 2




 0
     Sep Oct Nov Dec Jan Feb Mar Apr May Jun           Jul   Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun           Jul   Aug
     2007 2007 2007 2007 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2009 2009 2009 2009 2009 2009 2009 2009

© 2009 Mark Tabladillo Ph.D.                                                                                               16
Demo Two:
Government Forecasting
• Rapid response is as useful as prediction
• Seek intelligent correlations among related
  metrics
• Projections depend on time frame –
  modeling is continual




© 2009 Mark Tabladillo Ph.D.                17
Forecasting Algorithms
• Microsoft Time Series




                               Value
                               Slide




© 2009 Mark Tabladillo Ph.D.           18
Supervised Algorithms
•    Microsoft Naive Bayes
•    Microsoft Linear Regression
•    Microsoft Decision Trees
•    Microsoft Neural Networks
•    Microsoft Logistic Regression


                                     Value
                                     Slide




© 2009 Mark Tabladillo Ph.D.                 19
Unsupervised Algorithms
•    Microsoft Clustering
•    Microsoft Sequence Clustering
•    Microsoft Association Rules
•    Text Mining



                                     Value
                                     Slide




© 2009 Mark Tabladillo Ph.D.                 20
Resources
• MarkTab.NET
     Links, video resources and information for data mining

•    Data Mining with Microsoft SQL Server 2008
     by Jamie MacLennan (Author), ZhaoHui Tang (Author), Bogdan Crivat (Author)

•    Smart Business Intelligence Solutions with Microsoft® SQL Server® 2008
     (PRO-Developer)
     by Lynn Langit (Author), Matthew Roche (Author)




© 2009 Mark Tabladillo Ph.D.                                                  21
Regroup and Conclusion
• Main Points from this Presentation




© 2009 Mark Tabladillo Ph.D.           22
Contact Information
• Mark Tabladillo
  Twitter @marktabnet

• Also on:
  Linked In
  Facebook




© 2009 Mark Tabladillo Ph.D.   23
Bonus:
Sequence Clustering Ideas
•    Trading players in professional sports
•    Assigning players to certain positions
•    Moving from city to city
•    Store path at the mall
•    Cancer treatment path
•    Taking up a musical instrument
•    Taking up sports
•    Blogging
•    Viral news

© 2009 Mark Tabladillo Ph.D.                  24

Más contenido relacionado

Más de Mark Tabladillo

201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusMLMark Tabladillo
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...Mark Tabladillo
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Mark Tabladillo
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Mark Tabladillo
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureMark Tabladillo
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Mark Tabladillo
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Mark Tabladillo
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Mark Tabladillo
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Mark Tabladillo
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610Mark Tabladillo
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Mark Tabladillo
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Mark Tabladillo
 
Insider's guide to azure machine learning 201606
Insider's guide to azure machine learning 201606Insider's guide to azure machine learning 201606
Insider's guide to azure machine learning 201606Mark Tabladillo
 
Window functions for Data Science
Window functions for Data ScienceWindow functions for Data Science
Window functions for Data ScienceMark Tabladillo
 
Microsoft Technologies for Data Science 201601
Microsoft Technologies for Data Science 201601Microsoft Technologies for Data Science 201601
Microsoft Technologies for Data Science 201601Mark Tabladillo
 
Microsoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies: Back Office EditionMicrosoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies: Back Office EditionMark Tabladillo
 
Microsoft Data Science Technologies 201510
Microsoft Data Science Technologies 201510Microsoft Data Science Technologies 201510
Microsoft Data Science Technologies 201510Mark Tabladillo
 

Más de Mark Tabladillo (20)

201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on Azure
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
 
Insider's guide to azure machine learning 201606
Insider's guide to azure machine learning 201606Insider's guide to azure machine learning 201606
Insider's guide to azure machine learning 201606
 
Window functions for Data Science
Window functions for Data ScienceWindow functions for Data Science
Window functions for Data Science
 
Microsoft Technologies for Data Science 201601
Microsoft Technologies for Data Science 201601Microsoft Technologies for Data Science 201601
Microsoft Technologies for Data Science 201601
 
Microsoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies: Back Office EditionMicrosoft Data Science Technologies: Back Office Edition
Microsoft Data Science Technologies: Back Office Edition
 
Microsoft Data Science Technologies 201510
Microsoft Data Science Technologies 201510Microsoft Data Science Technologies 201510
Microsoft Data Science Technologies 201510
 

Último

Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...ictsugar
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportMintel Group
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadAyesha Khan
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCRashishs7044
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607dollysharma2066
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 

Último (20)

Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample Report
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
 
8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR8447779800, Low rate Call girls in Rohini Delhi NCR
8447779800, Low rate Call girls in Rohini Delhi NCR
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607FULL ENJOY Call girls in Paharganj Delhi | 8377087607
FULL ENJOY Call girls in Paharganj Delhi | 8377087607
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 

Data Mining Beyond Adventure Works (Redmond WA 10/3/2009)

  • 1. Data Mining beyond Adventure Works Mark Tabladillo Ph.D. http://marktab.net October 3, 2009
  • 2. Approach of this Presentation • Emphasize – Conceptual value of data mining – Relationship of data mining to the real world • Reserve – Specific procedures and mechanics – Specific mathematics – Production implementation © 2009 Mark Tabladillo Ph.D. 2
  • 3. Outline • Data Mining Fundamentals • Interactive Demos • Conclusion © 2009 Mark Tabladillo Ph.D. 3
  • 4. Interactive Demos • Sports • Government Forecasting © 2009 Mark Tabladillo Ph.D. 4
  • 5. Data Mining Definitions • Data mining is the automatic or semi- automatic process of exploring data for meaningful or useful patterns. • Data mining algorithms typically use estimation or optimization to achieve results (as opposed to only calculations). © 2009 Mark Tabladillo Ph.D. 5
  • 6. Microsoft Data Mining • Microsoft Data Mining refers to Microsoft’s specific implementation of certain common data mining algorithms for the DMX (Data Mining Extensions) language. • Also called SQL Server Data Mining, the technology is integrated into SQL Server rather than presented as an independent application. © 2009 Mark Tabladillo Ph.D. 6
  • 7. Data Mining Tasks • Supervised – Answer known, what is correlated? • Unsupervised – Answer unknown (unspecified), what are the groups? • Forecasting – Given a trend, what is next? Value Slide © 2009 Mark Tabladillo Ph.D. 7
  • 8. List the Data Mining Algorithms • Ten Answers • Each one is a field of academic focus © 2009 Mark Tabladillo Ph.D. 8
  • 9. The Data Mining Algorithms • Microsoft Naive Bayes • Microsoft Linear Regression • Microsoft Decision Trees • Microsoft Time Series • Microsoft Clustering • Microsoft Sequence Clustering • Microsoft Association Rules • Microsoft Neural Networks • Microsoft Logistic Regression • Text Mining © 2009 Mark Tabladillo Ph.D. 9
  • 10. The Analyze Tab Menu Option Data Mining Algorithm Analyze Key Influencers Naïve Bayes Detect Categories Clustering Fill from Example Logistic Regression Forecast Time Series Highlight Exceptions Clustering Scenario Analysis (Goal Seek) Logistic Regression Scenario Analysis (What If) Logistic Regression Prediction Calculator Logistic Regression Shopping Basket Analysis Association Rules © 2009 Mark Tabladillo Ph.D. 10
  • 11. Demo One: National League Baseball • Directions: You are on the management team for the Atlanta Braves. To better serve the team, you have been instructed by the owner to group the players by considering both their position and their salary. © 2009 Mark Tabladillo Ph.D. 11
  • 12. Demo One: National League Baseball • The following rules apply: – You must make more than one group – Each group must have at least two players – Players of different position may be in the same group © 2009 Mark Tabladillo Ph.D. 12
  • 13. Demo One: National League Baseball • Individual attributes can be used to make groups • Historical statistics can be used to group new players • Both supervised and unsupervised algorithms can be applied to the same data © 2009 Mark Tabladillo Ph.D. 13
  • 14. Demo Two: Government Forecasting • Directions: The President is asking your opinion on how the following numbers will increase over the next few months. Because this project is sensitive, you do not know what these numbers measure. However, based on the available history, make your best projection for the next six periods. © 2009 Mark Tabladillo Ph.D. 14
  • 15. Demo Two: Government Forecasting 8 7 6 5 4 3 2 1 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 © 2009 Mark Tabladillo Ph.D. 15
  • 16. Demo Two: Government Forecasting 12 10 8 6 4 2 0 Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug 2007 2007 2007 2007 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 2009 2009 2009 2009 2009 2009 2009 2009 © 2009 Mark Tabladillo Ph.D. 16
  • 17. Demo Two: Government Forecasting • Rapid response is as useful as prediction • Seek intelligent correlations among related metrics • Projections depend on time frame – modeling is continual © 2009 Mark Tabladillo Ph.D. 17
  • 18. Forecasting Algorithms • Microsoft Time Series Value Slide © 2009 Mark Tabladillo Ph.D. 18
  • 19. Supervised Algorithms • Microsoft Naive Bayes • Microsoft Linear Regression • Microsoft Decision Trees • Microsoft Neural Networks • Microsoft Logistic Regression Value Slide © 2009 Mark Tabladillo Ph.D. 19
  • 20. Unsupervised Algorithms • Microsoft Clustering • Microsoft Sequence Clustering • Microsoft Association Rules • Text Mining Value Slide © 2009 Mark Tabladillo Ph.D. 20
  • 21. Resources • MarkTab.NET Links, video resources and information for data mining • Data Mining with Microsoft SQL Server 2008 by Jamie MacLennan (Author), ZhaoHui Tang (Author), Bogdan Crivat (Author) • Smart Business Intelligence Solutions with Microsoft® SQL Server® 2008 (PRO-Developer) by Lynn Langit (Author), Matthew Roche (Author) © 2009 Mark Tabladillo Ph.D. 21
  • 22. Regroup and Conclusion • Main Points from this Presentation © 2009 Mark Tabladillo Ph.D. 22
  • 23. Contact Information • Mark Tabladillo Twitter @marktabnet • Also on: Linked In Facebook © 2009 Mark Tabladillo Ph.D. 23
  • 24. Bonus: Sequence Clustering Ideas • Trading players in professional sports • Assigning players to certain positions • Moving from city to city • Store path at the mall • Cancer treatment path • Taking up a musical instrument • Taking up sports • Blogging • Viral news © 2009 Mark Tabladillo Ph.D. 24