SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Document
Classification using
DMX in Analysis
Services
Mark Tabladillo Ph.D.
http://marktab.net
September 18, 2010
SQL Saturday 46 -- Raleigh NC
#sqlsat46 #MarkTabNet




                                © 2010 Mark Tabladillo Ph.D.
                                    2
MarkTab & Text Mining




    © 2010 Mark Tabladillo Ph.D.
3
© 2010 Mark Tabladillo Ph.D.
4
Outline




                      © 2010 Mark Tabladillo Ph.D.
 Tools for
              Demos
Text Mining

                          5
Data Mining as a Service




    © 2010 Mark Tabladillo Ph.D.
6
Text Mining Product
Comparison from 2008




                                                                                                                         © 2010 Mark Tabladillo Ph.D.
                                                                                                                             7

Feinerer, I., Hornik, K., & Meyer, D. (2008). Text Mining Infrastructure in R. Journal of Statistical Software, 25(5).
SQL Server Data Mining
Activity     How
Preprocess   T-SQL; Integration Services; Data Mining Add-In for Excel; .NET
             programming

Associate    Microsoft Association Rules (algorithm)




                                                                               © 2010 Mark Tabladillo Ph.D.
Cluster      Microsoft Clustering (algorithm)

Summarize    Integration Services (Term Extraction, Term Lookup)

Categorize   Integration Services

API          Includes DMX, XMLA, AMO, ADOMD.NET

                                                                                   8
APIs for Data Mining
 Acronym     Term                             Definition
 DMX         Data Mining Extensions           SQL-like queries
             (OLE DB for Data Mining)


 XMLA        Extensible Markup Language for   Client communication
             Analysis                         protocol




                                                                       © 2010 Mark Tabladillo Ph.D.
 AMO         Analysis Management Objects      .NET library to manage
                                              Analysis Services


 ADOMD.NET   ActiveX Data Objects             .NET Framework data
             (Multidimensional) for .NET      provider
                                                                           9
DMX Tasks
• Data Definition
  • Create, Alter, Drop – Mining Structure
  • Create, Drop – Mining Model
  • Export and Import Models
• Data Manipulation




                                                                    © 2010 Mark Tabladillo Ph.D.
  • Query Models, Content, Cases, Sample Cases, Dimension Content




                                                                    10
SQL Server Data Mining
Applications (User Interfaces)
User Interface                                    Activity
Excel (and PowerPivot for Excel)                    DMX

BIDS (Business Intelligence         Analysis Services Project; Integration
Development Studio)                 Services Project (T-SQL; DMX; XMLA)




                                                                             © 2010 Mark Tabladillo Ph.D.
SSMS (SQL Server Management                  T-SQL; DMX; XMLA
Studio)
PowerShell version 2.0                       T-SQL; DMX; XMLA
                                             AMO; ADOMD.NET
SharePoint                           (Requires Setup or Customization)

Your Name Here (Develop Your Own)                     ?
                                                                             11
Outline




                      © 2010 Mark Tabladillo Ph.D.
 Tools for
              Demos
Text Mining

                      12
Data: Presidential Addresses




                                                                                      © 2010 Mark Tabladillo Ph.D.
                                                                                      13

 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470277742,descCd-DOWNLOAD.html
Excel
• Use the 32-bit Excel add-in for Data Mining
  • Written for SQL Server 2008, ok for 2008 R2
  • Written for Office 2007, ok for 2010
• (Optional) Add the free PowerPivot add-in
  (http://powerpivot.com)




                                                  © 2010 Mark Tabladillo Ph.D.
                                                  14
Click to edit Master title style
                                              Datasets
                                                 &
                                              Models     Public Cloud or On-
                                                         Premise Private
                                                         Cloud




                                                                        SQL
                                                                        Server
 •   SQL Server    PowerPivot                                           Analysis
 •   Access       Data Sources                                          Services
 •   Oracle
 •   Teradata
 •   Sybase
 •   Informix
 •   DB2
 •   Data Feeds
 •   Text Files




                   ©2010 Predixion Software
BIDS
• The preferred application for production data mining
• Analysis Services Projects
  • Make Mining Structures and Models
  • Data Mining for OLAP Cubes
  • Excellent for Experimentation




                                                         © 2010 Mark Tabladillo Ph.D.
• Integration Services Projects
  • Term Extraction and Term Lookup Text Mining
  • Excellent for Production
• Reporting Services Projects
  • Similar to Crystal Reports

                                                         16
SSMS
• Production management and maintenance
• Scripts can become stored procedures
• T-SQL, DMX, MDX, XMLA




                                          © 2010 Mark Tabladillo Ph.D.
                                          17
PowerShell
• Object-oriented command prompt, now in version 2
• Provides complete access to AMO, ADOMD.NET and DMX




                                                       © 2010 Mark Tabladillo Ph.D.
                                                       18
Excel in Production
• Can create and manage permanent data mining models
• Can document data mining models
• Can do some preprocessing (ETL)




                                                       © 2010 Mark Tabladillo Ph.D.
                                                       19
BIDS in Production
• Can create a production workflow with Integration Services
  projects
• Can create production data mining models with Analysis
  Services projects




                                                               © 2010 Mark Tabladillo Ph.D.
                                                               20
SSMS in Production
• The standard production user interface for SQL Server
• Also the standard production user interface for Analysis
  Services Databases
• Built for
  •   Scripting (T-SQL, MDX, DMX, XMLA)




                                                             © 2010 Mark Tabladillo Ph.D.
  •   Security
  •   Assembly Registration (Analysis Services)
  •   Stored Procedures (SQL Server)




                                                             21
PowerShell in Production
• Features
  • Object-oriented
  • Command window or ISE (Integrated Scripting Environment)
  • Accesses .NET libraries and WMI (Windows Management
    Instrumentation)




                                                               © 2010 Mark Tabladillo Ph.D.
  • Version two adds event and exception handling




                                                               22
Resources
• MarkTab.NET
  Blog, links, video resources and information for
  data mining
• Blog: http://marktab.net/datamining




                                                     © 2010 Mark Tabladillo Ph.D.
• Twitter: @MarkTabNet




                                                     23
Regroup and Conclusion
• Main Points from this Presentation




                                       © 2010 Mark Tabladillo Ph.D.
                                       24
Contact Information
• Mark Tabladillo
  http://marktab.net

• Also on:
  Twitter @marktabnet




                        © 2010 Mark Tabladillo Ph.D.
  Linked In




                        25

Más contenido relacionado

Destacado

The One Time Methodology
The One Time MethodologyThe One Time Methodology
The One Time MethodologyMark Tabladillo
 
Jumpstart your Business Intelligence Career 201010
Jumpstart your Business Intelligence Career 201010Jumpstart your Business Intelligence Career 201010
Jumpstart your Business Intelligence Career 201010Mark Tabladillo
 
Sas® Macro Design Patterns
Sas® Macro Design PatternsSas® Macro Design Patterns
Sas® Macro Design PatternsMark Tabladillo
 
Introduction to SAS System Where Expressions
Introduction to SAS System Where ExpressionsIntroduction to SAS System Where Expressions
Introduction to SAS System Where ExpressionsMark Tabladillo
 
Regular Expressions in SAS Enterprise Guide
Regular Expressions in SAS Enterprise GuideRegular Expressions in SAS Enterprise Guide
Regular Expressions in SAS Enterprise GuideMark Tabladillo
 
An Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrAn Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrLucidworks (Archived)
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Mark Tabladillo
 

Destacado (7)

The One Time Methodology
The One Time MethodologyThe One Time Methodology
The One Time Methodology
 
Jumpstart your Business Intelligence Career 201010
Jumpstart your Business Intelligence Career 201010Jumpstart your Business Intelligence Career 201010
Jumpstart your Business Intelligence Career 201010
 
Sas® Macro Design Patterns
Sas® Macro Design PatternsSas® Macro Design Patterns
Sas® Macro Design Patterns
 
Introduction to SAS System Where Expressions
Introduction to SAS System Where ExpressionsIntroduction to SAS System Where Expressions
Introduction to SAS System Where Expressions
 
Regular Expressions in SAS Enterprise Guide
Regular Expressions in SAS Enterprise GuideRegular Expressions in SAS Enterprise Guide
Regular Expressions in SAS Enterprise Guide
 
An Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache SolrAn Introduction to Basics of Search and Relevancy with Apache Solr
An Introduction to Basics of Search and Relevancy with Apache Solr
 
Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016Georgia Tech Data Science Hackathon September 2016
Georgia Tech Data Science Hackathon September 2016
 

Similar a Document Classification using DMX in SQL Server Analysis Services

Data mining with excel 2010 and power pivot
Data mining with excel 2010 and power pivotData mining with excel 2010 and power pivot
Data mining with excel 2010 and power pivotigsc
 
Data Mining with Excel 2010 and PowerPivot
Data Mining with Excel 2010 and PowerPivotData Mining with Excel 2010 and PowerPivot
Data Mining with Excel 2010 and PowerPivotMark Tabladillo
 
SQL Server Data Mining for SQL Server Professionals
SQL Server Data Mining for SQL Server Professionals SQL Server Data Mining for SQL Server Professionals
SQL Server Data Mining for SQL Server Professionals Mark Tabladillo
 
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
The Perfect Storm: The Impact of Analytics, Big Data and AnalyticsThe Perfect Storm: The Impact of Analytics, Big Data and Analytics
The Perfect Storm: The Impact of Analytics, Big Data and AnalyticsInside Analysis
 
An overview of Microsoft data mining technology
An overview of Microsoft data mining technologyAn overview of Microsoft data mining technology
An overview of Microsoft data mining technologyMark Tabladillo
 
An overview of microsoft data mining technology
An overview of microsoft data mining technologyAn overview of microsoft data mining technology
An overview of microsoft data mining technologyMark Tabladillo
 
Enterprise Data Mining for SQL Server Professionals 20110319
Enterprise Data Mining for SQL Server Professionals 20110319Enterprise Data Mining for SQL Server Professionals 20110319
Enterprise Data Mining for SQL Server Professionals 20110319Mark Tabladillo
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Mark Tabladillo
 
Keat Resume 2012b
Keat Resume 2012bKeat Resume 2012b
Keat Resume 2012bmkeating1
 
Enteprise Data Mining with SQL Server by Mark Tabladillo
Enteprise Data Mining with SQL Server by Mark TabladilloEnteprise Data Mining with SQL Server by Mark Tabladillo
Enteprise Data Mining with SQL Server by Mark TabladilloFelipe Ferreira
 
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2Mark Tabladillo
 
SSDT Workshop @ SQL Bits X (2012-03-29)
SSDT Workshop @ SQL Bits X (2012-03-29)SSDT Workshop @ SQL Bits X (2012-03-29)
SSDT Workshop @ SQL Bits X (2012-03-29)Gert Drapers
 
Keat Resume 2012b
Keat Resume 2012bKeat Resume 2012b
Keat Resume 2012bmkeating1
 
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...EclipseDayParis
 
Secrets of Enterprise Data Mining
Secrets of Enterprise Data Mining Secrets of Enterprise Data Mining
Secrets of Enterprise Data Mining Mark Tabladillo
 
Introduction to .net and asp
Introduction to .net and aspIntroduction to .net and asp
Introduction to .net and aspPrachi Agarwal
 
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBReal-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBVoltDB
 
Model Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & FutureModel Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & Futureelliando dias
 
Altova Tools for DB2 pureXML
Altova Tools for DB2 pureXMLAltova Tools for DB2 pureXML
Altova Tools for DB2 pureXMLdavemcg
 
Leveraging PowerPivot
Leveraging PowerPivotLeveraging PowerPivot
Leveraging PowerPivotDan English
 

Similar a Document Classification using DMX in SQL Server Analysis Services (20)

Data mining with excel 2010 and power pivot
Data mining with excel 2010 and power pivotData mining with excel 2010 and power pivot
Data mining with excel 2010 and power pivot
 
Data Mining with Excel 2010 and PowerPivot
Data Mining with Excel 2010 and PowerPivotData Mining with Excel 2010 and PowerPivot
Data Mining with Excel 2010 and PowerPivot
 
SQL Server Data Mining for SQL Server Professionals
SQL Server Data Mining for SQL Server Professionals SQL Server Data Mining for SQL Server Professionals
SQL Server Data Mining for SQL Server Professionals
 
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
The Perfect Storm: The Impact of Analytics, Big Data and AnalyticsThe Perfect Storm: The Impact of Analytics, Big Data and Analytics
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
 
An overview of Microsoft data mining technology
An overview of Microsoft data mining technologyAn overview of Microsoft data mining technology
An overview of Microsoft data mining technology
 
An overview of microsoft data mining technology
An overview of microsoft data mining technologyAn overview of microsoft data mining technology
An overview of microsoft data mining technology
 
Enterprise Data Mining for SQL Server Professionals 20110319
Enterprise Data Mining for SQL Server Professionals 20110319Enterprise Data Mining for SQL Server Professionals 20110319
Enterprise Data Mining for SQL Server Professionals 20110319
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008
 
Keat Resume 2012b
Keat Resume 2012bKeat Resume 2012b
Keat Resume 2012b
 
Enteprise Data Mining with SQL Server by Mark Tabladillo
Enteprise Data Mining with SQL Server by Mark TabladilloEnteprise Data Mining with SQL Server by Mark Tabladillo
Enteprise Data Mining with SQL Server by Mark Tabladillo
 
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
 
SSDT Workshop @ SQL Bits X (2012-03-29)
SSDT Workshop @ SQL Bits X (2012-03-29)SSDT Workshop @ SQL Bits X (2012-03-29)
SSDT Workshop @ SQL Bits X (2012-03-29)
 
Keat Resume 2012b
Keat Resume 2012bKeat Resume 2012b
Keat Resume 2012b
 
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
Solution de génération de rapport OpenDocument à partir de plusieurs sources ...
 
Secrets of Enterprise Data Mining
Secrets of Enterprise Data Mining Secrets of Enterprise Data Mining
Secrets of Enterprise Data Mining
 
Introduction to .net and asp
Introduction to .net and aspIntroduction to .net and asp
Introduction to .net and asp
 
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBReal-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB
 
Model Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & FutureModel Driven Architecture (MDA): Motivations, Status & Future
Model Driven Architecture (MDA): Motivations, Status & Future
 
Altova Tools for DB2 pureXML
Altova Tools for DB2 pureXMLAltova Tools for DB2 pureXML
Altova Tools for DB2 pureXML
 
Leveraging PowerPivot
Leveraging PowerPivotLeveraging PowerPivot
Leveraging PowerPivot
 

Más de Mark Tabladillo

How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006Mark Tabladillo
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMark Tabladillo
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for DevelopersMark Tabladillo
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0Mark Tabladillo
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019Mark Tabladillo
 
201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusMLMark Tabladillo
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine LearningMark Tabladillo
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...Mark Tabladillo
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Mark Tabladillo
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Mark Tabladillo
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureMark Tabladillo
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Mark Tabladillo
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Mark Tabladillo
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Mark Tabladillo
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Mark Tabladillo
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610Mark Tabladillo
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Mark Tabladillo
 

Más de Mark Tabladillo (20)

How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006How to find low-cost or free data science resources 202006
How to find low-cost or free data science resources 202006
 
Microsoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science RecapMicrosoft Build 2020: Data Science Recap
Microsoft Build 2020: Data Science Recap
 
201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0201906 01 Introduction to ML.NET 1.0
201906 01 Introduction to ML.NET 1.0
 
201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019201906 04 Overview of Automated ML June 2019
201906 04 Overview of Automated ML June 2019
 
201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML201906 03 Introduction to NimbusML
201906 03 Introduction to NimbusML
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0
 
201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning201905 Azure Databricks for Machine Learning
201905 Azure Databricks for Machine Learning
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
 
Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904Big Data Advanced Analytics on Microsoft Azure 201904
Big Data Advanced Analytics on Microsoft Azure 201904
 
Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904Managing Enterprise Data Science 201904
Managing Enterprise Data Science 201904
 
Training of Python scikit-learn models on Azure
Training of Python scikit-learn models on AzureTraining of Python scikit-learn models on Azure
Training of Python scikit-learn models on Azure
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808Advanced Analytics with Power BI 201808
Advanced Analytics with Power BI 201808
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
 
Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017Machine learning services with SQL Server 2017
Machine learning services with SQL Server 2017
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 
How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610How Big Companies plan to use Our Big Data 201610
How Big Companies plan to use Our Big Data 201610
 
Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608Microsoft Data Science Technologies 201608
Microsoft Data Science Technologies 201608
 

Último

8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
Entrepreneurship lessons in Philippines
Entrepreneurship lessons in  PhilippinesEntrepreneurship lessons in  Philippines
Entrepreneurship lessons in PhilippinesDavidSamuel525586
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
8447779800, Low rate Call girls in Dwarka mor Delhi NCR
8447779800, Low rate Call girls in Dwarka mor Delhi NCR8447779800, Low rate Call girls in Dwarka mor Delhi NCR
8447779800, Low rate Call girls in Dwarka mor Delhi NCRashishs7044
 
Send Files | Sendbig.comSend Files | Sendbig.com
Send Files | Sendbig.comSend Files | Sendbig.comSend Files | Sendbig.comSend Files | Sendbig.com
Send Files | Sendbig.comSend Files | Sendbig.comSendBig4
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...ssuserf63bd7
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxFinancial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxsaniyaimamuddin
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 

Último (20)

8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
Entrepreneurship lessons in Philippines
Entrepreneurship lessons in  PhilippinesEntrepreneurship lessons in  Philippines
Entrepreneurship lessons in Philippines
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
8447779800, Low rate Call girls in Dwarka mor Delhi NCR
8447779800, Low rate Call girls in Dwarka mor Delhi NCR8447779800, Low rate Call girls in Dwarka mor Delhi NCR
8447779800, Low rate Call girls in Dwarka mor Delhi NCR
 
Call Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North GoaCall Us ➥9319373153▻Call Girls In North Goa
Call Us ➥9319373153▻Call Girls In North Goa
 
Send Files | Sendbig.comSend Files | Sendbig.com
Send Files | Sendbig.comSend Files | Sendbig.comSend Files | Sendbig.comSend Files | Sendbig.com
Send Files | Sendbig.comSend Files | Sendbig.com
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
Intermediate Accounting, Volume 2, 13th Canadian Edition by Donald E. Kieso t...
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptxFinancial-Statement-Analysis-of-Coca-cola-Company.pptx
Financial-Statement-Analysis-of-Coca-cola-Company.pptx
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 

Document Classification using DMX in SQL Server Analysis Services

  • 1. Document Classification using DMX in Analysis Services Mark Tabladillo Ph.D. http://marktab.net September 18, 2010
  • 2. SQL Saturday 46 -- Raleigh NC #sqlsat46 #MarkTabNet © 2010 Mark Tabladillo Ph.D. 2
  • 3. MarkTab & Text Mining © 2010 Mark Tabladillo Ph.D. 3
  • 4. © 2010 Mark Tabladillo Ph.D. 4
  • 5. Outline © 2010 Mark Tabladillo Ph.D. Tools for Demos Text Mining 5
  • 6. Data Mining as a Service © 2010 Mark Tabladillo Ph.D. 6
  • 7. Text Mining Product Comparison from 2008 © 2010 Mark Tabladillo Ph.D. 7 Feinerer, I., Hornik, K., & Meyer, D. (2008). Text Mining Infrastructure in R. Journal of Statistical Software, 25(5).
  • 8. SQL Server Data Mining Activity How Preprocess T-SQL; Integration Services; Data Mining Add-In for Excel; .NET programming Associate Microsoft Association Rules (algorithm) © 2010 Mark Tabladillo Ph.D. Cluster Microsoft Clustering (algorithm) Summarize Integration Services (Term Extraction, Term Lookup) Categorize Integration Services API Includes DMX, XMLA, AMO, ADOMD.NET 8
  • 9. APIs for Data Mining Acronym Term Definition DMX Data Mining Extensions SQL-like queries (OLE DB for Data Mining) XMLA Extensible Markup Language for Client communication Analysis protocol © 2010 Mark Tabladillo Ph.D. AMO Analysis Management Objects .NET library to manage Analysis Services ADOMD.NET ActiveX Data Objects .NET Framework data (Multidimensional) for .NET provider 9
  • 10. DMX Tasks • Data Definition • Create, Alter, Drop – Mining Structure • Create, Drop – Mining Model • Export and Import Models • Data Manipulation © 2010 Mark Tabladillo Ph.D. • Query Models, Content, Cases, Sample Cases, Dimension Content 10
  • 11. SQL Server Data Mining Applications (User Interfaces) User Interface Activity Excel (and PowerPivot for Excel) DMX BIDS (Business Intelligence Analysis Services Project; Integration Development Studio) Services Project (T-SQL; DMX; XMLA) © 2010 Mark Tabladillo Ph.D. SSMS (SQL Server Management T-SQL; DMX; XMLA Studio) PowerShell version 2.0 T-SQL; DMX; XMLA AMO; ADOMD.NET SharePoint (Requires Setup or Customization) Your Name Here (Develop Your Own) ? 11
  • 12. Outline © 2010 Mark Tabladillo Ph.D. Tools for Demos Text Mining 12
  • 13. Data: Presidential Addresses © 2010 Mark Tabladillo Ph.D. 13 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470277742,descCd-DOWNLOAD.html
  • 14. Excel • Use the 32-bit Excel add-in for Data Mining • Written for SQL Server 2008, ok for 2008 R2 • Written for Office 2007, ok for 2010 • (Optional) Add the free PowerPivot add-in (http://powerpivot.com) © 2010 Mark Tabladillo Ph.D. 14
  • 15. Click to edit Master title style Datasets & Models Public Cloud or On- Premise Private Cloud SQL Server • SQL Server PowerPivot Analysis • Access Data Sources Services • Oracle • Teradata • Sybase • Informix • DB2 • Data Feeds • Text Files ©2010 Predixion Software
  • 16. BIDS • The preferred application for production data mining • Analysis Services Projects • Make Mining Structures and Models • Data Mining for OLAP Cubes • Excellent for Experimentation © 2010 Mark Tabladillo Ph.D. • Integration Services Projects • Term Extraction and Term Lookup Text Mining • Excellent for Production • Reporting Services Projects • Similar to Crystal Reports 16
  • 17. SSMS • Production management and maintenance • Scripts can become stored procedures • T-SQL, DMX, MDX, XMLA © 2010 Mark Tabladillo Ph.D. 17
  • 18. PowerShell • Object-oriented command prompt, now in version 2 • Provides complete access to AMO, ADOMD.NET and DMX © 2010 Mark Tabladillo Ph.D. 18
  • 19. Excel in Production • Can create and manage permanent data mining models • Can document data mining models • Can do some preprocessing (ETL) © 2010 Mark Tabladillo Ph.D. 19
  • 20. BIDS in Production • Can create a production workflow with Integration Services projects • Can create production data mining models with Analysis Services projects © 2010 Mark Tabladillo Ph.D. 20
  • 21. SSMS in Production • The standard production user interface for SQL Server • Also the standard production user interface for Analysis Services Databases • Built for • Scripting (T-SQL, MDX, DMX, XMLA) © 2010 Mark Tabladillo Ph.D. • Security • Assembly Registration (Analysis Services) • Stored Procedures (SQL Server) 21
  • 22. PowerShell in Production • Features • Object-oriented • Command window or ISE (Integrated Scripting Environment) • Accesses .NET libraries and WMI (Windows Management Instrumentation) © 2010 Mark Tabladillo Ph.D. • Version two adds event and exception handling 22
  • 23. Resources • MarkTab.NET Blog, links, video resources and information for data mining • Blog: http://marktab.net/datamining © 2010 Mark Tabladillo Ph.D. • Twitter: @MarkTabNet 23
  • 24. Regroup and Conclusion • Main Points from this Presentation © 2010 Mark Tabladillo Ph.D. 24
  • 25. Contact Information • Mark Tabladillo http://marktab.net • Also on: Twitter @marktabnet © 2010 Mark Tabladillo Ph.D. Linked In 25