SlideShare una empresa de Scribd logo
1 de 36
Data Warehousing
        &
   Data Mining


              By Mandar Kulkarni
               PRN 10030141129
                         MBA-IT
                          SICSR
Contents
•   Data warehousing
•   Understanding data warehousing
•   Data warehouse architecture
•   Data Mining
•   Data mining techniques
Warehouse?

Real time example?
Data Warehousing
Samsung

Mumbai




 Delhi
           Sales per item type per branch    Sales
                  for first quarter.        Manager

Chennai




Banglore
• Now, the sales manager wants to know the
  sales of first quarter.?

• Solution
  – Extract information from each database store it at
    a single place, and process using operational
    systems.!
Solution
Mumbai


                                        Report
 Delhi
                         Query &                  Sales
             Data      Analysis tools            Manager
           Warehouse

Chennai




Banglore
Operational Systems
• Running the business real time
• Routine tasks
• Decision Support Systems(DSS)
  – Help in taking actions!
• Used by people who deal with customers,
  products
• They are increasingly used by customers
Data Warehouse
•      A single, complete and consistent store of
    data obtained from a variety of different
    sources made available to end users in a what
    they can understand and use in a business
    context.

• A process of transforming data into
  information and making it available to users in
  a timely enough manner to make a difference
Definition


• Integrated, Subject-Oriented, Time-Variant,
  Nonvolatile database that provides support
              for decision making
Data warehouse architecture
Source
Data                                          Information
                    Management & Control        Delivery
  External

                        Metadata


Production

                                              MDDB
                       Data Warehouse
                       DBMS
Internal



                                                            Report /
                                                            Query

Archived                         Data Marts
             Data Staging

                                                            Data
                                                            Mining
Components
• Source Data
• Data Staging (Data Extraction, cleaning And   Loading )
   – Talend is the first open source ETL tool
• Data Storage
• Information Delivery (EIS)
• Management and control
OLAP
• Online Analytical Processing Tools
• DSS tools that use multidimensional data
  analysis techniques
  – Support for a DSS data store
  – Data extraction and integration filter
  – Specialized presentation interface
• Oracle OLAP 11G
Multidimensional analysis
OLAP architecture
12 Rules of Data Warehouse
1. Data Warehouse and Operational
   Environments are Separated
2. Data is integrated
3. Contains historical data over a long period of
   time
4. Data is a snapshot data captured at a given
   point in time
5. Data is subject-oriented
6.Mainly read-only with periodic batch updates

7.Development Life Cycle has a data driven
   approach versus the traditional process-driven
   approach

8.Data contains several levels of detail
   -Current, Old, Lightly Summarized, Highly
   Summarized
9.Environment is characterized by Read-only
  transactions to very large data sets

10.System that traces data sources, transformations,
  and storage

11.Metadata is a critical component
   – Source, transformation, integration, storage, relationships,
     history, etc

12.Contains a chargeback mechanism for resource
  usage that enforces optimal use of data by end users
OLTP v/s Data warehousing
           OLTP                 Data Warehousing
•   Application Oriented    • Subject Oriented
•   Used to Run Business    • Used to analyze business
•   Detailed data           • Summarized and refined
•   Current up-to date      • Snapshot Data
•   Isolated data           • Integrated Data
                            • Ad-Hoc Access
•   Repetitive Access
                            • Performance relaxed
•   Performance Sensitive
                            • Large volume accessed at a
•   Few records accessed      time
•   Read/Update Access      • Mostly Read
Data Warehouse summary

• Integrated platform for OLAP and DSS

• Helps optimize business operations

• Easy access to multidimensional data
Data Mining
Why Data Mining?
                            Wealth generation




                                                     Analyzing trends




Strategic decision making




                                                Security
Data Mining
• Look for hidden patterns and trends in data
  that is not immediately apparent from
  summarizing the data

• No Query…

• …But an “Interestingness criteria”
Data Mining




        +                       =
              Interestingness       Hidden
 Data             criteria          patterns
Data Mining                          Type
                                      of
                                     Patterns




        +                       =
              Interestingness       Hidden
 Data             criteria          patterns
Data Mining
   Type of data       Type of
                      Interestingness criteria




                  +                         =
                      Interestingness            Hidden
 Data                     criteria               patterns
Type of Data
• Tabular            (Ex: Transaction data)
   – Relational
   – Multi-dimensional

• Tree               (Ex: XML data)

• Graphs

• Sequence               (Ex: DNA, activity logs)

• Text, Multimedia …
Type of Interestingness
•   Frequency
•   Rarity
•   Correlation
•   Length of occurrence (for sequence and temporal data)
•   Consistency
•   Repeating / periodicity
•   “Abnormal” behavior
•   Other patterns of interestingness…
Data Mining vs Statistical Inference

Statistics:


                                                    Statistical
         Conceptual                                 Reasoning
           Model
        (Hypothesis)




                                “Proof”
                       (Validation of Hypothesis)
Data Mining vs Statistical Inference

Data mining:


                                  Mining
                                  Algorithm
                                  Based on
        Data                      Interestingness




               Pattern
               (model, rule,
                hypothesis)
               discovery
Used for..
• Data mining is used for
  – Frequent Item-sets
  – Associations
  – Classifications
  – Clustering
Techniques
• Algorithms
   – Apriori algorithm

   – Decision tree
      • SLIQ
          – Supervised Learning in QUEST
          – IBM

• “GROUP BY”
  mysql> select sum(sal),deptno from emp group by deptno;
Data Mining Summary
• Helps in pattern analysis and thus taking
  actions –real time and future based.

• Analyzing trends and clusters in business
  operations.
References
• http://www.datawarehousing.com/
• http://www.dw-institute.com/
• http://www.almaden.ibm.com/cs/quest/index.html
Thank you



Any Questions?

Más contenido relacionado

La actualidad más candente

Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
Saif Ullah
 
Data Mining Concepts and Techniques
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and Techniques
Pratik Tambekar
 

La actualidad más candente (20)

Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
 
data mining
data miningdata mining
data mining
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)
 
Data mining
Data mining Data mining
Data mining
 
Data Mining Concepts and Techniques
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and Techniques
 
Data mining
Data mining Data mining
Data mining
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Artificial Intelligence: Data Mining
Artificial Intelligence: Data MiningArtificial Intelligence: Data Mining
Artificial Intelligence: Data Mining
 
Data Mining methodology
 Data Mining methodology  Data Mining methodology
Data Mining methodology
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data mining
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 

Similar a Data warehousing

Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk
 
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8amBusiness Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Barrett Peterson
 
MIS: Business Intelligence
MIS: Business IntelligenceMIS: Business Intelligence
MIS: Business Intelligence
Jonathan Coleman
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
David Walker
 

Similar a Data warehousing (20)

Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno Solutions
 
Data mining
Data miningData mining
Data mining
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Data mining & column stores
Data mining & column storesData mining & column stores
Data mining & column stores
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
Dbm630_lecture02-03
Dbm630_lecture02-03Dbm630_lecture02-03
Dbm630_lecture02-03
 
Dbm630_Lecture02-03
Dbm630_Lecture02-03Dbm630_Lecture02-03
Dbm630_Lecture02-03
 
Lecture1
Lecture1Lecture1
Lecture1
 
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8amBusiness Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
 
MIS: Business Intelligence
MIS: Business IntelligenceMIS: Business Intelligence
MIS: Business Intelligence
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
DataScienceIntroduction.pptx
DataScienceIntroduction.pptxDataScienceIntroduction.pptx
DataScienceIntroduction.pptx
 
1-_Intro_to_Data_Minning__DWH.ppt
1-_Intro_to_Data_Minning__DWH.ppt1-_Intro_to_Data_Minning__DWH.ppt
1-_Intro_to_Data_Minning__DWH.ppt
 
Software architecture & design patterns for MS CRM Developers
Software architecture & design patterns for MS CRM  Developers Software architecture & design patterns for MS CRM  Developers
Software architecture & design patterns for MS CRM Developers
 
Big Data - Module 1
Big Data - Module 1Big Data - Module 1
Big Data - Module 1
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
data warehousing
data warehousingdata warehousing
data warehousing
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 

Último

Último (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Data warehousing

  • 1. Data Warehousing & Data Mining By Mandar Kulkarni PRN 10030141129 MBA-IT SICSR
  • 2. Contents • Data warehousing • Understanding data warehousing • Data warehouse architecture • Data Mining • Data mining techniques
  • 5. Samsung Mumbai Delhi Sales per item type per branch Sales for first quarter. Manager Chennai Banglore
  • 6. • Now, the sales manager wants to know the sales of first quarter.? • Solution – Extract information from each database store it at a single place, and process using operational systems.!
  • 7. Solution Mumbai Report Delhi Query & Sales Data Analysis tools Manager Warehouse Chennai Banglore
  • 8. Operational Systems • Running the business real time • Routine tasks • Decision Support Systems(DSS) – Help in taking actions! • Used by people who deal with customers, products • They are increasingly used by customers
  • 9. Data Warehouse • A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. • A process of transforming data into information and making it available to users in a timely enough manner to make a difference
  • 10. Definition • Integrated, Subject-Oriented, Time-Variant, Nonvolatile database that provides support for decision making
  • 12. Source Data Information Management & Control Delivery External Metadata Production MDDB Data Warehouse DBMS Internal Report / Query Archived Data Marts Data Staging Data Mining
  • 13. Components • Source Data • Data Staging (Data Extraction, cleaning And Loading ) – Talend is the first open source ETL tool • Data Storage • Information Delivery (EIS) • Management and control
  • 14. OLAP • Online Analytical Processing Tools • DSS tools that use multidimensional data analysis techniques – Support for a DSS data store – Data extraction and integration filter – Specialized presentation interface • Oracle OLAP 11G
  • 17. 12 Rules of Data Warehouse 1. Data Warehouse and Operational Environments are Separated 2. Data is integrated 3. Contains historical data over a long period of time 4. Data is a snapshot data captured at a given point in time 5. Data is subject-oriented
  • 18. 6.Mainly read-only with periodic batch updates 7.Development Life Cycle has a data driven approach versus the traditional process-driven approach 8.Data contains several levels of detail -Current, Old, Lightly Summarized, Highly Summarized
  • 19. 9.Environment is characterized by Read-only transactions to very large data sets 10.System that traces data sources, transformations, and storage 11.Metadata is a critical component – Source, transformation, integration, storage, relationships, history, etc 12.Contains a chargeback mechanism for resource usage that enforces optimal use of data by end users
  • 20. OLTP v/s Data warehousing OLTP Data Warehousing • Application Oriented • Subject Oriented • Used to Run Business • Used to analyze business • Detailed data • Summarized and refined • Current up-to date • Snapshot Data • Isolated data • Integrated Data • Ad-Hoc Access • Repetitive Access • Performance relaxed • Performance Sensitive • Large volume accessed at a • Few records accessed time • Read/Update Access • Mostly Read
  • 21. Data Warehouse summary • Integrated platform for OLAP and DSS • Helps optimize business operations • Easy access to multidimensional data
  • 23. Why Data Mining? Wealth generation Analyzing trends Strategic decision making Security
  • 24. Data Mining • Look for hidden patterns and trends in data that is not immediately apparent from summarizing the data • No Query… • …But an “Interestingness criteria”
  • 25. Data Mining + = Interestingness Hidden Data criteria patterns
  • 26. Data Mining Type of Patterns + = Interestingness Hidden Data criteria patterns
  • 27. Data Mining Type of data Type of Interestingness criteria + = Interestingness Hidden Data criteria patterns
  • 28. Type of Data • Tabular (Ex: Transaction data) – Relational – Multi-dimensional • Tree (Ex: XML data) • Graphs • Sequence (Ex: DNA, activity logs) • Text, Multimedia …
  • 29. Type of Interestingness • Frequency • Rarity • Correlation • Length of occurrence (for sequence and temporal data) • Consistency • Repeating / periodicity • “Abnormal” behavior • Other patterns of interestingness…
  • 30. Data Mining vs Statistical Inference Statistics: Statistical Conceptual Reasoning Model (Hypothesis) “Proof” (Validation of Hypothesis)
  • 31. Data Mining vs Statistical Inference Data mining: Mining Algorithm Based on Data Interestingness Pattern (model, rule, hypothesis) discovery
  • 32. Used for.. • Data mining is used for – Frequent Item-sets – Associations – Classifications – Clustering
  • 33. Techniques • Algorithms – Apriori algorithm – Decision tree • SLIQ – Supervised Learning in QUEST – IBM • “GROUP BY” mysql> select sum(sal),deptno from emp group by deptno;
  • 34. Data Mining Summary • Helps in pattern analysis and thus taking actions –real time and future based. • Analyzing trends and clusters in business operations.

Notas del editor

  1. Our bag is a data warehouse containing databases of different subjects and in different formats(books,notes,ppt)
  2. Example of Samsung productsSales manager wants to know quarterly sales all over india