SlideShare una empresa de Scribd logo
1 de 36
Data Warehousing
        &
   Data Mining


              By Mandar Kulkarni
               PRN 10030141129
                         MBA-IT
                          SICSR
Contents
•   Data warehousing
•   Understanding data warehousing
•   Data warehouse architecture
•   Data Mining
•   Data mining techniques
Warehouse?

Real time example?
Data Warehousing
Samsung

Mumbai




 Delhi
           Sales per item type per branch    Sales
                  for first quarter.        Manager

Chennai




Banglore
• Now, the sales manager wants to know the
  sales of first quarter.?

• Solution
  – Extract information from each database store it at
    a single place, and process using operational
    systems.!
Solution
Mumbai


                                        Report
 Delhi
                         Query &                  Sales
             Data      Analysis tools            Manager
           Warehouse

Chennai




Banglore
Operational Systems
• Running the business real time
• Routine tasks
• Decision Support Systems(DSS)
  – Help in taking actions!
• Used by people who deal with customers,
  products
• They are increasingly used by customers
Data Warehouse
•      A single, complete and consistent store of
    data obtained from a variety of different
    sources made available to end users in a what
    they can understand and use in a business
    context.

• A process of transforming data into
  information and making it available to users in
  a timely enough manner to make a difference
Definition


• Integrated, Subject-Oriented, Time-Variant,
  Nonvolatile database that provides support
              for decision making
Data warehouse architecture
Source
Data                                          Information
                    Management & Control        Delivery
  External

                        Metadata


Production

                                              MDDB
                       Data Warehouse
                       DBMS
Internal



                                                            Report /
                                                            Query

Archived                         Data Marts
             Data Staging

                                                            Data
                                                            Mining
Components
• Source Data
• Data Staging (Data Extraction, cleaning And   Loading )
   – Talend is the first open source ETL tool
• Data Storage
• Information Delivery (EIS)
• Management and control
OLAP
• Online Analytical Processing Tools
• DSS tools that use multidimensional data
  analysis techniques
  – Support for a DSS data store
  – Data extraction and integration filter
  – Specialized presentation interface
• Oracle OLAP 11G
Multidimensional analysis
OLAP architecture
12 Rules of Data Warehouse
1. Data Warehouse and Operational
   Environments are Separated
2. Data is integrated
3. Contains historical data over a long period of
   time
4. Data is a snapshot data captured at a given
   point in time
5. Data is subject-oriented
6.Mainly read-only with periodic batch updates

7.Development Life Cycle has a data driven
   approach versus the traditional process-driven
   approach

8.Data contains several levels of detail
   -Current, Old, Lightly Summarized, Highly
   Summarized
9.Environment is characterized by Read-only
  transactions to very large data sets

10.System that traces data sources, transformations,
  and storage

11.Metadata is a critical component
   – Source, transformation, integration, storage, relationships,
     history, etc

12.Contains a chargeback mechanism for resource
  usage that enforces optimal use of data by end users
OLTP v/s Data warehousing
           OLTP                 Data Warehousing
•   Application Oriented    • Subject Oriented
•   Used to Run Business    • Used to analyze business
•   Detailed data           • Summarized and refined
•   Current up-to date      • Snapshot Data
•   Isolated data           • Integrated Data
                            • Ad-Hoc Access
•   Repetitive Access
                            • Performance relaxed
•   Performance Sensitive
                            • Large volume accessed at a
•   Few records accessed      time
•   Read/Update Access      • Mostly Read
Data Warehouse summary

• Integrated platform for OLAP and DSS

• Helps optimize business operations

• Easy access to multidimensional data
Data Mining
Why Data Mining?
                            Wealth generation




                                                     Analyzing trends




Strategic decision making




                                                Security
Data Mining
• Look for hidden patterns and trends in data
  that is not immediately apparent from
  summarizing the data

• No Query…

• …But an “Interestingness criteria”
Data Mining




        +                       =
              Interestingness       Hidden
 Data             criteria          patterns
Data Mining                          Type
                                      of
                                     Patterns




        +                       =
              Interestingness       Hidden
 Data             criteria          patterns
Data Mining
   Type of data       Type of
                      Interestingness criteria




                  +                         =
                      Interestingness            Hidden
 Data                     criteria               patterns
Type of Data
• Tabular            (Ex: Transaction data)
   – Relational
   – Multi-dimensional

• Tree               (Ex: XML data)

• Graphs

• Sequence               (Ex: DNA, activity logs)

• Text, Multimedia …
Type of Interestingness
•   Frequency
•   Rarity
•   Correlation
•   Length of occurrence (for sequence and temporal data)
•   Consistency
•   Repeating / periodicity
•   “Abnormal” behavior
•   Other patterns of interestingness…
Data Mining vs Statistical Inference

Statistics:


                                                    Statistical
         Conceptual                                 Reasoning
           Model
        (Hypothesis)




                                “Proof”
                       (Validation of Hypothesis)
Data Mining vs Statistical Inference

Data mining:


                                  Mining
                                  Algorithm
                                  Based on
        Data                      Interestingness




               Pattern
               (model, rule,
                hypothesis)
               discovery
Used for..
• Data mining is used for
  – Frequent Item-sets
  – Associations
  – Classifications
  – Clustering
Techniques
• Algorithms
   – Apriori algorithm

   – Decision tree
      • SLIQ
          – Supervised Learning in QUEST
          – IBM

• “GROUP BY”
  mysql> select sum(sal),deptno from emp group by deptno;
Data Mining Summary
• Helps in pattern analysis and thus taking
  actions –real time and future based.

• Analyzing trends and clusters in business
  operations.
References
• http://www.datawarehousing.com/
• http://www.dw-institute.com/
• http://www.almaden.ibm.com/cs/quest/index.html
Thank you



Any Questions?

Más contenido relacionado

La actualidad más candente

Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Houw Liong The
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentationmillerca2
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data miningmaxonlinetr
 
data mining
data miningdata mining
data mininguoitc
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)Pratik Tambekar
 
Data Mining Concepts and Techniques
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and TechniquesPratik Tambekar
 
Data mining
Data mining Data mining
Data mining AthiraR23
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Artificial Intelligence: Data Mining
Artificial Intelligence: Data MiningArtificial Intelligence: Data Mining
Artificial Intelligence: Data MiningThe Integral Worm
 
Data Mining methodology
 Data Mining methodology  Data Mining methodology
Data Mining methodology rebeccatho
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data miningDhilsath Fathima
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryHoang Nguyen
 

La actualidad más candente (20)

Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
 
data mining
data miningdata mining
data mining
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)
 
Data mining
Data mining Data mining
Data mining
 
Data Mining Concepts and Techniques
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and Techniques
 
Data mining
Data mining Data mining
Data mining
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Artificial Intelligence: Data Mining
Artificial Intelligence: Data MiningArtificial Intelligence: Data Mining
Artificial Intelligence: Data Mining
 
Data Mining methodology
 Data Mining methodology  Data Mining methodology
Data Mining methodology
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data mining
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 

Similar a Data warehousing

Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsGDi Techno Solutions
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 abhagathk
 
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8amBusiness Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8amBarrett Peterson
 
MIS: Business Intelligence
MIS: Business IntelligenceMIS: Business Intelligence
MIS: Business IntelligenceJonathan Coleman
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationDavid Walker
 
DataScienceIntroduction.pptx
DataScienceIntroduction.pptxDataScienceIntroduction.pptx
DataScienceIntroduction.pptxKannanThangavelu2
 
1-_Intro_to_Data_Minning__DWH.ppt
1-_Intro_to_Data_Minning__DWH.ppt1-_Intro_to_Data_Minning__DWH.ppt
1-_Intro_to_Data_Minning__DWH.pptBsMath3rdsem
 
Software architecture & design patterns for MS CRM Developers
Software architecture & design patterns for MS CRM  Developers Software architecture & design patterns for MS CRM  Developers
Software architecture & design patterns for MS CRM Developers sebedatalabs
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introductionMurli Jha
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 

Similar a Data warehousing (20)

Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno Solutions
 
Data mining
Data miningData mining
Data mining
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Data mining & column stores
Data mining & column storesData mining & column stores
Data mining & column stores
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
Dbm630_lecture02-03
Dbm630_lecture02-03Dbm630_lecture02-03
Dbm630_lecture02-03
 
Dbm630_Lecture02-03
Dbm630_Lecture02-03Dbm630_Lecture02-03
Dbm630_Lecture02-03
 
Lecture1
Lecture1Lecture1
Lecture1
 
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8amBusiness Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am
Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
 
MIS: Business Intelligence
MIS: Business IntelligenceMIS: Business Intelligence
MIS: Business Intelligence
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
DataScienceIntroduction.pptx
DataScienceIntroduction.pptxDataScienceIntroduction.pptx
DataScienceIntroduction.pptx
 
1-_Intro_to_Data_Minning__DWH.ppt
1-_Intro_to_Data_Minning__DWH.ppt1-_Intro_to_Data_Minning__DWH.ppt
1-_Intro_to_Data_Minning__DWH.ppt
 
Software architecture & design patterns for MS CRM Developers
Software architecture & design patterns for MS CRM  Developers Software architecture & design patterns for MS CRM  Developers
Software architecture & design patterns for MS CRM Developers
 
Big Data - Module 1
Big Data - Module 1Big Data - Module 1
Big Data - Module 1
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
data warehousing
data warehousingdata warehousing
data warehousing
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 

Último

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Último (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Data warehousing

  • 1. Data Warehousing & Data Mining By Mandar Kulkarni PRN 10030141129 MBA-IT SICSR
  • 2. Contents • Data warehousing • Understanding data warehousing • Data warehouse architecture • Data Mining • Data mining techniques
  • 5. Samsung Mumbai Delhi Sales per item type per branch Sales for first quarter. Manager Chennai Banglore
  • 6. • Now, the sales manager wants to know the sales of first quarter.? • Solution – Extract information from each database store it at a single place, and process using operational systems.!
  • 7. Solution Mumbai Report Delhi Query & Sales Data Analysis tools Manager Warehouse Chennai Banglore
  • 8. Operational Systems • Running the business real time • Routine tasks • Decision Support Systems(DSS) – Help in taking actions! • Used by people who deal with customers, products • They are increasingly used by customers
  • 9. Data Warehouse • A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. • A process of transforming data into information and making it available to users in a timely enough manner to make a difference
  • 10. Definition • Integrated, Subject-Oriented, Time-Variant, Nonvolatile database that provides support for decision making
  • 12. Source Data Information Management & Control Delivery External Metadata Production MDDB Data Warehouse DBMS Internal Report / Query Archived Data Marts Data Staging Data Mining
  • 13. Components • Source Data • Data Staging (Data Extraction, cleaning And Loading ) – Talend is the first open source ETL tool • Data Storage • Information Delivery (EIS) • Management and control
  • 14. OLAP • Online Analytical Processing Tools • DSS tools that use multidimensional data analysis techniques – Support for a DSS data store – Data extraction and integration filter – Specialized presentation interface • Oracle OLAP 11G
  • 17. 12 Rules of Data Warehouse 1. Data Warehouse and Operational Environments are Separated 2. Data is integrated 3. Contains historical data over a long period of time 4. Data is a snapshot data captured at a given point in time 5. Data is subject-oriented
  • 18. 6.Mainly read-only with periodic batch updates 7.Development Life Cycle has a data driven approach versus the traditional process-driven approach 8.Data contains several levels of detail -Current, Old, Lightly Summarized, Highly Summarized
  • 19. 9.Environment is characterized by Read-only transactions to very large data sets 10.System that traces data sources, transformations, and storage 11.Metadata is a critical component – Source, transformation, integration, storage, relationships, history, etc 12.Contains a chargeback mechanism for resource usage that enforces optimal use of data by end users
  • 20. OLTP v/s Data warehousing OLTP Data Warehousing • Application Oriented • Subject Oriented • Used to Run Business • Used to analyze business • Detailed data • Summarized and refined • Current up-to date • Snapshot Data • Isolated data • Integrated Data • Ad-Hoc Access • Repetitive Access • Performance relaxed • Performance Sensitive • Large volume accessed at a • Few records accessed time • Read/Update Access • Mostly Read
  • 21. Data Warehouse summary • Integrated platform for OLAP and DSS • Helps optimize business operations • Easy access to multidimensional data
  • 23. Why Data Mining? Wealth generation Analyzing trends Strategic decision making Security
  • 24. Data Mining • Look for hidden patterns and trends in data that is not immediately apparent from summarizing the data • No Query… • …But an “Interestingness criteria”
  • 25. Data Mining + = Interestingness Hidden Data criteria patterns
  • 26. Data Mining Type of Patterns + = Interestingness Hidden Data criteria patterns
  • 27. Data Mining Type of data Type of Interestingness criteria + = Interestingness Hidden Data criteria patterns
  • 28. Type of Data • Tabular (Ex: Transaction data) – Relational – Multi-dimensional • Tree (Ex: XML data) • Graphs • Sequence (Ex: DNA, activity logs) • Text, Multimedia …
  • 29. Type of Interestingness • Frequency • Rarity • Correlation • Length of occurrence (for sequence and temporal data) • Consistency • Repeating / periodicity • “Abnormal” behavior • Other patterns of interestingness…
  • 30. Data Mining vs Statistical Inference Statistics: Statistical Conceptual Reasoning Model (Hypothesis) “Proof” (Validation of Hypothesis)
  • 31. Data Mining vs Statistical Inference Data mining: Mining Algorithm Based on Data Interestingness Pattern (model, rule, hypothesis) discovery
  • 32. Used for.. • Data mining is used for – Frequent Item-sets – Associations – Classifications – Clustering
  • 33. Techniques • Algorithms – Apriori algorithm – Decision tree • SLIQ – Supervised Learning in QUEST – IBM • “GROUP BY” mysql> select sum(sal),deptno from emp group by deptno;
  • 34. Data Mining Summary • Helps in pattern analysis and thus taking actions –real time and future based. • Analyzing trends and clusters in business operations.

Notas del editor

  1. Our bag is a data warehouse containing databases of different subjects and in different formats(books,notes,ppt)
  2. Example of Samsung productsSales manager wants to know quarterly sales all over india