SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Data Warehouse
                               An Introduction

                                   Lecture - 2


Dept of MCA, NIT, Durgapur.           September 6, 2012   1
Data, Data everywhere yet ...
                              I can’t find the data I need
                                 data is scattered over the network
                                 many versions, subtle differences

                              I can’t get the data I need
                                 need an expert to get the data


                              I can’t understand the data I found
                                 available data poorly documented


                              I can’t use the data I found
                                 results are unexpected
                                 data needs to be transformed from one form to
                                 other




Dept of MCA, NIT, Durgapur.      September 6, 2012                           2
What We Need?
     A single, complete and consistent
     store of data obtained from a variety
     of different sources made available to
     end users in a what they can
     understand and use, in a Business
     Context / Subject.

                                   [Barry Devlin]



 Leads towards Business Analysis




Dept of MCA, NIT, Durgapur.            September 6, 2012   3
Subject
                              Orientation
       Organized around major subjects, such as
        customer, product, sales.

       Focusing on the modeling and analysis of data for
       decision makers, not on daily operations or
       transaction processing.

       Provide a simple and concise view around
       particular subject issues, by excluding data that are
       not useful in the decision support process.

Dept of MCA, NIT, Durgapur.      September 6, 2012        4
What Are Analytical
                                   Needs?
                                     Which are our
                                     Which are our
                                 lowest/highest margin
                                  lowest/highest margin
                                      customers ?
                                       customers ?
                                                                Who are my customers
                                                                Who are my customers
        What is the most
         What is the most                                        and what products
                                                                  and what products
      effective distribution
       effective distribution                                     are they buying?
                                                                   are they buying?
            channel?
             channel?


   What product prom-
    What product prom-                                                Which customers
                                                                       Which customers
-otions have the biggest
 -otions have the biggest                                           are most likely to go
                                                                     are most likely to go
   impact on revenue?
    impact on revenue?                                              to the competition ?
                                                                     to the competition ?
                                    What impact will
                                     What impact will
                                 new products/services
                                  new products/services
                                    have on revenue
                                     have on revenue
                                      and margins?
                                       and margins?
Dept of MCA, NIT, Durgapur.                 September 6, 2012                          5
Decision Support System
                  Used to manage and control business
                  Data is historical or point-in-time
                  Optimized for inquiry rather than update
                  Use of the system is loosely defined and can
                  be ad-hoc
                  Used by managers and end-users to
                  understand the business and make
                  judgements




Dept of MCA, NIT, Durgapur.          September 6, 2012           6
Evolution of Decision Support
          60’s: Batch reports
                hard to find and analyze information

                inflexible and expensive, reprogram every request

          70’s: Terminal based DSS and EIS

          80’s: Desktop data access and analysis tools
                query tools, spreadsheets, GUIs

                easy to use, but access only operational db

          90’s: Data warehousing with integrated OLAP engines and
          tools
                To meet the analytical needs of the business.

Dept of MCA, NIT, Durgapur.                   September 6, 2012     7
What are the users saying...

           Data should be integrated across the
           enterprise
           Summary data had a real value to
           the organization
           Historical data held the key to
           understanding data over time
           What-if capabilities are required




Dept of MCA, NIT, Durgapur.            September 6, 2012   8
Need Separate Process?

                               Technique for assembling and
                               managing data from various sources
                               for the purpose of answering business
                               questions. Thus making decisions that
                               were not previously possible.

                               A decision support database
                               maintained separately from the
                               organization’s operational database




Dept of MCA, NIT, Durgapur.      September 6, 2012               9
Traditional RDBMS used for OLTP
                  Database Systems have been used traditionally
                  for OLTP
                        clerical data processing tasks
                        detailed, up to date data
                        structured repetitive tasks
                        read/update a few records
                        isolation, recovery and integrity are critical
                        Normalization is mandatory



                  Will call these Operational Database
Dept of MCA, NIT, Durgapur.                     September 6, 2012        10
Decision Support
                                 Database
               Defined in many different ways, but not
               rigorously.
                     A decision support database that is
                     maintained separately from the
                     organization’s operational database
                     Support information processing by providing
                     a solid platform of consolidated, historical
                     data for analysis.

Dept of MCA, NIT, Durgapur.          September 6, 2012          11
Some Common Terms
     Operational databases: Operational databases are detail oriented
     databases defined to meet the needs of sometimes very complex
     processes in a company. This detailed view is reflected in the data
     arrangement in the database. The data is highly normalized to avoid data
     redundancy and “complex-maintenance".


     OLTP: On-Line Transaction Processing (OLTP) describes the way data
     is processed by an end user or a computer system. It is detail oriented,
     highly repetitive with massive amounts of updates and changes of the
     data by the end user. It is also very often described as the use of
     computers to run the on-going operation of a business.

Dept of MCA, NIT, Durgapur.           September 6, 2012                   12
Some Common Terms
                                         Cont…

          Data warehouse: A data warehouse collects, organizes, and makes
          data available for the purpose of analysis — to give management the
          ability to access and analyze information about its business. This type
          of data can be called "informational data". The systems used to work
          with informational data are referred to as OLAP (On-Line Analytical
          Processing).


          We will call it Informational Database .




Dept of MCA, NIT, Durgapur.             September 6, 2012                   13
Some Common Terms
                                            Cont…




          Operational versus informational databases
          The major difference between operational and informational databases is the
          update frequency:
          1. On operational databases a high number of transactions take place every
          hour. The database is always "up to date", and it represents a snapshot of
          the current business situation, or more commonly referred to as point in
          time.

          2. Informational databases are usually stable over a period of time to
          represent a situation at a specific point in time in the past, which can be
          noted as historical data.
Dept of MCA, NIT, Durgapur.                  September 6, 2012                          14
Some Common Terms
                                             Cont…

          OLAP: On-Line Analytical Processing (OLAP) is a category of software
          technology that enables analysts, managers and executives to gain insight into
          data through fast, consistent, interactive access to a wide variety of possible
          views of information that has been transformed from raw data to reflect the real
          dimensionality of the enterprise as understood by the user.

          OLAP is implemented in a multi-user client/server mode and offers
          consistently rapid response to queries, regardless of database size and
          complexity. OLAP helps the user synthesize enterprise information through
          comparative, personalized viewing, as well as through analysis of historical
          and projected data in various "what-if" data model scenarios. This is achieved
          through use of an OLAP Server.



Dept of MCA, NIT, Durgapur.                 September 6, 2012                       15
OLTP vs. Data Warehouse
                  OLTP                               Warehouse (OLAP)
                        Application Oriented              Subject Oriented
                        Used to run business              Used to analyze business
                        Clerical User                     Manager/Analyst
                        Detailed data                     Summarized and refined
                        Current up to date                Snapshot data
                        Isolated Data                     Integrated Data
                        Repetitive access by              Ad-hoc access using
                        small transactions                large queries
                        Read/Update access                Mostly read access (batch
                                                          update)

Dept of MCA, NIT, Durgapur.                    September 6, 2012                      16
Some Common Terms
                                              Cont…

          Metadata — a definition

          Metadata is the kind of information that describes the data stored in a
          database and includes such information as:

          • A description of tables and fields in the data warehouse, including data
          types and the range of acceptable values.

          • A similar description of tables and fields in the source databases, with a
          mapping of fields from the source to the warehouse.

          • A description of how the data has been transformed, including formulae,
          formatting, currency conversion, and time aggregation.

          • Any other information that is needed to support and manage the operation
          of the data warehouse.


Dept of MCA, NIT, Durgapur.                  September 6, 2012                           17
Some Common Terms
                                       Cont…

     Data mart: A data mart contains a subset of corporate data that is of
     value to a specific business unit, department, or set of users. This subset
     consists of historical, summarized, and possibly detailed data captured
     from transaction processing systems, or from an enterprise data
     warehouse. It is important to realize that a data mart is defined by the
     functional scope of its users, and not by the size of the data mart
     database. Most data marts today involve less than 100 GB of data; some
     are larger, however it is expected that as data mart usage increases they
     will rapidly increase in size.

     Data mining: Data mining is the process of extracting valid, useful,
     previously unknown, and comprehensible information from data and using
     it to make business decisions.
Dept of MCA, NIT, Durgapur.            September 6, 2012                    18
Problem in General Purpose SQL
            Let a set of database schemas are as follows:
            1. Product ( P_ID, P_NAME, P_DESC);
            2. Sales (R_NO, P_ID, Q_ID, AMOUNT);
            3. Time (Q_ID, Q_DESC);


            Say, the organization need to generate a report as follows:

              Product         4Q96 Sales        4Q97 Sales
                  XYZ              57                66
                  ABC              29                24
                  PQR             115               89


Dept of MCA, NIT, Durgapur.              September 6, 2012                19
Problem in SQL                   Cont…


       The SQL may be needed to display the Fourth Quarter 1996 Sales may be
       as follows:


       SELECT Product.P_Name, SUM(Sales.DOLLAR)
       FROM Sales, Product, Time
       WHERE . . . Time.Q_ID= '4Q96'
       AND Product.Product_Name in (‘XYZ', ‘ABC', ‘PQR')
       GROUP BY Product.P_NAME

       If one expand the Time constraint to include both quarters, as follows:

       WHERE . . . Time.Quarter IN ('4Q96', '4Q97')

       then the sum expression adds up the sales from both quarters, which
       we do not want. Also SQL not gives any other alternative.

          Hence General SQL Engine fails in case of query like above.

Dept of MCA, NIT, Durgapur.                September 6, 2012                     20

Más contenido relacionado

La actualidad más candente

Putting customer insight into practice, Peter Gadsdon, Lewisham Council
Putting customer insight into practice, Peter Gadsdon, Lewisham CouncilPutting customer insight into practice, Peter Gadsdon, Lewisham Council
Putting customer insight into practice, Peter Gadsdon, Lewisham Councillocalinsight
 
Jrd Project Management Services
Jrd Project Management ServicesJrd Project Management Services
Jrd Project Management Servicestiglesias
 
GE Healthcare - HP Case Study
GE Healthcare - HP Case StudyGE Healthcare - HP Case Study
GE Healthcare - HP Case StudyMilan Caha
 
IBM Business Analytics and Optimization - Introduktion till Prediktiv Analys
IBM Business Analytics and Optimization - Introduktion till Prediktiv AnalysIBM Business Analytics and Optimization - Introduktion till Prediktiv Analys
IBM Business Analytics and Optimization - Introduktion till Prediktiv AnalysIBM Sverige
 
Advocate Consulting - Enterprise Communications
Advocate Consulting - Enterprise CommunicationsAdvocate Consulting - Enterprise Communications
Advocate Consulting - Enterprise CommunicationsAdvocate Consulting
 
QServ Corporation Sap BI Brochure
QServ Corporation Sap BI BrochureQServ Corporation Sap BI Brochure
QServ Corporation Sap BI BrochureManisha Sangwan
 
NINtec corporate presentation
NINtec corporate presentationNINtec corporate presentation
NINtec corporate presentationNINtec
 
“A Practitioner’s View” on the latest trends and information on BI/ DW techno...
“A Practitioner’s View” on the latest trends and information on BI/ DW techno...“A Practitioner’s View” on the latest trends and information on BI/ DW techno...
“A Practitioner’s View” on the latest trends and information on BI/ DW techno...Hazelknight Media & Entertainment Pvt Ltd
 
121211 depfac ulb_master_presentation_v5_1
121211 depfac ulb_master_presentation_v5_1121211 depfac ulb_master_presentation_v5_1
121211 depfac ulb_master_presentation_v5_1Thibaut De Vylder
 
Iscram09 Grant Ppr248 Mixed Rational Naturalistic Ds Final Slides 090506
Iscram09 Grant Ppr248 Mixed Rational Naturalistic Ds Final Slides 090506Iscram09 Grant Ppr248 Mixed Rational Naturalistic Ds Final Slides 090506
Iscram09 Grant Ppr248 Mixed Rational Naturalistic Ds Final Slides 090506Tim Grant
 
IBM Information Management - Optimera er verksamhet och öka kundnyttan med nä...
IBM Information Management - Optimera er verksamhet och öka kundnyttan med nä...IBM Information Management - Optimera er verksamhet och öka kundnyttan med nä...
IBM Information Management - Optimera er verksamhet och öka kundnyttan med nä...IBM Sverige
 
2ST.net Corporate Overview 2012
2ST.net Corporate Overview 20122ST.net Corporate Overview 2012
2ST.net Corporate Overview 2012chohl
 
Make Money with Big Data (TCELab)
Make Money with Big Data (TCELab)Make Money with Big Data (TCELab)
Make Money with Big Data (TCELab)Stephen King
 
Cost Reduction Guide Issue 6 IT
Cost Reduction Guide Issue 6 ITCost Reduction Guide Issue 6 IT
Cost Reduction Guide Issue 6 ITymw15
 

La actualidad más candente (19)

Putting customer insight into practice, Peter Gadsdon, Lewisham Council
Putting customer insight into practice, Peter Gadsdon, Lewisham CouncilPutting customer insight into practice, Peter Gadsdon, Lewisham Council
Putting customer insight into practice, Peter Gadsdon, Lewisham Council
 
Automated loan processing
Automated loan processingAutomated loan processing
Automated loan processing
 
Jrd Project Management Services
Jrd Project Management ServicesJrd Project Management Services
Jrd Project Management Services
 
GE Healthcare - HP Case Study
GE Healthcare - HP Case StudyGE Healthcare - HP Case Study
GE Healthcare - HP Case Study
 
IBM Business Analytics and Optimization - Introduktion till Prediktiv Analys
IBM Business Analytics and Optimization - Introduktion till Prediktiv AnalysIBM Business Analytics and Optimization - Introduktion till Prediktiv Analys
IBM Business Analytics and Optimization - Introduktion till Prediktiv Analys
 
Advocate Consulting - Enterprise Communications
Advocate Consulting - Enterprise CommunicationsAdvocate Consulting - Enterprise Communications
Advocate Consulting - Enterprise Communications
 
QServ Corporation Sap BI Brochure
QServ Corporation Sap BI BrochureQServ Corporation Sap BI Brochure
QServ Corporation Sap BI Brochure
 
QServ Retail Analytics Offering
QServ Retail Analytics OfferingQServ Retail Analytics Offering
QServ Retail Analytics Offering
 
QServ Retail Analytics Offering
QServ Retail Analytics OfferingQServ Retail Analytics Offering
QServ Retail Analytics Offering
 
NINtec corporate presentation
NINtec corporate presentationNINtec corporate presentation
NINtec corporate presentation
 
“A Practitioner’s View” on the latest trends and information on BI/ DW techno...
“A Practitioner’s View” on the latest trends and information on BI/ DW techno...“A Practitioner’s View” on the latest trends and information on BI/ DW techno...
“A Practitioner’s View” on the latest trends and information on BI/ DW techno...
 
121211 depfac ulb_master_presentation_v5_1
121211 depfac ulb_master_presentation_v5_1121211 depfac ulb_master_presentation_v5_1
121211 depfac ulb_master_presentation_v5_1
 
Iscram09 Grant Ppr248 Mixed Rational Naturalistic Ds Final Slides 090506
Iscram09 Grant Ppr248 Mixed Rational Naturalistic Ds Final Slides 090506Iscram09 Grant Ppr248 Mixed Rational Naturalistic Ds Final Slides 090506
Iscram09 Grant Ppr248 Mixed Rational Naturalistic Ds Final Slides 090506
 
Probabilistic Soft Logic
Probabilistic Soft LogicProbabilistic Soft Logic
Probabilistic Soft Logic
 
IBM Information Management - Optimera er verksamhet och öka kundnyttan med nä...
IBM Information Management - Optimera er verksamhet och öka kundnyttan med nä...IBM Information Management - Optimera er verksamhet och öka kundnyttan med nä...
IBM Information Management - Optimera er verksamhet och öka kundnyttan med nä...
 
LucidEra Introduction
LucidEra IntroductionLucidEra Introduction
LucidEra Introduction
 
2ST.net Corporate Overview 2012
2ST.net Corporate Overview 20122ST.net Corporate Overview 2012
2ST.net Corporate Overview 2012
 
Make Money with Big Data (TCELab)
Make Money with Big Data (TCELab)Make Money with Big Data (TCELab)
Make Money with Big Data (TCELab)
 
Cost Reduction Guide Issue 6 IT
Cost Reduction Guide Issue 6 ITCost Reduction Guide Issue 6 IT
Cost Reduction Guide Issue 6 IT
 

Similar a Data Warehouse: Basics

Data mining & warehousing
Data mining & warehousingData mining & warehousing
Data mining & warehousingSamoneh Dashti
 
Krithi talk-impact
Krithi talk-impactKrithi talk-impact
Krithi talk-impactKaran7755
 
Leverage IBM Business Analytics with PMSquare
Leverage IBM Business Analytics with PMSquareLeverage IBM Business Analytics with PMSquare
Leverage IBM Business Analytics with PMSquarePM square
 
OLAP Release 13082012
OLAP Release 13082012OLAP Release 13082012
OLAP Release 13082012Pozzolini
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasiryasir873
 
How to make data actionable for business
How to make data actionable for businessHow to make data actionable for business
How to make data actionable for businessRavi Padaki
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDMKousik Mukherjee
 
Getting to Global Spend Visibility_Nestle
Getting to Global Spend Visibility_NestleGetting to Global Spend Visibility_Nestle
Getting to Global Spend Visibility_NestleZycus
 
Business Intelligence: The Definitive Guide
Business Intelligence: The Definitive GuideBusiness Intelligence: The Definitive Guide
Business Intelligence: The Definitive GuideFindWhitePapers
 
Predictive Analytics with IBM Cognos 10
Predictive Analytics with IBM Cognos 10Predictive Analytics with IBM Cognos 10
Predictive Analytics with IBM Cognos 10Senturus
 
Decision Engineering Pass conference presentation 2014
Decision Engineering Pass conference presentation 2014Decision Engineering Pass conference presentation 2014
Decision Engineering Pass conference presentation 2014anilkaul123
 
SharePoint MoneyBall: The Art of Winning the SharePoint Metrics Game by Susan...
SharePoint MoneyBall: The Art of Winning the SharePoint Metrics Game by Susan...SharePoint MoneyBall: The Art of Winning the SharePoint Metrics Game by Susan...
SharePoint MoneyBall: The Art of Winning the SharePoint Metrics Game by Susan...SPTechCon
 
Intersection of Business Intelligence and CRM vsr12
Intersection of Business Intelligence and CRM vsr12Intersection of Business Intelligence and CRM vsr12
Intersection of Business Intelligence and CRM vsr12David J Rosenthal
 
Improve Efficiency & Reduce Costs through BI in Fertilizer Sector
Improve Efficiency & Reduce Costs through BI in Fertilizer SectorImprove Efficiency & Reduce Costs through BI in Fertilizer Sector
Improve Efficiency & Reduce Costs through BI in Fertilizer SectorDhiren Gala
 

Similar a Data Warehouse: Basics (20)

Data mining & warehousing
Data mining & warehousingData mining & warehousing
Data mining & warehousing
 
Krithi talk-impact
Krithi talk-impactKrithi talk-impact
Krithi talk-impact
 
Leverage IBM Business Analytics with PMSquare
Leverage IBM Business Analytics with PMSquareLeverage IBM Business Analytics with PMSquare
Leverage IBM Business Analytics with PMSquare
 
OLAP Release 13082012
OLAP Release 13082012OLAP Release 13082012
OLAP Release 13082012
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
How to make data actionable for business
How to make data actionable for businessHow to make data actionable for business
How to make data actionable for business
 
Seven building blocks for MDM
Seven building blocks for MDMSeven building blocks for MDM
Seven building blocks for MDM
 
Getting to Global Spend Visibility_Nestle
Getting to Global Spend Visibility_NestleGetting to Global Spend Visibility_Nestle
Getting to Global Spend Visibility_Nestle
 
iClaims SWOT
iClaims SWOTiClaims SWOT
iClaims SWOT
 
Business Intelligence: The Definitive Guide
Business Intelligence: The Definitive GuideBusiness Intelligence: The Definitive Guide
Business Intelligence: The Definitive Guide
 
Predictive Analytics with IBM Cognos 10
Predictive Analytics with IBM Cognos 10Predictive Analytics with IBM Cognos 10
Predictive Analytics with IBM Cognos 10
 
Decision Engineering Pass conference presentation 2014
Decision Engineering Pass conference presentation 2014Decision Engineering Pass conference presentation 2014
Decision Engineering Pass conference presentation 2014
 
SharePoint MoneyBall: The Art of Winning the SharePoint Metrics Game by Susan...
SharePoint MoneyBall: The Art of Winning the SharePoint Metrics Game by Susan...SharePoint MoneyBall: The Art of Winning the SharePoint Metrics Game by Susan...
SharePoint MoneyBall: The Art of Winning the SharePoint Metrics Game by Susan...
 
Why mTAB?
Why mTAB?Why mTAB?
Why mTAB?
 
Intersection of Business Intelligence and CRM vsr12
Intersection of Business Intelligence and CRM vsr12Intersection of Business Intelligence and CRM vsr12
Intersection of Business Intelligence and CRM vsr12
 
Mobile Analytics
Mobile AnalyticsMobile Analytics
Mobile Analytics
 
Improve Efficiency & Reduce Costs through BI in Fertilizer Sector
Improve Efficiency & Reduce Costs through BI in Fertilizer SectorImprove Efficiency & Reduce Costs through BI in Fertilizer Sector
Improve Efficiency & Reduce Costs through BI in Fertilizer Sector
 
[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics[Webinar] High Speed Retail Analytics
[Webinar] High Speed Retail Analytics
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Making Money With Big Data
Making Money With Big DataMaking Money With Big Data
Making Money With Big Data
 

Último

UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 

Último (20)

UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 

Data Warehouse: Basics

  • 1. Data Warehouse An Introduction Lecture - 2 Dept of MCA, NIT, Durgapur. September 6, 2012 1
  • 2. Data, Data everywhere yet ... I can’t find the data I need data is scattered over the network many versions, subtle differences I can’t get the data I need need an expert to get the data I can’t understand the data I found available data poorly documented I can’t use the data I found results are unexpected data needs to be transformed from one form to other Dept of MCA, NIT, Durgapur. September 6, 2012 2
  • 3. What We Need? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use, in a Business Context / Subject. [Barry Devlin] Leads towards Business Analysis Dept of MCA, NIT, Durgapur. September 6, 2012 3
  • 4. Subject Orientation  Organized around major subjects, such as customer, product, sales.  Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing.  Provide a simple and concise view around particular subject issues, by excluding data that are not useful in the decision support process. Dept of MCA, NIT, Durgapur. September 6, 2012 4
  • 5. What Are Analytical Needs? Which are our Which are our lowest/highest margin lowest/highest margin customers ? customers ? Who are my customers Who are my customers What is the most What is the most and what products and what products effective distribution effective distribution are they buying? are they buying? channel? channel? What product prom- What product prom- Which customers Which customers -otions have the biggest -otions have the biggest are most likely to go are most likely to go impact on revenue? impact on revenue? to the competition ? to the competition ? What impact will What impact will new products/services new products/services have on revenue have on revenue and margins? and margins? Dept of MCA, NIT, Durgapur. September 6, 2012 5
  • 6. Decision Support System Used to manage and control business Data is historical or point-in-time Optimized for inquiry rather than update Use of the system is loosely defined and can be ad-hoc Used by managers and end-users to understand the business and make judgements Dept of MCA, NIT, Durgapur. September 6, 2012 6
  • 7. Evolution of Decision Support 60’s: Batch reports hard to find and analyze information inflexible and expensive, reprogram every request 70’s: Terminal based DSS and EIS 80’s: Desktop data access and analysis tools query tools, spreadsheets, GUIs easy to use, but access only operational db 90’s: Data warehousing with integrated OLAP engines and tools To meet the analytical needs of the business. Dept of MCA, NIT, Durgapur. September 6, 2012 7
  • 8. What are the users saying... Data should be integrated across the enterprise Summary data had a real value to the organization Historical data held the key to understanding data over time What-if capabilities are required Dept of MCA, NIT, Durgapur. September 6, 2012 8
  • 9. Need Separate Process? Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previously possible. A decision support database maintained separately from the organization’s operational database Dept of MCA, NIT, Durgapur. September 6, 2012 9
  • 10. Traditional RDBMS used for OLTP Database Systems have been used traditionally for OLTP clerical data processing tasks detailed, up to date data structured repetitive tasks read/update a few records isolation, recovery and integrity are critical Normalization is mandatory Will call these Operational Database Dept of MCA, NIT, Durgapur. September 6, 2012 10
  • 11. Decision Support Database  Defined in many different ways, but not rigorously.  A decision support database that is maintained separately from the organization’s operational database  Support information processing by providing a solid platform of consolidated, historical data for analysis. Dept of MCA, NIT, Durgapur. September 6, 2012 11
  • 12. Some Common Terms Operational databases: Operational databases are detail oriented databases defined to meet the needs of sometimes very complex processes in a company. This detailed view is reflected in the data arrangement in the database. The data is highly normalized to avoid data redundancy and “complex-maintenance". OLTP: On-Line Transaction Processing (OLTP) describes the way data is processed by an end user or a computer system. It is detail oriented, highly repetitive with massive amounts of updates and changes of the data by the end user. It is also very often described as the use of computers to run the on-going operation of a business. Dept of MCA, NIT, Durgapur. September 6, 2012 12
  • 13. Some Common Terms Cont… Data warehouse: A data warehouse collects, organizes, and makes data available for the purpose of analysis — to give management the ability to access and analyze information about its business. This type of data can be called "informational data". The systems used to work with informational data are referred to as OLAP (On-Line Analytical Processing). We will call it Informational Database . Dept of MCA, NIT, Durgapur. September 6, 2012 13
  • 14. Some Common Terms Cont… Operational versus informational databases The major difference between operational and informational databases is the update frequency: 1. On operational databases a high number of transactions take place every hour. The database is always "up to date", and it represents a snapshot of the current business situation, or more commonly referred to as point in time. 2. Informational databases are usually stable over a period of time to represent a situation at a specific point in time in the past, which can be noted as historical data. Dept of MCA, NIT, Durgapur. September 6, 2012 14
  • 15. Some Common Terms Cont… OLAP: On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. OLAP is implemented in a multi-user client/server mode and offers consistently rapid response to queries, regardless of database size and complexity. OLAP helps the user synthesize enterprise information through comparative, personalized viewing, as well as through analysis of historical and projected data in various "what-if" data model scenarios. This is achieved through use of an OLAP Server. Dept of MCA, NIT, Durgapur. September 6, 2012 15
  • 16. OLTP vs. Data Warehouse OLTP Warehouse (OLAP) Application Oriented Subject Oriented Used to run business Used to analyze business Clerical User Manager/Analyst Detailed data Summarized and refined Current up to date Snapshot data Isolated Data Integrated Data Repetitive access by Ad-hoc access using small transactions large queries Read/Update access Mostly read access (batch update) Dept of MCA, NIT, Durgapur. September 6, 2012 16
  • 17. Some Common Terms Cont… Metadata — a definition Metadata is the kind of information that describes the data stored in a database and includes such information as: • A description of tables and fields in the data warehouse, including data types and the range of acceptable values. • A similar description of tables and fields in the source databases, with a mapping of fields from the source to the warehouse. • A description of how the data has been transformed, including formulae, formatting, currency conversion, and time aggregation. • Any other information that is needed to support and manage the operation of the data warehouse. Dept of MCA, NIT, Durgapur. September 6, 2012 17
  • 18. Some Common Terms Cont… Data mart: A data mart contains a subset of corporate data that is of value to a specific business unit, department, or set of users. This subset consists of historical, summarized, and possibly detailed data captured from transaction processing systems, or from an enterprise data warehouse. It is important to realize that a data mart is defined by the functional scope of its users, and not by the size of the data mart database. Most data marts today involve less than 100 GB of data; some are larger, however it is expected that as data mart usage increases they will rapidly increase in size. Data mining: Data mining is the process of extracting valid, useful, previously unknown, and comprehensible information from data and using it to make business decisions. Dept of MCA, NIT, Durgapur. September 6, 2012 18
  • 19. Problem in General Purpose SQL Let a set of database schemas are as follows: 1. Product ( P_ID, P_NAME, P_DESC); 2. Sales (R_NO, P_ID, Q_ID, AMOUNT); 3. Time (Q_ID, Q_DESC); Say, the organization need to generate a report as follows: Product 4Q96 Sales 4Q97 Sales XYZ 57 66 ABC 29 24 PQR 115 89 Dept of MCA, NIT, Durgapur. September 6, 2012 19
  • 20. Problem in SQL Cont… The SQL may be needed to display the Fourth Quarter 1996 Sales may be as follows: SELECT Product.P_Name, SUM(Sales.DOLLAR) FROM Sales, Product, Time WHERE . . . Time.Q_ID= '4Q96' AND Product.Product_Name in (‘XYZ', ‘ABC', ‘PQR') GROUP BY Product.P_NAME If one expand the Time constraint to include both quarters, as follows: WHERE . . . Time.Quarter IN ('4Q96', '4Q97') then the sum expression adds up the sales from both quarters, which we do not want. Also SQL not gives any other alternative. Hence General SQL Engine fails in case of query like above. Dept of MCA, NIT, Durgapur. September 6, 2012 20