SlideShare a Scribd company logo
1 of 63
Download to read offline
DATA WAREHOUSING
DATA WAREHOUSING




 SUSHIL KULKARNI
A producer wants to know….
                              Which are our
                               Which are our
                          lowest/highest margin
                           lowest/highest margin       Who are my customers
                                                       Who are my customers
                               customers ?
                                customers ?             and what products
                                                         and what products
  What is the most
   What is the most                                      are they buying?
                                                          are they buying?
effective distribution
 effective distribution
       channel?
        channel?




    What impact will
    What impact will                                         Which customers
                                                              Which customers
 new products/services
  new products/services                                    are most likely to go
                                                            are most likely to go
    have on revenue
     have on revenue                                       to the competition ?
                                                            to the competition ?
      and margins?
       and margins?           What product prom-
                               What product prom-
                           -otions have the biggest
                            -otions have the biggest
                              impact on revenue?
                               impact on revenue?
Lot of data everywhere
yet ...
             • I can’t find the data I need
                – data is scattered over the network
                – many versions, subtle differences

             • I can’t get the data I need
                – need an expert to get the data

             • I can’t understand the data I found
                – available data poorly documented


             • I can’t use the data I found
                – results are unexpected
                – data needs to be transformed from
                  one form to other
What is a Data Warehouse?
A single, complete and
consistent store of data
obtained from a variety of
different sources made
available to end users in
a what they can
understand and use in a
business context.

[Barry Devlin]
What users says...
• Data should be integrated
  across the enterprise
• Summary data has a real
  value to the organization
• Historical data holds the
  key to understanding data
  over time
• What-if capabilities are
  required
What is Data Warehousing?
  A process of transforming
  data into information and
  making it available to users
  in a timely enough manner
  to make a difference

[Forrester Research, April 1996]




                                   Data
Evolution
• 60’s: Batch reports
   – hard to find and analyze information
   – inflexible and expensive, reprogram every new
     request
• 70’s: Terminal-based DSS and EIS
  (executive information systems)
   – still inflexible, not integrated with desktop tools
• 80’s: Desktop data access and analysis
  tools
   – query tools, spreadsheets, GUIs
   – easier to use, but only access operational
     databases
• 90’s: Data warehousing with integrated
  OLAP engines and tools
Warehouses are Very Large
                           Databases
              35%

              30%

              25%
Respondents




              20%

              15%

              10%
                                                                      Initial
              5%                                                      Projected 2Q96

              0%                                                    Source: META Group, Inc.
                    5GB       10-19GB   50-99GB   250-499GB
                          5-9GB   20-49GB   100-249GB   500GB-1TB
Very Large Data Bases
• Terabytes -- 10^12 bytes:    Walmart -- 24 Terabytes

• Petabytes -- 10^15 bytes:    Geographic Information
                                 Systems
• Exabytes -- 10^18 bytes:     National Medical Records

• Zettabytes -- 10^21 bytes:   Weather images

• Zottabytes -- 10^24 bytes:   Intelligence Agency Videos
Data Warehousing --
             It is a process
• Technique for assembling and
  managing data from various
  sources for the purpose of
  answering business questions.
  Thus making decisions that were
  not previous possible
• A decision support database
  maintained separately from the
  organization’s operational
  database
Data Warehouse
• A data warehouse is a
  – subject-oriented
  – integrated
  – time-varying
  – non-volatile
 collection of data that is used primarily
 in organizational decision making.
          -- Bill Inmon, Building the Data Warehouse 1996
Data Warehouse Subject-oriented
                  Customers: Get
                   information of
                   different prices
                   of a beer


                   Farmers: Harvest
                   information from
                   known access
                   paths
Data Warehouse Subject-oriented
                  Students: Get
                  information about
                  various
                  universities in
                  U.K.

                   Explorers: Seek
                   out the unknown
                   and previously
                   unsuspected
                   rewards hiding in
                   the detailed data
Data Warehouse Subject-oriented
• Focusing on the modelling and
 analysis of data for decision makers,
 not on daily operations or transaction
 processing
• Provide a simple and concise view
 around particular subject issues by
 excluding data that are not useful in the
 decision support process
Data Warehouse Subject-oriented

                      Enterprise
                      “Database”


                  Customers         Orders
   Transactions

                  Vendors          Etc…


                                                       Data Miners:
                            Etc…                       • “Farmers” – they know
                                                       • “Explorers” - unpredictable
                               Copied,
                               organized
                               summarized



                       Data                  Data Mining
                     Warehouse
Data Warehouse :Time - variant

Use to study trends
and changes
Data Warehouse :Time - variant
• The time horizon for the data warehouse is
  significantly longer than that of operational
  systems
  – Data warehouse data: provide information from a historical
    perspective (e.g., past 5-10 years)
  – Operational database: current value data

• Every key structure in the data warehouse
  – Contains an element of time explicitly or implicitly, while
    the key of operational data may or may not contain “time
    element”
Data Warehouse : Non-volatile


cannot
updated by
end users
Data Warehouse Architecture

Relational
Databases
                          Optimized Loader
             Extraction
   ERP
 Systems     Cleansing

                          Data Warehouse
                               Engine           Analyze
Purchased                                        Query
   Data



 Legacy
  Data                    Metadata Repository
Data Mart

• A Data Mart is a smaller, more focused
 Data Warehouse – a mini-warehouse.

• A Data Mart typically reflects the business
 rules of a specific business unit within an
 enterprise.
Data Warehouse to Data Mart

                           Decision
                           Support
             Data Mart   Information




                           Decision
   Data                    Support
             Data Mart   Information
 Warehouse


                           Decision
                           Support
             Data Mart   Information
DATA MARTS
•   Create many DM’s
•   Limited scope




Examples:

1. Financial DM
2. Marketing DM
3. Supply chain DM
Generic Architecture of Data




  (synonym) Transaction data
Transaction (Operational)
                Data
• Operational (production) systems create
  (massive number of) transactions, such
  as sales, purchases, deposits,
  withdrawals, returns, refunds, phone
  calls, toll roads, web site “hits”, etc…
• Transactions are the base level of data –
  the raw material for understanding
  customer behavior
• Unfortunately, operational systems
  change due to changing business needs
• Fortunately, operational systems can
  usually be changed to support changing
  business needs
• Data warehousing strategies need to be
  aware of operational system changes
Operational Summary Data
Summaries are for a
specific time period    Other Examples???
and utilize the
transaction data for
that time period
Decision Support Summary Data

• The data that are used to help make
  decisions about the business
   – Financial Data, such as:
      • Income Statements (Profit & Loss)
      • Balance Sheets (Assets – Liabilities = Net
        Worth)
   – Sales summaries
   – Other examples???
• Data warehouses maintain this type of
  data, however financial data “of record”
  (for audit purposes) usually comes
  from databases and not the data
  warehouse (confusing???)
• Generally, it is a bad idea to use the
  same system for analytic and
  operational purposes
Data Warehouse for Decision
          Support
• Putting Information technology to help
  the knowledge worker make faster and
  better decisions
  – Which of my customers are most likely to
    go to the competition?
  – What product promotions have the biggest
    impact on revenue?
  – How did the share price of software
    companies correlate with profits over last
    10 years?
Decision Support
• Used to manage and control business
• Data is historical or point-in-time
• Optimized for inquiry rather than update
• Use of the system is loosely defined and can
  be ad-hoc
• Used by managers and end-users to
  understand the business and make
  judgements
Database Schema
• Database schema defines the structure of data,
  not the values of the data (e.g., first name, last
  name = structure; Ron Norman = values of the
  data)
• In RDBMS:
   – Columns = fields = attributes (A,B,C)
   – Rows = records = tuples (1-7)
Logical Database Schema
• Describes data in a way that is familiar to
  business users
Physical Database Schema
• Describes the data the way it will be stored in an
  RDBMS which might be different than the way the
  logical shows it
Metadata
• General definition: Data about data !!!
   – Examples:
       • A library’s card catalog (metadata)
         describes publications (data)
       • A file system maintains permissions
         (metadata) about files (data)
• A form of system documentation
  including:
   – Values legally allowed in a field (e.g., AZ,
     CA, OR, UT, WA, etc.)
   – Description of the contents of each field
     (e.g., start date)
   – Date when data were loaded
   – Indication of currency of the data
     (last updated)
   – Mappings between systems
     (e.g., A.this = B.that)
• Invaluable, otherwise have to
  research to find it
Business Rules
• Highest level of abstraction from
  operational (transaction) data
• Describes why relationships exist and
  how they are applied
• Examples:
   – Need to have 3 forms of ID for credit
   – Only allow a maximum daily withdrawal of
     $200
   – After the 3rd log-in attempt, lock the log-in
     screen
   – Accept no bills larger than $20
   – Others???
General Architecture for Data
                        Warehousing

•   Source systems
•   Extraction, (Clean),
    Transformation, &
    Load (ETL)
•   Central repository
•   Metadata repository
•   Data marts
•   Operational
    feedback
•   End users
    (business)
DATA WAREHOUSE SCOPE
Broad :

 Required for
 companies, Very
 costly, May
 be divided according
 to Depts.


Narrow:

 Required for
 Personal information
Design of a Data Warehouse: A
   Business Analysis Framework
• Four views regarding the design of a data
  warehouse
  – Top-down view
     • allows selection of the relevant information necessary for
       the data warehouse
  – Data source view
     • exposes the information being captured, stored, and
       managed by operational systems
  – Data warehouse view
     • consists of fact tables and dimension tables
  – Business query view
     • sees the perspectives of data in the warehouse from the
       view of end-user
Data Warehouse Design Process
•   Top-down, bottom-up approaches or a combination of both
    – Top-down: Starts with overall design and planning
    – Bottom-up: Starts with experiments and prototypes (rapid)


•   From software engineering point of view
    – Waterfall: structured and systematic analysis at each step before
      proceeding to the next
    – Spiral: rapid generation of increasingly functional systems, short turn
      around time, quick turn around


•   Typical data warehouse design process
    –   Choose a business process to model, e.g., orders, invoices, etc.
    –   Choose the grain (atomic level of data) of the business process
    –   Choose the dimensions that will apply to each fact table record
    –   Choose the measure that will populate each fact table record
Multi-Tiered Architecture

                               Monitor
                                  &          OLAP Server
  other          Metadata
  sources                     Integrator

                                                           Analysis
 Operational   Extract                                     Query
 DBs           Transform      Data             Serve       Reports
               Load
               Refresh
                            Warehouse                      Data mining




                             Data Marts


Data Sources        Data Storage           OLAP Engine Front-End Tools
Three Data Warehouse Models
• Enterprise warehouse
  – collects all of the information about subjects spanning
    the entire organization
• Data Mart
  – a subset of corporate-wide data that is of value to a
    specific groups of users. Its scope is confined to
    specific, selected groups, such as marketing data mart
     • Independent vs. dependent (directly from warehouse) data
       mart
• Virtual warehouse
  – A set of views over operational databases
  – Only some of the possible summary views may be
    materialized
Data Mining works with
          Warehouse Data
                       • Data Warehousing provides
                         the Enterprise with a
                         memory




• Data Mining provides the
  Enterprise with intelligence
We want to know ...

•   Given a database of 100,000 names, which persons are the
    least likely to default on their credit cards?
•   Which types of transactions are likely to be fraudulent given
    the demographics and transactional history of a particular
    customer?
•   If I raise the price of my product by Rs. 2, what is the effect on
    my ROI?
•   If I offer only 2,500 airline miles as an incentive to purchase
    rather than 5,000, how many lost responses will result?
•   If I emphasize ease-of-use of the product as opposed to its
    technical capabilities, what will be the net effect on my
    revenues?
•   Which of my customers are likely to be the most loyal?

         Data Mining helps extract such information
Application Areas

Industry                 Application
Finance                  Credit Card Analysis
Insurance                Claims, Fraud Analysis
Telecommunication        Call record analysis
Transport                Logistics management
Consumer goods           promotion analysis
Data Service providers   Value added data
Utilities                Power usage analysis
Data Mining in Use
• Data Mining can be used to track fraud
• A Supermarket becomes an information
  broker
• Basketball teams use it to track game strategy
• Cross Selling
• Warranty Claims Routing
• Holding on to Good Customers
• Weeding out Bad Customers
Two Systems

• Operational System




• Information System
Operational Systems
• Run the business in real time
• Based on up-to-the-second
  data
• Optimized to handle large
  numbers of simple read/write
  transactions
• Optimized for fast response
  to predefined transactions
• Used by people who deal with
  customers, products --
  clerks, salespeople etc.
• They are increasingly used
  by customers
On Line Transaction Process
                (OLTP)
It refers to a class of
systems that facilitate
and manage
transaction-oriented
applications, typically
for data entry and
retrieval transaction
processing
On Line Transaction Process
               (OLTP)
OLTP technology is used in a
number of industries, including
banking, airlines, mail order,
supermarkets, and manufacturing.
Applications include electronic
banking, order processing,
employee time clock systems, e-
commerce, and eTrading. The
most widely used OLTP system is
probably IBM's CICS.
What are Operational Systems?
• They are OLTP systems
• Run mission critical
  applications
• Need to work with
  stringent performance
  requirements for routine
  tasks
• Used to run a business!
RDBMS used for OLTP
• Database Systems have been
  used traditionally for OLTP
  –   clerical data processing tasks
  –   detailed, up to date data
  –   structured repetitive tasks
  –   read/update a few records
  –   isolation, recovery and
       integrity are critical
Operational Summary Data
Summaries are for a
specific time period   Other Examples???
and utilize the
transaction data for
that time period
Examples of Operational Data
Data        Industry Usage             Technology             Volumes
Customer    All        Track           Legacy application, flat Small-medium
File                   Customer        files, main frames
                       Details
Account     Finance    Control         Legacy applications,    Large
Balance                account         hierarchical databases,
                       activities      mainframe
Point-of-   Retail     Generate        ERP, Client/Server,     Very Large
Sale data              bills, manage   relational databases
                       stock
Call        Telecomm- Billing          Legacy application,    Very Large
Record      unications                 hierarchical database,
                                       mainframe
Production Manufact-   Control         ERP,                   Medium
Record     uring       Production      relational databases,
                                       AS/400
So, what’s different?
Application-Orientation vs.
      Subject-Orientation
Application-Orientation        Subject-Orientation

               Operational                 Data
               Database                    Warehouse

                      Credit
       Loans                    Customer
                      Card
                                             Vendor
                   Trust        Product

  Savings                                     Activity
OLTP vs. Data Warehouse
• OLTP systems are tuned for known transactions
  and workloads while workload is not known a
  priori in a data warehouse

• Special data organization, access methods and
  implementation methods are needed to support
  data warehouse queries (typically
  multidimensional queries)
  – e.g., average amount spent on phone calls between
    9AM-5PM in Pune during the month of December
OLTP vs Data Warehouse
• OLTP                       • Warehouse (DSS)
  –   Application Oriented     – Subject Oriented
  –   Used to run business     – Used to analyze
  –   Detailed data              business
  –   Current up to date       – Summarized and refined
  –   Isolated Data            – Snapshot data
  –   Repetitive access        – Integrated Data
  –   Clerical User            – Ad-hoc access
                               – Knowledge User
                                 (Manager)
OLTP vs Data Warehouse
• OLTP                        • Data Warehouse
  – Performance Sensitive       – Performance relaxed
  – Few Records accessed at     – Large volumes accessed
    a time (tens)                 at a time(millions)
                                – Mostly Read (Batch
  – Read/Update Access            Update)
                                – Redundancy present
  – No data redundancy
                                – Database Size       100
  – Database Size 100MB -         GB - few terabytes
    100 GB
OLTP vs Data Warehouse
• OLTP                    • Data Warehouse
  – Transaction             – Query throughput is
    throughput is the         the performance
    performance metric        metric
  – Thousands of users      – Hundreds of users
  – Managed in entirety     – Managed by subsets
To summarize ...
• OLTP Systems are
  used to “run” a
  business




                     • The Data Warehouse
                       helps to “optimize” the
                       business
Why Separate Data
                Warehouse?
• Performance
  – Op dbs designed & tuned for known txs & workloads.
  – Complex OLAP queries would degrade perf. for op txs.
  – Special data organization, access & implementation methods
    needed for multidimensional views & queries.

• Function
  – Missing data: Decision support requires historical data, which op
    dbs do not typically maintain.
  – Data consolidation: Decision support requires consolidation
    (aggregation, summarization) of data from many heterogeneous
    sources: op dbs, external sources.
  – Data quality: Different sources typically use inconsistent data
    representations, codes, and formats which have to be reconciled.
INFORMATION SYSTEMS
• Designed to support
  decision-making based on

   1. Historical data
   2. Prediction data.

• Designed for complex queries
  or data-mining applications.

  Examples:

  1. Sales trend analysis,
  2. Customer segmentation
  3. Human resources planning
INFORMATION SYSTEMS
DIFFERENCE
Characteristics   Operational Systems       Informational Systems

Purpose           Real time data entry      Real and analyze
                                            historical data.

Primary users     Clerks, sales-persons,    Managers, business
                  administrations           analysts, customers

Scope of usage    Narrow, planned, and      Broad, ad hoc, complex
                  simple updates and        queries and analysis
                  queries

Design goal       Performance throughput,   Ease of flexible access
                  availability              and use

Volume            Many, constant updates    Periodical batch updates
                  and queries on one or a   and queries requiring
                  few table rows            many or all rows
THANKS!

More Related Content

What's hot

Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)Pratik Tambekar
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1Mahmoud Alfarra
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseKartik Kalpande Patil
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningDataminingTools Inc
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Houw Liong The
 
Data mining
Data miningData mining
Data miningpradeepa n
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentationmillerca2
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniquesSandhya Tarwani
 
Data Mining
Data MiningData Mining
Data Miningksanthosh
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)Kartik Kalpande Patil
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 

What's hot (20)

Data mining
Data miningData mining
Data mining
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)What Is DATA MINING(INTRODUCTION)
What Is DATA MINING(INTRODUCTION)
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques Chapter 08 Data Mining Techniques
Chapter 08 Data Mining Techniques
 
Data mining
Data miningData mining
Data mining
 
A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Introduction
IntroductionIntroduction
Introduction
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
 
Data Mining Overview
Data Mining OverviewData Mining Overview
Data Mining Overview
 
Data Mining
Data MiningData Mining
Data Mining
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 

Similar to Ch 1 intro_dw

Data mining & warehousing
Data mining & warehousingData mining & warehousing
Data mining & warehousingSamoneh Dashti
 
Krithi talk-impact
Krithi talk-impactKrithi talk-impact
Krithi talk-impactKaran7755
 
Data warehousing
Data warehousingData warehousing
Data warehousingOwais Ashraf
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxvipush1
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingVibrant Technologies & Computers
 
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyfBig Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyfVijayKaran7
 
Data warehousev2.1
Data warehousev2.1Data warehousev2.1
Data warehousev2.1Tuan Luong
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemKiran kumar
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptxhqlm1
 
The Data Axioms lecture-overview-big data-usama-9-2015
The Data Axioms lecture-overview-big data-usama-9-2015The Data Axioms lecture-overview-big data-usama-9-2015
The Data Axioms lecture-overview-big data-usama-9-2015CMR WORLD TECH
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data WarehouseCaserta
 
05_Decision Support and OLAP.pdf
05_Decision Support and OLAP.pdf05_Decision Support and OLAP.pdf
05_Decision Support and OLAP.pdfINyomanSwitrayana
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasiryasir873
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.pptKRISHNARAJ207
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.pptsrirupadasgupta1
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?Findwise
 

Similar to Ch 1 intro_dw (20)

Data mining & warehousing
Data mining & warehousingData mining & warehousing
Data mining & warehousing
 
Krithi talk-impact
Krithi talk-impactKrithi talk-impact
Krithi talk-impact
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyfBig Data Analytics.pdfbgfjgjgghfhhffhdfyf
Big Data Analytics.pdfbgfjgjgghfhhffhdfyf
 
Data warehousev2.1
Data warehousev2.1Data warehousev2.1
Data warehousev2.1
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 
The Data Axioms lecture-overview-big data-usama-9-2015
The Data Axioms lecture-overview-big data-usama-9-2015The Data Axioms lecture-overview-big data-usama-9-2015
The Data Axioms lecture-overview-big data-usama-9-2015
 
Hadoop and Your Data Warehouse
Hadoop and Your Data WarehouseHadoop and Your Data Warehouse
Hadoop and Your Data Warehouse
 
DWM
DWMDWM
DWM
 
05_Decision Support and OLAP.pdf
05_Decision Support and OLAP.pdf05_Decision Support and OLAP.pdf
05_Decision Support and OLAP.pdf
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Ch 1 intro_dw

  • 2. A producer wants to know…. Which are our Which are our lowest/highest margin lowest/highest margin Who are my customers Who are my customers customers ? customers ? and what products and what products What is the most What is the most are they buying? are they buying? effective distribution effective distribution channel? channel? What impact will What impact will Which customers Which customers new products/services new products/services are most likely to go are most likely to go have on revenue have on revenue to the competition ? to the competition ? and margins? and margins? What product prom- What product prom- -otions have the biggest -otions have the biggest impact on revenue? impact on revenue?
  • 3. Lot of data everywhere yet ... • I can’t find the data I need – data is scattered over the network – many versions, subtle differences • I can’t get the data I need – need an expert to get the data • I can’t understand the data I found – available data poorly documented • I can’t use the data I found – results are unexpected – data needs to be transformed from one form to other
  • 4. What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. [Barry Devlin]
  • 5. What users says... • Data should be integrated across the enterprise • Summary data has a real value to the organization • Historical data holds the key to understanding data over time • What-if capabilities are required
  • 6. What is Data Warehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research, April 1996] Data
  • 7. Evolution • 60’s: Batch reports – hard to find and analyze information – inflexible and expensive, reprogram every new request • 70’s: Terminal-based DSS and EIS (executive information systems) – still inflexible, not integrated with desktop tools • 80’s: Desktop data access and analysis tools – query tools, spreadsheets, GUIs – easier to use, but only access operational databases • 90’s: Data warehousing with integrated OLAP engines and tools
  • 8. Warehouses are Very Large Databases 35% 30% 25% Respondents 20% 15% 10% Initial 5% Projected 2Q96 0% Source: META Group, Inc. 5GB 10-19GB 50-99GB 250-499GB 5-9GB 20-49GB 100-249GB 500GB-1TB
  • 9. Very Large Data Bases • Terabytes -- 10^12 bytes: Walmart -- 24 Terabytes • Petabytes -- 10^15 bytes: Geographic Information Systems • Exabytes -- 10^18 bytes: National Medical Records • Zettabytes -- 10^21 bytes: Weather images • Zottabytes -- 10^24 bytes: Intelligence Agency Videos
  • 10. Data Warehousing -- It is a process • Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible • A decision support database maintained separately from the organization’s operational database
  • 11. Data Warehouse • A data warehouse is a – subject-oriented – integrated – time-varying – non-volatile collection of data that is used primarily in organizational decision making. -- Bill Inmon, Building the Data Warehouse 1996
  • 12. Data Warehouse Subject-oriented Customers: Get information of different prices of a beer Farmers: Harvest information from known access paths
  • 13. Data Warehouse Subject-oriented Students: Get information about various universities in U.K. Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data
  • 14. Data Warehouse Subject-oriented • Focusing on the modelling and analysis of data for decision makers, not on daily operations or transaction processing • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process
  • 15. Data Warehouse Subject-oriented Enterprise “Database” Customers Orders Transactions Vendors Etc… Data Miners: Etc… • “Farmers” – they know • “Explorers” - unpredictable Copied, organized summarized Data Data Mining Warehouse
  • 16. Data Warehouse :Time - variant Use to study trends and changes
  • 17. Data Warehouse :Time - variant • The time horizon for the data warehouse is significantly longer than that of operational systems – Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) – Operational database: current value data • Every key structure in the data warehouse – Contains an element of time explicitly or implicitly, while the key of operational data may or may not contain “time element”
  • 18. Data Warehouse : Non-volatile cannot updated by end users
  • 19. Data Warehouse Architecture Relational Databases Optimized Loader Extraction ERP Systems Cleansing Data Warehouse Engine Analyze Purchased Query Data Legacy Data Metadata Repository
  • 20. Data Mart • A Data Mart is a smaller, more focused Data Warehouse – a mini-warehouse. • A Data Mart typically reflects the business rules of a specific business unit within an enterprise.
  • 21. Data Warehouse to Data Mart Decision Support Data Mart Information Decision Data Support Data Mart Information Warehouse Decision Support Data Mart Information
  • 22. DATA MARTS • Create many DM’s • Limited scope Examples: 1. Financial DM 2. Marketing DM 3. Supply chain DM
  • 23. Generic Architecture of Data (synonym) Transaction data
  • 24. Transaction (Operational) Data • Operational (production) systems create (massive number of) transactions, such as sales, purchases, deposits, withdrawals, returns, refunds, phone calls, toll roads, web site “hits”, etc… • Transactions are the base level of data – the raw material for understanding customer behavior • Unfortunately, operational systems change due to changing business needs • Fortunately, operational systems can usually be changed to support changing business needs • Data warehousing strategies need to be aware of operational system changes
  • 25. Operational Summary Data Summaries are for a specific time period Other Examples??? and utilize the transaction data for that time period
  • 26. Decision Support Summary Data • The data that are used to help make decisions about the business – Financial Data, such as: • Income Statements (Profit & Loss) • Balance Sheets (Assets – Liabilities = Net Worth) – Sales summaries – Other examples??? • Data warehouses maintain this type of data, however financial data “of record” (for audit purposes) usually comes from databases and not the data warehouse (confusing???) • Generally, it is a bad idea to use the same system for analytic and operational purposes
  • 27. Data Warehouse for Decision Support • Putting Information technology to help the knowledge worker make faster and better decisions – Which of my customers are most likely to go to the competition? – What product promotions have the biggest impact on revenue? – How did the share price of software companies correlate with profits over last 10 years?
  • 28. Decision Support • Used to manage and control business • Data is historical or point-in-time • Optimized for inquiry rather than update • Use of the system is loosely defined and can be ad-hoc • Used by managers and end-users to understand the business and make judgements
  • 29. Database Schema • Database schema defines the structure of data, not the values of the data (e.g., first name, last name = structure; Ron Norman = values of the data) • In RDBMS: – Columns = fields = attributes (A,B,C) – Rows = records = tuples (1-7)
  • 30. Logical Database Schema • Describes data in a way that is familiar to business users
  • 31. Physical Database Schema • Describes the data the way it will be stored in an RDBMS which might be different than the way the logical shows it
  • 32. Metadata • General definition: Data about data !!! – Examples: • A library’s card catalog (metadata) describes publications (data) • A file system maintains permissions (metadata) about files (data) • A form of system documentation including: – Values legally allowed in a field (e.g., AZ, CA, OR, UT, WA, etc.) – Description of the contents of each field (e.g., start date) – Date when data were loaded – Indication of currency of the data (last updated) – Mappings between systems (e.g., A.this = B.that) • Invaluable, otherwise have to research to find it
  • 33. Business Rules • Highest level of abstraction from operational (transaction) data • Describes why relationships exist and how they are applied • Examples: – Need to have 3 forms of ID for credit – Only allow a maximum daily withdrawal of $200 – After the 3rd log-in attempt, lock the log-in screen – Accept no bills larger than $20 – Others???
  • 34. General Architecture for Data Warehousing • Source systems • Extraction, (Clean), Transformation, & Load (ETL) • Central repository • Metadata repository • Data marts • Operational feedback • End users (business)
  • 35. DATA WAREHOUSE SCOPE Broad : Required for companies, Very costly, May be divided according to Depts. Narrow: Required for Personal information
  • 36. Design of a Data Warehouse: A Business Analysis Framework • Four views regarding the design of a data warehouse – Top-down view • allows selection of the relevant information necessary for the data warehouse – Data source view • exposes the information being captured, stored, and managed by operational systems – Data warehouse view • consists of fact tables and dimension tables – Business query view • sees the perspectives of data in the warehouse from the view of end-user
  • 37. Data Warehouse Design Process • Top-down, bottom-up approaches or a combination of both – Top-down: Starts with overall design and planning – Bottom-up: Starts with experiments and prototypes (rapid) • From software engineering point of view – Waterfall: structured and systematic analysis at each step before proceeding to the next – Spiral: rapid generation of increasingly functional systems, short turn around time, quick turn around • Typical data warehouse design process – Choose a business process to model, e.g., orders, invoices, etc. – Choose the grain (atomic level of data) of the business process – Choose the dimensions that will apply to each fact table record – Choose the measure that will populate each fact table record
  • 38. Multi-Tiered Architecture Monitor & OLAP Server other Metadata sources Integrator Analysis Operational Extract Query DBs Transform Data Serve Reports Load Refresh Warehouse Data mining Data Marts Data Sources Data Storage OLAP Engine Front-End Tools
  • 39. Three Data Warehouse Models • Enterprise warehouse – collects all of the information about subjects spanning the entire organization • Data Mart – a subset of corporate-wide data that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data mart • Independent vs. dependent (directly from warehouse) data mart • Virtual warehouse – A set of views over operational databases – Only some of the possible summary views may be materialized
  • 40. Data Mining works with Warehouse Data • Data Warehousing provides the Enterprise with a memory • Data Mining provides the Enterprise with intelligence
  • 41. We want to know ... • Given a database of 100,000 names, which persons are the least likely to default on their credit cards? • Which types of transactions are likely to be fraudulent given the demographics and transactional history of a particular customer? • If I raise the price of my product by Rs. 2, what is the effect on my ROI? • If I offer only 2,500 airline miles as an incentive to purchase rather than 5,000, how many lost responses will result? • If I emphasize ease-of-use of the product as opposed to its technical capabilities, what will be the net effect on my revenues? • Which of my customers are likely to be the most loyal? Data Mining helps extract such information
  • 42. Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
  • 43. Data Mining in Use • Data Mining can be used to track fraud • A Supermarket becomes an information broker • Basketball teams use it to track game strategy • Cross Selling • Warranty Claims Routing • Holding on to Good Customers • Weeding out Bad Customers
  • 44. Two Systems • Operational System • Information System
  • 45. Operational Systems • Run the business in real time • Based on up-to-the-second data • Optimized to handle large numbers of simple read/write transactions • Optimized for fast response to predefined transactions • Used by people who deal with customers, products -- clerks, salespeople etc. • They are increasingly used by customers
  • 46. On Line Transaction Process (OLTP) It refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing
  • 47. On Line Transaction Process (OLTP) OLTP technology is used in a number of industries, including banking, airlines, mail order, supermarkets, and manufacturing. Applications include electronic banking, order processing, employee time clock systems, e- commerce, and eTrading. The most widely used OLTP system is probably IBM's CICS.
  • 48. What are Operational Systems? • They are OLTP systems • Run mission critical applications • Need to work with stringent performance requirements for routine tasks • Used to run a business!
  • 49. RDBMS used for OLTP • Database Systems have been used traditionally for OLTP – clerical data processing tasks – detailed, up to date data – structured repetitive tasks – read/update a few records – isolation, recovery and integrity are critical
  • 50. Operational Summary Data Summaries are for a specific time period Other Examples??? and utilize the transaction data for that time period
  • 51. Examples of Operational Data Data Industry Usage Technology Volumes Customer All Track Legacy application, flat Small-medium File Customer files, main frames Details Account Finance Control Legacy applications, Large Balance account hierarchical databases, activities mainframe Point-of- Retail Generate ERP, Client/Server, Very Large Sale data bills, manage relational databases stock Call Telecomm- Billing Legacy application, Very Large Record unications hierarchical database, mainframe Production Manufact- Control ERP, Medium Record uring Production relational databases, AS/400
  • 53. Application-Orientation vs. Subject-Orientation Application-Orientation Subject-Orientation Operational Data Database Warehouse Credit Loans Customer Card Vendor Trust Product Savings Activity
  • 54. OLTP vs. Data Warehouse • OLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouse • Special data organization, access methods and implementation methods are needed to support data warehouse queries (typically multidimensional queries) – e.g., average amount spent on phone calls between 9AM-5PM in Pune during the month of December
  • 55. OLTP vs Data Warehouse • OLTP • Warehouse (DSS) – Application Oriented – Subject Oriented – Used to run business – Used to analyze – Detailed data business – Current up to date – Summarized and refined – Isolated Data – Snapshot data – Repetitive access – Integrated Data – Clerical User – Ad-hoc access – Knowledge User (Manager)
  • 56. OLTP vs Data Warehouse • OLTP • Data Warehouse – Performance Sensitive – Performance relaxed – Few Records accessed at – Large volumes accessed a time (tens) at a time(millions) – Mostly Read (Batch – Read/Update Access Update) – Redundancy present – No data redundancy – Database Size 100 – Database Size 100MB - GB - few terabytes 100 GB
  • 57. OLTP vs Data Warehouse • OLTP • Data Warehouse – Transaction – Query throughput is throughput is the the performance performance metric metric – Thousands of users – Hundreds of users – Managed in entirety – Managed by subsets
  • 58. To summarize ... • OLTP Systems are used to “run” a business • The Data Warehouse helps to “optimize” the business
  • 59. Why Separate Data Warehouse? • Performance – Op dbs designed & tuned for known txs & workloads. – Complex OLAP queries would degrade perf. for op txs. – Special data organization, access & implementation methods needed for multidimensional views & queries. • Function – Missing data: Decision support requires historical data, which op dbs do not typically maintain. – Data consolidation: Decision support requires consolidation (aggregation, summarization) of data from many heterogeneous sources: op dbs, external sources. – Data quality: Different sources typically use inconsistent data representations, codes, and formats which have to be reconciled.
  • 60. INFORMATION SYSTEMS • Designed to support decision-making based on 1. Historical data 2. Prediction data. • Designed for complex queries or data-mining applications. Examples: 1. Sales trend analysis, 2. Customer segmentation 3. Human resources planning
  • 62. DIFFERENCE Characteristics Operational Systems Informational Systems Purpose Real time data entry Real and analyze historical data. Primary users Clerks, sales-persons, Managers, business administrations analysts, customers Scope of usage Narrow, planned, and Broad, ad hoc, complex simple updates and queries and analysis queries Design goal Performance throughput, Ease of flexible access availability and use Volume Many, constant updates Periodical batch updates and queries on one or a and queries requiring few table rows many or all rows