The document discusses data warehousing and the star schema. It defines a data warehouse as a repository of integrated information available for queries and analysis. The data comes from heterogeneous sources and can be queried together. It describes how a star schema organizes data into a central fact table surrounded by dimension tables. The fact table contains keys linking to attributes in the dimension tables. Star queries are processed by first using bitmap indexes on the fact table keys to retrieve relevant rows, then joining the results to the dimension tables.
This document discusses using Ruby to perform multidimensional data analysis on relational databases. It introduces Mondrian, an open-source OLAP engine that allows for multidimensional analysis on top of SQL databases using the MDX query language. A new Ruby gem called mondrian-olap will integrate Mondrian and provide a Ruby DSL and ActiveRecord-like query interface for defining OLAP schemas and performing analytical queries on relational data in a simpler way than SQL. Examples show how to write multidimensional queries in MDX and the Ruby interface to analyze sales data across dimensions like time, products, and customers.
This document provides an overview of online analytical processing (OLAP). It defines OLAP as a process for analyzing multidimensional data to help decision makers. OLAP uses data warehouses to store historical data in a structured format. It allows for analytical queries and operations like aggregation, roll-up, drill-down and slicing and dicing of data. SQL extensions and OLAP functions further aid analysis. OLAP systems can be MOLAP, ROLAP or HOLAP based on their architecture and data storage methods. Commercial OLAP systems include IBM, Oracle and Microsoft products.
Data Warehousing for students educationpptxjainyshah20
This document discusses data warehousing and OLAP technology. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data used to support management decision making. Key aspects covered include the multi-dimensional data model using cubes and dimensions, various data warehouse architectures like star schemas and snowflake schemas, and OLAP operations for analysis like roll-up, drill-down, slice and dice. Building a data warehouse requires a range of business, technology, and program management skills.
This document discusses OLAP (online analytical processing) and compares it to OLTP (online transaction processing). It describes how OLAP uses complex queries against large portions of a database to guide strategic decisions, while OLTP focuses on short, frequent updates to maintain an accurate database. The document also introduces concepts like data cubes, dimension tables, aggregation, drilling down, rolling up, pivoting, slicing and dicing. It describes how the CUBE and ROLLUP operators can perform multiple aggregations with a single query.
This document discusses OLAP (Online Analytical Processing) and data mining. It begins by comparing OLTP (Online Transaction Processing) and OLAP, noting that OLTP maintains operational databases while OLAP uses stored data to guide strategic decisions. Common OLAP techniques like slicing, dicing, drilling down and rolling up are introduced. Data mining aims to extract knowledge from databases by finding patterns and associations. The document outlines common data mining algorithms and discusses data warehouses for housing OLAP and mining data.
This document discusses data warehouse models and OLAP operations. It introduces key concepts like dimensional modeling, star schemas, snowflake schemas, and multi-dimensional data cubes. It explains how OLAP allows for interactive analysis of large datasets through operations like roll-up, drill-down, slicing, and pivoting. It also outlines different approaches to implementing OLAP, including ROLAP which uses relational databases and MOLAP which uses specialized multi-dimensional arrays.
This document provides an overview of data warehousing concepts including:
- Data warehouses store historical data from operational systems for analysis and reporting. The data passes through a staging area and operational data store for cleaning before loading into the data warehouse.
- Common data warehouse architectures include star schemas with fact and dimension tables and snowflake schemas with normalized dimensions. Data marts contain summarized data for specific business questions.
- ETL processes extract, transform, and load the data in three phases. Transformation cleans and prepares the data before loading into dimensional schemas.
- Data warehouses typically contain historical data, derived data generated from existing data, and metadata describing the data and schemas.
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
This document discusses using Ruby to perform multidimensional data analysis on relational databases. It introduces Mondrian, an open-source OLAP engine that allows for multidimensional analysis on top of SQL databases using the MDX query language. A new Ruby gem called mondrian-olap will integrate Mondrian and provide a Ruby DSL and ActiveRecord-like query interface for defining OLAP schemas and performing analytical queries on relational data in a simpler way than SQL. Examples show how to write multidimensional queries in MDX and the Ruby interface to analyze sales data across dimensions like time, products, and customers.
This document provides an overview of online analytical processing (OLAP). It defines OLAP as a process for analyzing multidimensional data to help decision makers. OLAP uses data warehouses to store historical data in a structured format. It allows for analytical queries and operations like aggregation, roll-up, drill-down and slicing and dicing of data. SQL extensions and OLAP functions further aid analysis. OLAP systems can be MOLAP, ROLAP or HOLAP based on their architecture and data storage methods. Commercial OLAP systems include IBM, Oracle and Microsoft products.
Data Warehousing for students educationpptxjainyshah20
This document discusses data warehousing and OLAP technology. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data used to support management decision making. Key aspects covered include the multi-dimensional data model using cubes and dimensions, various data warehouse architectures like star schemas and snowflake schemas, and OLAP operations for analysis like roll-up, drill-down, slice and dice. Building a data warehouse requires a range of business, technology, and program management skills.
This document discusses OLAP (online analytical processing) and compares it to OLTP (online transaction processing). It describes how OLAP uses complex queries against large portions of a database to guide strategic decisions, while OLTP focuses on short, frequent updates to maintain an accurate database. The document also introduces concepts like data cubes, dimension tables, aggregation, drilling down, rolling up, pivoting, slicing and dicing. It describes how the CUBE and ROLLUP operators can perform multiple aggregations with a single query.
This document discusses OLAP (Online Analytical Processing) and data mining. It begins by comparing OLTP (Online Transaction Processing) and OLAP, noting that OLTP maintains operational databases while OLAP uses stored data to guide strategic decisions. Common OLAP techniques like slicing, dicing, drilling down and rolling up are introduced. Data mining aims to extract knowledge from databases by finding patterns and associations. The document outlines common data mining algorithms and discusses data warehouses for housing OLAP and mining data.
This document discusses data warehouse models and OLAP operations. It introduces key concepts like dimensional modeling, star schemas, snowflake schemas, and multi-dimensional data cubes. It explains how OLAP allows for interactive analysis of large datasets through operations like roll-up, drill-down, slicing, and pivoting. It also outlines different approaches to implementing OLAP, including ROLAP which uses relational databases and MOLAP which uses specialized multi-dimensional arrays.
This document provides an overview of data warehousing concepts including:
- Data warehouses store historical data from operational systems for analysis and reporting. The data passes through a staging area and operational data store for cleaning before loading into the data warehouse.
- Common data warehouse architectures include star schemas with fact and dimension tables and snowflake schemas with normalized dimensions. Data marts contain summarized data for specific business questions.
- ETL processes extract, transform, and load the data in three phases. Transformation cleans and prepares the data before loading into dimensional schemas.
- Data warehouses typically contain historical data, derived data generated from existing data, and metadata describing the data and schemas.
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
This document discusses multidimensional data models and cube operations. It introduces key concepts like facts and measures, dimensions and hierarchies. It describes star and snowflake schemas for structuring multidimensional data in a relational database. The document also covers cube operations like roll-up, drill-down, slice and dice that allow interactive analysis of aggregated data across multiple dimensions.
This document describes a data warehouse and business intelligence project for analyzing Starbucks store data. It discusses extracting data from various structured, semi-structured, and unstructured sources, transforming the data using SQL and R, and loading it into a star schema data warehouse with fact and dimension tables. The data warehouse is then used for business queries and analysis in Tableau, with case studies examining city revenue, visitor and beverage sales by city, and city ratings based on food and beverage counts. The analysis finds that New York City generally has the highest revenue, visitor counts, and ratings.
This document provides an overview of data warehousing. It defines a data warehouse as a collection of diverse data aimed at executives and decision makers that is integrated, time-varying, and non-volatile. It discusses why warehouses are used, different warehouse architectures and models, and operations for implementing and querying a warehouse like aggregation, pivoting, and materializing views.
This document provides an overview of data warehousing. It defines a data warehouse as a collection of diverse data aimed at executives and decision makers that is integrated, time-varying, and non-volatile. It discusses why warehouses are used, different warehouse architectures and models, and operations like aggregation, pivoting and materializing views. Key benefits of warehouses include high query performance, flexibility to query data not in a DBMS, and ability to operate when sources are unavailable.
The document discusses data warehousing and OLAP technology for data mining. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data to support management decision making. It describes how a data warehouse uses a multi-dimensional data model with dimensions and measures. It also discusses efficient computation of data cubes, OLAP operations, and further developments in data cube technology like discovery-driven and multi-feature cubes to support data mining applications from information processing to analytical processing and knowledge discovery.
The document discusses data warehousing and OLAP technology for data mining. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data that supports management decision making. It describes how data warehouses use a multi-dimensional data model with dimensions and facts to organize data into cubes that can be sliced, diced, and aggregated. It also discusses how data warehouse architecture, implementation, indexing techniques, and metadata repositories help optimize online analytical processing queries on historical and summarized data to support data mining.
June 10, 2010 BDPA Charlotte Program Meeting Presentation.
Presenter:
Markus Beamer, BDPA Charlotte President Elect
Topic:
Intelligent Data Strategies - Intro to Data Marts and Data Warehouses
This document discusses online analytical processing (OLAP) and OLAP querying. It defines OLAP as using a set of query and reporting tools that provides multidimensional views of data to allow analysis using simple windowing techniques. It describes different types of OLAP tools, including ROLAP which accesses data directly, MOLAP which loads data into a multidimensional structure, and slicing a data cube to produce views of the data.
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
A data warehouse stores current and historical data for analysis and decision making. It uses a star schema with fact and dimension tables. The fact table contains measures that can be aggregated and connected to dimension tables through foreign keys. Dimensions describe the facts and contain descriptive attributes to analyze measures over time, products, locations etc. This allows analyzing large volumes of historical data for informed decisions.
A BI system gathers data from multiple sources, transforms the data for consistency, and stores it in a single location for analysis and decision making. The data is extracted from sources, processed for integration, and loaded into dimensional data marts and data warehouses for analysis. Dimensions contain attributes and hierarchies that are analyzed together with measures stored in fact tables. Multidimensional OLAP enables interactive analysis of measures across various dimension attributes and hierarchies.
This document discusses multidimensional data analysis using JRuby and OLAP technologies. It provides examples of SQL queries for relational data that become complex for analytical queries. It then introduces multidimensional data modeling and cubes with dimensions, hierarchies and measures. Examples are given of MDX queries and equivalent Ruby queries against OLAP cubes for analytical reporting. Finally, it discusses defining an OLAP schema to map cubes to database tables.
The document introduces three fundamental concepts for tuning SQL queries: join order, join method, and access method. It uses an example query joining sales, products, and customers tables to illustrate how changing the join order can significantly impact performance. The best join order typically lists tables in order of selectivity of their predicates. Statistics and hints can help the optimizer determine optimal join order. Breaking a query into multiple statements is another way to control join order.
This document outlines the process of creating an OLAP cube in SQL Server Analysis Services to perform sales analysis from data retrieved from an SAP system. The process includes defining data sources and views, creating dimensions for customers, products, and time, building a sales cube, and browsing the dimensions and cube in Analysis Services and Excel pivot tables.
See sql server graphical execution plans in action tech republicKaing Menglieng
Tim Chapman identifies a few basic things to look for in SQL Server graphical execution plans to understand how indexes are used. He creates a sample database with different indexes and queries it in various ways to demonstrate index seeks, scans, and lookups. The article shows how indexes can improve query performance when columns have high selectivity but may not help when columns have few distinct values.
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
This document discusses multidimensional data models and cube operations. It introduces key concepts like facts and measures, dimensions and hierarchies. It describes star and snowflake schemas for structuring multidimensional data in a relational database. The document also covers cube operations like roll-up, drill-down, slice and dice that allow interactive analysis of aggregated data across multiple dimensions.
This document describes a data warehouse and business intelligence project for analyzing Starbucks store data. It discusses extracting data from various structured, semi-structured, and unstructured sources, transforming the data using SQL and R, and loading it into a star schema data warehouse with fact and dimension tables. The data warehouse is then used for business queries and analysis in Tableau, with case studies examining city revenue, visitor and beverage sales by city, and city ratings based on food and beverage counts. The analysis finds that New York City generally has the highest revenue, visitor counts, and ratings.
This document provides an overview of data warehousing. It defines a data warehouse as a collection of diverse data aimed at executives and decision makers that is integrated, time-varying, and non-volatile. It discusses why warehouses are used, different warehouse architectures and models, and operations for implementing and querying a warehouse like aggregation, pivoting, and materializing views.
This document provides an overview of data warehousing. It defines a data warehouse as a collection of diverse data aimed at executives and decision makers that is integrated, time-varying, and non-volatile. It discusses why warehouses are used, different warehouse architectures and models, and operations like aggregation, pivoting and materializing views. Key benefits of warehouses include high query performance, flexibility to query data not in a DBMS, and ability to operate when sources are unavailable.
The document discusses data warehousing and OLAP technology for data mining. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data to support management decision making. It describes how a data warehouse uses a multi-dimensional data model with dimensions and measures. It also discusses efficient computation of data cubes, OLAP operations, and further developments in data cube technology like discovery-driven and multi-feature cubes to support data mining applications from information processing to analytical processing and knowledge discovery.
The document discusses data warehousing and OLAP technology for data mining. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data that supports management decision making. It describes how data warehouses use a multi-dimensional data model with dimensions and facts to organize data into cubes that can be sliced, diced, and aggregated. It also discusses how data warehouse architecture, implementation, indexing techniques, and metadata repositories help optimize online analytical processing queries on historical and summarized data to support data mining.
June 10, 2010 BDPA Charlotte Program Meeting Presentation.
Presenter:
Markus Beamer, BDPA Charlotte President Elect
Topic:
Intelligent Data Strategies - Intro to Data Marts and Data Warehouses
This document discusses online analytical processing (OLAP) and OLAP querying. It defines OLAP as using a set of query and reporting tools that provides multidimensional views of data to allow analysis using simple windowing techniques. It describes different types of OLAP tools, including ROLAP which accesses data directly, MOLAP which loads data into a multidimensional structure, and slicing a data cube to produce views of the data.
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
A data warehouse stores current and historical data for analysis and decision making. It uses a star schema with fact and dimension tables. The fact table contains measures that can be aggregated and connected to dimension tables through foreign keys. Dimensions describe the facts and contain descriptive attributes to analyze measures over time, products, locations etc. This allows analyzing large volumes of historical data for informed decisions.
A BI system gathers data from multiple sources, transforms the data for consistency, and stores it in a single location for analysis and decision making. The data is extracted from sources, processed for integration, and loaded into dimensional data marts and data warehouses for analysis. Dimensions contain attributes and hierarchies that are analyzed together with measures stored in fact tables. Multidimensional OLAP enables interactive analysis of measures across various dimension attributes and hierarchies.
This document discusses multidimensional data analysis using JRuby and OLAP technologies. It provides examples of SQL queries for relational data that become complex for analytical queries. It then introduces multidimensional data modeling and cubes with dimensions, hierarchies and measures. Examples are given of MDX queries and equivalent Ruby queries against OLAP cubes for analytical reporting. Finally, it discusses defining an OLAP schema to map cubes to database tables.
The document introduces three fundamental concepts for tuning SQL queries: join order, join method, and access method. It uses an example query joining sales, products, and customers tables to illustrate how changing the join order can significantly impact performance. The best join order typically lists tables in order of selectivity of their predicates. Statistics and hints can help the optimizer determine optimal join order. Breaking a query into multiple statements is another way to control join order.
This document outlines the process of creating an OLAP cube in SQL Server Analysis Services to perform sales analysis from data retrieved from an SAP system. The process includes defining data sources and views, creating dimensions for customers, products, and time, building a sales cube, and browsing the dimensions and cube in Analysis Services and Excel pivot tables.
See sql server graphical execution plans in action tech republicKaing Menglieng
Tim Chapman identifies a few basic things to look for in SQL Server graphical execution plans to understand how indexes are used. He creates a sample database with different indexes and queries it in various ways to demonstrate index seeks, scans, and lookups. The article shows how indexes can improve query performance when columns have high selectivity but may not help when columns have few distinct values.
Similar a IT301-Datawarehousing (1) and its sub topics.pptx (20)
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
Liberarsi dai framework con i Web Component.pptxMassimo Artizzu
In Italian
Presentazione sulle feature e l'utilizzo dei Web Component nell sviluppo di pagine e applicazioni web. Racconto delle ragioni storiche dell'avvento dei Web Component. Evidenziazione dei vantaggi e delle sfide poste, indicazione delle best practices, con particolare accento sulla possibilità di usare web component per facilitare la migrazione delle proprie applicazioni verso nuovi stack tecnologici.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
14 th Edition of International conference on computer visionShulagnaSarkar2
About the event
14th Edition of International conference on computer vision
Computer conferences organized by ScienceFather group. ScienceFather takes the privilege to invite speakers participants students delegates and exhibitors from across the globe to its International Conference on computer conferences to be held in the Various Beautiful cites of the world. computer conferences are a discussion of common Inventions-related issues and additionally trade information share proof thoughts and insight into advanced developments in the science inventions service system. New technology may create many materials and devices with a vast range of applications such as in Science medicine electronics biomaterials energy production and consumer products.
Nomination are Open!! Don't Miss it
Visit: computer.scifat.com
Award Nomination: https://x-i.me/ishnom
Conference Submission: https://x-i.me/anicon
For Enquiry: Computer@scifat.com
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfVALiNTRY360
Salesforce Healthcare CRM, implemented by VALiNTRY360, revolutionizes patient management by enhancing patient engagement, streamlining administrative processes, and improving care coordination. Its advanced analytics, robust security, and seamless integration with telehealth services ensure that healthcare providers can deliver personalized, efficient, and secure patient care. By automating routine tasks and providing actionable insights, Salesforce Healthcare CRM enables healthcare providers to focus on delivering high-quality care, leading to better patient outcomes and higher satisfaction. VALiNTRY360's expertise ensures a tailored solution that meets the unique needs of any healthcare practice, from small clinics to large hospital systems.
For more info visit us https://valintry360.com/solutions/health-life-sciences
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
2. Finally we are talking about something not
invented by IBM!
Inventor is unknown. Popularized by Ralph
Kimball and his company, Red Brick
Warehouse.
2
3. History
First product introduced by Red Brick
Warehouse, a standalone system for data
warehouse
Algorithm was figured out by Oracle and
Sybase. Oracle built into DBMS, Sybase
made separate software product.
IBM bought Red Brick
3
4. Agenda
Definition
Why data warehouse
Product History
Processing Star queries
Data warehouse in the enterprise
Data warehouse design
Relevance of normalization
Star schema
Processing the star schema
4
5. Definition
Data warehouse: A repository of integrated
information, available for queries and
analysis. Data and information are extracted
from heterogeneous sources as they are
generated
The point is that it’s not used for transaction
processing; that is, it’s read-only. And the
data can come from heterogeneous sources
and it can all be queried in one database.
5
6. Why Data Warehouse
A data warehouse collects information from
many sources. It converts different data into
a uniform format, following the data analytics
requirements. Data warehousing ensures
production standards and quality information.
Different departments can generate
consistent results.
6
7. Data Warehouse vs. OLTP
OLTP DW
Purpose Automate day-to-day
operations
Analysis
Structure RDBMS RMBMS
Data Model Normalized Dimensional
Access SQL SQL and business analysis
programs
Data Data that runs the business Current and historical information
Condition of data Changing, incomplete Historical, complete, descriptive
7
8. Red Brick
Invented data warehouse; they sold a hardware
product with a star schema database
You loaded the Red Brick Warehouse and then
queried it for OLTP
It featured new optimizations for star schemas, was
very fast
8
9. Enter Sybase
Sybase learned the optimization and
developed their own product.
The Sybase product was a stand-alone
software data warehouse product
It couldn’t do general-purpose database
work, was just a data warehouse
They appear to have copied the Red Brick
idea, without selling hardware
9
10. Enter Oracle
Oracle, later, also copied the same
optimization
They added a bitmap index to their database
product, and added the star schema
optimization
Now their product could do data warehouse
as well as database
10
11. Status Today
Oracle dominates the field today
IBM eventually bought Red Brick so still
offers some sort of Red Brick product
Sybase offers their OLTP product, now as an
offering of SAP
11
13. Star Schema
Data warehouse relies on the star schema
The data is not normalized
DW is loaded from a normalized database
There is a fact table surrounded by multiple
dimension tables
Fact table has all measures for the subject
area, with foreign keys for dimensions for
each measure
13
14. A Sample OLTP Schema
14
orders
products
order
items
customers
15. Transformed to a Star Schema
15
products
customers
sales
channels
times
fact table
dimension
table
dimension
table
dimension
table
dimension
table
Time Product Customer Channel
Time Hour Day
of
Week
Month Year Season
17. Processing Star Queries
Build a bitmap index on each foreign key
column of the fact table
Index is a 2-dimensional array, one column
for each row being indexed, one row per
value of that column
Bitmap indexes are typically much smaller
than b-tree indexes, that can be larger than
the data itself
17
19. Query Processing
The typical query is a join of foreign keys of
dimension tables to the fact table
This is processed in two phases:
1. From the fact table, retrieve all rows that are part
of the result, using bitmap indexes
2. Join the result of the step above to the
dimension tables
19
20. Example Query
Find sales and profits from the grocery
departments of stores in the West and
Southwest districts over the last three quarters
20
21. Example Query
SELECT
store.sales_district,
time.fiscal_period,
SUM(sales.dollar_sales) revenue,
SUM(dollar_sales) - SUM(dollar_cost) income
FROM
sales, store, time, product
WHERE
sales.store_key = store.store_key AND
sales.time_key = time.time_key AND
sales.product_key = product.product_key AND
time.fiscal_period IN ('3Q95', '4Q95', '1Q96') and
product.department = 'Grocery' AND
store.sales_district IN ('San Francisco', 'Los Angeles')
GROUP BY
store.sales_district, time.fiscal_period;
21
22. Phase 1
Finding the rows in the SALES table (using bitmap indexes):
SELECT ... FROM sales
WHERE
store_key IN (SELECT store_key FROM store WHERE
sales_district IN ('WEST', 'SOUTHWEST')) AND
time_key IN (SELECT time_key FROM time WHERE
quarter IN ('3Q96', '4Q96', '1Q97')) AND
product_key IN (SELECT product_key FROM product WHERE
department = 'GROCERY');
22
23. Phase 2
Now the fact table is joined to dimension
tables. For dimension tables of small
cardinality, a full-table scan may be used. For
large cardinality, a hash join could be used.
23
24. The Star Transformation
Use bitmap indexes to retrieve all relevant
rows from the fact table, based on foreign
key values
– This happens very fast
Join this result set to the dimension tables
– If there are many values, a hash join may be used
– If there are fewer values, a b-tree driven join may
be used
24
25. How DW Fits into the Enterprise
25
OLTP3
Data
Mart
Data
Warehouse
Data
Mart
Data
Mart
Data
Mart
Application A
Application B
Application C
User
User
User
User
User
User
User
Extract,
Transform
And Load
OLTP2
OLTP1
Integration
Integration
26. Data Warehouse Database Design
A conventional database design for data
warehouse would lead to joins on large
amounts of data that would run slowly
The star schema allows for fast processing of
very large quantities of data in the data
warehouse
It also allows for very compact representation
of events that occur many times
26
27. A Sample OLTP Schema
27
orders
products
order
items
customers
28. Transformed to a Star Schema
28
products
customers
sales
channels
times
fact table
dimension
table
dimension
table
dimension
table
dimension
table
30. Fact Table
The fact table contains the actual business process
measurements or metrics for a specific event, called facts,
usually numbers.
A fact table represents facts by foreign keys from other tables,
called “dimension” tables
These foreign keys are usually generated keys, in order to save
fact table space
If you are building a DW of monthly sales in dollars, your fact
table will contain monthly sales, one row per month.
If you are building a DW of retail sales, each row of the fact
table might have one row for each item sold.
30
31. Fact Table Design
A fact table may contain one or more facts. Usually you
create one fact table per business event. For
example if you want to analyze the sales numbers
and also advertising spending, they are two separate
business processes. So you will create two separate
fact tables, one for sales data and one for
advertising cost data. On the other hand if you want
to track the sales tax in addition to the sales number,
you simply create one more fact column in the Sales
fact table called Tax.
31
32. Dimension Table
Dimension tables have a small number of rows
(compared to fact tables) but a large number of
columns
For the lowest level of granularity of a fact in the fact
table, a dimension table will have one row that gives
all the categories for each value
The dimension table is often all key, so a generated
key is used so that the fact table reference to the
dimension table can be small
32
34. Time Dimension Schema
Column Name Type
Dim_Id INTEGER (4)
Month SMALL INTEGER (2)
Month_Name VARCHAR (3)
Quarter SMALL INTEGER (4)
Quarter_Name VARCHAR (2)
Year SMALL INTEGER (2)
34
35. Time Dimension Data
TM _Dim_Id TM _Month TM_Month_Name TM _Quarter
TM_Quarter_N
ame
TM_Year
1001 1 Jan 1 Q1 2003
1002 2 Feb 1 Q1 2003
1003 3 Mar 1 Q1 2003
1004 4 Apr 2 Q2 2003
1005 5 May 2 Q2 2003
35
36. Location Dimension Schema
Column Name Type
Dim_Id INTEGER (4)
Loc_Code VARCHAR (4)
Name VARCHAR (50)
State_Name VARCHAR (20)
Country_Name VARCHAR (20)
36
37. Location Dimension Data
Dim_Id Loc_Code Name State_Name Country_Name
1001 IL01 Chicago Loop Illinois USA
1002 IL02 Arlington Hts Illinois USA
1003 NY01 Brooklyn New York USA
1004 TO01 Toronto Ontario Canada
1005 MX01 Mexico City Distrito Federal Mexico
37
38. Product Data Schema
Column Name Type
Dim_Id INTEGER (4)
SKU VARCHAR (10)
Name VARCHAR (30)
Category VARCHAR (30)
38
39. Product Data
Dim_Id SKU Name Category
1001 DOVE6K Dove Soap 6Pk Sanitary
1002 MLK66F# Skim Milk 1 Gal Dairy
1003 SMKSAL55 Smoked Salmon 6oz Meat
39
40. Categories in Dimension Tables
Categories may or may not be hierarchical;
or can be both
Categories provide canned values that can
be given to users for queries
40
41. Granularity (Grain) of the Fact Table
The level of detail of the fact table is known as
the grain of the fact table. In this example the
grain of the fact table is monthly
sales number per location per product.
41
42. Note about Granularity
There may be multiple star schemas at
different levels of granularity, especially for
very large data warehouses
The first could be the finest—say, each
transaction such as a sale
The next could be an aggregation, like the
previous example
There could be more levels of aggregation
42
43. Design Approach
1. Identify the business process.
In this step you will determine what is your business process that your data
warehouse represents. This process will be the source of your metrics or
measurements.
2. Identify the Grain
You will determine what does one row of fact table mean. In the previous example
you have decided that your grain is 'monthly sales per location per product'. It
might be daily sales or even each sale could be one row.
3. Identify the Dimensions
Your dimensions should be descriptive (SQL VARCHAR or CHARACTER) as much
as possible and conform to your grain.
4. Finally Identify the facts
In this step you will identify what are your measurements (or metrics or facts). The
facts should be numeric and should confirm to the grain defined in step 2.
43
44. Monthly Sales Fact Table Schema
Field Name Type
TM_Dim_Id INTEGER (4)
PR_ Dim_Id INTEGER (4)
LOC_ Dim_Id INTEGER (4)
Sales INTEGER (4)
44
46. Data Mart
A data mart is a collection of subject areas organized for decision
support based on the needs of a given department. Examples: finance
has their data mart, marketing has theirs, sales has theirs and so on.
Each department generally runs its own data mart. Ownership of the
data mart allows each department to bypass the control that might
coordinate the data found in the different departments.
Each department's data mart is peculiar to and specific to its own
needs. Typically, the database design for a data mart is built around a
star-join structure designed for that department.
The data mart contains only a modicum of historical information and is
granular only to the point that it suits the needs of the department.
The data mart may also include data from outside the organization,
such as purchased normative salary data that might be purchased by
an HR department.
46
47. About the Data Mart
The structure of the data in the data mart may or may not be
compatible with the structure of data in the data warehouse.
The amount of historical data found in the data mart is different
from the history of the data found in the warehouse. Data
warehouses contain robust amounts of history, while data marts
usually contain modest amounts of history.
The subject areas found in the data mart are only faintly related
to the subject areas found in the data warehouse.
The relationships found in the data mart may not be those
relationships that are found in the data warehouse.
The types of queries satisfied in the data mart are quite
different from those queries found in the data warehouse.
47
48. Walmart’s Data Warehouse
Half a petabyte in capacity (.5 x 1015 bytes)
World’s largest DW
Tracks 100 million customers buying billions of
products every week
Every sale from every store is transmitted to
Bentonville every night
Walmart has more than 18,000 retail stores, employs
2.2 million, serves 245 million customers every week
48
49. Typical Questions
How much orange juice did we sell last year,
last month, last week in store X?
What internal factors (position in store,
advertising campaigns...) influence orange
juice sales?
How much orange juice are we going to sell
next week, next month, next year?
49