2. Using the Star Schema
The queries against a
star schema follow a
consistent pattern.
One or more facts
are typically
requested, along with
the dimensional
attributes that
provide the desired
context. The facts are
summarized as
appropriate, based on
the dimensions.
2
3. Aggregate tables
Aggregate tables improve data warehouse performance
by reducing the number of rows the RDBMS must access
when responding to a query
Base schema Aggregate schema
3
5. aggregate characteristic
The more highly summarized an aggregate table is, the
fewer queries it will be able to accelerate.
This means that choosing aggregates involves making careful
tradeoffs between the performance gain offered and the
number of queries that will benefit.
5
6. The Aggregate Navigator
To receive the performance benefit offered by an
aggregate schema, a query must be written to use the
aggregate.
aggregate navigator: A component of the data warehouse
infrastructure, the aggregate navigator assumes the task of
rewriting user queries to utilize aggregate tables.
6
7. Principles of Aggregation
An aggregate schema must always provide exactly the
same results as the base schema.
The attributes of each aggregate table must be a subset of
those from a base schema table.
The only exception to this rule is the surrogate key for an
aggregate dimension table.
7
9. Pre-Joined Aggregates
a pre-joined aggregate summarizes a fact across a set of
dimension values. But unlike the aggregate star schemas
the pre-joined aggregate places the results in a single
table.
By doing so, the pre-joined aggregate eliminates the need for
the RDBMS to perform a join operation at query time.
9
10. Derived Tables
alter the structure of the tables summarized or change
the scope of their content.
Types:
the merged fact table: combines facts from more than one fact
table at a common grain
the pivoted fact table: transforms a set of metrics in a single
row into multiple rows with a single metric, or vice versa.
the sliced fact table: contains a subset of the records of the
base fact table, usually in coordination with a specific
dimension attribute.
In all three cases, the derived fact tables are not expected
to serve as invisible stand-ins for the base schema.
10
11. Tables with New Facts
Semi-additive facts may not be added together across a
particular dimension; non-additive facts are never added
together. In these situations, you may choose to aggregate
by means other than summation.
11
12. Choosing Aggregates
One of the most vexing tasks in deploying dimensional
aggregates is choosing which aggregates to design and
deploy.
Your aim is to strike the correct balance between the
performance gain provided by aggregate schemas and their cost
in terms of resource requirements.
12
13. Choosing Aggregates
What Is a Potential Aggregate?
Identifying Potentially Useful Aggregates
Assessing the Value of Potential Aggregates
13
14. What Is a Potential Aggregate?
Aggregate Fact Tables: A Question of Grain
Aggregate Dimensions Must Conform
Pre-Joined Aggregates Have Grain Too
Enumerating Potential Aggregates
14
15. What Is a Potential Aggregate?
Express potential aggregates as fact table grain statements
Orders by day, salesperson and product
Orders by day, customer, and product
Orders by month, product, and salesperson
15
17. Identifying Potentially Useful Aggregates
Drawing on Initial Design
Design Decisions
Listening to Users
Where Subject Areas Meet
The Conformance Bus
Aggregates for Drilling Across
Query Patterns of an Existing System
Analyzing Reports for Potential Aggregates
Choosing Which Reports to Analyze
17
18. Identifying Potentially Useful Aggregates
Identify and document potential aggregates during schema
design, even if initial implementation will not include
aggregates. This information will be useful in the future.
Any decision to set the grain of a fact table at a finer level
reveals a potential aggregate.
Decisions about where to place groups of dimensional
attributes reveal potential levels of aggregation.
Discussion of hierarchies or drill paths point to potential
aggregates
User work products reveal potential aggregates. These may
include reports from operational systems, manually compiled
briefings, or spreadsheets. They will also be revealed by manual
processes and requirements not currently met.
18
19. Aggregates for Drilling Across
The process of combining
information from multiple fact
tables is called drilling across
Consult the conformance bus
to identify aggregates that will
be used in drill-across reports.
The lowest common
dimensionality between two fact
tables often suggests one or
more aggregates.
19
20. Analyzing Reports for Potential Aggregates
The detail rows
require order facts
by product and
month.
The summary rows
require order facts
by category and
month.
The grand total
requires order facts
by month.
20
22. Assessing the Value of Potential Aggregates
After identifying a pool of potential aggregates, the next
step is to sort through them and determine which ones
to build.
22
23. Assessing the Value of Potential Aggregates
Number of Aggregates
Presence of an Aggregate Navigator
Space Consumed by Aggregate Tables
How Many Rows Are Summarized
Examining the Number of Rows Summarized
The Cardinality Trap and Sparsity
Who Will Benefit from the Aggregate
23
24. Examining the Number of Rows
Summarized
A good starting rule of thumb is to identify aggregate fact
tables where each row summarizes an average of 20
rows.
The savings afforded by aggregates can be lopsided,
favoring a particular attribute value.
Remember that, like a base fact table, a dimensional
aggregate can be aggregated during a query. Aggregates
may be competing with other aggregates to offer
performance gains.
24
25. The Cardinality Trap and Sparsity
Cardinality:The number of distinct values taken on by a
given attribute
sparse:not all combinations of keys are present.
Don’t assume aggregate fact tables will exhibit the same
sparsity as the tables they summarize.
The higher the degree of summarization, the more dense the
aggregate fact table will be.
The best way to get an idea of the relative size of the
aggregate is to count the number of rows.
As before, count the distinct combination of keys and/or
summarized dimension attributes.
25
26. Who Will Benefit from the Aggregate
The first aggregates you add to your implementation are
those that offer benefits across the widest number of
user requirements. Aggregates that fall in the 20:1 range
of savings are compared with one another to identify
those that support the most common user requirements.
Start by selecting aggregates that provide solid
performance boosts for a wide number of common
queries. To this, add more powerful (but more narrowly
used) aggregates as space permits. Use the relative
importance of one aggregate over another in a tiebreaker
situation.
26
27. Bibliografía
Mastering Data Warehouse Aggregates.Solutions for Star
Schema Performance. Christopher Adamson.
27