Agreggates i

Data Warehouse Aggregates

Ing. Julio Ernesto Carreño Vargas

Using the Star Schema
 The queries against a
star schema follow a
consistent pattern.
One or more facts
are typically
requested, along with
the dimensional
attributes that
provide the desired
context. The facts are
summarized as
appropriate, based on
the dimensions.

2

Aggregate tables
 Aggregate tables improve data warehouse performance
by reducing the number of rows the RDBMS must access
when responding to a query

Base schema Aggregate schema

3

aggregate dimension table

4

aggregate characteristic
 The more highly summarized an aggregate table is, the
fewer queries it will be able to accelerate.
 This means that choosing aggregates involves making careful
tradeoffs between the performance gain offered and the
number of queries that will benefit.

5

The Aggregate Navigator
 To receive the performance benefit offered by an
aggregate schema, a query must be written to use the
aggregate.
 aggregate navigator: A component of the data warehouse
infrastructure, the aggregate navigator assumes the task of
rewriting user queries to utilize aggregate tables.

6

Principles of Aggregation
 An aggregate schema must always provide exactly the
same results as the base schema.
 The attributes of each aggregate table must be a subset of
those from a base schema table.
 The only exception to this rule is the surrogate key for an
aggregate dimension table.

7

summarization techniques
 Aggregate Tables
 Pre-Joined Aggregates
 Derived Tables

8

Pre-Joined Aggregates
 a pre-joined aggregate summarizes a fact across a set of
dimension values. But unlike the aggregate star schemas
the pre-joined aggregate places the results in a single
table.
 By doing so, the pre-joined aggregate eliminates the need for
the RDBMS to perform a join operation at query time.

9

Derived Tables
 alter the structure of the tables summarized or change
the scope of their content.
 Types:
 the merged fact table: combines facts from more than one fact
table at a common grain
 the pivoted fact table: transforms a set of metrics in a single
row into multiple rows with a single metric, or vice versa.
 the sliced fact table: contains a subset of the records of the
base fact table, usually in coordination with a specific
dimension attribute.
 In all three cases, the derived fact tables are not expected
to serve as invisible stand-ins for the base schema.
10

Tables with New Facts
 Semi-additive facts may not be added together across a
particular dimension; non-additive facts are never added
together. In these situations, you may choose to aggregate
by means other than summation.

11

Choosing Aggregates
 One of the most vexing tasks in deploying dimensional
aggregates is choosing which aggregates to design and
deploy.
 Your aim is to strike the correct balance between the
performance gain provided by aggregate schemas and their cost
in terms of resource requirements.

12

Choosing Aggregates
 What Is a Potential Aggregate?
 Identifying Potentially Useful Aggregates
 Assessing the Value of Potential Aggregates

13

What Is a Potential Aggregate?
 Aggregate Fact Tables: A Question of Grain
 Aggregate Dimensions Must Conform
 Pre-Joined Aggregates Have Grain Too
 Enumerating Potential Aggregates

14

What Is a Potential Aggregate?
 Express potential aggregates as fact table grain statements
 Orders by day, salesperson and product
 Orders by day, customer, and product
 Orders by month, product, and salesperson

15

Enumerating Potential Aggregates

6*4*4*4*2*2 = 1563  1534 posibles agregados

16

Identifying Potentially Useful Aggregates
 Drawing on Initial Design
 Design Decisions
 Listening to Users
 Where Subject Areas Meet
 The Conformance Bus
 Aggregates for Drilling Across
 Query Patterns of an Existing System
 Analyzing Reports for Potential Aggregates
 Choosing Which Reports to Analyze

17

Identifying Potentially Useful Aggregates
 Identify and document potential aggregates during schema
design, even if initial implementation will not include
aggregates. This information will be useful in the future.
 Any decision to set the grain of a fact table at a finer level
reveals a potential aggregate.
 Decisions about where to place groups of dimensional
attributes reveal potential levels of aggregation.
 Discussion of hierarchies or drill paths point to potential
aggregates
 User work products reveal potential aggregates. These may
include reports from operational systems, manually compiled
briefings, or spreadsheets. They will also be revealed by manual
processes and requirements not currently met.

18

Aggregates for Drilling Across
 The process of combining
information from multiple fact
tables is called drilling across
 Consult the conformance bus
to identify aggregates that will
be used in drill-across reports.
 The lowest common
dimensionality between two fact
tables often suggests one or
more aggregates.

19

Analyzing Reports for Potential Aggregates
 The detail rows
require order facts
by product and
month.
 The summary rows
require order facts
by category and
month.
 The grand total
requires order facts
by month.

20

Drilling
 Drill paths suggest
aggregates

21

Assessing the Value of Potential Aggregates
 After identifying a pool of potential aggregates, the next
step is to sort through them and determine which ones
to build.

22

Assessing the Value of Potential Aggregates
 Number of Aggregates
 Presence of an Aggregate Navigator
 Space Consumed by Aggregate Tables
 How Many Rows Are Summarized
 Examining the Number of Rows Summarized
 The Cardinality Trap and Sparsity
 Who Will Benefit from the Aggregate

23

Examining the Number of Rows
Summarized
 A good starting rule of thumb is to identify aggregate fact
tables where each row summarizes an average of 20
rows.
 The savings afforded by aggregates can be lopsided,
favoring a particular attribute value.
 Remember that, like a base fact table, a dimensional
aggregate can be aggregated during a query. Aggregates
may be competing with other aggregates to offer
performance gains.

24

The Cardinality Trap and Sparsity
 Cardinality:The number of distinct values taken on by a
given attribute
 sparse:not all combinations of keys are present.
 Don’t assume aggregate fact tables will exhibit the same
sparsity as the tables they summarize.
 The higher the degree of summarization, the more dense the
aggregate fact table will be.
 The best way to get an idea of the relative size of the
aggregate is to count the number of rows.
 As before, count the distinct combination of keys and/or
summarized dimension attributes.

25

Who Will Benefit from the Aggregate
 The first aggregates you add to your implementation are
those that offer benefits across the widest number of
user requirements. Aggregates that fall in the 20:1 range
of savings are compared with one another to identify
those that support the most common user requirements.
 Start by selecting aggregates that provide solid
performance boosts for a wide number of common
queries. To this, add more powerful (but more narrowly
used) aggregates as space permits. Use the relative
importance of one aggregate over another in a tiebreaker
situation.

26

Bibliografía
 Mastering Data Warehouse Aggregates.Solutions for Star
Schema Performance. Christopher Adamson.

27

Agreggates i

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (15)

Similar a Agreggates i

Similar a Agreggates i (20)

Más de Claudia Gomez

Más de Claudia Gomez (20)

Agreggates i