SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Data Warehouse Aggregates

     Ing. Julio Ernesto Carreño Vargas
Using the Star Schema
   The queries against a
    star schema follow a
    consistent pattern.
    One or more facts
    are typically
    requested, along with
    the dimensional
    attributes that
    provide the desired
    context. The facts are
    summarized as
    appropriate, based on
    the dimensions.

         2
Aggregate tables
   Aggregate tables improve data warehouse performance
    by reducing the number of rows the RDBMS must access
    when responding to a query




             Base schema                   Aggregate schema


       3
aggregate dimension table




   4
aggregate characteristic
   The more highly summarized an aggregate table is, the
    fewer queries it will be able to accelerate.
       This means that choosing aggregates involves making careful
        tradeoffs between the performance gain offered and the
        number of queries that will benefit.




          5
The Aggregate Navigator
   To receive the performance benefit offered by an
    aggregate schema, a query must be written to use the
    aggregate.
   aggregate navigator: A component of the data warehouse
    infrastructure, the aggregate navigator assumes the task of
    rewriting user queries to utilize aggregate tables.




        6
Principles of Aggregation
   An aggregate schema must always provide exactly the
    same results as the base schema.
   The attributes of each aggregate table must be a subset of
    those from a base schema table.
       The only exception to this rule is the surrogate key for an
        aggregate dimension table.




          7
summarization techniques
   Aggregate Tables
   Pre-Joined Aggregates
   Derived Tables




       8
Pre-Joined Aggregates
   a pre-joined aggregate summarizes a fact across a set of
    dimension values. But unlike the aggregate star schemas
    the pre-joined aggregate places the results in a single
    table.
       By doing so, the pre-joined aggregate eliminates the need for
        the RDBMS to perform a join operation at query time.




          9
Derived Tables
   alter the structure of the tables summarized or change
    the scope of their content.
   Types:
       the merged fact table: combines facts from more than one fact
        table at a common grain
       the pivoted fact table: transforms a set of metrics in a single
        row into multiple rows with a single metric, or vice versa.
       the sliced fact table: contains a subset of the records of the
        base fact table, usually in coordination with a specific
        dimension attribute.
   In all three cases, the derived fact tables are not expected
    to serve as invisible stand-ins for the base schema.
         10
Tables with New Facts
   Semi-additive facts may not be added together across a
    particular dimension; non-additive facts are never added
    together. In these situations, you may choose to aggregate
    by means other than summation.




       11
Choosing Aggregates
   One of the most vexing tasks in deploying dimensional
    aggregates is choosing which aggregates to design and
    deploy.
       Your aim is to strike the correct balance between the
        performance gain provided by aggregate schemas and their cost
        in terms of resource requirements.




         12
Choosing Aggregates
   What Is a Potential Aggregate?
   Identifying Potentially Useful Aggregates
   Assessing the Value of Potential Aggregates




       13
What Is a Potential Aggregate?
   Aggregate Fact Tables: A Question of Grain
   Aggregate Dimensions Must Conform
   Pre-Joined Aggregates Have Grain Too
   Enumerating Potential Aggregates




       14
What Is a Potential Aggregate?
   Express potential aggregates as fact table grain statements
       Orders by day, salesperson and product
       Orders by day, customer, and product
       Orders by month, product, and salesperson




         15
Enumerating Potential Aggregates




        6*4*4*4*2*2 = 1563  1534 posibles agregados



   16
Identifying Potentially Useful Aggregates
   Drawing on Initial Design
       Design Decisions
       Listening to Users
   Where Subject Areas Meet
       The Conformance Bus
       Aggregates for Drilling Across
   Query Patterns of an Existing System
       Analyzing Reports for Potential Aggregates
       Choosing Which Reports to Analyze




         17
Identifying Potentially Useful Aggregates
   Identify and document potential aggregates during schema
    design, even if initial implementation will not include
    aggregates. This information will be useful in the future.
   Any decision to set the grain of a fact table at a finer level
    reveals a potential aggregate.
   Decisions about where to place groups of dimensional
    attributes reveal potential levels of aggregation.
   Discussion of hierarchies or drill paths point to potential
    aggregates
   User work products reveal potential aggregates. These may
    include reports from operational systems, manually compiled
    briefings, or spreadsheets. They will also be revealed by manual
    processes and requirements not currently met.

       18
Aggregates for Drilling Across
   The process of combining
    information from multiple fact
    tables is called drilling across
   Consult the conformance bus
    to identify aggregates that will
    be used in drill-across reports.
   The lowest common
    dimensionality between two fact
    tables often suggests one or
    more aggregates.




           19
Analyzing Reports for Potential Aggregates
   The detail rows
    require order facts
    by product and
    month.
   The summary rows
    require order facts
    by category and
    month.
   The grand total
    requires order facts
    by month.

       20
Drilling
   Drill paths suggest
    aggregates




       21
Assessing the Value of Potential Aggregates
   After identifying a pool of potential aggregates, the next
    step is to sort through them and determine which ones
    to build.




       22
Assessing the Value of Potential Aggregates
   Number of Aggregates
       Presence of an Aggregate Navigator
       Space Consumed by Aggregate Tables
   How Many Rows Are Summarized
       Examining the Number of Rows Summarized
       The Cardinality Trap and Sparsity
   Who Will Benefit from the Aggregate




         23
Examining the Number of Rows
Summarized
   A good starting rule of thumb is to identify aggregate fact
    tables where each row summarizes an average of 20
    rows.
   The savings afforded by aggregates can be lopsided,
    favoring a particular attribute value.
   Remember that, like a base fact table, a dimensional
    aggregate can be aggregated during a query. Aggregates
    may be competing with other aggregates to offer
    performance gains.




       24
The Cardinality Trap and Sparsity
   Cardinality:The number of distinct values taken on by a
    given attribute
   sparse:not all combinations of keys are present.
   Don’t assume aggregate fact tables will exhibit the same
    sparsity as the tables they summarize.
       The higher the degree of summarization, the more dense the
        aggregate fact table will be.
   The best way to get an idea of the relative size of the
    aggregate is to count the number of rows.
       As before, count the distinct combination of keys and/or
        summarized dimension attributes.


         25
Who Will Benefit from the Aggregate
   The first aggregates you add to your implementation are
    those that offer benefits across the widest number of
    user requirements. Aggregates that fall in the 20:1 range
    of savings are compared with one another to identify
    those that support the most common user requirements.
   Start by selecting aggregates that provide solid
    performance boosts for a wide number of common
    queries. To this, add more powerful (but more narrowly
    used) aggregates as space permits. Use the relative
    importance of one aggregate over another in a tiebreaker
    situation.

       26
Bibliografía
   Mastering Data Warehouse Aggregates.Solutions for Star
    Schema Performance. Christopher Adamson.




       27

Más contenido relacionado

La actualidad más candente

data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etlAashish Rathod
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-WarehouseAbdul Aslam
 
What is the future of etl tools like ab initio
What is the future of etl tools like ab initioWhat is the future of etl tools like ab initio
What is the future of etl tools like ab initiomaxonlinetr
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
Query Optimization
Query OptimizationQuery Optimization
Query Optimizationrohitsalunke
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
data modeling and models
data modeling and modelsdata modeling and models
data modeling and modelssabah N
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing Girish Dhareshwar
 

La actualidad más candente (20)

Data warehouse
Data warehouseData warehouse
Data warehouse
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Star schema
Star schemaStar schema
Star schema
 
Aggregate fact tables
Aggregate fact tablesAggregate fact tables
Aggregate fact tables
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
Unit03 dbms
Unit03 dbmsUnit03 dbms
Unit03 dbms
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-Warehouse
 
What is the future of etl tools like ab initio
What is the future of etl tools like ab initioWhat is the future of etl tools like ab initio
What is the future of etl tools like ab initio
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Classification of data
Classification of dataClassification of data
Classification of data
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
Query Optimization
Query OptimizationQuery Optimization
Query Optimization
 
Chapter 4 functions, views, indexing
Chapter 4  functions, views, indexingChapter 4  functions, views, indexing
Chapter 4 functions, views, indexing
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
data modeling and models
data modeling and modelsdata modeling and models
data modeling and models
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 

Destacado

Aggregate Functions,Final
Aggregate Functions,FinalAggregate Functions,Final
Aggregate Functions,Finalmukesh24pandey
 
Aggregate functions
Aggregate functionsAggregate functions
Aggregate functionssinhacp
 
Aggregating Data Using Group Functions
Aggregating Data Using Group FunctionsAggregating Data Using Group Functions
Aggregating Data Using Group FunctionsSalman Memon
 
Aggregate Function - Database
Aggregate Function - DatabaseAggregate Function - Database
Aggregate Function - DatabaseShahadat153031
 
Introduction on aggregate impact testing machine ppt
Introduction on aggregate impact testing machine pptIntroduction on aggregate impact testing machine ppt
Introduction on aggregate impact testing machine pptAbhishek Sagar
 
Aggregate impact value Calculation And uses
Aggregate impact value Calculation And usesAggregate impact value Calculation And uses
Aggregate impact value Calculation And usesShahryar Amin
 
Final review ppt project EFFECTIVENESS OF USING RECYCLED COARSE AGGREGATES IN...
Final review ppt project EFFECTIVENESS OF USING RECYCLED COARSE AGGREGATES IN...Final review ppt project EFFECTIVENESS OF USING RECYCLED COARSE AGGREGATES IN...
Final review ppt project EFFECTIVENESS OF USING RECYCLED COARSE AGGREGATES IN...Selva Prakash
 
Aggregate impact value test
Aggregate impact value testAggregate impact value test
Aggregate impact value testAdarsh Shukla
 
Presentation on Comparative study Of concrete using Recycled coarse aggregates
Presentation on Comparative study Of concrete using Recycled coarse aggregatesPresentation on Comparative study Of concrete using Recycled coarse aggregates
Presentation on Comparative study Of concrete using Recycled coarse aggregatesShanu Aggarwal
 
Aggregate - Concrete Technology
Aggregate - Concrete TechnologyAggregate - Concrete Technology
Aggregate - Concrete TechnologyDavid Grubba
 
Pavement materials in Road Construction
Pavement materials in Road ConstructionPavement materials in Road Construction
Pavement materials in Road Constructionsrinivas2036
 
Recycle material used in road construction
Recycle material used in road constructionRecycle material used in road construction
Recycle material used in road constructionpavan bathani
 
Alkalinity,hardness,softening BY Muhammad Fahad Ansari 12IEEM14
Alkalinity,hardness,softening BY Muhammad Fahad Ansari 12IEEM14Alkalinity,hardness,softening BY Muhammad Fahad Ansari 12IEEM14
Alkalinity,hardness,softening BY Muhammad Fahad Ansari 12IEEM14fahadansari131
 

Destacado (15)

Aggregate Functions,Final
Aggregate Functions,FinalAggregate Functions,Final
Aggregate Functions,Final
 
Aggregate functions
Aggregate functionsAggregate functions
Aggregate functions
 
Aggregating Data Using Group Functions
Aggregating Data Using Group FunctionsAggregating Data Using Group Functions
Aggregating Data Using Group Functions
 
Aggregate Function - Database
Aggregate Function - DatabaseAggregate Function - Database
Aggregate Function - Database
 
Introduction on aggregate impact testing machine ppt
Introduction on aggregate impact testing machine pptIntroduction on aggregate impact testing machine ppt
Introduction on aggregate impact testing machine ppt
 
Aggregate impact value Calculation And uses
Aggregate impact value Calculation And usesAggregate impact value Calculation And uses
Aggregate impact value Calculation And uses
 
Final review ppt project EFFECTIVENESS OF USING RECYCLED COARSE AGGREGATES IN...
Final review ppt project EFFECTIVENESS OF USING RECYCLED COARSE AGGREGATES IN...Final review ppt project EFFECTIVENESS OF USING RECYCLED COARSE AGGREGATES IN...
Final review ppt project EFFECTIVENESS OF USING RECYCLED COARSE AGGREGATES IN...
 
Sql Basics And Advanced
Sql Basics And AdvancedSql Basics And Advanced
Sql Basics And Advanced
 
Aggregate impact value test
Aggregate impact value testAggregate impact value test
Aggregate impact value test
 
Presentation on Comparative study Of concrete using Recycled coarse aggregates
Presentation on Comparative study Of concrete using Recycled coarse aggregatesPresentation on Comparative study Of concrete using Recycled coarse aggregates
Presentation on Comparative study Of concrete using Recycled coarse aggregates
 
Aggregate - Concrete Technology
Aggregate - Concrete TechnologyAggregate - Concrete Technology
Aggregate - Concrete Technology
 
Pavement materials in Road Construction
Pavement materials in Road ConstructionPavement materials in Road Construction
Pavement materials in Road Construction
 
Recycle material used in road construction
Recycle material used in road constructionRecycle material used in road construction
Recycle material used in road construction
 
Aggregates
AggregatesAggregates
Aggregates
 
Alkalinity,hardness,softening BY Muhammad Fahad Ansari 12IEEM14
Alkalinity,hardness,softening BY Muhammad Fahad Ansari 12IEEM14Alkalinity,hardness,softening BY Muhammad Fahad Ansari 12IEEM14
Alkalinity,hardness,softening BY Muhammad Fahad Ansari 12IEEM14
 

Similar a Agreggates i

Twp optimizer-with-oracledb-12c-1963236
Twp optimizer-with-oracledb-12c-1963236Twp optimizer-with-oracledb-12c-1963236
Twp optimizer-with-oracledb-12c-1963236Santosh Kumar
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...IOSR Journals
 
Assessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data LinkagesAssessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data Linkagesjournal ijrtem
 
E132833
E132833E132833
E132833irjes
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Novel Approach to Automatically Generate Feasible Assembly Sequence
Novel Approach to Automatically Generate Feasible Assembly SequenceNovel Approach to Automatically Generate Feasible Assembly Sequence
Novel Approach to Automatically Generate Feasible Assembly Sequenceishan kossambe
 
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsHortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsIJMER
 
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...Dave Stokes
 
Oracle-Whitepaper-Optimizer-with-Oracle-Database-12c.pdf
Oracle-Whitepaper-Optimizer-with-Oracle-Database-12c.pdfOracle-Whitepaper-Optimizer-with-Oracle-Database-12c.pdf
Oracle-Whitepaper-Optimizer-with-Oracle-Database-12c.pdfDetchDuvanGaelaCamar
 
Dw design 4_bus_architecture
Dw design 4_bus_architectureDw design 4_bus_architecture
Dw design 4_bus_architectureClaudia Gomez
 
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTION
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTIONDECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTION
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTIONcscpconf
 
Oracle performance tuning for java developers
Oracle performance tuning for java developersOracle performance tuning for java developers
Oracle performance tuning for java developersSaeed Shahsavan
 
Database performance tuning and query optimization
Database performance tuning and query optimizationDatabase performance tuning and query optimization
Database performance tuning and query optimizationUsman Tariq
 

Similar a Agreggates i (20)

Agreggates ii
Agreggates iiAgreggates ii
Agreggates ii
 
Agreggates iii
Agreggates iiiAgreggates iii
Agreggates iii
 
Twp optimizer-with-oracledb-12c-1963236
Twp optimizer-with-oracledb-12c-1963236Twp optimizer-with-oracledb-12c-1963236
Twp optimizer-with-oracledb-12c-1963236
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
 
Database aggregation using metadata
Database aggregation using metadataDatabase aggregation using metadata
Database aggregation using metadata
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
 
mod 2.pdf
mod 2.pdfmod 2.pdf
mod 2.pdf
 
Aggreagate awareness
Aggreagate awarenessAggreagate awareness
Aggreagate awareness
 
Assessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data LinkagesAssessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data Linkages
 
E132833
E132833E132833
E132833
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Novel Approach to Automatically Generate Feasible Assembly Sequence
Novel Approach to Automatically Generate Feasible Assembly SequenceNovel Approach to Automatically Generate Feasible Assembly Sequence
Novel Approach to Automatically Generate Feasible Assembly Sequence
 
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsHortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
 
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
PHP UK 2020 Tutorial: MySQL Indexes, Histograms And other ways To Speed Up Yo...
 
Oracle-Whitepaper-Optimizer-with-Oracle-Database-12c.pdf
Oracle-Whitepaper-Optimizer-with-Oracle-Database-12c.pdfOracle-Whitepaper-Optimizer-with-Oracle-Database-12c.pdf
Oracle-Whitepaper-Optimizer-with-Oracle-Database-12c.pdf
 
Dw design 4_bus_architecture
Dw design 4_bus_architectureDw design 4_bus_architecture
Dw design 4_bus_architecture
 
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTION
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTIONDECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTION
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTION
 
Oracle performance tuning for java developers
Oracle performance tuning for java developersOracle performance tuning for java developers
Oracle performance tuning for java developers
 
Database performance tuning and query optimization
Database performance tuning and query optimizationDatabase performance tuning and query optimization
Database performance tuning and query optimization
 

Más de Claudia Gomez

Más de Claudia Gomez (20)

Olapsql
OlapsqlOlapsql
Olapsql
 
3 olap storage
3 olap storage3 olap storage
3 olap storage
 
3 olap storage
3 olap storage3 olap storage
3 olap storage
 
2 olap operaciones
2 olap operaciones2 olap operaciones
2 olap operaciones
 
1 introba
1 introba1 introba
1 introba
 
Diseño fisico particiones_3
Diseño fisico particiones_3Diseño fisico particiones_3
Diseño fisico particiones_3
 
Diseño fisico indices_2
Diseño fisico indices_2Diseño fisico indices_2
Diseño fisico indices_2
 
Diseño fisico 1
Diseño fisico 1Diseño fisico 1
Diseño fisico 1
 
Dw design hierarchies_7
Dw design hierarchies_7Dw design hierarchies_7
Dw design hierarchies_7
 
Dw design fact_tables_types_6
Dw design fact_tables_types_6Dw design fact_tables_types_6
Dw design fact_tables_types_6
 
Dw design date_dimension_1_1
Dw design date_dimension_1_1Dw design date_dimension_1_1
Dw design date_dimension_1_1
 
Dw design 3_surro_keys
Dw design 3_surro_keysDw design 3_surro_keys
Dw design 3_surro_keys
 
Dw design 2_conceptual_model
Dw design 2_conceptual_modelDw design 2_conceptual_model
Dw design 2_conceptual_model
 
Dw design 1_dim_facts
Dw design 1_dim_factsDw design 1_dim_facts
Dw design 1_dim_facts
 
3 dw architectures
3 dw architectures3 dw architectures
3 dw architectures
 
2 dw requeriments
2 dw requeriments2 dw requeriments
2 dw requeriments
 
1 dw projectplanning
1 dw projectplanning1 dw projectplanning
1 dw projectplanning
 
0 dw process
0 dw process0 dw process
0 dw process
 
Clase2 introdw
Clase2 introdwClase2 introdw
Clase2 introdw
 
Intro bi
Intro biIntro bi
Intro bi
 

Agreggates i

  • 1. Data Warehouse Aggregates Ing. Julio Ernesto Carreño Vargas
  • 2. Using the Star Schema  The queries against a star schema follow a consistent pattern. One or more facts are typically requested, along with the dimensional attributes that provide the desired context. The facts are summarized as appropriate, based on the dimensions. 2
  • 3. Aggregate tables  Aggregate tables improve data warehouse performance by reducing the number of rows the RDBMS must access when responding to a query Base schema Aggregate schema 3
  • 5. aggregate characteristic  The more highly summarized an aggregate table is, the fewer queries it will be able to accelerate.  This means that choosing aggregates involves making careful tradeoffs between the performance gain offered and the number of queries that will benefit. 5
  • 6. The Aggregate Navigator  To receive the performance benefit offered by an aggregate schema, a query must be written to use the aggregate.  aggregate navigator: A component of the data warehouse infrastructure, the aggregate navigator assumes the task of rewriting user queries to utilize aggregate tables. 6
  • 7. Principles of Aggregation  An aggregate schema must always provide exactly the same results as the base schema.  The attributes of each aggregate table must be a subset of those from a base schema table.  The only exception to this rule is the surrogate key for an aggregate dimension table. 7
  • 8. summarization techniques  Aggregate Tables  Pre-Joined Aggregates  Derived Tables 8
  • 9. Pre-Joined Aggregates  a pre-joined aggregate summarizes a fact across a set of dimension values. But unlike the aggregate star schemas the pre-joined aggregate places the results in a single table.  By doing so, the pre-joined aggregate eliminates the need for the RDBMS to perform a join operation at query time. 9
  • 10. Derived Tables  alter the structure of the tables summarized or change the scope of their content.  Types:  the merged fact table: combines facts from more than one fact table at a common grain  the pivoted fact table: transforms a set of metrics in a single row into multiple rows with a single metric, or vice versa.  the sliced fact table: contains a subset of the records of the base fact table, usually in coordination with a specific dimension attribute.  In all three cases, the derived fact tables are not expected to serve as invisible stand-ins for the base schema. 10
  • 11. Tables with New Facts  Semi-additive facts may not be added together across a particular dimension; non-additive facts are never added together. In these situations, you may choose to aggregate by means other than summation. 11
  • 12. Choosing Aggregates  One of the most vexing tasks in deploying dimensional aggregates is choosing which aggregates to design and deploy.  Your aim is to strike the correct balance between the performance gain provided by aggregate schemas and their cost in terms of resource requirements. 12
  • 13. Choosing Aggregates  What Is a Potential Aggregate?  Identifying Potentially Useful Aggregates  Assessing the Value of Potential Aggregates 13
  • 14. What Is a Potential Aggregate?  Aggregate Fact Tables: A Question of Grain  Aggregate Dimensions Must Conform  Pre-Joined Aggregates Have Grain Too  Enumerating Potential Aggregates 14
  • 15. What Is a Potential Aggregate?  Express potential aggregates as fact table grain statements  Orders by day, salesperson and product  Orders by day, customer, and product  Orders by month, product, and salesperson 15
  • 16. Enumerating Potential Aggregates 6*4*4*4*2*2 = 1563  1534 posibles agregados 16
  • 17. Identifying Potentially Useful Aggregates  Drawing on Initial Design  Design Decisions  Listening to Users  Where Subject Areas Meet  The Conformance Bus  Aggregates for Drilling Across  Query Patterns of an Existing System  Analyzing Reports for Potential Aggregates  Choosing Which Reports to Analyze 17
  • 18. Identifying Potentially Useful Aggregates  Identify and document potential aggregates during schema design, even if initial implementation will not include aggregates. This information will be useful in the future.  Any decision to set the grain of a fact table at a finer level reveals a potential aggregate.  Decisions about where to place groups of dimensional attributes reveal potential levels of aggregation.  Discussion of hierarchies or drill paths point to potential aggregates  User work products reveal potential aggregates. These may include reports from operational systems, manually compiled briefings, or spreadsheets. They will also be revealed by manual processes and requirements not currently met. 18
  • 19. Aggregates for Drilling Across  The process of combining information from multiple fact tables is called drilling across  Consult the conformance bus to identify aggregates that will be used in drill-across reports.  The lowest common dimensionality between two fact tables often suggests one or more aggregates. 19
  • 20. Analyzing Reports for Potential Aggregates  The detail rows require order facts by product and month.  The summary rows require order facts by category and month.  The grand total requires order facts by month. 20
  • 21. Drilling  Drill paths suggest aggregates 21
  • 22. Assessing the Value of Potential Aggregates  After identifying a pool of potential aggregates, the next step is to sort through them and determine which ones to build. 22
  • 23. Assessing the Value of Potential Aggregates  Number of Aggregates  Presence of an Aggregate Navigator  Space Consumed by Aggregate Tables  How Many Rows Are Summarized  Examining the Number of Rows Summarized  The Cardinality Trap and Sparsity  Who Will Benefit from the Aggregate 23
  • 24. Examining the Number of Rows Summarized  A good starting rule of thumb is to identify aggregate fact tables where each row summarizes an average of 20 rows.  The savings afforded by aggregates can be lopsided, favoring a particular attribute value.  Remember that, like a base fact table, a dimensional aggregate can be aggregated during a query. Aggregates may be competing with other aggregates to offer performance gains. 24
  • 25. The Cardinality Trap and Sparsity  Cardinality:The number of distinct values taken on by a given attribute  sparse:not all combinations of keys are present.  Don’t assume aggregate fact tables will exhibit the same sparsity as the tables they summarize.  The higher the degree of summarization, the more dense the aggregate fact table will be.  The best way to get an idea of the relative size of the aggregate is to count the number of rows.  As before, count the distinct combination of keys and/or summarized dimension attributes. 25
  • 26. Who Will Benefit from the Aggregate  The first aggregates you add to your implementation are those that offer benefits across the widest number of user requirements. Aggregates that fall in the 20:1 range of savings are compared with one another to identify those that support the most common user requirements.  Start by selecting aggregates that provide solid performance boosts for a wide number of common queries. To this, add more powerful (but more narrowly used) aggregates as space permits. Use the relative importance of one aggregate over another in a tiebreaker situation. 26
  • 27. Bibliografía  Mastering Data Warehouse Aggregates.Solutions for Star Schema Performance. Christopher Adamson. 27