SlideShare a Scribd company logo
1 of 79
Statistics on Partitioned Objects Doug Burns
Introduction Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
Introduction Who am I? Why am I talking? Setting Expectations 12/03/2011
Who am I? Possibly a question some of us will be asking ourselves at 8:30 am tomorrow after tonight's party I am Doug Doug I am Actually I am Douglas … or, if you're Scottish, Dougie or Doogie I'm not from round here You will have probably noticed that already See Twitter @doug_conference for lots of whining about my 21 hour journey 12/03/2011
A Bitter Old Drunk Man 12/03/2011
A Pioneer 12/03/2011
A Sports Fan 12/03/2011
A Family Man 12/03/2011
A Performance Guy 12/03/2011 1986 Zilog Z80A (3.5MHz) 32KB Usable RAM Yes, Cary, we used profiles!
Why am I talking? Partitioned objects are a given when working with large databases Maintaining statistics on partitioned objects is one of the primary challenges of the DW designer/developer/DBA There are many options that vary between versions but the fundamental challenges are the same Trade-off between statistics quality and collection effort People keep getting it wrong! 12/03/2011
Setting Expectations What I will and won't include No Histograms No Sampling Sizes No Indexes No Detail Level of depth – paper WeDoNotUseDemos A lot to get through! Questions 12/03/2011
Simple Fundamentals Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
Cost-Based Optimiser The CBO evaluates potential execution plans using Rules and formulae embedded in the code Some control through Configuration parameters Hints Statistics Describing the content of data objects (Object Statistics)  e.g. Tables, Indexes, Clusters Describing system characteristics (System Statistics) 12/03/2011
Statistics Quality The CBO uses statistics to estimate row source cardinalities How many rows do we expect a specific operation to return Primary driver in selecting the best operations to perform and their order Inaccurate or missing statistics are the most common cause of sub-optimal execution plans Hard work on designing and implementing appropriate statistics maintenance will pay off across the system 12/03/2011
Statistics on Partitioned Objects Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
Statistics on Partitioned Objects 12/03/2011
Statistics at all levels Global Describe the entire table or index and all of it's underlying partitions and subpartitionsas a whole Important – GLOBAL_STATS=YES/NO Partition Describe individual partitions and potentially the underlying subpartitionsas a whole Important – GLOBAL_STATS=YES/NO Subpartition Describe individual subpartitions Implictly, GLOBAL_STATS=YES 12/03/2011
How Statistics Levels are used If a statement accesses multiple partitions the CBO will use Global Statistics. If a statement is able to limit access to a single partition, then the partition statistics can be used. If a statement accesses a single subpartition, then subpartition statistics can be used. However, prior to 10.2.0.4, subpartition statistics are rarely used.  For most applications you will need both Global and Partition stats for the CBO to operate effectively 12/03/2011
The Quality/Performance Trade-off Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
Collecting Global Statistics 12/03/2011 Data loaded for Moscow / 20110202
Collecting Global Statistics 12/03/2011 Potentially Stale Statistics
GRANULARITY Parameter 12/03/2011
GRANULARITY => SUBPARTITION 12/03/2011 dbms_stats.gather_table_stats( GRANULARITY => 'SUBPARTITION',  	PARTNAME => 'P_20110202_MOSCOW');
GRANULARITY => ALL 12/03/2011 dbms_stats.gather_table_stats( GRANULARITY => 'ALL');
GRANULARITY => GLOBAL 12/03/2011 dbms_stats.gather_table_stats( GRANULARITY => 'GLOBAL');
GRANULARITY => DEFAULT 12/03/2011 dbms_stats.gather_table_stats( GRANULARITY => 'DEFAULT',  	PARTNAME => 'P_20110202_MOSCOW'); dbms_stats.gather_table_stats(      GRANULARITY => 'GLOBAL AND PARTITION',       PARTNAME => 'P_20110202_MOSCOW');
Aggregated Global Statistics To address the high cost of collecting Global Stats, Oracle provides another option – Aggregated or Approximate Global Stats Only gather stats on the lower levels of the object Partition on partitioned tables Subpartition on composite-partitioned tables DBMS_STATS will aggregate the underlying statistics to generate approximate global statistics at higher levels Important – GLOBAL_STATS=NO 12/03/2011
Aggregated Row Counts 12/03/2011 GRANULARITY => 'SUBPARTITION' TEST_TAB1 GLOBAL_STATS=NO  NUM_ROWS = 11 P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO  NUM_ROWS = 8 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 LONDON GLOBAL_STATS=YES  NUM_ROWS = 5 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 8 rows inserted for Moscow 20110202
Aggregated Row Counts 12/03/2011 TEST_TAB1 GLOBAL_STATS=NO  NUM_ROWS = 1119 P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO  NUM_ROWS = 816 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 LONDON GLOBAL_STATS=YES  NUM_ROWS = 5 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 311 Stats gathered on subpartition
Aggregated High/Low and NDVs 12/03/2011 NDV = Number of Distinct Values in STATUS H/L = Highest and Lowest TEST_TAB1 STATUS NDV = 1 STATUS H/L = P/P P_20110201 STATUS NDV = 1 STATUS H/L = P/P P_20110202 STATUS NDV = 1 STATUS H/L = P/P MOSCOW STATUS NDV = 1 STATUS H/L = P/P LONDON STATUS NDV = 1 STATUS H/L = P/P MOSCOW STATUS NDV = 1 STATUS H/L = P/P
Aggregated High/Low and NDVs 12/03/2011 TEST_TAB1 STATUS NDV = 1 4 STATUS H/L = P/PP/U P_20110201 STATUS NDV = 1 STATUS H/L = P/P P_20110202 STATUS NDV = 1 3 STATUS H/L = P/PP/U MOSCOW STATUS NDV = 1 STATUS H/L = P/P LONDON STATUS NDV = 1 STATUS H/L = P/P MOSCOW STATUS NDV = 1 2 STATUS H/L = P/PP/U New STATUS=U appeared
Quality/Performance Trade-off You have a choice Gather True Global Stats More accurate NDVs Requires high-cost full table scan (which will get progressively slower and more expensive as tables grow) Maybe an occasional activity? Gather True Partition Stats and Aggregated Global Stats Accurate row counts and column High/Low values Wildly inaccurate NDVs Requires low-cost partition scan activity plus aggregation 12/03/2011
Aggregation Scenarios Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
Aggregation Scenarios Take care if you decide to use Aggregated Global Stats Several implicit rules govern the aggregation process I have seen every issue I'm about to describe  In the past 18 months Working on systems with people who are usually pretty smart 12/03/2011
Missing Subpartition Stats Scenario 1 Aggregated Global Stats at Table-level Subpartition Stats gathered at subpartition-level as part of new subpartition load process Emergency hits when someone tries to INSERT data for which there is no valid subpartition Solution – quickly add a new partition and gather stats on new subpartition. 12/03/2011
Missing Subpartition Stats 12/03/2011 TEST_TAB1 GLOBAL_STATS=NO  NUM_ROWS = 11 P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 11 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 11
Missing Subpartition Stats 12/03/2011 What will number of rows be? TEST_TAB1 GLOBAL_STATS=NO  NUM_ROWS IS ? P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 11 P_20110202 GLOBAL_STATS=NO  NUM_ROWS IS ? LONDON GLOBAL_STATS=NO  NUM_ROWS = NULL MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 11 New data inserted and stats gathered New subpartition with no stats yet
Missing Subpartition Stats 12/03/2011 Aggregated global stats invalidated TEST_TAB1 GLOBAL_STATS=NO  NUM_ROWS IS NULL P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 11 P_20110202 GLOBAL_STATS=NO  NUM_ROWS IS NULL LONDON GLOBAL_STATS=NO  NUM_ROWS = NULL MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 11 No partition stats as not all subpartitions have stats
Missing Subpartition Stats 12/03/2011 ... and fixes aggregated global stats TEST_TAB1 GLOBAL_STATS=NO  NUM_ROWS IS 14 P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 11 P_20110202 GLOBAL_STATS=NO  NUM_ROWS IS 3 LONDON GLOBAL_STATS=YES NUM_ROWS = 0 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 11 ... updates aggregated stats on partition Gathering stats on all subpartitions ...
Incorrectly gathered Global Stats Scenario 2 Aggregated Global Stats at Table-level Partition Stats gathered at Partition-level as part of new partition load process Performance of several queries is horrible and poor NDVs at the Table-level are identified as root cause Solution – Gather Global Stats quickly! 12/03/2011
Incorrectly Gathered Global Stats 12/03/2011 TEST_TAB1 GLOBAL_STATS=NO  NUM_ROWS = 3 P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3
Incorrectly Gathered Global Stats 12/03/2011 Global Stats gathered TEST_TAB1 GLOBAL_STATS=YES NUM_ROWS = 3 P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3
Incorrectly Gathered Global Stats 12/03/2011 What will new number of rows be? New partition & subpartitionswith stats gathered TEST_TAB1 GLOBAL_STATS=YES NUM_ROWS = ? P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO  NUM_ROWS = 8 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 LONDON GLOBAL_STATS=YES  NUM_ROWS = 5 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3
Incorrectly Gathered Global Stats 12/03/2011 TEST_TAB1 GLOBAL_STATS=YES NUM_ROWS = 3 P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO  NUM_ROWS = 8 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 LONDON GLOBAL_STATS=YES  NUM_ROWS = 5 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3
Partition Exchange Issues Scenario 3 Aggregated Global Stats at Table-level Statistics are gathered on temporary Load Table  Load Table is exchanged with partition of target table Objective is to minimise activity on target table and ensure that stats are available on partition immediately on exchange 12/03/2011
Gather-then-Exchange 12/03/2011 TEST_TAB1 GLOBAL_STATS=NO  NUM_ROWS = 3 P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 LOAD_TAB1 GLOBAL_STATS=YES  NUM_ROWS = 10 Temporary Load Table with stats
Gather-then-Exchange 12/03/2011 New Partition & Subpartition without stats TEST_TAB1 GLOBAL_STATS=NO  NUM_ROWS = 3 P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO  NUM_ROWS IS NULL MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 LONDON GLOBAL_STATS=NO  NUM_ROWS IS NULL LOAD_TAB1 GLOBAL_STATS=YES  NUM_ROWS = 10
Gather-then-Exchange 12/03/2011 All subpartitions have stats, so what happened to Global Stats? TEST_TAB1 GLOBAL_STATS=NO  NUM_ROWS = ? P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO  NUM_ROWS = ? MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 LONDON GLOBAL_STATS=YES  NUM_ROWS = 10 LOAD_TAB1 GLOBAL_STATS=NO  NUM_ROWS IS NULL Data and stats appear at partition exchange
Gather-then-Exchange 12/03/2011 No statistics aggregation! TEST_TAB1 GLOBAL_STATS=NO  NUM_ROWS = 3 P_20110201 GLOBAL_STATS=NO  NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO  NUM_ROWS IS NULL MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 LONDON GLOBAL_STATS=YES  NUM_ROWS = 10
_minimal_stats_aggregation Hidden parameter used to minimise the impact of statistics aggregation process Default is TRUE which means minimise aggregation Partition exchange will not trigger the aggregation process! Solutions Change hidden parameter – speak to Support Exchange-then-Gather (another good reason for this later) 12/03/2011
Aggregated Stats – Summary Wildly inaccurate NDVs which will impact Execution Plans Take care with the aggregation process Do not use aggregated statistics unless you really don't have time to gather true Global Stats But the problem is, what if your table is so damn big that you can never manage to update those Global Stats? 12/03/2011
Alternative Strategies Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
Dynamic Sampling If stats collection is such a nightmare, perhaps we shouldn't bother gathering stats at all? Dynamic Sampling could be used Gather no stats manually When statements are parsed, Oracle will execute queries against objects to generate temporary stats on-the-fly I would not recommend this as a system-wide strategy What happened when stats were missing in earlier examples! Recurring overhead for every query Either expensive or low quality stats 12/03/2011
Setting Statistics Gathering stats takes time and resources The resulting stats describe your data to help the CBO determine optimal execution plans If you know your data well enough to know the appropriate stats, why not just set them manually and avoid the collection overhead? Plenty of appropriate DBMS_STATS procedures Not a new idea and discussed in several places on the net (including JL chapter in latest Oak Table book) 12/03/2011
Setting Statistics - Summary Positives Very fast and low resource method for setting statistics on new partitions Potential improvements to plan stability when accessing time-period partitions that are filled over time  Negatives You need to know your data well, particularly any time periodicity You need to develop your own code implementation You could undermine the CBO's ability to use more appropriate execution plans as data changes over time Does not eliminate the difficulty in maintaining accurate Global Statistics, although these could be set manually too 12/03/2011
Copying Statistics Extending the concept of setting statistics manually Instead of trying to work out what the appropriate statistics are for a new partition, copy the statistics from another partition The previous partition – increasing volumes? A golden template partition – plan stability? A prior partition to reflect the periodicity of your data. The second Tuesday from last month, Tuesday from last week, the 8th of last month Supported from 10.2.0.4 12/03/2011
Copying Statistics 12/03/2011 TEST_TAB1 GLOBAL_STATS=YES  NUM_ROWS = 3 P_20110201 GLOBAL_STATS=YES  NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 dbms_stats.copy_table_stats( 'TESTUSER', TEST_TAB1',  srcpartname => 'P_20110201',  dstpartname => 'P_20110202'); dbms_stats.copy_table_stats( 	'TESTUSER', TEST_TAB1',  srcpartname => 'P_20110201_MOSCOW',  dstpartname => 'P_20110202_MOSCOW');
Copy Statistics 12/03/2011 TEST_TAB1 GLOBAL_STATS=YES  NUM_ROWS = 3 P_20110201 GLOBAL_STATS=YES  NUM_ROWS = 3 P_20110202 GLOBAL_STATS=YES  NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES  NUM_ROWS = 3
Copying Statistics – Bug 1 The previous example doesn't work on an unpatched 10.2.0.4 When copying stats between partitions on a composite partitioned object (one with subpartitions) SQL> exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname => 'P_20110201', dstpartname => 'P_20110202'); BEGIN dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname => 'P_20110201', dstpartname => 'P_20110202'); END; * ERROR at line 1: ORA-06533: Subscript beyond count  ORA-06512: at "SYS.DBMS_STATS", line 17408  ORA-06512: at line 1 12/03/2011
Copying Statistics – Bug 1 Bug number 8318020 Merge Label Request 8866627  Fixes a variety of stats-related bugs Patchset 10.2.0.5 Upgrade to 11.2.0.2 12/03/2011
Copying Statistics – Bug 2 12/03/2011 TEST_TAB1 REPORTING_DATE  High/Low = 20110201 P_20110201 REPORTING_DATE  High/Low = 20110201 P_20110202
Copying Statistics – Bug 2 12/03/2011 TEST_TAB1 REPORTING_DATE  High/Low = 20110201 P_20110201 REPORTING_DATE  High/Low = 20110201 P_20110202 REPORTING_DATE  High/Low = 20110201
Copying Statistics – Bug 2 We might reasonably expect Oracle to understand the implicit High/Low values of a partition key Merge Label Request 8866627  Patchset 10.2.0.5 Upgrade to 11.2 The wider issue here is that High/Low values (other than Partition Key columns and NDVs) will simply be copied Are you sure that's what you want? 12/03/2011
Copying Statistics – Bug 3 12/03/2011 TEST_TAB1 GLOBAL_STATS=YES  NUM_ROWS = 3 P_20110201 GLOBAL_STATS=YES  NUM_ROWS = 3 P_20110202 OTHERS GLOBAL_STATS=YES  NUM_ROWS = 3 OTHERS
Copying Statistics ORA-03113 / 07445 while copying list partition statistics Core dump in qospMinMaxPartCol I initially thought this was because the OTHERS subpartition was the last one I copied stats for It is because it is a DEFAULT list subpartition Bug number 10268597  Still in 10.2.0.5 and 11.2.0.2 Marked as fixed in 11.2.0.3 and 12.1.0.0 12/03/2011
Copying Statistics - Summary Positives Very fast and low resource method for setting statistics on new partitions Potential improvements to plan stability when accessing time-period partitions that are filled over time  Negatives Bugs and related patches although better using 10.2.0.5 or 11.2 Does not eliminate the difficulty in maintaining accurate Global Statistics.  Does not work well with composite partitioned tables.  Does not work in current releases with List Partitioning where there is a DEFAULT partition 12/03/2011
APPROX_GLOBAL AND PARTITION New 10.2 GRANULARITY option as an alternative to GLOBAL AND PARTITION Uses the aggregation process, but can replace gathered global statistics If the aggregation process is unavailable, e.g. Because there are missing partition statistics, it falls back to GLOBAL AND PARTITION All the same NDV issues with aggregated stats so you should use with occasional Global Stats gather process 12/03/2011
Incremental Statistics Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
Incremental Statistics What's the problem with the process for aggregating NDVs? Oracle knows the number of distinct values in the other partitions but not what those values were This might seem counter-intuitive. Oracle must have known what the values were when stats were gathered. But they are not stored anywhere Aggregation is a destructive process Incremental Statistics feature tracks the distinct values, stored as synopses Stored in WRI$_OPTSTAT_SYNPOSIS_HEAD$ and WRI$_OPTSTAT_SYNPOSIS$ 12/03/2011
Incremental Statistics Prerequisites INCREMENTAL setting for the partitioned table is TRUE Set using DBMS_STATS.SET_TABLE_PREFS PUBLISH setting for the partitioned table is TRUE Which is the default setting anyway The user specifies (both defaults) ESTIMATE_PERCENT => AUTO_SAMPLE_SIZE GRANULARITY => 'AUTO' 12/03/2011
New Process Gather initial statistics using the default settings Oracle will gather statistics at all appropriate levels using one-pass distinct sampling and store initial synopses As partitions are added or stats become stale, keep gathering using AUTO granularity and Oracle will Gather missing or stale partition stats Update synopses for those partitions Merge the synopses with synopses for higher levels of the same object, maintaining all Global Stats along the way Intelligent and accurate aggregation process 12/03/2011
Other Resources AmitPoddar's excellent paper and presentation from earlier Hotsos Symposium Robin Moffat's blog post Synopses can take a lot of space in SYSAUX Aggregation seems hopelessly slow in older releases. Probably because WRI$_OPTSTAT_SYNOPSIS$ is not partitioned (it is in 11.2.0.2) Incremental Stats looks like the solution to our problems If you have the time to gather using defaults 12/03/2011
Conclusions and References Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
Issues Aggregated NDVs are very low quality DBMS_STATS will only update aggregated stats when stats have been gathered appropriately on all underlying structures DBMS_STATS will never overwrite properly gathered Global Stats with aggregated results Unless you use 'APPROX_GLOBAL  AND PARTITION' APPROX_GLOBAL stats otherwise suffer from the same problems as any other aggregated stats If aggregation fails because of missing partition stats, you will suddenly be using GLOBAL  AND PARTITION 12/03/2011
Issues Dynamic Sampling is almost certainly not the answer to your problems The default setting of _minimal_stats aggregation implies that you should normally use exchange-then-gather If you are using Incremental Stats you must use exchange-then-gather anyway 12/03/2011
Suggestions Try the Oracle default options first, particularly 11.2 and up If you do not have time to gather using the default granularity, gather the best statistics you can as data is loaded and gather proper global statistics later DBMS_STATS is constantly evolving so you should try to be on the latest patchsets with all relevant one-off patches applied Checking stats means checking all levels, including GLOBAL_STATS column NUM_DISTINCT and High/Low Values 12/03/2011
Suggestions Design a strategy Develop any surrounding code Stick to the strategy Always gather stats using the wrapper code Lock and unlock stats programmatically to prevent human errors ruining the strategy 12/03/2011
Additional References Optimiser Development Group blog Greg Rahn's blog AmitPoddar's Paper Jonathan Lewis chapter in latest Oak Table book Lots of others in references section of paper 12/03/2011
Statistics on Partitioned Objects Doug Burns dougburns@yahoo.com http://oracledoug.com/stats.docx

More Related Content

Similar to Statistics on Partitioned Objects Guide

final home work for database.DS_Store__MACOSXfinal home wo.docx
final home work for database.DS_Store__MACOSXfinal home wo.docxfinal home work for database.DS_Store__MACOSXfinal home wo.docx
final home work for database.DS_Store__MACOSXfinal home wo.docxmydrynan
 
Data ware house design
Data ware house designData ware house design
Data ware house designSayed Ahmed
 
Data ware house design
Data ware house designData ware house design
Data ware house designSayed Ahmed
 
John Noll Portfolio
John Noll PortfolioJohn Noll Portfolio
John Noll PortfolioJohnNoll
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designCalpont
 
Customer TableInventory TableOrders Table.docx
Customer TableInventory TableOrders Table.docxCustomer TableInventory TableOrders Table.docx
Customer TableInventory TableOrders Table.docxfaithxdunce63732
 
Dimensional data model
Dimensional data modelDimensional data model
Dimensional data modelVnktp1
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresSteven Johnson
 
Stack It And Unpack It
Stack It And Unpack ItStack It And Unpack It
Stack It And Unpack ItJeff Moss
 
Online Statistics Gathering for ETL
Online Statistics Gathering for ETLOnline Statistics Gathering for ETL
Online Statistics Gathering for ETLAndrej Pashchenko
 
Session #4 b content providers
Session #4 b  content providersSession #4 b  content providers
Session #4 b content providersVitali Pekelis
 
Remarks09222018 - Bold section headings enhance the works o.docx
Remarks09222018 - Bold section headings enhance the works o.docxRemarks09222018 - Bold section headings enhance the works o.docx
Remarks09222018 - Bold section headings enhance the works o.docxcarlt4
 
SQL For Programmers -- Boston Big Data Techcon April 27th
SQL For Programmers -- Boston Big Data Techcon April 27thSQL For Programmers -- Boston Big Data Techcon April 27th
SQL For Programmers -- Boston Big Data Techcon April 27thDave Stokes
 
Approaches To The Analysis Of Survey Data
Approaches To The Analysis Of Survey DataApproaches To The Analysis Of Survey Data
Approaches To The Analysis Of Survey DataRichard Hogue
 
Data Warehouse ( Dw Of Dwh )
Data Warehouse ( Dw Of Dwh )Data Warehouse ( Dw Of Dwh )
Data Warehouse ( Dw Of Dwh )Jenny Calhoon
 
Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?Huy Nguyen
 
Em12c performance tuning outside the box
Em12c performance tuning outside the boxEm12c performance tuning outside the box
Em12c performance tuning outside the boxKellyn Pot'Vin-Gorman
 

Similar to Statistics on Partitioned Objects Guide (20)

final home work for database.DS_Store__MACOSXfinal home wo.docx
final home work for database.DS_Store__MACOSXfinal home wo.docxfinal home work for database.DS_Store__MACOSXfinal home wo.docx
final home work for database.DS_Store__MACOSXfinal home wo.docx
 
Data ware house design
Data ware house designData ware house design
Data ware house design
 
Data ware house design
Data ware house designData ware house design
Data ware house design
 
John Noll Portfolio
John Noll PortfolioJohn Noll Portfolio
John Noll Portfolio
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
 
Customer TableInventory TableOrders Table.docx
Customer TableInventory TableOrders Table.docxCustomer TableInventory TableOrders Table.docx
Customer TableInventory TableOrders Table.docx
 
Pivot tables c01
Pivot tables c01Pivot tables c01
Pivot tables c01
 
Dimensional data model
Dimensional data modelDimensional data model
Dimensional data model
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome Measures
 
Stack It And Unpack It
Stack It And Unpack ItStack It And Unpack It
Stack It And Unpack It
 
Online Statistics Gathering for ETL
Online Statistics Gathering for ETLOnline Statistics Gathering for ETL
Online Statistics Gathering for ETL
 
Session #4 b content providers
Session #4 b  content providersSession #4 b  content providers
Session #4 b content providers
 
Data analysis training
Data analysis trainingData analysis training
Data analysis training
 
Approaches to the_analysis_of_survey_data
Approaches to the_analysis_of_survey_dataApproaches to the_analysis_of_survey_data
Approaches to the_analysis_of_survey_data
 
Remarks09222018 - Bold section headings enhance the works o.docx
Remarks09222018 - Bold section headings enhance the works o.docxRemarks09222018 - Bold section headings enhance the works o.docx
Remarks09222018 - Bold section headings enhance the works o.docx
 
SQL For Programmers -- Boston Big Data Techcon April 27th
SQL For Programmers -- Boston Big Data Techcon April 27thSQL For Programmers -- Boston Big Data Techcon April 27th
SQL For Programmers -- Boston Big Data Techcon April 27th
 
Approaches To The Analysis Of Survey Data
Approaches To The Analysis Of Survey DataApproaches To The Analysis Of Survey Data
Approaches To The Analysis Of Survey Data
 
Data Warehouse ( Dw Of Dwh )
Data Warehouse ( Dw Of Dwh )Data Warehouse ( Dw Of Dwh )
Data Warehouse ( Dw Of Dwh )
 
Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?Why PostgreSQL for Analytics Infrastructure (DW)?
Why PostgreSQL for Analytics Infrastructure (DW)?
 
Em12c performance tuning outside the box
Em12c performance tuning outside the boxEm12c performance tuning outside the box
Em12c performance tuning outside the box
 

Recently uploaded

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Statistics on Partitioned Objects Guide

  • 1. Statistics on Partitioned Objects Doug Burns
  • 2. Introduction Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
  • 3. Introduction Who am I? Why am I talking? Setting Expectations 12/03/2011
  • 4. Who am I? Possibly a question some of us will be asking ourselves at 8:30 am tomorrow after tonight's party I am Doug Doug I am Actually I am Douglas … or, if you're Scottish, Dougie or Doogie I'm not from round here You will have probably noticed that already See Twitter @doug_conference for lots of whining about my 21 hour journey 12/03/2011
  • 5. A Bitter Old Drunk Man 12/03/2011
  • 7. A Sports Fan 12/03/2011
  • 8. A Family Man 12/03/2011
  • 9. A Performance Guy 12/03/2011 1986 Zilog Z80A (3.5MHz) 32KB Usable RAM Yes, Cary, we used profiles!
  • 10. Why am I talking? Partitioned objects are a given when working with large databases Maintaining statistics on partitioned objects is one of the primary challenges of the DW designer/developer/DBA There are many options that vary between versions but the fundamental challenges are the same Trade-off between statistics quality and collection effort People keep getting it wrong! 12/03/2011
  • 11. Setting Expectations What I will and won't include No Histograms No Sampling Sizes No Indexes No Detail Level of depth – paper WeDoNotUseDemos A lot to get through! Questions 12/03/2011
  • 12. Simple Fundamentals Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
  • 13. Cost-Based Optimiser The CBO evaluates potential execution plans using Rules and formulae embedded in the code Some control through Configuration parameters Hints Statistics Describing the content of data objects (Object Statistics) e.g. Tables, Indexes, Clusters Describing system characteristics (System Statistics) 12/03/2011
  • 14. Statistics Quality The CBO uses statistics to estimate row source cardinalities How many rows do we expect a specific operation to return Primary driver in selecting the best operations to perform and their order Inaccurate or missing statistics are the most common cause of sub-optimal execution plans Hard work on designing and implementing appropriate statistics maintenance will pay off across the system 12/03/2011
  • 15. Statistics on Partitioned Objects Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
  • 16. Statistics on Partitioned Objects 12/03/2011
  • 17. Statistics at all levels Global Describe the entire table or index and all of it's underlying partitions and subpartitionsas a whole Important – GLOBAL_STATS=YES/NO Partition Describe individual partitions and potentially the underlying subpartitionsas a whole Important – GLOBAL_STATS=YES/NO Subpartition Describe individual subpartitions Implictly, GLOBAL_STATS=YES 12/03/2011
  • 18. How Statistics Levels are used If a statement accesses multiple partitions the CBO will use Global Statistics. If a statement is able to limit access to a single partition, then the partition statistics can be used. If a statement accesses a single subpartition, then subpartition statistics can be used. However, prior to 10.2.0.4, subpartition statistics are rarely used. For most applications you will need both Global and Partition stats for the CBO to operate effectively 12/03/2011
  • 19. The Quality/Performance Trade-off Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
  • 20. Collecting Global Statistics 12/03/2011 Data loaded for Moscow / 20110202
  • 21. Collecting Global Statistics 12/03/2011 Potentially Stale Statistics
  • 23. GRANULARITY => SUBPARTITION 12/03/2011 dbms_stats.gather_table_stats( GRANULARITY => 'SUBPARTITION', PARTNAME => 'P_20110202_MOSCOW');
  • 24. GRANULARITY => ALL 12/03/2011 dbms_stats.gather_table_stats( GRANULARITY => 'ALL');
  • 25. GRANULARITY => GLOBAL 12/03/2011 dbms_stats.gather_table_stats( GRANULARITY => 'GLOBAL');
  • 26. GRANULARITY => DEFAULT 12/03/2011 dbms_stats.gather_table_stats( GRANULARITY => 'DEFAULT', PARTNAME => 'P_20110202_MOSCOW'); dbms_stats.gather_table_stats( GRANULARITY => 'GLOBAL AND PARTITION', PARTNAME => 'P_20110202_MOSCOW');
  • 27. Aggregated Global Statistics To address the high cost of collecting Global Stats, Oracle provides another option – Aggregated or Approximate Global Stats Only gather stats on the lower levels of the object Partition on partitioned tables Subpartition on composite-partitioned tables DBMS_STATS will aggregate the underlying statistics to generate approximate global statistics at higher levels Important – GLOBAL_STATS=NO 12/03/2011
  • 28. Aggregated Row Counts 12/03/2011 GRANULARITY => 'SUBPARTITION' TEST_TAB1 GLOBAL_STATS=NO NUM_ROWS = 11 P_20110201 GLOBAL_STATS=NO NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO NUM_ROWS = 8 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 LONDON GLOBAL_STATS=YES NUM_ROWS = 5 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 8 rows inserted for Moscow 20110202
  • 29. Aggregated Row Counts 12/03/2011 TEST_TAB1 GLOBAL_STATS=NO NUM_ROWS = 1119 P_20110201 GLOBAL_STATS=NO NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO NUM_ROWS = 816 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 LONDON GLOBAL_STATS=YES NUM_ROWS = 5 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 311 Stats gathered on subpartition
  • 30. Aggregated High/Low and NDVs 12/03/2011 NDV = Number of Distinct Values in STATUS H/L = Highest and Lowest TEST_TAB1 STATUS NDV = 1 STATUS H/L = P/P P_20110201 STATUS NDV = 1 STATUS H/L = P/P P_20110202 STATUS NDV = 1 STATUS H/L = P/P MOSCOW STATUS NDV = 1 STATUS H/L = P/P LONDON STATUS NDV = 1 STATUS H/L = P/P MOSCOW STATUS NDV = 1 STATUS H/L = P/P
  • 31. Aggregated High/Low and NDVs 12/03/2011 TEST_TAB1 STATUS NDV = 1 4 STATUS H/L = P/PP/U P_20110201 STATUS NDV = 1 STATUS H/L = P/P P_20110202 STATUS NDV = 1 3 STATUS H/L = P/PP/U MOSCOW STATUS NDV = 1 STATUS H/L = P/P LONDON STATUS NDV = 1 STATUS H/L = P/P MOSCOW STATUS NDV = 1 2 STATUS H/L = P/PP/U New STATUS=U appeared
  • 32. Quality/Performance Trade-off You have a choice Gather True Global Stats More accurate NDVs Requires high-cost full table scan (which will get progressively slower and more expensive as tables grow) Maybe an occasional activity? Gather True Partition Stats and Aggregated Global Stats Accurate row counts and column High/Low values Wildly inaccurate NDVs Requires low-cost partition scan activity plus aggregation 12/03/2011
  • 33. Aggregation Scenarios Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
  • 34. Aggregation Scenarios Take care if you decide to use Aggregated Global Stats Several implicit rules govern the aggregation process I have seen every issue I'm about to describe In the past 18 months Working on systems with people who are usually pretty smart 12/03/2011
  • 35. Missing Subpartition Stats Scenario 1 Aggregated Global Stats at Table-level Subpartition Stats gathered at subpartition-level as part of new subpartition load process Emergency hits when someone tries to INSERT data for which there is no valid subpartition Solution – quickly add a new partition and gather stats on new subpartition. 12/03/2011
  • 36. Missing Subpartition Stats 12/03/2011 TEST_TAB1 GLOBAL_STATS=NO NUM_ROWS = 11 P_20110201 GLOBAL_STATS=NO NUM_ROWS = 11 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 11
  • 37. Missing Subpartition Stats 12/03/2011 What will number of rows be? TEST_TAB1 GLOBAL_STATS=NO NUM_ROWS IS ? P_20110201 GLOBAL_STATS=NO NUM_ROWS = 11 P_20110202 GLOBAL_STATS=NO NUM_ROWS IS ? LONDON GLOBAL_STATS=NO NUM_ROWS = NULL MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 11 New data inserted and stats gathered New subpartition with no stats yet
  • 38. Missing Subpartition Stats 12/03/2011 Aggregated global stats invalidated TEST_TAB1 GLOBAL_STATS=NO NUM_ROWS IS NULL P_20110201 GLOBAL_STATS=NO NUM_ROWS = 11 P_20110202 GLOBAL_STATS=NO NUM_ROWS IS NULL LONDON GLOBAL_STATS=NO NUM_ROWS = NULL MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 11 No partition stats as not all subpartitions have stats
  • 39. Missing Subpartition Stats 12/03/2011 ... and fixes aggregated global stats TEST_TAB1 GLOBAL_STATS=NO NUM_ROWS IS 14 P_20110201 GLOBAL_STATS=NO NUM_ROWS = 11 P_20110202 GLOBAL_STATS=NO NUM_ROWS IS 3 LONDON GLOBAL_STATS=YES NUM_ROWS = 0 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 11 ... updates aggregated stats on partition Gathering stats on all subpartitions ...
  • 40. Incorrectly gathered Global Stats Scenario 2 Aggregated Global Stats at Table-level Partition Stats gathered at Partition-level as part of new partition load process Performance of several queries is horrible and poor NDVs at the Table-level are identified as root cause Solution – Gather Global Stats quickly! 12/03/2011
  • 41. Incorrectly Gathered Global Stats 12/03/2011 TEST_TAB1 GLOBAL_STATS=NO NUM_ROWS = 3 P_20110201 GLOBAL_STATS=NO NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3
  • 42. Incorrectly Gathered Global Stats 12/03/2011 Global Stats gathered TEST_TAB1 GLOBAL_STATS=YES NUM_ROWS = 3 P_20110201 GLOBAL_STATS=NO NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3
  • 43. Incorrectly Gathered Global Stats 12/03/2011 What will new number of rows be? New partition & subpartitionswith stats gathered TEST_TAB1 GLOBAL_STATS=YES NUM_ROWS = ? P_20110201 GLOBAL_STATS=NO NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO NUM_ROWS = 8 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 LONDON GLOBAL_STATS=YES NUM_ROWS = 5 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3
  • 44. Incorrectly Gathered Global Stats 12/03/2011 TEST_TAB1 GLOBAL_STATS=YES NUM_ROWS = 3 P_20110201 GLOBAL_STATS=NO NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO NUM_ROWS = 8 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 LONDON GLOBAL_STATS=YES NUM_ROWS = 5 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3
  • 45. Partition Exchange Issues Scenario 3 Aggregated Global Stats at Table-level Statistics are gathered on temporary Load Table Load Table is exchanged with partition of target table Objective is to minimise activity on target table and ensure that stats are available on partition immediately on exchange 12/03/2011
  • 46. Gather-then-Exchange 12/03/2011 TEST_TAB1 GLOBAL_STATS=NO NUM_ROWS = 3 P_20110201 GLOBAL_STATS=NO NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 LOAD_TAB1 GLOBAL_STATS=YES NUM_ROWS = 10 Temporary Load Table with stats
  • 47. Gather-then-Exchange 12/03/2011 New Partition & Subpartition without stats TEST_TAB1 GLOBAL_STATS=NO NUM_ROWS = 3 P_20110201 GLOBAL_STATS=NO NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO NUM_ROWS IS NULL MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 LONDON GLOBAL_STATS=NO NUM_ROWS IS NULL LOAD_TAB1 GLOBAL_STATS=YES NUM_ROWS = 10
  • 48. Gather-then-Exchange 12/03/2011 All subpartitions have stats, so what happened to Global Stats? TEST_TAB1 GLOBAL_STATS=NO NUM_ROWS = ? P_20110201 GLOBAL_STATS=NO NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO NUM_ROWS = ? MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 LONDON GLOBAL_STATS=YES NUM_ROWS = 10 LOAD_TAB1 GLOBAL_STATS=NO NUM_ROWS IS NULL Data and stats appear at partition exchange
  • 49. Gather-then-Exchange 12/03/2011 No statistics aggregation! TEST_TAB1 GLOBAL_STATS=NO NUM_ROWS = 3 P_20110201 GLOBAL_STATS=NO NUM_ROWS = 3 P_20110202 GLOBAL_STATS=NO NUM_ROWS IS NULL MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 LONDON GLOBAL_STATS=YES NUM_ROWS = 10
  • 50. _minimal_stats_aggregation Hidden parameter used to minimise the impact of statistics aggregation process Default is TRUE which means minimise aggregation Partition exchange will not trigger the aggregation process! Solutions Change hidden parameter – speak to Support Exchange-then-Gather (another good reason for this later) 12/03/2011
  • 51. Aggregated Stats – Summary Wildly inaccurate NDVs which will impact Execution Plans Take care with the aggregation process Do not use aggregated statistics unless you really don't have time to gather true Global Stats But the problem is, what if your table is so damn big that you can never manage to update those Global Stats? 12/03/2011
  • 52. Alternative Strategies Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
  • 53. Dynamic Sampling If stats collection is such a nightmare, perhaps we shouldn't bother gathering stats at all? Dynamic Sampling could be used Gather no stats manually When statements are parsed, Oracle will execute queries against objects to generate temporary stats on-the-fly I would not recommend this as a system-wide strategy What happened when stats were missing in earlier examples! Recurring overhead for every query Either expensive or low quality stats 12/03/2011
  • 54. Setting Statistics Gathering stats takes time and resources The resulting stats describe your data to help the CBO determine optimal execution plans If you know your data well enough to know the appropriate stats, why not just set them manually and avoid the collection overhead? Plenty of appropriate DBMS_STATS procedures Not a new idea and discussed in several places on the net (including JL chapter in latest Oak Table book) 12/03/2011
  • 55. Setting Statistics - Summary Positives Very fast and low resource method for setting statistics on new partitions Potential improvements to plan stability when accessing time-period partitions that are filled over time Negatives You need to know your data well, particularly any time periodicity You need to develop your own code implementation You could undermine the CBO's ability to use more appropriate execution plans as data changes over time Does not eliminate the difficulty in maintaining accurate Global Statistics, although these could be set manually too 12/03/2011
  • 56. Copying Statistics Extending the concept of setting statistics manually Instead of trying to work out what the appropriate statistics are for a new partition, copy the statistics from another partition The previous partition – increasing volumes? A golden template partition – plan stability? A prior partition to reflect the periodicity of your data. The second Tuesday from last month, Tuesday from last week, the 8th of last month Supported from 10.2.0.4 12/03/2011
  • 57. Copying Statistics 12/03/2011 TEST_TAB1 GLOBAL_STATS=YES NUM_ROWS = 3 P_20110201 GLOBAL_STATS=YES NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 dbms_stats.copy_table_stats( 'TESTUSER', TEST_TAB1', srcpartname => 'P_20110201', dstpartname => 'P_20110202'); dbms_stats.copy_table_stats( 'TESTUSER', TEST_TAB1', srcpartname => 'P_20110201_MOSCOW', dstpartname => 'P_20110202_MOSCOW');
  • 58. Copy Statistics 12/03/2011 TEST_TAB1 GLOBAL_STATS=YES NUM_ROWS = 3 P_20110201 GLOBAL_STATS=YES NUM_ROWS = 3 P_20110202 GLOBAL_STATS=YES NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3 MOSCOW GLOBAL_STATS=YES NUM_ROWS = 3
  • 59. Copying Statistics – Bug 1 The previous example doesn't work on an unpatched 10.2.0.4 When copying stats between partitions on a composite partitioned object (one with subpartitions) SQL> exec dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname => 'P_20110201', dstpartname => 'P_20110202'); BEGIN dbms_stats.copy_table_stats(ownname => 'TESTUSER', tabname => 'TEST_TAB1', srcpartname => 'P_20110201', dstpartname => 'P_20110202'); END; * ERROR at line 1: ORA-06533: Subscript beyond count ORA-06512: at "SYS.DBMS_STATS", line 17408 ORA-06512: at line 1 12/03/2011
  • 60. Copying Statistics – Bug 1 Bug number 8318020 Merge Label Request 8866627 Fixes a variety of stats-related bugs Patchset 10.2.0.5 Upgrade to 11.2.0.2 12/03/2011
  • 61. Copying Statistics – Bug 2 12/03/2011 TEST_TAB1 REPORTING_DATE High/Low = 20110201 P_20110201 REPORTING_DATE High/Low = 20110201 P_20110202
  • 62. Copying Statistics – Bug 2 12/03/2011 TEST_TAB1 REPORTING_DATE High/Low = 20110201 P_20110201 REPORTING_DATE High/Low = 20110201 P_20110202 REPORTING_DATE High/Low = 20110201
  • 63. Copying Statistics – Bug 2 We might reasonably expect Oracle to understand the implicit High/Low values of a partition key Merge Label Request 8866627 Patchset 10.2.0.5 Upgrade to 11.2 The wider issue here is that High/Low values (other than Partition Key columns and NDVs) will simply be copied Are you sure that's what you want? 12/03/2011
  • 64. Copying Statistics – Bug 3 12/03/2011 TEST_TAB1 GLOBAL_STATS=YES NUM_ROWS = 3 P_20110201 GLOBAL_STATS=YES NUM_ROWS = 3 P_20110202 OTHERS GLOBAL_STATS=YES NUM_ROWS = 3 OTHERS
  • 65. Copying Statistics ORA-03113 / 07445 while copying list partition statistics Core dump in qospMinMaxPartCol I initially thought this was because the OTHERS subpartition was the last one I copied stats for It is because it is a DEFAULT list subpartition Bug number 10268597 Still in 10.2.0.5 and 11.2.0.2 Marked as fixed in 11.2.0.3 and 12.1.0.0 12/03/2011
  • 66. Copying Statistics - Summary Positives Very fast and low resource method for setting statistics on new partitions Potential improvements to plan stability when accessing time-period partitions that are filled over time Negatives Bugs and related patches although better using 10.2.0.5 or 11.2 Does not eliminate the difficulty in maintaining accurate Global Statistics. Does not work well with composite partitioned tables. Does not work in current releases with List Partitioning where there is a DEFAULT partition 12/03/2011
  • 67. APPROX_GLOBAL AND PARTITION New 10.2 GRANULARITY option as an alternative to GLOBAL AND PARTITION Uses the aggregation process, but can replace gathered global statistics If the aggregation process is unavailable, e.g. Because there are missing partition statistics, it falls back to GLOBAL AND PARTITION All the same NDV issues with aggregated stats so you should use with occasional Global Stats gather process 12/03/2011
  • 68. Incremental Statistics Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
  • 69. Incremental Statistics What's the problem with the process for aggregating NDVs? Oracle knows the number of distinct values in the other partitions but not what those values were This might seem counter-intuitive. Oracle must have known what the values were when stats were gathered. But they are not stored anywhere Aggregation is a destructive process Incremental Statistics feature tracks the distinct values, stored as synopses Stored in WRI$_OPTSTAT_SYNPOSIS_HEAD$ and WRI$_OPTSTAT_SYNPOSIS$ 12/03/2011
  • 70. Incremental Statistics Prerequisites INCREMENTAL setting for the partitioned table is TRUE Set using DBMS_STATS.SET_TABLE_PREFS PUBLISH setting for the partitioned table is TRUE Which is the default setting anyway The user specifies (both defaults) ESTIMATE_PERCENT => AUTO_SAMPLE_SIZE GRANULARITY => 'AUTO' 12/03/2011
  • 71. New Process Gather initial statistics using the default settings Oracle will gather statistics at all appropriate levels using one-pass distinct sampling and store initial synopses As partitions are added or stats become stale, keep gathering using AUTO granularity and Oracle will Gather missing or stale partition stats Update synopses for those partitions Merge the synopses with synopses for higher levels of the same object, maintaining all Global Stats along the way Intelligent and accurate aggregation process 12/03/2011
  • 72. Other Resources AmitPoddar's excellent paper and presentation from earlier Hotsos Symposium Robin Moffat's blog post Synopses can take a lot of space in SYSAUX Aggregation seems hopelessly slow in older releases. Probably because WRI$_OPTSTAT_SYNOPSIS$ is not partitioned (it is in 11.2.0.2) Incremental Stats looks like the solution to our problems If you have the time to gather using defaults 12/03/2011
  • 73. Conclusions and References Introduction Simple Fundamentals Statistics on Partitioned Objects The Quality/Performance Trade-off Aggregation Scenarios Alternative Strategies Incremental Statistics Conclusions and References 12/03/2011
  • 74. Issues Aggregated NDVs are very low quality DBMS_STATS will only update aggregated stats when stats have been gathered appropriately on all underlying structures DBMS_STATS will never overwrite properly gathered Global Stats with aggregated results Unless you use 'APPROX_GLOBAL AND PARTITION' APPROX_GLOBAL stats otherwise suffer from the same problems as any other aggregated stats If aggregation fails because of missing partition stats, you will suddenly be using GLOBAL AND PARTITION 12/03/2011
  • 75. Issues Dynamic Sampling is almost certainly not the answer to your problems The default setting of _minimal_stats aggregation implies that you should normally use exchange-then-gather If you are using Incremental Stats you must use exchange-then-gather anyway 12/03/2011
  • 76. Suggestions Try the Oracle default options first, particularly 11.2 and up If you do not have time to gather using the default granularity, gather the best statistics you can as data is loaded and gather proper global statistics later DBMS_STATS is constantly evolving so you should try to be on the latest patchsets with all relevant one-off patches applied Checking stats means checking all levels, including GLOBAL_STATS column NUM_DISTINCT and High/Low Values 12/03/2011
  • 77. Suggestions Design a strategy Develop any surrounding code Stick to the strategy Always gather stats using the wrapper code Lock and unlock stats programmatically to prevent human errors ruining the strategy 12/03/2011
  • 78. Additional References Optimiser Development Group blog Greg Rahn's blog AmitPoddar's Paper Jonathan Lewis chapter in latest Oak Table book Lots of others in references section of paper 12/03/2011
  • 79. Statistics on Partitioned Objects Doug Burns dougburns@yahoo.com http://oracledoug.com/stats.docx

Editor's Notes

  1. Which is probably why Oracle introduced automatic stats gathering, dynamic sampling etc
  2. So what should we do about these different levels? What is involved in updating them?
  3. Slide corrected. Originally presented as Missing Partition StatsScenario 1Aggregated Global Stats at Table-levelPartition Stats gathered at Partition-level as part of new partition load processEmergency hits when someone tries to INSERT data for which there is no valid partitionSolution – quickly add a new partition!
  4. Slide corrected. Originally presented without subpartitions used in white paper, so was difficult to show the correct issue. Next sequence of diagrams all modified to show subpartitions