SlideShare una empresa de Scribd logo
1 de 35
HISTOGRAMS
PRE-12c AND NOW
1
ANJU GARG
About me
Sangam 14 Anju Garg
• More than 11 years of experience in IT Industry
• Sr. Corporate Trainer (Oracle DBA) with Koenig Solutions Ltd.
• Oracle blog : http://oracleinaction.com/
• Email : anjugarg66@gmail.com
• Oracle Certified Expert
2
Agenda
• Need For Histograms
• Pre-12c Histograms
– Frequency Histograms
– Height Balanced Histograms
• Issues With Histograms In 11g
• Histograms in 12c
– Top Frequency Histograms
– Hybrid Histograms
– Hybrid Histograms - Corollary
• Conclusion
• References
• Q & A
Sangam 14 Anju Garg 3
Need For Histograms
Skewed Data Distribution
Sangam 14 Anju Garg
• 26 distinct values in the column ID
• Out of 120 rows, more than 40% (50 rows) have ID = 8
• If optimizer assumes uniform distribution – will result in a bad execution plan
• Histogram desirable
4
Pre-12c Histograms
• Two types of histograms:
– Frequency histograms
– Height balanced histograms
Sangam 14 Anju Garg 5
Frequency Histogram
• NDV <= 254 and Nb >= NDV
• Records each different value and its exact cardinality.
• One bucket contains exactly one value.
• One value resides entirely in one bucket.
• Bucket size = cardinality of the value
• Precise
Nb – Number of buckets
NDV – No. of Distinct Values Sangam 14 Anju Garg
SQL>exec dbms_stats.gather_table_stats -
(ownname => 'HR', -
tabname => 'HIST', -
method_opt => 'FOR COLUMNS ID', -
cascade => true);
SQL>select table_name, column_name, histogram,num_distinct,num_buckets
Where table_name = 'HIST'
And column_name = 'ID';
TABLE_NAME COLUMN_NAME HISTOGRAM NUM_DISTINCT NUM_BUCKETS
---------- --------------- --------------- ------------ -----------
HIST ID FREQUENCY 26 26
6
Frequency Histogram
•
Sangam 14 Anju Garg 7
Frequency Histogram
• Optimizer uses frequency histogram to estimate cardinality accurately
• Uses FTS access path as desired even though column ID is indexed
Sangam 14 Anju Garg
SQL>explain plan for select * from hr.hist where id = 8;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------
Plan hash value: 538080257
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50 | 50200 | 7 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| HIST | 50 | 50200 | 7 (0)| 00:00:01 |
--------------------------------------------------------------------------
8
Height-Balanced Histogram
• NDV > 254 or Nb < NDV
• Distributes the count of all rows evenly across all histogram buckets
• All buckets have almost the same number of rows
• Less precise
• No. of buckets = 20 (< NDV (=26) )
Nb – Number of buckets
NDV – No. of Distinct Values
DB11g>exec dbms_stats.gather_table_stats -
(ownname => 'HR', -
tabname => 'HIST', -
method_opt => 'FOR COLUMNS ID size 20', -
cascade => true);
DB11g>select table_name,column_name,histogram,num_distinct,num_buckets
from dba_tab_col_statistics
where table_name = 'HIST' and column_name = 'ID';
TABLE_NAME COLUMN_NAME HISTOGRAM NUM_DISTINCT NUM_BUCKETS
---------- --------------- --------------- ------------ -----------
HIST ID HEIGHT BALANCED 26 20
Sangam 14 Anju Garg 9
Height-Balanced Histogram
Sangam 14 Anju Garg 10
Height-Balanced Histogram
• One bucket can have multiple values.
• One value can span multiple buckets.
• Multiple buckets (3 – 10) with same ENDPOINT_VALUE (8) compressed
into a single bucket with the highest endpoint number (bucket 10).
• A popular value (8) occurs as an endpoint value of multiple buckets.
• Any value that is not popular is a non-popular value.
Sangam 14 Anju Garg 11
Height-Balanced Histogram
Popular values
• Cardinality of popular value = (Bucket size)*(no. of buckets with value as
endpoint)
• ID = 8 is an end point of 8 buckets
• Estimated cardinality = (6 * 8) = 48 (Actual = 50)
DB11g> explain plan for select * from hr.hist where id =8;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
Plan hash value: 538080257
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 48 | 48192 | 7 (0)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| HIST | 48 | 48192 | 7 (0)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=8)
Sangam 14 Anju Garg 12
Height-Balanced Histogram
Non-Popular values (Endpoint)
• Cardinality of non-popular value = (number of rows in table) * density
• ID = 15 is an end point of one bucket
• Estimated cardinality = 3 (Actual = 5)
DB11g>explain plan for select * from hr.hist where id = 15;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------
Plan hash value: 4058847011
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|Time
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 3012 | 2 (0)| 00:00:01
| 1 | TABLE ACCESS BY INDEX ROWID| HIST | 3 | 3012 | 2 (0)| 00:00:01
|* 2 | INDEX RANGE SCAN | HIST_IDX | 3 | | 1 (0)| 00:00:01
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=15)
Sangam 14 Anju Garg 13
Height-Balanced Histogram
Non-Popular values (Non-endpoint)
• Cardinality of non-popular value = (num of rows in table) * density
• ID = 3 is not an endpoint
• Estimated cardinality = 3 (Actual = 1)
DB11g>explain plan for select * from hr.hist where id = 3;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------
Plan hash value: 4058847011
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |Cost (%CPU)| Time
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 3012 | 2 (0)| 00:00:01
| 1 | TABLE ACCESS BY INDEX ROWID| HIST | 3 | 3012 | 2 (0)| 00:00:01
|* 2 | INDEX RANGE SCAN | HIST_IDX | 3 | | 1 (0)| 00:00:01
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access"ID"=3)
Sangam 14 Anju Garg 14
Drawbacks Of Pre-12c Histograms
• Frequency histograms
– Accurate but can be created only for NDV <= 254.
• Height balanced histograms
– May misestimate cardinality for both popular and non-popular values.
– A value which is endpoint of one bucket and almost fills up another bucket
value might be considered unpopular.
NDV – No. of Distinct Values
Sangam 14 Anju Garg 15
Histograms in 12c
• Frequency histograms can be created for up to 2048 distinct values .
• Introduces two new types of histograms
– Top-n-frequency histograms
– Hybrid histograms
Sangam 14 Anju Garg 16
Top Frequency Histograms
• Useful when a small number of distinct values dominate the data set.
• Captures highly popular values and ignores statistically insignificant unpopular
values.
• Prerequisites
 NDV > Nb
 The percentage of rows occupied by the top Nb frequent values is equal to
or greater than threshold p, where p is (1-(1/Nb))*100.
 The ESTIMATE_PERCENT is set to AUTO_SAMPLE_SIZE in the DBMS_STATS
statistics gathering procedure.
• Features
 One value per bucket
 One value contained entirely in one bucket
 Cardinality of the value = Bucket size
 Variable bucket size
 Precise for all the endpoints captured
 Captures top Nb most popular values
Sangam 14 Anju Garg 17
Nb – Number of buckets
NDV – Number of distinct values
Top Frequency Histogram
• NDV (26) > Nb (20)
• Threshold p for 20 buckets = (1 - (1/Nb))*100 = (1 - (1/20))*100 = 95.0
• Top 20 most popular values occupy more than 95% of rows
• ESTIMATE_PERCENT = AUTO_SAMPLE_SIZE (default)
• Precisely captures cardinality of top Nb (20) most popular values.
• 6 (= NDV –Nb) Values occurring least no. of times are not captured.
DB12c>exec dbms_stats.gather_table_stats -
(ownname => 'HR', tabname => 'HIST', method_opt => 'FOR COLUMNS ID size
20', cascade => true);
select table_name, column_name,histogram,num_distinct, num_buckets
from dba_tab_col_statistics
where table_name = 'HIST' and column_name = 'ID';
TABLE_NAME COLUMN_NAME HISTOGRAM NUM_DISTINCT NUM_BUCKETS
---------- --------------- --------------- ------------ -----------
HIST ID TOP-FREQUENCY 26 20
Nb – Number of buckets
NDV – No. of Distinct Values
Sangam 14 Anju Garg 18
Top Frequency Histogram
Sangam 14 Anju Garg 19
Top Frequency Histogram
Non-Popular values (Endpoint)
• ID = 15
– Occurs 5 times
– Height-balanced histogram
• Endpoint of one bucket - considered unpopular
• misestimated 3 rows
– Top Frequency histogram
• Is an endpoint - one of the top 20 most frequently occurring values
• accurate cardinality estimate of 5 rows
DB12c>explain plan for select * from hr.hist where id = 15;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------
Plan hash value: 3950962134
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows|Bytes|Cost(%CPU)|Time|
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 |5020 | 2 (0)| 00:00:01
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| HIST | 5 |5020 | 2 (0)| 00:00:01
|* 2 | INDEX RANGE SCAN |HIST_IDX| 5 | | 1 (0)| 00:00:01
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=15)
Sangam 14 Anju Garg
20
Top Frequency Histogram
Non-Popular values (Non-endpoint)
• ID = 3
– Occurs once
– Height-balanced histogram
• Not an endpoint - considered unpopular
• misestimated 3 rows
– Top Frequency histogram
• Not an endpoint – one of the 6 least frequently occurring values
• accurate cardinality estimate of 1 row
DB12c>explain plan for select * from hr.hist where id = 3;
select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------
Plan hash value: 3950962134
---------------------------------------------------------------------------------
| Id | Operation | Name |Rows| Bytes|Cost(%CPU)|Time
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |1004 | 2 (0)| 00:00:01
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED|HIST | 1 |1004 | 2 (0)| 00:00:01
|* 2 | INDEX RANGE SCAN |HIST_IDX| 1 | | 1 (0)| 00:00:01
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"=3)
Sangam 14 Anju Garg
21
Top Frequency Histograms: Summary
• The occurrences of popular values are accurately captured at the expense of not
capturing the data for least occurring values.
• Accurately estimates cardinality of top Nb popular values.
• Useful for cases when data set is dominated by a small no. of values.
Nb – Number of buckets
NDV – No. of Distinct Values
Sangam 14 Anju Garg 22
Hybrid Histograms
• Combines characteristics of both height-balanced histograms and frequency
histograms.
• Pre-requisites
 Nb < NDV
 The criteria for top frequency histograms do not apply i.e.
Percentage of rows occupied by the top Nb most popular values is less than p
where p = (1-(1/Nb))*100.
 The sampling percentage is AUTO_SAMPLE_SIZE.
• Correctly estimates the frequency of endpoints
 stores the ENDPOINT_REPEAT_ COUNT - number of times the endpoint value is
repeated.
• Captures more endpoints
 stores all occurrences of every value in the same bucket - makes available
buckets to store more endpoints.
Nb – Number of buckets
NDV – No. of Distinct Values
Sangam 14 Anju Garg 23
Hybrid Histogram
• Nb (20) < NDV (26)
• Threshold p for 20 buckets = p = (1 - (1/nb))*100 = (1 - (1/20))*100 = 95.0
• On deleting 20 rows with ID = 8 from table HR.HIST , it qualifies for hybrid histogram
creation.
• Top 20 most popular ID's occupy less than 95% of total rows.
DB12c>exec dbms_stats.gather_table_stats -
(ownname => 'HR',tabname => 'HIST', -
method_opt => 'FOR COLUMNS ID size 20', -
cascade => true);
DB12c>select table_name,column_name, histogram,num_distinct,num_buckets
from dba_tab_col_statistics
where table_name = 'HIST' and column_name = 'ID';
TABLE_NAME COLUMN_NAME HISTOGRAM NUM_DISTINCT NUM_BUCKETS
---------- --------------- --------------- ------------ -----------
HIST ID HYBRID 26 20
Nb – Number of buckets
NDV – No. of Distinct Values
Sangam 14 Anju Garg 24
Hybrid Histogram
Sangam 14 Anju Garg
BUCKET
NUMBER
ENDPOINT
_VALUE
1 1 1 1 1 1
2 2 2 3 3
3 4 4 5 5
4 6 6 7 7 7 7
5 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
6 9 9 9 10 10 10
7 11 11 11 11 11 11 11
8 12 12 12 12 12 12 12
9 13 13 13 13 13 13 13
10 14 14 14 14
11 15 15 15 15 15 15
12 16 16 16 16
13 17 17 17 17
14 18 19 19
15 20 20 20 20 20 20
16 21 21 21
17 22 22
18 23 23
19 24 24 24
20 25 25 25 26 26 26
HYBRIDHISTOGRAM
VALUES
25
Hybrid Histograms: Summary
• Hybrid histograms have features of both frequency and height balanced
histograms.
– Features similar to frequency histograms
 All occurrences of a value are placed in one bucket
 ENDPOINT_NUMBER stores cumulative frequency
 Variable bucket size
– Features similar to Height Balanced histograms
 No. of values captured are less than NDV
 One bucket can have multiple values.
 ENDPOINT_VALUE represents the largest value in a bucket
• Captures more endpoints than height balanced histograms
• An improvement over height balanced histograms when data set is not
dominated by a small number of values.
Sangam 14 Anju Garg 26
Nb – Number of buckets
NDV – No. of Distinct Values
Hybrid Histograms : Corollary
• For data without any gap in values, If a maximum of two values are stored per
bucket, a hybrid histogram might even be able to accurately capture the
frequency of values up to twice the number of buckets.
Sangam 14 Anju Garg
Total no. of entries in the bucket = BUCKET_SIZE
= Difference of 2 consecutive ENDPOINT_NUMBER's
Frequency of endpoint = ENDPOINT_REPEAT_COUNT
Frequency of non-endpoint = (BUCKET_SIZE - ENDPOINT_REPEAT_COUNT)
27
BUCKET_SIZE
ENDPOINT_REPEAT_COUNT
Non-endpoint Endpoint
NON-ENDPOINT FREQUENCY
Hybrid Histograms : Corollary
• Frequency of all the 26 values should be estimated accurately using hybrid histogram having
13 buckets.
• Create a histogram with No. of buckets = NDV/2 = 26/2 = 13
DB12c>exec dbms_stats.gather_table_stats -
(ownname => 'HR', tabname => 'HIST',method_opt => 'FOR COLUMNS ID
size 13', cascade => true);
DB12c> select table_name, column_name, histogram, num_distinct,
num_buckets
from dba_tab_col_statistics
where table_name = 'HIST' and column_name = 'ID';
TABLE_NAME COLUMN_NAME HISTOGRAM NUM_DISTINCT NUM_BUCKETS
---------- --------------- --------------- ------------ -----------
HIST ID HYBRID 26 13
Nb – Number of buckets
NDV – No. of Distinct Values
Sangam 14 Anju Garg 28
Hybrid Histograms: Corollary
Sangam 14 Anju Garg
DB12c>select ENDPOINT_VALUE
ENDPOINT,
ENDPOINT_NUMBER
ENDPOINT_NO,
endpoint_repeat_count
rpt_cnt
from dba_histograms
where table_name = 'HIST'
and column_name = 'ID';
ENDPOINT ENDPOINT_NO RPT_CNT
-------- ----------- -------
1 4 4
5 10 1
8 45 30
11 56 6
12 62 6
13 68 6
15 76 5
17 82 3
20 89 5
22 92 1
23 93 1
24 95 2
26 100 2
13 rows selected.
ENDPOINT_VALUE : Largest value in a bucket
ENDPOINT_NUMBER : Cumulative frequency
ENDPOINT_REPEAT_COUNT :Frequency of endpoint
BUCKET
NUMBER
ENDPOINT
_ VALUE
1 1 1 1 1 1
2 2 2 3 4 4 5 5
3 6 6 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
4 9 9 9 10 10 11 11 11 11 11 11 11
5 12 12 12 12 12 12 12
6 13 13 13 13 13 13 13
7 14 14 14 15 15 15 15 15 15
8 16 16 16 17 17 17 17
9 18 19 20 20 20 20 20 20
10 21 21 22 22
11 23 23
12 24 24 24
13 25 25 25 26 26 26
VALUES
ACTUAL HYBRID HISTOGRAM (Nb = NDV/2 )
BUCKET
NUMBER
ENDPOINT
_ VALUE
1 1 1 1 1 2 2 2
2 3 4 4 4
3 5 6 6 6
4 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
5 9 9 9 10 10 10
6 11 11 11 11 11 11 12 12 12 12 12 12 12
7 13 13 13 13 13 13 14 14 14 14
8 15 15 15 15 15 16 16 16 16
9 17 17 17 18 18
10 19 20 20 20 20 20 20
11 21 21 22 22
12 23 24 24 24
13 25 25 25 26 26 26
HYPOTHETICAL HYBRID HISTOGRAM ( Nb = NDV/2 )
VALUES
29
Hybrid Histograms : Corollary
• Actual histogram
– some of the buckets have more than two values
– cardinality of some of the non endpoint values cannot be accurately determined.
• Hypothetical histogram
– All the buckets have two values each.
– Cardinality of all (2*Nb) values can be accurately determined.
Nb – Number of buckets
NDV – No. of Distinct Values
Sangam 14 Anju Garg 30
Histograms : Further enhancements?
• Hybrid histogram can be made to store a maximum of two values per bucket.
• For data without any gap in values
– Nb >= NDV/2 and Nb <= 2048
• Hybrid histogram will accurately estimate frequency of all the distinct
values.
• Half the number of buckets required as compared to frequency histogram.
• Can be a substitute for frequency histogram.
• Can accurately estimate the frequency of up to 4096 values.
– Nb < NDV/2 or Nb > 2048
• A newer type of histogram – Top frequency hybrid histogram could be
created.
• Captures (2*Nb) most popular values.
• Better substitute for Top frequency histogram which captures top Nb most
popular values.
Sangam 14 Anju Garg 31
Conclusion
• In 12c, frequency histogram can be created for NDV <= 2048 .
• Top Frequency and Hybrid histograms are designed to overcome flaws of
Height-Balanced histograms.
• Top frequency and Hybrid histograms are created only if ESTIMATE_PERCENT =
AUTO_SAMPLE_SIZE .
• Top frequency histogram accurately estimates the frequencies for only top
occurring values if a small number of values dominate the data set.
• Hybrid histogram has features of both frequency and height balanced
histograms
• Hybrid histogram captures more endpoints as compared to height-balanced
histograms and estimates their frequency accurately.
Nb – Number of buckets
NDV – No. of Distinct Values
Sangam 14 Anju Garg 32
References
• http://docs.oracle.com/database/121/TGSQL/tgsql_histo.htm#TGSQL366
• http://jimczuprynski.files.wordpress.com/2014/04/czuprynski-select-q2-2014.pdf
• http://jonathanlewis.wordpress.com/2013/09/01/histograms/
Sangam 14 Anju Garg 33
34
ANJU GARG
Email:anjugarg66@gmail.com
Blog:http://oracleinaction.com

Más contenido relacionado

La actualidad más candente

12c SQL Plan Directives
12c SQL Plan Directives12c SQL Plan Directives
12c SQL Plan DirectivesFranck Pachot
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace WalkthroughSergey Petrunya
 
Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009mattsmiley
 
New Tuning Features in Oracle 11g - How to make your database as boring as po...
New Tuning Features in Oracle 11g - How to make your database as boring as po...New Tuning Features in Oracle 11g - How to make your database as boring as po...
New Tuning Features in Oracle 11g - How to make your database as boring as po...Sage Computing Services
 
A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013Connor McDonald
 
Oracle statistics by example
Oracle statistics by exampleOracle statistics by example
Oracle statistics by exampleMauro Pagano
 
Full Table Scan: friend or foe
Full Table Scan: friend or foeFull Table Scan: friend or foe
Full Table Scan: friend or foeMauro Pagano
 
Oracle Parallel Distribution and 12c Adaptive Plans
Oracle Parallel Distribution and 12c Adaptive PlansOracle Parallel Distribution and 12c Adaptive Plans
Oracle Parallel Distribution and 12c Adaptive PlansFranck Pachot
 
Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizerMauro Pagano
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkSergey Petrunya
 
Is your SQL Exadata-aware?
Is your SQL Exadata-aware?Is your SQL Exadata-aware?
Is your SQL Exadata-aware?Mauro Pagano
 
Histograms in 12c era
Histograms in 12c eraHistograms in 12c era
Histograms in 12c eraMauro Pagano
 
Explaining the Postgres Query Optimizer
Explaining the Postgres Query OptimizerExplaining the Postgres Query Optimizer
Explaining the Postgres Query OptimizerEDB
 
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreLukas Fittl
 
Same plan different performance
Same plan different performanceSame plan different performance
Same plan different performanceMauro Pagano
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performanceSergey Petrunya
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Sergey Petrunya
 

La actualidad más candente (19)

12c SQL Plan Directives
12c SQL Plan Directives12c SQL Plan Directives
12c SQL Plan Directives
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009
 
New Tuning Features in Oracle 11g - How to make your database as boring as po...
New Tuning Features in Oracle 11g - How to make your database as boring as po...New Tuning Features in Oracle 11g - How to make your database as boring as po...
New Tuning Features in Oracle 11g - How to make your database as boring as po...
 
Oracle 12c SPM
Oracle 12c SPMOracle 12c SPM
Oracle 12c SPM
 
A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013
 
Oracle statistics by example
Oracle statistics by exampleOracle statistics by example
Oracle statistics by example
 
Full Table Scan: friend or foe
Full Table Scan: friend or foeFull Table Scan: friend or foe
Full Table Scan: friend or foe
 
Oracle Parallel Distribution and 12c Adaptive Plans
Oracle Parallel Distribution and 12c Adaptive PlansOracle Parallel Distribution and 12c Adaptive Plans
Oracle Parallel Distribution and 12c Adaptive Plans
 
Chasing the optimizer
Chasing the optimizerChasing the optimizer
Chasing the optimizer
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
Is your SQL Exadata-aware?
Is your SQL Exadata-aware?Is your SQL Exadata-aware?
Is your SQL Exadata-aware?
 
Histograms in 12c era
Histograms in 12c eraHistograms in 12c era
Histograms in 12c era
 
Explaining the Postgres Query Optimizer
Explaining the Postgres Query OptimizerExplaining the Postgres Query Optimizer
Explaining the Postgres Query Optimizer
 
Advanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & moreAdvanced pg_stat_statements: Filtering, Regression Testing & more
Advanced pg_stat_statements: Filtering, Regression Testing & more
 
Same plan different performance
Same plan different performanceSame plan different performance
Same plan different performance
 
Using histograms to get better performance
Using histograms to get better performanceUsing histograms to get better performance
Using histograms to get better performance
 
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
Common Table Expressions in MariaDB 10.2 (Percona Live Amsterdam 2016)
 
The PostgreSQL Query Planner
The PostgreSQL Query PlannerThe PostgreSQL Query Planner
The PostgreSQL Query Planner
 

Similar a Histograms : Pre-12c and Now

Managing Statistics for Optimal Query Performance
Managing Statistics for Optimal Query PerformanceManaging Statistics for Optimal Query Performance
Managing Statistics for Optimal Query PerformanceKaren Morton
 
Understanding Optimizer-Statistics-for-Developers
Understanding Optimizer-Statistics-for-DevelopersUnderstanding Optimizer-Statistics-for-Developers
Understanding Optimizer-Statistics-for-DevelopersEnkitec
 
Query optimizer vivek sharma
Query optimizer vivek sharmaQuery optimizer vivek sharma
Query optimizer vivek sharmaaioughydchapter
 
Oracle dbms_xplan.display_cursor format
Oracle dbms_xplan.display_cursor formatOracle dbms_xplan.display_cursor format
Oracle dbms_xplan.display_cursor formatFranck Pachot
 
Writing efficient sql
Writing efficient sqlWriting efficient sql
Writing efficient sqlj9soto
 
Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Understanding Query Optimization with ‘regular’ and ‘Exadata’ OracleUnderstanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Understanding Query Optimization with ‘regular’ and ‘Exadata’ OracleGuatemala User Group
 
Sydney Oracle Meetup - access paths
Sydney Oracle Meetup - access pathsSydney Oracle Meetup - access paths
Sydney Oracle Meetup - access pathspaulguerin
 
Riyaj: why optimizer_hates_my_sql_2010
Riyaj: why optimizer_hates_my_sql_2010Riyaj: why optimizer_hates_my_sql_2010
Riyaj: why optimizer_hates_my_sql_2010Riyaj Shamsudeen
 
Demystifying cost based optimization
Demystifying cost based optimizationDemystifying cost based optimization
Demystifying cost based optimizationRiyaj Shamsudeen
 
Informix Warehouse Accelerator (IWA) features in version 12.1
Informix Warehouse Accelerator (IWA) features in version 12.1Informix Warehouse Accelerator (IWA) features in version 12.1
Informix Warehouse Accelerator (IWA) features in version 12.1Keshav Murthy
 
SQL Macros - Game Changing Feature for SQL Developers?
SQL Macros - Game Changing Feature for SQL Developers?SQL Macros - Game Changing Feature for SQL Developers?
SQL Macros - Game Changing Feature for SQL Developers?Andrej Pashchenko
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
EvolveExecutionPlans.pdf
EvolveExecutionPlans.pdfEvolveExecutionPlans.pdf
EvolveExecutionPlans.pdfPraveenPolu1
 
SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所Hiroshi Sekiguchi
 
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...EDB
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internalsAlexey Ermakov
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfgaros1
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdwKohei KaiGai
 
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321maclean liu
 

Similar a Histograms : Pre-12c and Now (20)

Managing Statistics for Optimal Query Performance
Managing Statistics for Optimal Query PerformanceManaging Statistics for Optimal Query Performance
Managing Statistics for Optimal Query Performance
 
Understanding Optimizer-Statistics-for-Developers
Understanding Optimizer-Statistics-for-DevelopersUnderstanding Optimizer-Statistics-for-Developers
Understanding Optimizer-Statistics-for-Developers
 
Query optimizer vivek sharma
Query optimizer vivek sharmaQuery optimizer vivek sharma
Query optimizer vivek sharma
 
Oracle dbms_xplan.display_cursor format
Oracle dbms_xplan.display_cursor formatOracle dbms_xplan.display_cursor format
Oracle dbms_xplan.display_cursor format
 
Basic Query Tuning Primer
Basic Query Tuning PrimerBasic Query Tuning Primer
Basic Query Tuning Primer
 
Writing efficient sql
Writing efficient sqlWriting efficient sql
Writing efficient sql
 
Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Understanding Query Optimization with ‘regular’ and ‘Exadata’ OracleUnderstanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
 
Sydney Oracle Meetup - access paths
Sydney Oracle Meetup - access pathsSydney Oracle Meetup - access paths
Sydney Oracle Meetup - access paths
 
Riyaj: why optimizer_hates_my_sql_2010
Riyaj: why optimizer_hates_my_sql_2010Riyaj: why optimizer_hates_my_sql_2010
Riyaj: why optimizer_hates_my_sql_2010
 
Demystifying cost based optimization
Demystifying cost based optimizationDemystifying cost based optimization
Demystifying cost based optimization
 
Informix Warehouse Accelerator (IWA) features in version 12.1
Informix Warehouse Accelerator (IWA) features in version 12.1Informix Warehouse Accelerator (IWA) features in version 12.1
Informix Warehouse Accelerator (IWA) features in version 12.1
 
SQL Macros - Game Changing Feature for SQL Developers?
SQL Macros - Game Changing Feature for SQL Developers?SQL Macros - Game Changing Feature for SQL Developers?
SQL Macros - Game Changing Feature for SQL Developers?
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
EvolveExecutionPlans.pdf
EvolveExecutionPlans.pdfEvolveExecutionPlans.pdf
EvolveExecutionPlans.pdf
 
SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所SQLチューニング総合診療Oracle CloudWorld出張所
SQLチューニング総合診療Oracle CloudWorld出張所
 
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN AN...
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internals
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdf
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
 
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
【Maclean liu技术分享】拨开oracle cbo优化器迷雾,探究histogram直方图之秘 0321
 

Último

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Último (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Histograms : Pre-12c and Now

  • 2. About me Sangam 14 Anju Garg • More than 11 years of experience in IT Industry • Sr. Corporate Trainer (Oracle DBA) with Koenig Solutions Ltd. • Oracle blog : http://oracleinaction.com/ • Email : anjugarg66@gmail.com • Oracle Certified Expert 2
  • 3. Agenda • Need For Histograms • Pre-12c Histograms – Frequency Histograms – Height Balanced Histograms • Issues With Histograms In 11g • Histograms in 12c – Top Frequency Histograms – Hybrid Histograms – Hybrid Histograms - Corollary • Conclusion • References • Q & A Sangam 14 Anju Garg 3
  • 4. Need For Histograms Skewed Data Distribution Sangam 14 Anju Garg • 26 distinct values in the column ID • Out of 120 rows, more than 40% (50 rows) have ID = 8 • If optimizer assumes uniform distribution – will result in a bad execution plan • Histogram desirable 4
  • 5. Pre-12c Histograms • Two types of histograms: – Frequency histograms – Height balanced histograms Sangam 14 Anju Garg 5
  • 6. Frequency Histogram • NDV <= 254 and Nb >= NDV • Records each different value and its exact cardinality. • One bucket contains exactly one value. • One value resides entirely in one bucket. • Bucket size = cardinality of the value • Precise Nb – Number of buckets NDV – No. of Distinct Values Sangam 14 Anju Garg SQL>exec dbms_stats.gather_table_stats - (ownname => 'HR', - tabname => 'HIST', - method_opt => 'FOR COLUMNS ID', - cascade => true); SQL>select table_name, column_name, histogram,num_distinct,num_buckets Where table_name = 'HIST' And column_name = 'ID'; TABLE_NAME COLUMN_NAME HISTOGRAM NUM_DISTINCT NUM_BUCKETS ---------- --------------- --------------- ------------ ----------- HIST ID FREQUENCY 26 26 6
  • 8. Frequency Histogram • Optimizer uses frequency histogram to estimate cardinality accurately • Uses FTS access path as desired even though column ID is indexed Sangam 14 Anju Garg SQL>explain plan for select * from hr.hist where id = 8; select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------- Plan hash value: 538080257 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 50 | 50200 | 7 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| HIST | 50 | 50200 | 7 (0)| 00:00:01 | -------------------------------------------------------------------------- 8
  • 9. Height-Balanced Histogram • NDV > 254 or Nb < NDV • Distributes the count of all rows evenly across all histogram buckets • All buckets have almost the same number of rows • Less precise • No. of buckets = 20 (< NDV (=26) ) Nb – Number of buckets NDV – No. of Distinct Values DB11g>exec dbms_stats.gather_table_stats - (ownname => 'HR', - tabname => 'HIST', - method_opt => 'FOR COLUMNS ID size 20', - cascade => true); DB11g>select table_name,column_name,histogram,num_distinct,num_buckets from dba_tab_col_statistics where table_name = 'HIST' and column_name = 'ID'; TABLE_NAME COLUMN_NAME HISTOGRAM NUM_DISTINCT NUM_BUCKETS ---------- --------------- --------------- ------------ ----------- HIST ID HEIGHT BALANCED 26 20 Sangam 14 Anju Garg 9
  • 11. Height-Balanced Histogram • One bucket can have multiple values. • One value can span multiple buckets. • Multiple buckets (3 – 10) with same ENDPOINT_VALUE (8) compressed into a single bucket with the highest endpoint number (bucket 10). • A popular value (8) occurs as an endpoint value of multiple buckets. • Any value that is not popular is a non-popular value. Sangam 14 Anju Garg 11
  • 12. Height-Balanced Histogram Popular values • Cardinality of popular value = (Bucket size)*(no. of buckets with value as endpoint) • ID = 8 is an end point of 8 buckets • Estimated cardinality = (6 * 8) = 48 (Actual = 50) DB11g> explain plan for select * from hr.hist where id =8; select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------- Plan hash value: 538080257 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 48 | 48192 | 7 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| HIST | 48 | 48192 | 7 (0)| 00:00:01 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=8) Sangam 14 Anju Garg 12
  • 13. Height-Balanced Histogram Non-Popular values (Endpoint) • Cardinality of non-popular value = (number of rows in table) * density • ID = 15 is an end point of one bucket • Estimated cardinality = 3 (Actual = 5) DB11g>explain plan for select * from hr.hist where id = 15; select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------- Plan hash value: 4058847011 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|Time --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 3012 | 2 (0)| 00:00:01 | 1 | TABLE ACCESS BY INDEX ROWID| HIST | 3 | 3012 | 2 (0)| 00:00:01 |* 2 | INDEX RANGE SCAN | HIST_IDX | 3 | | 1 (0)| 00:00:01 --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=15) Sangam 14 Anju Garg 13
  • 14. Height-Balanced Histogram Non-Popular values (Non-endpoint) • Cardinality of non-popular value = (num of rows in table) * density • ID = 3 is not an endpoint • Estimated cardinality = 3 (Actual = 1) DB11g>explain plan for select * from hr.hist where id = 3; select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------- Plan hash value: 4058847011 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |Cost (%CPU)| Time --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3 | 3012 | 2 (0)| 00:00:01 | 1 | TABLE ACCESS BY INDEX ROWID| HIST | 3 | 3012 | 2 (0)| 00:00:01 |* 2 | INDEX RANGE SCAN | HIST_IDX | 3 | | 1 (0)| 00:00:01 --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access"ID"=3) Sangam 14 Anju Garg 14
  • 15. Drawbacks Of Pre-12c Histograms • Frequency histograms – Accurate but can be created only for NDV <= 254. • Height balanced histograms – May misestimate cardinality for both popular and non-popular values. – A value which is endpoint of one bucket and almost fills up another bucket value might be considered unpopular. NDV – No. of Distinct Values Sangam 14 Anju Garg 15
  • 16. Histograms in 12c • Frequency histograms can be created for up to 2048 distinct values . • Introduces two new types of histograms – Top-n-frequency histograms – Hybrid histograms Sangam 14 Anju Garg 16
  • 17. Top Frequency Histograms • Useful when a small number of distinct values dominate the data set. • Captures highly popular values and ignores statistically insignificant unpopular values. • Prerequisites  NDV > Nb  The percentage of rows occupied by the top Nb frequent values is equal to or greater than threshold p, where p is (1-(1/Nb))*100.  The ESTIMATE_PERCENT is set to AUTO_SAMPLE_SIZE in the DBMS_STATS statistics gathering procedure. • Features  One value per bucket  One value contained entirely in one bucket  Cardinality of the value = Bucket size  Variable bucket size  Precise for all the endpoints captured  Captures top Nb most popular values Sangam 14 Anju Garg 17 Nb – Number of buckets NDV – Number of distinct values
  • 18. Top Frequency Histogram • NDV (26) > Nb (20) • Threshold p for 20 buckets = (1 - (1/Nb))*100 = (1 - (1/20))*100 = 95.0 • Top 20 most popular values occupy more than 95% of rows • ESTIMATE_PERCENT = AUTO_SAMPLE_SIZE (default) • Precisely captures cardinality of top Nb (20) most popular values. • 6 (= NDV –Nb) Values occurring least no. of times are not captured. DB12c>exec dbms_stats.gather_table_stats - (ownname => 'HR', tabname => 'HIST', method_opt => 'FOR COLUMNS ID size 20', cascade => true); select table_name, column_name,histogram,num_distinct, num_buckets from dba_tab_col_statistics where table_name = 'HIST' and column_name = 'ID'; TABLE_NAME COLUMN_NAME HISTOGRAM NUM_DISTINCT NUM_BUCKETS ---------- --------------- --------------- ------------ ----------- HIST ID TOP-FREQUENCY 26 20 Nb – Number of buckets NDV – No. of Distinct Values Sangam 14 Anju Garg 18
  • 20. Top Frequency Histogram Non-Popular values (Endpoint) • ID = 15 – Occurs 5 times – Height-balanced histogram • Endpoint of one bucket - considered unpopular • misestimated 3 rows – Top Frequency histogram • Is an endpoint - one of the top 20 most frequently occurring values • accurate cardinality estimate of 5 rows DB12c>explain plan for select * from hr.hist where id = 15; select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------- Plan hash value: 3950962134 --------------------------------------------------------------------------------- | Id | Operation | Name | Rows|Bytes|Cost(%CPU)|Time| --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5 |5020 | 2 (0)| 00:00:01 | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| HIST | 5 |5020 | 2 (0)| 00:00:01 |* 2 | INDEX RANGE SCAN |HIST_IDX| 5 | | 1 (0)| 00:00:01 --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=15) Sangam 14 Anju Garg 20
  • 21. Top Frequency Histogram Non-Popular values (Non-endpoint) • ID = 3 – Occurs once – Height-balanced histogram • Not an endpoint - considered unpopular • misestimated 3 rows – Top Frequency histogram • Not an endpoint – one of the 6 least frequently occurring values • accurate cardinality estimate of 1 row DB12c>explain plan for select * from hr.hist where id = 3; select * from table(dbms_xplan.display); PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------- Plan hash value: 3950962134 --------------------------------------------------------------------------------- | Id | Operation | Name |Rows| Bytes|Cost(%CPU)|Time --------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 |1004 | 2 (0)| 00:00:01 | 1 | TABLE ACCESS BY INDEX ROWID BATCHED|HIST | 1 |1004 | 2 (0)| 00:00:01 |* 2 | INDEX RANGE SCAN |HIST_IDX| 1 | | 1 (0)| 00:00:01 --------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ID"=3) Sangam 14 Anju Garg 21
  • 22. Top Frequency Histograms: Summary • The occurrences of popular values are accurately captured at the expense of not capturing the data for least occurring values. • Accurately estimates cardinality of top Nb popular values. • Useful for cases when data set is dominated by a small no. of values. Nb – Number of buckets NDV – No. of Distinct Values Sangam 14 Anju Garg 22
  • 23. Hybrid Histograms • Combines characteristics of both height-balanced histograms and frequency histograms. • Pre-requisites  Nb < NDV  The criteria for top frequency histograms do not apply i.e. Percentage of rows occupied by the top Nb most popular values is less than p where p = (1-(1/Nb))*100.  The sampling percentage is AUTO_SAMPLE_SIZE. • Correctly estimates the frequency of endpoints  stores the ENDPOINT_REPEAT_ COUNT - number of times the endpoint value is repeated. • Captures more endpoints  stores all occurrences of every value in the same bucket - makes available buckets to store more endpoints. Nb – Number of buckets NDV – No. of Distinct Values Sangam 14 Anju Garg 23
  • 24. Hybrid Histogram • Nb (20) < NDV (26) • Threshold p for 20 buckets = p = (1 - (1/nb))*100 = (1 - (1/20))*100 = 95.0 • On deleting 20 rows with ID = 8 from table HR.HIST , it qualifies for hybrid histogram creation. • Top 20 most popular ID's occupy less than 95% of total rows. DB12c>exec dbms_stats.gather_table_stats - (ownname => 'HR',tabname => 'HIST', - method_opt => 'FOR COLUMNS ID size 20', - cascade => true); DB12c>select table_name,column_name, histogram,num_distinct,num_buckets from dba_tab_col_statistics where table_name = 'HIST' and column_name = 'ID'; TABLE_NAME COLUMN_NAME HISTOGRAM NUM_DISTINCT NUM_BUCKETS ---------- --------------- --------------- ------------ ----------- HIST ID HYBRID 26 20 Nb – Number of buckets NDV – No. of Distinct Values Sangam 14 Anju Garg 24
  • 25. Hybrid Histogram Sangam 14 Anju Garg BUCKET NUMBER ENDPOINT _VALUE 1 1 1 1 1 1 2 2 2 3 3 3 4 4 5 5 4 6 6 7 7 7 7 5 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 6 9 9 9 10 10 10 7 11 11 11 11 11 11 11 8 12 12 12 12 12 12 12 9 13 13 13 13 13 13 13 10 14 14 14 14 11 15 15 15 15 15 15 12 16 16 16 16 13 17 17 17 17 14 18 19 19 15 20 20 20 20 20 20 16 21 21 21 17 22 22 18 23 23 19 24 24 24 20 25 25 25 26 26 26 HYBRIDHISTOGRAM VALUES 25
  • 26. Hybrid Histograms: Summary • Hybrid histograms have features of both frequency and height balanced histograms. – Features similar to frequency histograms  All occurrences of a value are placed in one bucket  ENDPOINT_NUMBER stores cumulative frequency  Variable bucket size – Features similar to Height Balanced histograms  No. of values captured are less than NDV  One bucket can have multiple values.  ENDPOINT_VALUE represents the largest value in a bucket • Captures more endpoints than height balanced histograms • An improvement over height balanced histograms when data set is not dominated by a small number of values. Sangam 14 Anju Garg 26 Nb – Number of buckets NDV – No. of Distinct Values
  • 27. Hybrid Histograms : Corollary • For data without any gap in values, If a maximum of two values are stored per bucket, a hybrid histogram might even be able to accurately capture the frequency of values up to twice the number of buckets. Sangam 14 Anju Garg Total no. of entries in the bucket = BUCKET_SIZE = Difference of 2 consecutive ENDPOINT_NUMBER's Frequency of endpoint = ENDPOINT_REPEAT_COUNT Frequency of non-endpoint = (BUCKET_SIZE - ENDPOINT_REPEAT_COUNT) 27 BUCKET_SIZE ENDPOINT_REPEAT_COUNT Non-endpoint Endpoint NON-ENDPOINT FREQUENCY
  • 28. Hybrid Histograms : Corollary • Frequency of all the 26 values should be estimated accurately using hybrid histogram having 13 buckets. • Create a histogram with No. of buckets = NDV/2 = 26/2 = 13 DB12c>exec dbms_stats.gather_table_stats - (ownname => 'HR', tabname => 'HIST',method_opt => 'FOR COLUMNS ID size 13', cascade => true); DB12c> select table_name, column_name, histogram, num_distinct, num_buckets from dba_tab_col_statistics where table_name = 'HIST' and column_name = 'ID'; TABLE_NAME COLUMN_NAME HISTOGRAM NUM_DISTINCT NUM_BUCKETS ---------- --------------- --------------- ------------ ----------- HIST ID HYBRID 26 13 Nb – Number of buckets NDV – No. of Distinct Values Sangam 14 Anju Garg 28
  • 29. Hybrid Histograms: Corollary Sangam 14 Anju Garg DB12c>select ENDPOINT_VALUE ENDPOINT, ENDPOINT_NUMBER ENDPOINT_NO, endpoint_repeat_count rpt_cnt from dba_histograms where table_name = 'HIST' and column_name = 'ID'; ENDPOINT ENDPOINT_NO RPT_CNT -------- ----------- ------- 1 4 4 5 10 1 8 45 30 11 56 6 12 62 6 13 68 6 15 76 5 17 82 3 20 89 5 22 92 1 23 93 1 24 95 2 26 100 2 13 rows selected. ENDPOINT_VALUE : Largest value in a bucket ENDPOINT_NUMBER : Cumulative frequency ENDPOINT_REPEAT_COUNT :Frequency of endpoint BUCKET NUMBER ENDPOINT _ VALUE 1 1 1 1 1 1 2 2 2 3 4 4 5 5 3 6 6 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 4 9 9 9 10 10 11 11 11 11 11 11 11 5 12 12 12 12 12 12 12 6 13 13 13 13 13 13 13 7 14 14 14 15 15 15 15 15 15 8 16 16 16 17 17 17 17 9 18 19 20 20 20 20 20 20 10 21 21 22 22 11 23 23 12 24 24 24 13 25 25 25 26 26 26 VALUES ACTUAL HYBRID HISTOGRAM (Nb = NDV/2 ) BUCKET NUMBER ENDPOINT _ VALUE 1 1 1 1 1 2 2 2 2 3 4 4 4 3 5 6 6 6 4 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 5 9 9 9 10 10 10 6 11 11 11 11 11 11 12 12 12 12 12 12 12 7 13 13 13 13 13 13 14 14 14 14 8 15 15 15 15 15 16 16 16 16 9 17 17 17 18 18 10 19 20 20 20 20 20 20 11 21 21 22 22 12 23 24 24 24 13 25 25 25 26 26 26 HYPOTHETICAL HYBRID HISTOGRAM ( Nb = NDV/2 ) VALUES 29
  • 30. Hybrid Histograms : Corollary • Actual histogram – some of the buckets have more than two values – cardinality of some of the non endpoint values cannot be accurately determined. • Hypothetical histogram – All the buckets have two values each. – Cardinality of all (2*Nb) values can be accurately determined. Nb – Number of buckets NDV – No. of Distinct Values Sangam 14 Anju Garg 30
  • 31. Histograms : Further enhancements? • Hybrid histogram can be made to store a maximum of two values per bucket. • For data without any gap in values – Nb >= NDV/2 and Nb <= 2048 • Hybrid histogram will accurately estimate frequency of all the distinct values. • Half the number of buckets required as compared to frequency histogram. • Can be a substitute for frequency histogram. • Can accurately estimate the frequency of up to 4096 values. – Nb < NDV/2 or Nb > 2048 • A newer type of histogram – Top frequency hybrid histogram could be created. • Captures (2*Nb) most popular values. • Better substitute for Top frequency histogram which captures top Nb most popular values. Sangam 14 Anju Garg 31
  • 32. Conclusion • In 12c, frequency histogram can be created for NDV <= 2048 . • Top Frequency and Hybrid histograms are designed to overcome flaws of Height-Balanced histograms. • Top frequency and Hybrid histograms are created only if ESTIMATE_PERCENT = AUTO_SAMPLE_SIZE . • Top frequency histogram accurately estimates the frequencies for only top occurring values if a small number of values dominate the data set. • Hybrid histogram has features of both frequency and height balanced histograms • Hybrid histogram captures more endpoints as compared to height-balanced histograms and estimates their frequency accurately. Nb – Number of buckets NDV – No. of Distinct Values Sangam 14 Anju Garg 32
  • 34. 34