Más contenido relacionado
Similar a Understanding histogramppt.prn (20)
Más de Leyi (Kamus) Zhang (11)
Understanding histogramppt.prn
- 1. Understanding Histograms in the
Oracle Database
Robert Gaydos
R Wave Solutions
Robert.Gaydos@rwavesolutions.com
Copyright © R Wave
Solutions 2008 All rights
reserved
- 2. Addenda
What is a Histogram?
Types of Histograms
How to create Histograms
How to identify existing Histograms
Rules and tips when creating Histograms
Q&A
Copyright © R Wave Solutions 2008 All rights reserved
- 3. What is a Histogram?
A histogram holds the data distribution of
values within a column of a table.
– Number of occurrences for a specific value/range
– Used by CBO to optimize a query.
It is collected by using DBMS_STATS.
– By default DBMS_STATS does not collect
histogram stats.
Min/Max Value and # of distinct values
NOTE: DBMS_STAT is used to delete histogram data.
Copyright © R Wave Solutions 2008 All rights reserved
- 4. Types of Histograms
Two types of Histograms
– frequency histograms
– height-balanced histograms
– Type of histogram is stored in the HISTOGRAM
column of the *tab_col_statistics views (USER/DBA)
– Value = (‘HEIGHT BALANCED’, ‘FREQUENCY’,
or ‘NONE’)
Copyright © R Wave Solutions 2008 All rights reserved
- 5. Histogram and Buckets
When Histograms are created the number of
buckets can be specified.
It is this number that controls the type of
histogram created.
# Buckets = # of Rows of information.
When Distinct Values <= Buckets
– Then Frequency Histogram is created
– Else Height-Balanced Histogram is created
Copyright © R Wave Solutions 2008 All rights reserved
- 6. Frequency Histograms
Each value of the column corresponds to a
single bucket of the histogram.
Each bucket contains the number of
occurrences of that single value.
Copyright © R Wave Solutions 2008 All rights reserved
- 7. Frequency Histograms Example
Data Sorted
A A Results
B B
C B Bucket 1 A = 1
B B
C C Bucket 2 B = 3
C C
C C Bucket 3 C = 6
C C
C C Bucket 4 D = 1
B C
D D Bucket 5 E = 1
E E
Copyright © R Wave Solutions 2008 All rights reserved
- 8. Height-Balanced Histograms
In a height-balanced histogram, the column
values are divided into bands so that each
band contains approximately the same
number of rows.
The useful information that the histogram
provides is where in the range of values the
endpoints fall.
Copyright © R Wave Solutions 2008 All rights reserved
- 9. Height-Balanced Example 1
Data Sorted
A A Results
B B
C B EPN EPV
B B
C C 0 B
C C
C C 1 C
C C
C C 2 C
B C
D D 3 E
E E
Buckets = 4
Copyright © R Wave Solutions 2008 All rights reserved
- 10. The DB only stores
0-B
Height-Balanced Example 1
2-C
3-E
Data Sorted
A A Results
Notice that B
B C EPN EPV
C B
Crosses Buckets.
B B
C C 0 B
C C
This is howC
C 1 C
database knows
C C
2 C
C C
which values are
B C
D
popular D 3 E
E E
Buckets = 4
Copyright © R Wave Solutions 2008 All rights reserved
- 11. How to Create a Histogram
Created by using
DBMS_STATS.GATHER_TABLE_STATS
METHOD_OPT => 'FOR COLUMNS SIZE
<# of buckets> <Column Name>‘
execute dbms_stats.gather_table_stats
(ownname => 'oe', tabname => 'inventories',
METHOD_OPT => 'FOR COLUMNS SIZE 10
quantity_on_hand');
Copyright © R Wave Solutions 2008 All rights reserved
- 12. Identify Existing Histograms
In 10g
– HISTOGRAM column of the *tab_col_statistics views
(USER/DBA)
In 9
select owner, table_name, column_name, count(*)
buckets
from dba_histograms
where endpoint_number not in (0,1)
group by owner,
table_name, column_name
Copyright © R Wave Solutions 2008 All rights reserved
- 13. Rules On When To Create Histograms
First there are no rules of thumb that are
always correct.
When creating using dbms_stats, the use of
the DEGREE (for parallel) can skew the
value of density
Running dbms_stats can drop histograms
– Use method_opt==>'REPEAT‘
Copyright © R Wave Solutions 2008 All rights reserved
- 14. Rules On When To Create Histograms
Index maintenance overhead is performed
during update, insert and delete operations.
Histogram maintenance overhead is
performed during the analyze/dbms_stats
process.
Histograms are NOT just for indexed
columns.
– Adding a histogram to an un-indexed column that is used in
a where clause can improve performance.
Copyright © R Wave Solutions 2008 All rights reserved
- 15. Histogram Example
select sum(t1.d2*t2.d3*t3.d3)
from t1, t2, t3
where t1.fk1 = t2.pk1
and t3.pk1 = t2.fk1
and t3.d2 = 35
and t1.d3 = 0; << This column has 2 values in it
In this example column T1.D3 has 2 values and is our
most selective criterion, but without a histogram CBO
assumes an even distribution of 2 distinct values
(density = 0.5).
Copyright © R Wave Solutions 2008 All rights reserved
- 16. SQL*Plan Without Histogram
Rows Row Source Operation
------- ---------------------------------
1 SORT AGGREGATE
2088 HASH JOIN
601 TABLE ACCESS BY INDEX ROWID
601 INDEX RANGE SCAN OBJ#(57612)
1000 HASH JOIN
4 TABLE ACCESS FULL OBJ#(57604)
filter("T1"."D3"=0)
62500 TABLE ACCESS FULL OBJ#(57605)
Copyright © R Wave Solutions 2008 All rights reserved
- 17. Timing Without Histogram
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 1.04 4.91 19013 20231 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 1.04 4.91 19013 20231 0 1
Query = 20231
Copyright © R Wave Solutions 2008 All rights reserved
- 18. Use DBMS_STATS to Add Histogram
execute dbms_stats.gather_table_stats(NULL,
'T1', estimate_percent=>null,
method_opt=>'FOR COLUMNS SIZE AUTO
d3');
Copyright © R Wave Solutions 2008 All rights reserved
- 19. Timing With Histogram
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.01 0.00 0 0 0 0
Fetch 2 0.56 2.84 9841 10666 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.57 2.84 9841 10666 0 1
Now Query = 10666
It was Query = 20231
½ the IO
Copyright © R Wave Solutions 2008 All rights reserved
- 20. Plan With Histogram
Rows Row Source Operation
------- ---------------------------------------
1 SORT AGGREGATE
2088 HASH JOIN
601 TABLE ACCESS BY INDEX ROWID OBJ#(57606)
601 INDEX RANGE SCAN OBJ#(57612)
1000 TABLE ACCESS BY INDEX ROWID OBJ#(57605)
1005 NESTED LOOPS
4 TABLE ACCESS FULL OBJ#(57604)
filter("T1"."D3"=0)
1000 INDEX RANGE SCAN OBJ#(57609)
access("T1"."FK1"="T2"."PK1")
Copyright © R Wave Solutions 2008 All rights reserved
- 21. Histograms Opportunities
Any column used in a where clause with
skewed data
Columns that are not queried all the time
Reduced overhead for insert, update, delete
Copyright © R Wave Solutions 2008 All rights reserved
- 22. When Not To Create A Histogram
No rule of thumb is ever perfect.
Do not create Histograms
– Evenly distributed data in the column
– Columns that are not queried
– Do not create them on every column of
every table.
– PK of a table
Copyright © R Wave Solutions 2008 All rights reserved
- 23. Creating / Maintaining Histograms
Table maintenance procedures are
important.
– Accidentally deleting histogram data is possible.
– Histogram data is ONLY Collected/Maintained
when stats are collected (DBMS_STATS).
– Changing data in the table does not change data
stored in the histogram.
Copyright © R Wave Solutions 2008 All rights reserved
- 24. Maintaining Histograms
Always check if histogram exists on table
before DBMS_STATS is run.
– Do not forget to note the # of buckets.
– Create tables to hold information.
Use METHOD_OPT FOR ALL COLUMNS
REPEAT to prevent deletion of histograms
data.
Distribution ratio change -> Recollect
Copyright © R Wave Solutions 2008 All rights reserved
- 25. Creating Histograms
What is the best way to start out creating
Histograms? Three popular ways to create.
– Salt and Pepper - Create them as you need them.
– Little Hammer DBMS_STATS using
METHOD_OPT AUTO
Should NOT be used after fresh restarted database
– Big Hammer DBMS_STATS using SKEWONLY
– Over Kill Hammer - FOR ALL COLUMNS
Copyright © R Wave Solutions 2008 All rights reserved
- 26. Other Comments
It is the Density of a column that the
optimizer considers when accessing a table.
Changing the number of buckets changes
the density.
More Buckets DOES NOT mean better
density.
Using degree != 1 can affect density.
Copyright © R Wave Solutions 2008 All rights reserved
- 27. Review
What is a Histogram?
Two types of Histograms in the database
How to create Histograms
Data is collected by running DBMS_STATS.
Different methods to create histograms
Bucket size affects density of column.
Density of the column affects CBO.
Copyright © R Wave Solutions 2008 All rights reserved
- 28. Thank You for Your Time
Copyright © R Wave Solutions 2008 All rights reserved