Session aims at introducing less familiar audience to the Oracle database statistics concept, why statistics are necessary and how the Oracle Cost-Based Optimizer uses them
10. What did we just do?
• Gathered:
– table statistics on table T1
– column statistics for every column
– index statistics on every index defined on T1
– (sub)partition statistics
– histograms on subset of columns (*)
• We’ll cover next stats that matters to CBO
1/29/17 10
12. Table statistics – FTS cost
select table_name,num_rows,blocks from user_tables where table_name='T1';
TABLE_NAME NUM_ROWS BLOCKS
------------------------------ ---------- ----------
T1 920560 16378
explain plan for select * from t1;
select * from table(dbms_xplan.display);
Plan hash value: 3617692013
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 920K| 100M| 4463 (1)| 00:00:01 |
| 1 | TABLE ACCESS STORAGE FULL| T1 | 920K| 100M| 4463 (1)| 00:00:01 |
----------------------------------------------------------------------------------
1/29/17 12
13. Table statistics – FTS cost
select table_name,num_rows,blocks from user_tables where table_name='T1';
TABLE_NAME NUM_ROWS BLOCKS
------------------------------ ---------- ----------
T1 920560 30000
explain plan for select * from t1;
select * from table(dbms_xplan.display);
Plan hash value: 3617692013
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 920K| 100M| 8156 (1)| 00:00:01 |
| 1 | TABLE ACCESS STORAGE FULL| T1 | 920K| 100M| 8156 (1)| 00:00:01 |
----------------------------------------------------------------------------------
1/29/17 13
14. Table statistics – Cardinality
select table_name,num_rows,blocks from user_tables where table_name='T1';
TABLE_NAME NUM_ROWS BLOCKS
------------------------------ ---------- ----------
T1 920560 16378
explain plan for select * from t1;
select * from table(dbms_xplan.display);
Plan hash value: 3617692013
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 920K| 100M| 4463 (1)| 00:00:01 |
| 1 | TABLE ACCESS STORAGE FULL| T1 | 920K| 100M| 4463 (1)| 00:00:01 |
----------------------------------------------------------------------------------
1/29/17 14
15. Table statistics – Cardinality
select table_name,num_rows,blocks from user_tables where table_name='T1';
TABLE_NAME NUM_ROWS BLOCKS
------------------------------ ---------- ----------
T1 1 16378
explain plan for select * from t1;
select * from table(dbms_xplan.display);
Plan hash value: 3617692013
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1| 115| 4442 (1)| 00:00:01 |
| 1 | TABLE ACCESS STORAGE FULL| T1 | 1| 115| 4442 (1)| 00:00:01 |
----------------------------------------------------------------------------------
1/29/17 15
16. Column Statistics
• Optimizer uses
– Number of distinct values (NDV)
• [ALL|DBA|USER]_TAB_COLS.NUM_DISTINCT
• Used to determine selectivity (no histogram present)
– Number of NULLs
• [ALL|DBA|USER]_TAB_COLS.NUM_NULLS
• Used to estimate how many rows we dealing with
– Min/Max value
• [ALL|DBA|USER]_TAB_COLS.[LOW|HIGH]_VALUE
• Used to determine in|out-of range
1/29/17 16
17. Column statistics – NoHgrm
1/29/17 17
select column_name, num_distinct, num_nulls, histogram from user_tab_cols
where table_name = 'T1' and column_name like '%OBJECT_ID';
COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM
------------------------------ ------------ ---------- ---------------
OBJECT_ID 93192 0 NONE
DATA_OBJECT_ID 8426 835930 NONE
explain plan for select * from t1 where object_id = 1234;
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 1150 | 4453 (1)| 00:00:01 |
|* 1 | TABLE ACCESS STORAGE FULL| T1 | 10 | 1150 | 4453 (1)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("OBJECT_ID"=1234)
filter("OBJECT_ID"=1234)
Let’s do the math!
Total rows: 920560
NDV: 93192
920560 * 1/93192 ~= 10
18. Column statistics – NoHgrm
1/29/17 18
select column_name, num_distinct, num_nulls, histogram from user_tab_cols
where table_name = 'T1' and column_name like '%OBJECT_ID';
COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM
------------------------------ ------------ ---------- ---------------
OBJECT_ID 93192 0 NONE
DATA_OBJECT_ID 8426 835930 NONE
explain plan for select * from t1 where data_object_id = 1234;
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 1150 | 4454 (1)| 00:00:01 |
|* 1 | TABLE ACCESS STORAGE FULL| T1 | 10 | 1150 | 4454 (1)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage(”DATA_OBJECT_ID"=1234)
filter(”DATA_OBJECT_ID"=1234)
Let’s do the math!
Total rows: 920560
Total NULLs: 835930
NDV: 8426
(920560 – 835930)/8426 ~= 10
19. Column statistics – Min/Max
1/29/17 19
cook_raw(low_value,'NUMBER') low_v,cook_raw(high_value, 'NUMBER') high_v
COLUMN_NAME NUM_DISTINCT LOW_VALU HIGH_VAL
------------------------------ ------------ -------- --------
OBJECT_ID 93192 2 99953
DATA_OBJECT_ID 8426 0 99953
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
explain plan for select * from t1 where object_id = 99953;
|* 1 | TABLE ACCESS STORAGE FULL| T1 | 10 | 1150 | 4453 (1)| 00:00:01 |
explain plan for select * from t1 where object_id = 150000;
|* 1 | TABLE ACCESS STORAGE FULL| T1 | 5 | 575 | 4453 (1)| 00:00:01 |
The more we move far
away from the range, the
lower the estimation
20. Column Statistics
• Optimizer also uses
– Density
• Not stored in dictionary (old one was, new one no)
• Used for unpopular value selectivity
– Histogram
• [ALL|DBA|USER]_TAB_COLS.LOW_VALUE
• [ALL|DBA|USER]_TAB_COLS.HIGH_VALUE
• [ALL|DBA|USER]_TAB_HISTOGRAMS
• Used for popular value selectivity
1/29/17 20
25. Index Statistics
• Optimizer uses
– Blevel
• [ALL|DBA|USER]_INDEXES.BLEVEL
• Used to estimate how expensive is to locate first leaf
– Number of leaf blocks (LB)
• [ALL|DBA|USER]_INDEXES.LEAF_BLOCKS
• Used to estimate how many index leaf blocks to read
– Clustering Factor (CLUF)
• [ALL|DBA|USER]_INDEXES.CLUSTERING_FACTOR
• Used to estimate how many table blocks to read
– Distinct Keys (DK)
• [ALL|DBA|USER]_INDEXES.DISTINCT_KEYS
• Used to help with data correlation
1/29/17 25
27. Index Statistics
1/29/17 27
select index_name, blevel, leaf_blocks, distinct_keys, clustering_factor
from user_indexes where index_name = 'T1_IDX';
INDEX_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
----------- ---------- ----------- ------------- -----------------
T1_IDX 2 2039 92056 920530
explain plan for select * from t1 where object_id = 1234;
-----------------------------------------------------------------------------
| Id | Operation |Name |Rows | Bytes|Cost (%CPU)|
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10| 1150| 13 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED|T1 | 10| 1150| 13 (0)|
|* 2 | INDEX RANGE SCAN |T1_IDX| 10| | 3 (0)|
-----------------------------------------------------------------------------
2 - access("OBJECT_ID"=1234)
Distinct keys is
100% accurate
NUM_DISTINCT
is approximated
If CLUF ~= number
of rows in the table,
inefficient index
Cost jumps 10 for 10
rows (from 3 to 13) as
consequence of bad CLUF