SlideShare una empresa de Scribd logo
1 de 107
Master Product Manager
Oracle Database
February 2020
Maria Colgan
Best Practices For Managing
Optimizer Statistics
@SQLMaria
Part 2
Safe harbor statement
The following is intended to outline our general product direction. It is intended for information purposes
only, and may not be incorporated into any contract. It is not a commitment to deliver any material,
code, or functionality, and should not be relied upon in making purchasing decisions.
The development, release, timing, and pricing of any features or functionality described for Oracle’s
products may change and remains at the sole discretion of Oracle Corporation.
2
Confidential – © 2020 Oracle
3
• What statistics do I actual need?
• How should I gather statistics?
• What sample size should I use?
• When should I gather statistics?
• Should I use the auto stats job?
• What statistics don’t I need?
• How do I speed up statistics gathering?
Common Statistics Questions and Fears
4
• Best practice recommendations work for ~80% of systems 90% of the time
• There are always exceptions
• Your goal should be to provide the optimizer a representative set of stats
Warning on Best Practices
• Indicates a recommendation
• Indicates a warning of potential size effects
• To help you navigate this presentation:
DATA DICTIONARY
OPTIMIZER STATISTICS
Index Table Column System
PROMO_PK Index
PROMOTIONS Table
Execution
plan
Optimizer
CPU & IO
Optimizer Statistics
PROMO_I
D
PROMO_NAME … PROMO_DATE
1 Promo_1 … 15-NOV-98
2 Promo_1 … 31-DEC-98
6
5
4
3
2
1
Statistics Gathering Performance
When to Gather Statistics
Additional Statistics
What Basic Statistics to Gather
How to Gather Statistics
AGENDA
6 When Not to Gather
7 Other Types of Statistics
• Oracle automatically collect statistics for all database objects, which are missing
statistics or have stale statistics
• The stats gathering AutoTask runs during a predefined maintenance window
• Internally prioritizes the database objects
• Both user schema and dictionary tables
• Objects that need updated statistics most are processed first
• Controlled by DBMS_AUTO_TASK_ADMIN package or Enterprise Manager
• Monitor using DBA_AUTOTASK_* views
Automatic Statistics Gathering
How to gather statistics
and fixed object stats from 12c onwards
• Starting in 12c there is a new Multitenant architecture
• Each PDB has its own stats gathering autotask &
its own maintenance window (default all same time)
• From 12.2 onwards use AUTOTASK_MAX_ACTIVE_PDBS to control how many
PDBs can run autotasks concurrently (Default 2)
• Use ENABLE_AUTOMATIC_MAINTENANCE_PDB to disable the maintenance
window at the PDB level (Default TRUE)
Automatic Statistics Gathering in a PDB environment
How to gather statistics
CRMPricing Retail
PDB
CDB
NOTE: From 20c onward Multitenant architecture is the only supported architecture
ANALYZE command is deprecated
• DO NOT USE
• Only good for row chaining
The GATHER_*_STATS procedures take 13 parameters
• Ideally you should only set the first 2-3 parameters
- SCHEMA NAME
- TABLE NAME
- PARTITION NAME
Use DBMS_STATS Package
How to gather statistics
Your gather statistics commands should be this simple
BEGIN
dbms_stats.Gather_database_stats();
END; /
BEGIN
dbms_stats.Gather_schema_stats('SH');
END; /
BEGIN
dbms_stats.Gather_table_stats('SH', 'SALES');
END; /
Use DBMS_STATS Package
How to gather statistics
Occasionally default parameter values may need to change
For example - features not automatically on by default
• Incremental Statistics
- Ability to accurate generate global statistics from partition level statistics
- Controlled by the parameter INCREMENTAL (default is FALSE)
• Concurrent Statistics Gathering
- Ability to gather statistics on multiple objects concurrently under a GATHER_SCHEMA_STATS
command
- Controlled by the parameter CONCURRENT (default is FALSE)
Changing default parameter values for gathering statistics
How to gather statistics
Can change the default value at the global level
• DBMS_STATS.SET_GLOBAL_PREFS
• This changes the value for all existing objects and any new objects
BEGIN
dbms_stats.Set_global_prefs('INCREMENTAL', 'TRUE');
END;
/
Can change the default value at the table level
• DBMS_STATS.SET_TABLE_PREFS
Changing default parameter values for gathering statistics
How to gather statistics
Can change the default value at the schema level
• DBMS_STATS.SET_SCHEMA_PREFS
• Current objects in the schema only
• New objects pick up global preferences
Can change the default value at the database level
• DBMS_STATS.SET_DATABASE_PREFS
• Current objects in the Database only
• New objects pick up global preferences
Changing default parameter values for gathering statistics
How to gather statistics
The following parameter defaults can be changed:
• APPROXIMATE_NDV_ALGORITHM
• AUTO_STATS_EXTENSIONS
• AUTOSTATS_TARGET*
• CASCADE
• CONCURRENT*
• DEGREE
• ESTIMATE_PERCENT
• GLOBAL_TEMP_TABLE_STATS*
• GRANULARITY
• INCREMENTAL
• INCREMENTAL_LEVEL**
• INCREMENTAL_STALENESS
• METHOD_OPT
• NO_INVALIDATE
• OPTIONS
• PERFERENCE_OVERRIDES_PARAMETER
• PUBLISH
• STALE_PERCENT
• STAT_CATEGORY*
• TABLE_CACHED_BLOCKS
• WAIT_TIME_TO_UPDATE_STATS*
* Parameters that can only be set at a Global level **Parameter that can only be set at a table level
Changing default parameter values for gathering statistics
How to gather statistics
Hierarchy for parameter values
How to gather statistics
DBMS_STATS.GATHER_*_STATS
Parameter hierarchy
Parameter values explicitly specified in the
gather statistics command
Table level preferences
Global preferences
# 1 most commonly asked question
“What sample size should I use?”
Controlled by ESTIMATE_PRECENT parameter
From 11g onwards use default value AUTO_SAMPLE_SIZE
• New hash-based algorithm
• Speed of a 10% sample
• Accuracy of 100% sample
Sample size
How to gather statistics
More info in the following paper http://dl.acm.org/citation.cfm?id=1376721
HowAUTO_SAMPLE_SIZE works
How to gather statistics
TABLE
In-memory hash table
space for ~ 16K distinct values
Apply hash function to
each value
In-memory hash table
space for ~ 16K distinct values
Split Level 0
Apply hash function
to each value
Split Level 1
TABLE
How AUTO_SAMPLE_SIZE works
How to gather statistics
Split Level 2
NDV = 2^split_level
TABLE
How AUTO_SAMPLE_SIZE works
How to gather statistics
Speed of a 10% sample
Accuracy of 100% sample
HowAUTO_SAMPLE_SIZE performs
How to gather statistics
Run Num AUTO_SAMPLE_SIZE 10% SAMPLE 100% SAMPLE
1 00:02:21.86 00:02:31.56 00:08:24.10
2 00:02:38.11 00:02:49.49 00:07:38.25
3 00:02:39.31 00:02:38.55 00:07:37.83
Column
Name
NDV with
AUTO_SAMPLE_SIZE
NDV with
10% SAMPLE
NDV with
100% SAMPLE
C1 59852 31464 60351
C2 1270912 608544 1289760
C3 768384 359424 777942
21
5
4
3
2
1
Statistics Gathering Performance
When to Gather Statistics
Additional Statistics
What Basic Statistics to Gather
How to Gather Statistics
AGENDA
6 When Not to Gather
7 Other Types of Statistics
By default the following basic table & column statistic are collected
either during a CTAS or IAS command* or via a
DBMS_STATS.GATHER_*_STATS command :
• Number of Rows++
• Number of blocks++
• Average row length
• Min / Max values of a column++
• Number of distinct values
• Number of nulls in column
What basic statistics to collect
*Online statistics gather possible from Oracle Database 12cR1
++ Real-time statistics gather possible from 19c onwards on Exadata & Cloud
Table Statistics
Index statistics are automatically gathered during creation and maintained
by DBMS_STATS.GATHER_*_STATS commands
Index statistics include:
• Number of leaf blocks
• Branch Levels
• Clustering factor
Basic index statistics collected by Oracle
Root Block
Branch Block
Leaf Block
Index Statistics
Basic index statistics collected by Oracle
Root Block
Branch Block
Leaf Block
Table
Database
Block
Database
Block
Database
Block
Low Clustering
Factor
Index statistics are automatically gathered during creation and maintained
by DBMS_STATS.GATHER_*_STATS commands
Index statistics include:
• Number of leaf blocks
• Branch Levels
• Clustering factor
Index Statistics
Basic index statistics collected by Oracle
Root Block
Branch Block
Leaf Block
Table
Database
Block
Database
Block
Database
Block
High Clustering
Factor
Index statistics are automatically gathered during creation and maintained
by DBMS_STATS.GATHER_*_STATS commands
Index statistics include:
• Number of leaf blocks
• Branch Levels
• Clustering factor
Index Statistics
• Histograms tell the Optimizer about the data distribution in a Column
• Creation controlled by METHOD_OPT parameter
• Default create histogram on any column that has been used in the WHERE
clause or GROUP BY of a statement and has a data skew
• Relies on column usage information gathered at compilation time and stored
in SYS.COL_USAGE$
• Four types of histograms
Histograms
What basic statistics to collect
• Height balanced
• Hybrid
• Frequency
• Top-Frequency
• A frequency histogram is only created if the number of distinct values in a column
(NDV) is less than 254 values
• Each distinct value will have it own bucket
Frequency histograms (FREQUENCY)
Histograms
Frequency histogram
Step 1: SELECT job_id FROM employees ORDER BY job_id;
Creating a frequency histogram
JOB_ID
==========
AC_ACCOUNT
AC_MGR
AD_ASST
AD_PRES
AD_VP
AD_VP
FI_ACCOUNT
FI_ACCOUNT
:
Step2 : Allocate a histogram buckets for each pf the job_ids listed
Creating a frequency histogram
Bucket 1 has end point AC_ACCOUNT
Bucket 2 has end point AC_MGR
Bucket 3 has end point AD_ASST
Bucket 4 has end point AD_PRES
Bucket 5 has end point AD_VP
Bucket 6 has end point AD _VP
Bucket 7 has end point FI_ACCOUNT
Bucket 8 has end point FI_ACCOUNT
JOB_ID
==========
AC_ACCOUNT
AC_MGR
AD_ASST
AD_PRES
AD_VP
AD_VP
FI_ACCOUNT
FI_ACCOUNT
:
Take Note of the Bucket # that
have duplicated end point
values
Step3 : Compress duplicate buckets to reduce the bucket count to 254 or less
Creating a frequency histogram
Bucket 1 has end point AC_ACCOUNT
Bucket 2 has end point AC_MGR
Bucket 3 has end point AD_ASST
Bucket 4 has end point AD_PRES
Bucket 6 has end point AD _VP
Bucket 11 has end point FI_ACCOUNT
JOB_ID
==========
AC_ACCOUNT
AC_MGR
AD_ASST
AD_PRES
AD_VP
FI_ACCOUNT
:
NOTE: the column named endpoint_number in USER_HISTOGRAMS is the bucket number
How Optimizer uses frequency histograms
Histograms
SELECT *
FROM employee
WHERE job_id = 'AD_VP';
Cardinality estimate = #rows X # buckets value is in
total # of buckets
107 X 2 2
107
= =
EmployeeTable
LAST_NAM
E
EMP_ID JOB_ID
GEITZ 206 AC_ACCT
HIGGINS 205 AC__MGR
WHALEN 200 AD_ASST
KING 100 AD_PRES
KOCHHAR 101 AD_VP
DE HAAN 102 AD_VP
Number of buckets calculated by finding the bucket number for value and subtracting the previous bucket number (6-4 = 2)
ATop-Frequency histogram created if less than 254 distinct values account for
99.9% of the entries in a column & AUTO_SAMPLE_SIZE is used
Top-Frequency Histograms - Starting in 12c
Histograms
Frequency histogramTop-Frequency histogram
1
• Traditionally a frequency histogram is only created if NDV < 254
• But if a small number of values occupies most of the rows (>99% rows)
• Creating a frequency histograms on that small set of values is very useful even
though NDV is greater than 254
• Ignores the unpopular values to create a better-quality histogram for popular
values
• Built using the same technique used for frequency histograms
• Only created with AUTO_SAMPLE_SIZE
Top Frequency (TOP-FREQUENCY)
Histograms
• Imagine we have store that sells Christmas decorations
• Majority of our sales take place from Nov – Jan
• But some small number of sales are made throughout the year
Top Frequency Histogram Example
TIME_ID column
perfect candidate for
top-frequency
histogram
• A height balanced histogram is created if the number of distinct values
in a column (NDV) is greater than 254 values
• Goal is to put an even number of rows in each of the 254 buckets
Height balanced histograms (HEIGHT BALANCED)
Histograms
Height balanced histogram
Step 1: SELECT cust_city_id FROM customers ORDER BY cust_city_id;
Creating a height-balance histogram
Row count CUST_CITY_ID
1 51040
2 51040
: :
219 51043
: :
5256 51166
: :
5475 51166
: :
55500 52531
Step 2: Assign an equal number of rows per bucket
Creating a height-balance histogram
Row count CUST_CITY_ID
1 51040
2 51040
: :
219 51043 Bucket 1 has end point 51043
: :
5256 51166 Bucket 24 has end point 51166
: :
5475 51166 Bucket 25 has end point 51166
: :
55500 52531 Bucket 254 has end point 52531
Number of rows per
bucket:
55,500 / 254 = 218.5
Step 3: If endpoint of 1st bucket is not min value add 0 bucket
Creating a height-balance histogram
Row count CUST_CITY_ID
1 51040 Bucket 0 has end point 51040
2 51040
: :
219 51043 Bucket 1 has end point 51043
: :
5256 51166 Bucket 24 has end point 51166
: :
5475 51166 Bucket 25 has end point 51166
: :
55500 52531 Bucket 254 has end point 52531
But now we have 255
buckets instead of
254
How do we reduce it?
Step 4: Compress duplicate buckets
Creating a height-balance histogram
Row count CUST_CITY_ID
1 51040 Bucket 0 has end point 51040
2 51040
: :
219 51043 Bucket 1 has end point 51043
: :
: :
5475 51166 Bucket 25 has end point 51166
: :
55500 52531 Bucket 254 has end point 52531
Information on histograms found in USER_HISTOGRAMS
Monitoring histograms
Not all 254 buckets used due
to compression step
Bucket 16 is missing because
buckets 16 & 17 had the same
endpoint value
A hybrid balanced histogram is created if the number of distinct values in a
column (NDV) is greater than 254 values & AUTO_SAMPLE_SIZE is used
Hybrid Histograms (HYBRID) - Starting in 12c
Histograms
Hybrid histogram
1
• Like height balanced histogram hybrids are created if the NDV >254
• Store the actual frequencies of bucket endpoints in histograms
• No value allowed to spill over multiple buckets
• More endpoint values can be squeezed in a histogram
• Achieves the same effect as increasing the # of buckets
• Only created with AUTO_SAMPLE_SIZE
Hybrid Histograms (HYBRID) - Starting in 12c
Histograms
Step 1: SELECT row_num, time_id FROM sales ORDER BY 2;
Hybrid Histogram Example
ROWNUM TIME_ID
1 02-JAN-98
2 03-JAN-98
3 05-JAN-98
4 05-JAN-98
5 06-JAN-98
6 09-JAN-98
7 09-JAN-98
8 09-JAN-98
9 10-JAN-98
10 10-JAN-98
: :
There are 960 rows in the sales table
Automatically created histograms have 254
buckets
That is 3.77 rows per bucket
ROWNUM TIME_ID BUCKET INFORMATION FREQUENCY OF ENDPOINT
1 02-JAN-98 Bucket 1 has end point 02-JAN-98 1
2 03-JAN-98
3 05-JAN-98
4 05-JAN-98
5 06-JAN-98
6 09-JAN-98
7 09-JAN-98
8 09-JAN-98
9 10-JAN-98
10 10-JAN-98
: :
Step 2: Assign roughly an equal number of rows per bucket
Hybrid Histogram Example
Bucket 5 has end point 06-JAN-98 1
Bucket 10 has end point 10-JAN-98 2
• No two buckets have the same
endpoint
• Frequency of endpoint values
recorded in new column called
Endpoint Repeat Count
Monitoring
Hybrid Histogram Example
:
Number of
occurrences of
the endpoint
value in the
sample
• Popular values in height-balance are the endpoint for two or more buckets
• Popular values in hybrid are the endpoint of a bucket
• Formulas used are:
Height-balanced: Number of endpoint buckets X #rows in table
Total number of buckets
Hybrid: Endpoint Frequency X #rows in table
Max buckets
Optimizer used two different formulas depend on the popularity of the value
How the Optimizer uses Height-balanced & Hybrid histograms
• Non-popular value means values that are the endpoint for only one bucket on a
height-balance histogram or are not an endpoint at all
• Formula used is:
DENSITY X number of rows in the table
NOTE: Density from 10.2.0.4 is calculated on the fly based on histogram information
and is not the value show in USER_HISTOGRAMS
Optimizer used two different formulas depend on the popularity of the value
How the Optimizer uses Height-balanced & Hybrid histograms
Popular values use Height-Balance histogram information
How the Optimizer uses histograms
SELECT Count(*)
FROM sales
WHERE cust_id = 22300;
COUNT(*)
-------------
5
Cardinality estimate:
Number of endpoint buckets X number of rows in table
total number of buckets
2 X 960 = 7.71 = 8
249
Popular values use Hybrid histogram information
How the Optimizer uses histograms
SELECT Count(*)
FROM sales
WHERE cust_id = 22810;
COUNT(*)
-------------
5
Cardinality estimate:
Endpoint Frequency X number of rows in the table
Max buckets
5 X 960 = 5
960
Non-popular values use density
How the Optimizer uses histograms
SELECT Count(*)
FROM sales
WHERE cust_id = 200;
COUNT(CUST_ID)
--------------
2
Cardinality estimate:
DENSITY X number of rows
0.00104 X 960
Histograms have a bad reputation
Why people hate histograms
How to gather statistics
WHY?
They can cause instability
They can increase the time
to gather statistics
Two main cause of instability with Histograms
1. Bind peeking interacts with histograms
2. Nearly popular values
Why people hate histograms
How to gather statistics
Adaptive Cursor Sharing introduced in 11gR2 helps relieve a lot
of the problems caused by bind peeking and histograms
• Share the plan when binds values are “equivalent”
• Plans are marked with selectivity range
• If current bind values fall within range, they use the same plan
• Create a new plan if binds are not equivalent
• Generating a new plan with a different selectivity range
• Controlled by init.ora parameter _optim_peek_user_binds
• Monitoring - V$SQL has 2 new columns
• IS_BIND_SENSITIVE - Optimizer believes the plan may depend on the value of bind
• IS_BIND_AWARE - Multiple execution plans exist for this statement
• Can force bind awareness via the BIND_AWARE hint
Adaptive Cursor Sharing
Solutions for bind peeking and histograms
Applications that only have statements with binds
• Drop histogram using DBMS_STATS.DELETE_COL_STATS
• Use DBMS_STATS.SET_PARM to change default setting for method_opt parameter to prevent
histogram from being created
• Re-gather statistics on the table without histogram
Applications that have statements with bind and literals
• Switch off bind peeking _optim_peek_user_binds = false
Prior to 11gR2
Solutions for bind peeking and histograms
Two main cause of instability with Histograms
1. Bind peeking interacts with histograms
2. Nearly popular values
Why people hate histograms
How to gather statistics
Nearly popular values are classified as non-popular values that means
their cardinality estimates is based on a density calculation, which is
not always accurate enough for them
• To get an accurate cardinality estimate use dynamic sampling
• OR …
Nearly popular value means the value is classified as non-popular but the density
calculation is not accurate for them
Cardinality estimates is based on a density calculation
Nearly popular values
Same estimate used as for non-popular. Here
density is not good enough to get accurate
cardinality estimate
To get an accurate cardinality estimate for nearly popular values use dynamic
sampling
Solution 1 dynamic sampling
Nearly popular values
If hints aren’t possible and stability is a priority drop the histogram
BEGIN
dbms_stats.Delete_column_stats(ownname=>'SH', tabname=>'CUSTOMERS', -
colname=>'CUST_CITY_ID', col_stat_type=>'HISTOGRAM');
END;
/
BEGIN
dbms_stats.Set_table_prefs('SH', 'CUSTOMERS', 'METHOD_OPT', -
'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 1 CUST_CITY_ID');
END;
/
Solution 2 Drop the Histogram
Nearly popular values
59
5
4
3
2
1
Statistics Gathering Performance
When to Gather Statistics
Additional Statistics
What Basic Statistics to Gather
How to Gather Statistics
AGENDA
6 When Not to Gather
7 OtherTypes of Statistics
Two types of Extended Statistics
• Column groups statistics
- Useful when multiple column from the same table are used in where clause predicates
• Expression statistics
- Useful when a column is used as part of a complex expression in where clause predicate
Can be manually or automatically created
Automatically maintained when statistics are gathered on the table
When table and column statistics are not enough
Additional statistics
61
Extended Statistics – column group statistics
SELECT *
FROM vehicles
WHERE model = '530xi'
AND color = 'RED’;
MAKE MODEL COLOR
------ ------ ------
BMW 530xi RED
Vehicles Table
(12 ROWS)
MAKE MODEL COLOR
BMW 530xi RED
BMW 530xi BLACK
BMW 530xi SILVER
TESLA P85D GRAY
: : :
PORSCHE 911 RED
Cardinality #ROWS * 1 * 1 12 * 1 * 1 1
NDV c1 NDV c2 4 3
= = =
62
Extended Statistics – column group statistics
SELECT *
FROM vehicles
WHERE model = '530xi'
AND make = ’BMW’;
MAKE MODEL COLOR
------ ------ ------
BMW 530xi RED
Vehicles Table
(12 ROWS)
MAKE MODEL COLOR
BMW 530xi RED
BMW 530xi BLACK
BMW 530xi SILVER
TESLA P85D GRAY
: : :
PORSCHE 911 RED
Cardinality #ROWS * 1 * 1 12 * 1 * 1 1
NDV c1 NDV c2 4 3
= = =
Create extended statistics on the MODEL & MAKE columns using the
CREATE_EXTENDED_STATS function in the DBMS_STATS package
New Column with
system generated
name
Extended Statistics – column group statistics
64
Extended Statistics – column group statistics
SELECT *
FROM vehicles
WHERE model = '530xi'
AND make = ’BMW’;
MAKE MODEL COLOR
------ ------ ------
BMW 530xi RED
Vehicles Table
(12 ROWS)
MAKE MODEL COLOR
BMW 530xi RED
BMW 530xi BLACK
BMW 530xi SILVER
TESLA P85D GRAY
: : :
PORSCHE 911 RED
Cardinality calculated using column group statistics
65
Extended Statistics – expression statistics solution
SELECT *
FROM Customers
WHERE UPPER(CUST_NAME) = 'SMITH';
Customers Table
(55,500 ROWS)
CUST_ID NAME EMAIL
00001 Smith Smith@...
00002 Jones Jones@....
00003 Colgan Colgan@...
00004 Lewis Lewis@...
: : :
55500 Baer Baer@...
• Optimizer doesn’t know how function affects
values in the column
• Optimizer guesses the cardinality to be 1% of
rows
Cardinality estimate is
1% of the rows
Extended Statistics – expression statistics solution
New Column with
system generated name
1. Start column group usage capture
BEGIN
dbms_stats.Seed_col_usage(NULL, NULL, 300);
END;
/
Automatic column group detection
Extended Statistics
Switches on monitoring for 300
seconds or the next 5 minutes. Any
statement executed will be
monitored for columns used in the
where and group by clauses
Directives are additional information used during optimization to generate a plan
• For example, when tableT1 is joined toT2 use dynamic statistics to get accurate cardinality
estimate
• Directives are collected on query expressions not at a statement level
- Allows for directives to be used for multiple statements
• Persisted on disk in the SYSAUX tablespace
• Managed using the new package DBMS_SPD
• Column group statistics can be automatically created via directives
Automatic Creation via SQL Plan Directives in 12c
Extended Statistics
By default in 12.1 directives were always created & used
• Use of directive and the automatic creation of column groups is controlled by the parameter
OPTIMIZER_ADAPTIVE_FEATURES (default value is TRUE)
From 12.2 onward directives are always created but not always used
• Use of SQL Plan Directives by the Optimizer is controlled by the parameter
OPTIMIZER_ADAPTIVE_STATISTICS (default value is FALSE)
• To have column group statistics automatically created based on directives set the DBMS_STATS
preference AUTO_STAT_EXTENSIONS to ON (default value is OFF)
Automatic Creation via SQL Plan Directives in 12c and Beyond
Extended Statistics
70
5
4
3
2
1
Statistics Gathering Performance
When to Gather Statistics
Additional Statistics
What Basic Statistics to Gather
How to Gather Statistics
AGENDA
6 When Not to Gather
7 OtherTypes of Statistics
• Oracle automatically collect statistics for all database objects, which are missing
statistics or have stale statistics
• AutoTask run during a predefined maintenance window
• Internally prioritizes the database objects
• Both user schema and dictionary tables and fixed objects from 12c onwards
• Objects that need updated statistics most are processed first
• Controlled by DBMS_AUTO_TASK_ADMINpackage or via Enterprise Manager
Automatic Statistics Gathering
When to gather statistics
Automatic Statistics Gathering
If you want to disable auto job for application schema leaving it on forOracle dictionary
tables
The scope of the auto job is controlled by the global preference AUTOSTATS_TARGET
Possible values are
• AUTO Oracle decides what tables need statistics (Default)
• All Statistics gathered for all tables in the system
• ORACLE Statistics gathered for only the dictionary tables
Automatic Statistics Gathering
Let Oracle take care of itself
• Remember use ENABLE_AUTOMATIC_MAINTENANCE_PDB to disable the maintenance
window at the PDB level
• Starting in 12c statistics gathered as part of the direct path load operations
• CreateTable As Select or Insert As Select commands
• Most essential statistics available directly after load
• No additional table scan required to gather statistics
• All internal maintenance operations that use CTAS benefit from this
• Note only occurs on IAS if table is empty
• Remaining statistics gathering is deferred until Auto StatsTask
Online Statistics Gathering
Note: Histograms are not created as they require an additional scan of the data
Use _optimizer_gather_stats_on_load_hist to enable histogram collection
75
• Gather statistics as part of conventional DML (Insert/update/merge)
• Statistics gathered during DML needs to be fast with negligible overhead
• Only most essential stats are gathered to avoid catastrophic SQL execution plan
performance regressions (e.g. avoiding out-of-range conditions)
• Min, Max, num_rows, etc.
• Gathering of remaining statistics is deferred
• Automatic statistics gathering job still kicks in as needed
Real-Time Statistics
After a large data load
• Include gathering statistics as part of the ETL or ELT process
If trickle loading into a partition table
• Used dbms_stats.Copy_table_stats()
- Copies stats from source partition
- Adjust min & max values for partition column
◦ Both partition & global statistics*
- Copies statistics of the dependent objects
◦ Columns, local (partitioned) indexes* etc.
If the Auto Statistics Gather Job is not suitable
When to gather statistics
Note*: By default global statistics are not adjusted. You must set the FLAGS parameter of the
DBMS_STATS.COPY_TABLE_STATS to 8 for global stats to be adjusted.
Global indexes statistics are never adjusted
Feb 1st 2020
…
Feb 12th 2020
Feb 13th 2020
Partitioned Table
77
5
4
3
2
1
Statistics Gathering Performance
When to Gather Statistics
Additional Statistics
What Basic Statistics to Gather
How to Gather Statistics
AGENDA
6 When Not to Gather
7 OtherTypes of Statistics
Four options to speed up statistics gathering
• Intra object using parallel execution
• Inter object using concurrency
• The combination of Inter and Intra object
• Incremental statistics gathering for partitioned tables
How to speed up statistics gathering
Statistics gathering performance
• Controlled by GATHER_*_STATS parameter DEGREE
• Default is to use parallel degree specified on object
• If set to AUTO Oracle decide parallel degree used
• Works on one object at a time
Intra object using parallel execution
Statistics gathering performance
Customers table has a degree of parallelism of 4
4 parallel server processes will be used to gather stats
Intra object using parallel execution
Statistics gathering performance
P4
P3
P2
P1
Customers Table
dbms_stats.Gather_table_stats('SH', 'SALES');
Intra object using parallel execution
Statistics gathering performance
The statistics gather procedure on each
individual partition operates in parallel
BUT the statistics gathering procedures
won’t happen concurrently
Each individual partition will have
statistics gathered one after the other
P4
P3
P2
P1Feb 11th 2020
Feb 12th 2020
Feb 13th 2020
Sales Table
• Gather statistics on multiple objects at the same time
• Controlled by DBMS_STATS preference, CONCURRENT
• Uses Database Scheduler and Advanced Queuing
• Number of concurrent gather operations controlled by JOB_QUEUE_PROCESSES
parameter
• Each gather operation can still operate in parallel
Inter object
Statistics gathering performance
Intra object statistics gathering for SH schema
Statistics gathering performance
A separate statistics gathering job is
created for each table and each
partition in the schema
Level 1 contain statistics gathering jobs
for all non-partitioned tables and a
coordinating job for each partitioned
table
Level 2 contain statistics gathering jobs
for each partition in the partitioned
tables
dbms_stats.Gather_schema_stats(’SH');
dbms_stats.Gather_schema_stats('SH');
Inter and intra working together for partitioned objects
Statistics gathering performance
The number of concurrent gathers is controlled
by the parameter JOB_QUEUE_PROCESSES
In this example it is set to 3
Remember each concurrent gather operates in
parallel
In this example the parallel degree is 4
Feb 10th 2020
Feb 11th 2020
Feb 12th 2020
Sales Table
Typically gathering statistics after a bulk load into one partition would causes a full
scan of all partitions to gather global table statistics
• Extremely time consuming
With Incremental Statistic gather statistics for touched partition(s) ONLY
• Table (global) statistics are accurately built from partition statistics
• Reduce statistics gathering time considerably
• Controlled by INCREMENTAL preference
Incremental statistics gathering for partitioned tables
Statistics gathering performance
Incremental Statistics Gathering
1. Partition level stats are gathered & synopsis created
2. Global stats generated by aggregating
partition level statistics and synopsis
Sysaux Tablespace
Feb 9th 2020
Feb 10th 2020
Feb 11th 2020
Sales Table
Feb 12th 2020
Feb 13th 2020
Incremental Statistics Gathering
Sysaux Tablespace
Feb 9th 2020
Feb 10th 2020
Feb 11th 2020
Sales Table
Feb 12th 2020
Feb 13th 2020
Feb 14th 2020
3. A new partition is added to the table & Data is Loaded
4. Gather partition statistics for new
partition
5. Retrieve synopsis for each
of the other partitions from
Sysaux
6. Global stats generated by aggregating the
original partition synopsis with the new one
Incremental statistics have a bad reputation
Why some people hate incremental statistics
Incremental Statistics Gathering
WHY?
Originally, they were
bloated and slow
How expensive is the creation of a synopsis?
Incremental Statistics Gathering
Feb 9th 2020
Feb 10th 2020
Feb 11th 2020
Sales Table
Feb 12th 2020
Feb 13th 2020
feb 14th 2020
Synopsis can now be created in two ways
• Adaptive Sampling
- Database uses an adaptive sampling algorithm which
provide very accurate results, but the resulting synopsis can
be large
• Hyperloglog
- Database uses the hyperloglog algorithm which offers
similar accuracy of adaptive sampling but requires
significantly less memory as the synopsis size is greatly
reduce
• It is possible to have both types on one table
Control by the new DBMS_STATS preference
APPROXIMATE_NDV_ALGORITHM
• Number of rows is a factor of the # of columns, partitions & NDV per column
• One entry for each column, for each partition
How expensive is the creation Adaptive Statistics synopsis?
Incremental Statistics Gathering
• No rows are store in the WRI$_OPTSTAT_SYNOPSIS$
• One entry for each column, for each partition
How expensive is the creation Hyperloglog synopsis?
Incremental Statistics Gathering
How expensive is the creation Hyperloglog synopsis?
Incremental Statistics Gathering
Only binary representation of NDV stored for each column in each
partition
93
• During data load some rows often going into the older partitions
• In 11g any DML on older partitions triggered partition statistics to be
re-gathered
• New DBMS_STATS preference INCREMENTAL_STALENESS
• When set to USE_STALE_PERCENT, DML on less than 10% of rows
in older partitions will not trigger re-gather
• When set to USE_LOCKED_STATS, locked partitions stats are never
considered stale, regardless of DML changes
• When set to ALLOW_MIXED_FORMAT, adaptive statistics &
hyperloglog synopsis can exist on the same table
What happens if data changes in one of the existing partitions?
Incremental Statistics Gathering
Feb 9th 2020
Feb 10th 2020
Feb 11th 2020
Sales Table
Feb 12th 2020
Feb 13th 2020
feb 14th 2020
94
What happens if you do a partition exchange load?
Incremental Statistics Gathering
1. Create external table for flat files
2. Use CTAS command to create
non-partitioned table
Feb 14th 2020
4. Gather Statistics
Sept 9th 2017
Sept 10th 2017
Sept 11th 2017
Sept 12th 2017
Sept 13th 2017
Sept 14th 2017
Feb 9th 2020
Feb 10th 2020
Feb 11th 2020
Sales Table
Feb 12th 2020
Feb 13th 2020
3. Alter table Sales exchange partition Sept_14_2020
with table tmp_sales
feb 14th 2020
95
What happens if you do a partition exchange load?
Incremental Statistics Gathering
1. Create external table for flat files
2. Use CTAS command to create
non-partitioned table
Feb 14th 2020
Sept 9th 2017
Sept 10th 2017
Sept 11th 2017
Sept 12th 2017
Sept 13th 2017
Sept 14th 2017
Feb 9th 2020
Feb 10th 2020
Feb 11th 2020
Sales Table
Feb 12th 2020
Feb 13th 2020
5. Alter table Sales exchange partition Sept_14_2020
with table tmp_sales
feb 14th 2020
3. Set INCREMENTAL to true &
INCREMENTAL_LEVEL to TABLE
4. Gather Statistics
96
5
4
3
2
1
Statistics Gathering Performance
When to Gather Statistics
Additional Statistics
What Basic Statistics to Gather
How to Gather Statistics
AGENDA
6 When Not to Gather
7 OtherTypes of Statistics
Volatile Tables
• Volume of data changes dramatically over a period of time
• For example orders queue table
- Starts empty, orders come in, order get processed, ends day empty
GlobalTempTables*
• Application code stores intermediate result
• Some session have a lot of data, some have very little
Intermediate work tables
• Written once, read once and then truncated or deleted
• For example part of an ETL process
When not to gather statistics
* Not the case in Oracle Database 12c
• Data volume changes dramatically over time
• When is a good time to gather statistics?
• Gather statistics when the table has a representative data volume
• Lock statistics to ensure statistics gathering job does not over write representative
statistics
Volatile tables
When not to gather statistics
Often seen as part of an ETL process
Written once, read once, and then truncated or deleted
When do you gather statistics?
Don’t gather statistics it will only increase ETL time
Use Dynamic sampling
• Add dynamic sampling hint to SQL statements querying the intermediate table
Intermediate work tables
When not to gather statistics
Add dynamic sampling hint or set it at the session or system level
SELECT /*+ dynamic_sampling(c 2) */ c.*
FROM customers_staging_tab c
WHERE cust_address_change = ‘Y’;
Intermediate work tables
When not to gather statistics
• Traditionally statistics gathered on GTT were shared by all sessions
• Share statistics are not always optimal
• Now each session can have its own version of statistics forGTT
• Controlled by new preference GLOBAL_TEMP_TABLE_STATS
• Default value is SESSION (non shared)
• To force sharing (as in 11g) set table preference to SHARED
Overview
Session private statistics for GTT
102
5
4
3
2
1
Statistics Gathering Performance
When to Gather Statistics
Additional Statistics
What Basic Statistics to Gather
How to Gather Statistics
AGENDA
6 When Not to Gather
7 OtherTypes of Statistics
• Dictionary Statistics
• Fixed Object Statistics
• System Statistics
• Dynamic Sampling
• Cloning databases
Other types of statistics
• Statistics on dictionary tables
• Maintained as part of the automatic statistics job
- Set preference AUTOSTATS_TARGET to ORACLE to only gather Dictionary stats
• Can be manually gather using
- Dbms_stats.Gather_dictionary_stats
- Set options to AUTO
Dictionary statistics
• Fixed objects are the x$ tables and their indexes
- v$ performance views built on top of them
• NOT maintained as part of the automatic statistics task in 11g but in 12c*
• Dynamic sampling not used on X$ tables when stats are missing or stale
• Can be manually gathering using
- dbms_stats.Gather_fixed_object_stats
• X$ tables are transient so gather when representative workload is active
• May need to be re-gathered after converting to a PDB from non-PDB
Fixed object statistics
* In 12c fixed object statistics are gathered as part of auto statistics gathering if they have not been previously collected!
• Information about hardware database is running on
• NOT maintained as part of the automatic statistics job
• Manually gathered using
- dbms_stats.Gather_system_stats
• Can be gathered with or without a workload
• Default setting are recommended for most systems
• For Exadata systems new “EXADATA” mode available
System statistics
Not recommend to mess with system statistics
– defaults are generally best
Join the Conversation
107
https://twitter.com/SQLMaria
https://blogs.oracle.com/optimizer/
https://sqlmaria.com
https://www.facebook.com/SQLMaria

Más contenido relacionado

La actualidad más candente

What to Expect From Oracle database 19c
What to Expect From Oracle database 19cWhat to Expect From Oracle database 19c
What to Expect From Oracle database 19cMaria Colgan
 
Optimize and Simplify Oracle 12C RAC using dNFS, ZFS and OISP
Optimize and Simplify Oracle 12C RAC using dNFS, ZFS and OISPOptimize and Simplify Oracle 12C RAC using dNFS, ZFS and OISP
Optimize and Simplify Oracle 12C RAC using dNFS, ZFS and OISPSecure-24
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder
 
OOUG: Oracle transaction locking
OOUG: Oracle transaction lockingOOUG: Oracle transaction locking
OOUG: Oracle transaction lockingKyle Hailey
 
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And Whatudaymoogala
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerMongoDB
 
TFA Collector - what can one do with it
TFA Collector - what can one do with it TFA Collector - what can one do with it
TFA Collector - what can one do with it Sandesh Rao
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMydbops
 
Exploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12cExploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12cZohar Elkayam
 
SQL Plan Directives explained
SQL Plan Directives explainedSQL Plan Directives explained
SQL Plan Directives explainedMauro Pagano
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performanceoysteing
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsFrederic Descamps
 
Tanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder
 
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022InfluxData
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxData
 
Rman 12c new_features
Rman 12c new_featuresRman 12c new_features
Rman 12c new_featuresNabi Abdul
 
Oracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAsOracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAsGokhan Atil
 

La actualidad más candente (20)

What to Expect From Oracle database 19c
What to Expect From Oracle database 19cWhat to Expect From Oracle database 19c
What to Expect From Oracle database 19c
 
Optimize and Simplify Oracle 12C RAC using dNFS, ZFS and OISP
Optimize and Simplify Oracle 12C RAC using dNFS, ZFS and OISPOptimize and Simplify Oracle 12C RAC using dNFS, ZFS and OISP
Optimize and Simplify Oracle 12C RAC using dNFS, ZFS and OISP
 
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
Tanel Poder - Troubleshooting Complex Oracle Performance Issues - Part 2
 
OOUG: Oracle transaction locking
OOUG: Oracle transaction lockingOOUG: Oracle transaction locking
OOUG: Oracle transaction locking
 
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And WhatPerformance Tuning With Oracle ASH and AWR. Part 1 How And What
Performance Tuning With Oracle ASH and AWR. Part 1 How And What
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
TFA Collector - what can one do with it
TFA Collector - what can one do with it TFA Collector - what can one do with it
TFA Collector - what can one do with it
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
Exploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12cExploring Oracle Multitenant in Oracle Database 12c
Exploring Oracle Multitenant in Oracle Database 12c
 
SQL Plan Directives explained
SQL Plan Directives explainedSQL Plan Directives explained
SQL Plan Directives explained
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performance
 
Open Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and HistogramsOpen Source 101 2022 - MySQL Indexes and Histograms
Open Source 101 2022 - MySQL Indexes and Histograms
 
Tanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools shortTanel Poder - Scripts and Tools short
Tanel Poder - Scripts and Tools short
 
Enterprise manager 13c
Enterprise manager 13cEnterprise manager 13c
Enterprise manager 13c
 
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
 
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...
 
Rman 12c new_features
Rman 12c new_featuresRman 12c new_features
Rman 12c new_features
 
153 Oracle dba interview questions
153 Oracle dba interview questions153 Oracle dba interview questions
153 Oracle dba interview questions
 
Oracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAsOracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAs
 

Similar a Part2 Best Practices for Managing Optimizer Statistics

Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerEdgar Alejandro Villegas
 
Improve speed &amp; performance of informix 11.xx part 1
Improve speed &amp; performance of informix 11.xx   part 1Improve speed &amp; performance of informix 11.xx   part 1
Improve speed &amp; performance of informix 11.xx part 1am_prasanna
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cRonald Francisco Vargas Quesada
 
Informix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_tableInformix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_tableKeshav Murthy
 
Oracle Query Optimizer - An Introduction
Oracle Query Optimizer - An IntroductionOracle Query Optimizer - An Introduction
Oracle Query Optimizer - An Introductionadryanbub
 
05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptx05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptxKareemBullard1
 
Presentation top tips for getting optimal sql execution
Presentation    top tips for getting optimal sql executionPresentation    top tips for getting optimal sql execution
Presentation top tips for getting optimal sql executionxKinAnx
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesSperasoft
 
Axibase Time Series Database
Axibase Time Series DatabaseAxibase Time Series Database
Axibase Time Series Databaseheinrichvk
 
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Processes in Query Optimization in (ABMS) Advanced Database Management Systems Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Processes in Query Optimization in (ABMS) Advanced Database Management Systems gamemaker762
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMahesh Salaria
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingAmir Reza Hashemi
 
Ebs stats
Ebs statsEbs stats
Ebs statsitshezz
 
Oracle data capture c dc
Oracle data capture c dcOracle data capture c dc
Oracle data capture c dcAmit Sharma
 
Oracle Data Redaction
Oracle Data RedactionOracle Data Redaction
Oracle Data RedactionAlex Zaballa
 
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)SolarWinds
 
Sql query analyzer & maintenance
Sql query analyzer & maintenanceSql query analyzer & maintenance
Sql query analyzer & maintenancenspyrenet
 
InfiniFlux Minmax Cache
InfiniFlux Minmax CacheInfiniFlux Minmax Cache
InfiniFlux Minmax CacheInfiniFlux
 
Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2Mahesh Vallampati
 

Similar a Part2 Best Practices for Managing Optimizer Statistics (20)

Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle Optimizer
 
Improve speed &amp; performance of informix 11.xx part 1
Improve speed &amp; performance of informix 11.xx   part 1Improve speed &amp; performance of informix 11.xx   part 1
Improve speed &amp; performance of informix 11.xx part 1
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
 
Informix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_tableInformix partitioning interval_rolling_window_table
Informix partitioning interval_rolling_window_table
 
Oracle Query Optimizer - An Introduction
Oracle Query Optimizer - An IntroductionOracle Query Optimizer - An Introduction
Oracle Query Optimizer - An Introduction
 
05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptx05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptx
 
Presentation top tips for getting optimal sql execution
Presentation    top tips for getting optimal sql executionPresentation    top tips for getting optimal sql execution
Presentation top tips for getting optimal sql execution
 
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data TablesPostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
PostgreSQL Performance Tables Partitioning vs. Aggregated Data Tables
 
Axibase Time Series Database
Axibase Time Series DatabaseAxibase Time Series Database
Axibase Time Series Database
 
Amazon Redshift Masterclass
Amazon Redshift MasterclassAmazon Redshift Masterclass
Amazon Redshift Masterclass
 
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Processes in Query Optimization in (ABMS) Advanced Database Management Systems Processes in Query Optimization in (ABMS) Advanced Database Management Systems
Processes in Query Optimization in (ABMS) Advanced Database Management Systems
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source Database
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
 
Ebs stats
Ebs statsEbs stats
Ebs stats
 
Oracle data capture c dc
Oracle data capture c dcOracle data capture c dc
Oracle data capture c dc
 
Oracle Data Redaction
Oracle Data RedactionOracle Data Redaction
Oracle Data Redaction
 
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
Getting the most out of your Oracle 12.2 Optimizer (i.e. The Brain)
 
Sql query analyzer & maintenance
Sql query analyzer & maintenanceSql query analyzer & maintenance
Sql query analyzer & maintenance
 
InfiniFlux Minmax Cache
InfiniFlux Minmax CacheInfiniFlux Minmax Cache
InfiniFlux Minmax Cache
 
Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2
 

Más de Maria Colgan

Five_Things_You_Might_Not_Know_About_Oracle_Database_v2.pptx
Five_Things_You_Might_Not_Know_About_Oracle_Database_v2.pptxFive_Things_You_Might_Not_Know_About_Oracle_Database_v2.pptx
Five_Things_You_Might_Not_Know_About_Oracle_Database_v2.pptxMaria Colgan
 
Part4 Influencing Execution Plans with Optimizer Hints
Part4 Influencing Execution Plans with Optimizer HintsPart4 Influencing Execution Plans with Optimizer Hints
Part4 Influencing Execution Plans with Optimizer HintsMaria Colgan
 
Part3 Explain the Explain Plan
Part3 Explain the Explain PlanPart3 Explain the Explain Plan
Part3 Explain the Explain PlanMaria Colgan
 
Ground Breakers Romania: Oracle Autonomous Database
Ground Breakers Romania: Oracle Autonomous DatabaseGround Breakers Romania: Oracle Autonomous Database
Ground Breakers Romania: Oracle Autonomous DatabaseMaria Colgan
 
Ground Breakers Romania: Explain the explain_plan
Ground Breakers Romania: Explain the explain_planGround Breakers Romania: Explain the explain_plan
Ground Breakers Romania: Explain the explain_planMaria Colgan
 
Explain the explain_plan
Explain the explain_planExplain the explain_plan
Explain the explain_planMaria Colgan
 
Beginners guide to_optimizer
Beginners guide to_optimizerBeginners guide to_optimizer
Beginners guide to_optimizerMaria Colgan
 
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldThe Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldMaria Colgan
 
Oracle Database in-Memory Overivew
Oracle Database in-Memory OverivewOracle Database in-Memory Overivew
Oracle Database in-Memory OverivewMaria Colgan
 
Useful PL/SQL Supplied Packages
Useful PL/SQL Supplied PackagesUseful PL/SQL Supplied Packages
Useful PL/SQL Supplied PackagesMaria Colgan
 
JSON and the Oracle Database
JSON and the Oracle DatabaseJSON and the Oracle Database
JSON and the Oracle DatabaseMaria Colgan
 
Five Tips to Get the Most Out of Your Indexing
Five Tips to Get the Most Out of Your IndexingFive Tips to Get the Most Out of Your Indexing
Five Tips to Get the Most Out of Your IndexingMaria Colgan
 
Harnessing the Power of Optimizer Hints
Harnessing the Power of Optimizer HintsHarnessing the Power of Optimizer Hints
Harnessing the Power of Optimizer HintsMaria Colgan
 
Oracle optimizer bootcamp
Oracle optimizer bootcampOracle optimizer bootcamp
Oracle optimizer bootcampMaria Colgan
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cMaria Colgan
 
Oracle database 12c_and_DevOps
Oracle database 12c_and_DevOpsOracle database 12c_and_DevOps
Oracle database 12c_and_DevOpsMaria Colgan
 

Más de Maria Colgan (17)

Five_Things_You_Might_Not_Know_About_Oracle_Database_v2.pptx
Five_Things_You_Might_Not_Know_About_Oracle_Database_v2.pptxFive_Things_You_Might_Not_Know_About_Oracle_Database_v2.pptx
Five_Things_You_Might_Not_Know_About_Oracle_Database_v2.pptx
 
Part5 sql tune
Part5 sql tunePart5 sql tune
Part5 sql tune
 
Part4 Influencing Execution Plans with Optimizer Hints
Part4 Influencing Execution Plans with Optimizer HintsPart4 Influencing Execution Plans with Optimizer Hints
Part4 Influencing Execution Plans with Optimizer Hints
 
Part3 Explain the Explain Plan
Part3 Explain the Explain PlanPart3 Explain the Explain Plan
Part3 Explain the Explain Plan
 
Ground Breakers Romania: Oracle Autonomous Database
Ground Breakers Romania: Oracle Autonomous DatabaseGround Breakers Romania: Oracle Autonomous Database
Ground Breakers Romania: Oracle Autonomous Database
 
Ground Breakers Romania: Explain the explain_plan
Ground Breakers Romania: Explain the explain_planGround Breakers Romania: Explain the explain_plan
Ground Breakers Romania: Explain the explain_plan
 
Explain the explain_plan
Explain the explain_planExplain the explain_plan
Explain the explain_plan
 
Beginners guide to_optimizer
Beginners guide to_optimizerBeginners guide to_optimizer
Beginners guide to_optimizer
 
The Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous WorldThe Changing Role of a DBA in an Autonomous World
The Changing Role of a DBA in an Autonomous World
 
Oracle Database in-Memory Overivew
Oracle Database in-Memory OverivewOracle Database in-Memory Overivew
Oracle Database in-Memory Overivew
 
Useful PL/SQL Supplied Packages
Useful PL/SQL Supplied PackagesUseful PL/SQL Supplied Packages
Useful PL/SQL Supplied Packages
 
JSON and the Oracle Database
JSON and the Oracle DatabaseJSON and the Oracle Database
JSON and the Oracle Database
 
Five Tips to Get the Most Out of Your Indexing
Five Tips to Get the Most Out of Your IndexingFive Tips to Get the Most Out of Your Indexing
Five Tips to Get the Most Out of Your Indexing
 
Harnessing the Power of Optimizer Hints
Harnessing the Power of Optimizer HintsHarnessing the Power of Optimizer Hints
Harnessing the Power of Optimizer Hints
 
Oracle optimizer bootcamp
Oracle optimizer bootcampOracle optimizer bootcamp
Oracle optimizer bootcamp
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12c
 
Oracle database 12c_and_DevOps
Oracle database 12c_and_DevOpsOracle database 12c_and_DevOps
Oracle database 12c_and_DevOps
 

Último

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Último (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

Part2 Best Practices for Managing Optimizer Statistics

  • 1. Master Product Manager Oracle Database February 2020 Maria Colgan Best Practices For Managing Optimizer Statistics @SQLMaria Part 2
  • 2. Safe harbor statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation. 2 Confidential – © 2020 Oracle
  • 3. 3 • What statistics do I actual need? • How should I gather statistics? • What sample size should I use? • When should I gather statistics? • Should I use the auto stats job? • What statistics don’t I need? • How do I speed up statistics gathering? Common Statistics Questions and Fears
  • 4. 4 • Best practice recommendations work for ~80% of systems 90% of the time • There are always exceptions • Your goal should be to provide the optimizer a representative set of stats Warning on Best Practices • Indicates a recommendation • Indicates a warning of potential size effects • To help you navigate this presentation:
  • 5. DATA DICTIONARY OPTIMIZER STATISTICS Index Table Column System PROMO_PK Index PROMOTIONS Table Execution plan Optimizer CPU & IO Optimizer Statistics PROMO_I D PROMO_NAME … PROMO_DATE 1 Promo_1 … 15-NOV-98 2 Promo_1 … 31-DEC-98
  • 6. 6 5 4 3 2 1 Statistics Gathering Performance When to Gather Statistics Additional Statistics What Basic Statistics to Gather How to Gather Statistics AGENDA 6 When Not to Gather 7 Other Types of Statistics
  • 7. • Oracle automatically collect statistics for all database objects, which are missing statistics or have stale statistics • The stats gathering AutoTask runs during a predefined maintenance window • Internally prioritizes the database objects • Both user schema and dictionary tables • Objects that need updated statistics most are processed first • Controlled by DBMS_AUTO_TASK_ADMIN package or Enterprise Manager • Monitor using DBA_AUTOTASK_* views Automatic Statistics Gathering How to gather statistics and fixed object stats from 12c onwards
  • 8. • Starting in 12c there is a new Multitenant architecture • Each PDB has its own stats gathering autotask & its own maintenance window (default all same time) • From 12.2 onwards use AUTOTASK_MAX_ACTIVE_PDBS to control how many PDBs can run autotasks concurrently (Default 2) • Use ENABLE_AUTOMATIC_MAINTENANCE_PDB to disable the maintenance window at the PDB level (Default TRUE) Automatic Statistics Gathering in a PDB environment How to gather statistics CRMPricing Retail PDB CDB NOTE: From 20c onward Multitenant architecture is the only supported architecture
  • 9. ANALYZE command is deprecated • DO NOT USE • Only good for row chaining The GATHER_*_STATS procedures take 13 parameters • Ideally you should only set the first 2-3 parameters - SCHEMA NAME - TABLE NAME - PARTITION NAME Use DBMS_STATS Package How to gather statistics
  • 10. Your gather statistics commands should be this simple BEGIN dbms_stats.Gather_database_stats(); END; / BEGIN dbms_stats.Gather_schema_stats('SH'); END; / BEGIN dbms_stats.Gather_table_stats('SH', 'SALES'); END; / Use DBMS_STATS Package How to gather statistics
  • 11. Occasionally default parameter values may need to change For example - features not automatically on by default • Incremental Statistics - Ability to accurate generate global statistics from partition level statistics - Controlled by the parameter INCREMENTAL (default is FALSE) • Concurrent Statistics Gathering - Ability to gather statistics on multiple objects concurrently under a GATHER_SCHEMA_STATS command - Controlled by the parameter CONCURRENT (default is FALSE) Changing default parameter values for gathering statistics How to gather statistics
  • 12. Can change the default value at the global level • DBMS_STATS.SET_GLOBAL_PREFS • This changes the value for all existing objects and any new objects BEGIN dbms_stats.Set_global_prefs('INCREMENTAL', 'TRUE'); END; / Can change the default value at the table level • DBMS_STATS.SET_TABLE_PREFS Changing default parameter values for gathering statistics How to gather statistics
  • 13. Can change the default value at the schema level • DBMS_STATS.SET_SCHEMA_PREFS • Current objects in the schema only • New objects pick up global preferences Can change the default value at the database level • DBMS_STATS.SET_DATABASE_PREFS • Current objects in the Database only • New objects pick up global preferences Changing default parameter values for gathering statistics How to gather statistics
  • 14. The following parameter defaults can be changed: • APPROXIMATE_NDV_ALGORITHM • AUTO_STATS_EXTENSIONS • AUTOSTATS_TARGET* • CASCADE • CONCURRENT* • DEGREE • ESTIMATE_PERCENT • GLOBAL_TEMP_TABLE_STATS* • GRANULARITY • INCREMENTAL • INCREMENTAL_LEVEL** • INCREMENTAL_STALENESS • METHOD_OPT • NO_INVALIDATE • OPTIONS • PERFERENCE_OVERRIDES_PARAMETER • PUBLISH • STALE_PERCENT • STAT_CATEGORY* • TABLE_CACHED_BLOCKS • WAIT_TIME_TO_UPDATE_STATS* * Parameters that can only be set at a Global level **Parameter that can only be set at a table level Changing default parameter values for gathering statistics How to gather statistics
  • 15. Hierarchy for parameter values How to gather statistics DBMS_STATS.GATHER_*_STATS Parameter hierarchy Parameter values explicitly specified in the gather statistics command Table level preferences Global preferences
  • 16. # 1 most commonly asked question “What sample size should I use?” Controlled by ESTIMATE_PRECENT parameter From 11g onwards use default value AUTO_SAMPLE_SIZE • New hash-based algorithm • Speed of a 10% sample • Accuracy of 100% sample Sample size How to gather statistics More info in the following paper http://dl.acm.org/citation.cfm?id=1376721
  • 17. HowAUTO_SAMPLE_SIZE works How to gather statistics TABLE In-memory hash table space for ~ 16K distinct values Apply hash function to each value
  • 18. In-memory hash table space for ~ 16K distinct values Split Level 0 Apply hash function to each value Split Level 1 TABLE How AUTO_SAMPLE_SIZE works How to gather statistics
  • 19. Split Level 2 NDV = 2^split_level TABLE How AUTO_SAMPLE_SIZE works How to gather statistics
  • 20. Speed of a 10% sample Accuracy of 100% sample HowAUTO_SAMPLE_SIZE performs How to gather statistics Run Num AUTO_SAMPLE_SIZE 10% SAMPLE 100% SAMPLE 1 00:02:21.86 00:02:31.56 00:08:24.10 2 00:02:38.11 00:02:49.49 00:07:38.25 3 00:02:39.31 00:02:38.55 00:07:37.83 Column Name NDV with AUTO_SAMPLE_SIZE NDV with 10% SAMPLE NDV with 100% SAMPLE C1 59852 31464 60351 C2 1270912 608544 1289760 C3 768384 359424 777942
  • 21. 21 5 4 3 2 1 Statistics Gathering Performance When to Gather Statistics Additional Statistics What Basic Statistics to Gather How to Gather Statistics AGENDA 6 When Not to Gather 7 Other Types of Statistics
  • 22. By default the following basic table & column statistic are collected either during a CTAS or IAS command* or via a DBMS_STATS.GATHER_*_STATS command : • Number of Rows++ • Number of blocks++ • Average row length • Min / Max values of a column++ • Number of distinct values • Number of nulls in column What basic statistics to collect *Online statistics gather possible from Oracle Database 12cR1 ++ Real-time statistics gather possible from 19c onwards on Exadata & Cloud Table Statistics
  • 23. Index statistics are automatically gathered during creation and maintained by DBMS_STATS.GATHER_*_STATS commands Index statistics include: • Number of leaf blocks • Branch Levels • Clustering factor Basic index statistics collected by Oracle Root Block Branch Block Leaf Block Index Statistics
  • 24. Basic index statistics collected by Oracle Root Block Branch Block Leaf Block Table Database Block Database Block Database Block Low Clustering Factor Index statistics are automatically gathered during creation and maintained by DBMS_STATS.GATHER_*_STATS commands Index statistics include: • Number of leaf blocks • Branch Levels • Clustering factor Index Statistics
  • 25. Basic index statistics collected by Oracle Root Block Branch Block Leaf Block Table Database Block Database Block Database Block High Clustering Factor Index statistics are automatically gathered during creation and maintained by DBMS_STATS.GATHER_*_STATS commands Index statistics include: • Number of leaf blocks • Branch Levels • Clustering factor Index Statistics
  • 26. • Histograms tell the Optimizer about the data distribution in a Column • Creation controlled by METHOD_OPT parameter • Default create histogram on any column that has been used in the WHERE clause or GROUP BY of a statement and has a data skew • Relies on column usage information gathered at compilation time and stored in SYS.COL_USAGE$ • Four types of histograms Histograms What basic statistics to collect • Height balanced • Hybrid • Frequency • Top-Frequency
  • 27. • A frequency histogram is only created if the number of distinct values in a column (NDV) is less than 254 values • Each distinct value will have it own bucket Frequency histograms (FREQUENCY) Histograms Frequency histogram
  • 28. Step 1: SELECT job_id FROM employees ORDER BY job_id; Creating a frequency histogram JOB_ID ========== AC_ACCOUNT AC_MGR AD_ASST AD_PRES AD_VP AD_VP FI_ACCOUNT FI_ACCOUNT :
  • 29. Step2 : Allocate a histogram buckets for each pf the job_ids listed Creating a frequency histogram Bucket 1 has end point AC_ACCOUNT Bucket 2 has end point AC_MGR Bucket 3 has end point AD_ASST Bucket 4 has end point AD_PRES Bucket 5 has end point AD_VP Bucket 6 has end point AD _VP Bucket 7 has end point FI_ACCOUNT Bucket 8 has end point FI_ACCOUNT JOB_ID ========== AC_ACCOUNT AC_MGR AD_ASST AD_PRES AD_VP AD_VP FI_ACCOUNT FI_ACCOUNT : Take Note of the Bucket # that have duplicated end point values
  • 30. Step3 : Compress duplicate buckets to reduce the bucket count to 254 or less Creating a frequency histogram Bucket 1 has end point AC_ACCOUNT Bucket 2 has end point AC_MGR Bucket 3 has end point AD_ASST Bucket 4 has end point AD_PRES Bucket 6 has end point AD _VP Bucket 11 has end point FI_ACCOUNT JOB_ID ========== AC_ACCOUNT AC_MGR AD_ASST AD_PRES AD_VP FI_ACCOUNT : NOTE: the column named endpoint_number in USER_HISTOGRAMS is the bucket number
  • 31. How Optimizer uses frequency histograms Histograms SELECT * FROM employee WHERE job_id = 'AD_VP'; Cardinality estimate = #rows X # buckets value is in total # of buckets 107 X 2 2 107 = = EmployeeTable LAST_NAM E EMP_ID JOB_ID GEITZ 206 AC_ACCT HIGGINS 205 AC__MGR WHALEN 200 AD_ASST KING 100 AD_PRES KOCHHAR 101 AD_VP DE HAAN 102 AD_VP Number of buckets calculated by finding the bucket number for value and subtracting the previous bucket number (6-4 = 2)
  • 32. ATop-Frequency histogram created if less than 254 distinct values account for 99.9% of the entries in a column & AUTO_SAMPLE_SIZE is used Top-Frequency Histograms - Starting in 12c Histograms Frequency histogramTop-Frequency histogram 1
  • 33. • Traditionally a frequency histogram is only created if NDV < 254 • But if a small number of values occupies most of the rows (>99% rows) • Creating a frequency histograms on that small set of values is very useful even though NDV is greater than 254 • Ignores the unpopular values to create a better-quality histogram for popular values • Built using the same technique used for frequency histograms • Only created with AUTO_SAMPLE_SIZE Top Frequency (TOP-FREQUENCY) Histograms
  • 34. • Imagine we have store that sells Christmas decorations • Majority of our sales take place from Nov – Jan • But some small number of sales are made throughout the year Top Frequency Histogram Example TIME_ID column perfect candidate for top-frequency histogram
  • 35. • A height balanced histogram is created if the number of distinct values in a column (NDV) is greater than 254 values • Goal is to put an even number of rows in each of the 254 buckets Height balanced histograms (HEIGHT BALANCED) Histograms Height balanced histogram
  • 36. Step 1: SELECT cust_city_id FROM customers ORDER BY cust_city_id; Creating a height-balance histogram Row count CUST_CITY_ID 1 51040 2 51040 : : 219 51043 : : 5256 51166 : : 5475 51166 : : 55500 52531
  • 37. Step 2: Assign an equal number of rows per bucket Creating a height-balance histogram Row count CUST_CITY_ID 1 51040 2 51040 : : 219 51043 Bucket 1 has end point 51043 : : 5256 51166 Bucket 24 has end point 51166 : : 5475 51166 Bucket 25 has end point 51166 : : 55500 52531 Bucket 254 has end point 52531 Number of rows per bucket: 55,500 / 254 = 218.5
  • 38. Step 3: If endpoint of 1st bucket is not min value add 0 bucket Creating a height-balance histogram Row count CUST_CITY_ID 1 51040 Bucket 0 has end point 51040 2 51040 : : 219 51043 Bucket 1 has end point 51043 : : 5256 51166 Bucket 24 has end point 51166 : : 5475 51166 Bucket 25 has end point 51166 : : 55500 52531 Bucket 254 has end point 52531 But now we have 255 buckets instead of 254 How do we reduce it?
  • 39. Step 4: Compress duplicate buckets Creating a height-balance histogram Row count CUST_CITY_ID 1 51040 Bucket 0 has end point 51040 2 51040 : : 219 51043 Bucket 1 has end point 51043 : : : : 5475 51166 Bucket 25 has end point 51166 : : 55500 52531 Bucket 254 has end point 52531
  • 40. Information on histograms found in USER_HISTOGRAMS Monitoring histograms Not all 254 buckets used due to compression step Bucket 16 is missing because buckets 16 & 17 had the same endpoint value
  • 41. A hybrid balanced histogram is created if the number of distinct values in a column (NDV) is greater than 254 values & AUTO_SAMPLE_SIZE is used Hybrid Histograms (HYBRID) - Starting in 12c Histograms Hybrid histogram 1
  • 42. • Like height balanced histogram hybrids are created if the NDV >254 • Store the actual frequencies of bucket endpoints in histograms • No value allowed to spill over multiple buckets • More endpoint values can be squeezed in a histogram • Achieves the same effect as increasing the # of buckets • Only created with AUTO_SAMPLE_SIZE Hybrid Histograms (HYBRID) - Starting in 12c Histograms
  • 43. Step 1: SELECT row_num, time_id FROM sales ORDER BY 2; Hybrid Histogram Example ROWNUM TIME_ID 1 02-JAN-98 2 03-JAN-98 3 05-JAN-98 4 05-JAN-98 5 06-JAN-98 6 09-JAN-98 7 09-JAN-98 8 09-JAN-98 9 10-JAN-98 10 10-JAN-98 : : There are 960 rows in the sales table Automatically created histograms have 254 buckets That is 3.77 rows per bucket
  • 44. ROWNUM TIME_ID BUCKET INFORMATION FREQUENCY OF ENDPOINT 1 02-JAN-98 Bucket 1 has end point 02-JAN-98 1 2 03-JAN-98 3 05-JAN-98 4 05-JAN-98 5 06-JAN-98 6 09-JAN-98 7 09-JAN-98 8 09-JAN-98 9 10-JAN-98 10 10-JAN-98 : : Step 2: Assign roughly an equal number of rows per bucket Hybrid Histogram Example Bucket 5 has end point 06-JAN-98 1 Bucket 10 has end point 10-JAN-98 2
  • 45. • No two buckets have the same endpoint • Frequency of endpoint values recorded in new column called Endpoint Repeat Count Monitoring Hybrid Histogram Example : Number of occurrences of the endpoint value in the sample
  • 46. • Popular values in height-balance are the endpoint for two or more buckets • Popular values in hybrid are the endpoint of a bucket • Formulas used are: Height-balanced: Number of endpoint buckets X #rows in table Total number of buckets Hybrid: Endpoint Frequency X #rows in table Max buckets Optimizer used two different formulas depend on the popularity of the value How the Optimizer uses Height-balanced & Hybrid histograms
  • 47. • Non-popular value means values that are the endpoint for only one bucket on a height-balance histogram or are not an endpoint at all • Formula used is: DENSITY X number of rows in the table NOTE: Density from 10.2.0.4 is calculated on the fly based on histogram information and is not the value show in USER_HISTOGRAMS Optimizer used two different formulas depend on the popularity of the value How the Optimizer uses Height-balanced & Hybrid histograms
  • 48. Popular values use Height-Balance histogram information How the Optimizer uses histograms SELECT Count(*) FROM sales WHERE cust_id = 22300; COUNT(*) ------------- 5 Cardinality estimate: Number of endpoint buckets X number of rows in table total number of buckets 2 X 960 = 7.71 = 8 249
  • 49. Popular values use Hybrid histogram information How the Optimizer uses histograms SELECT Count(*) FROM sales WHERE cust_id = 22810; COUNT(*) ------------- 5 Cardinality estimate: Endpoint Frequency X number of rows in the table Max buckets 5 X 960 = 5 960
  • 50. Non-popular values use density How the Optimizer uses histograms SELECT Count(*) FROM sales WHERE cust_id = 200; COUNT(CUST_ID) -------------- 2 Cardinality estimate: DENSITY X number of rows 0.00104 X 960
  • 51. Histograms have a bad reputation Why people hate histograms How to gather statistics WHY? They can cause instability They can increase the time to gather statistics
  • 52. Two main cause of instability with Histograms 1. Bind peeking interacts with histograms 2. Nearly popular values Why people hate histograms How to gather statistics Adaptive Cursor Sharing introduced in 11gR2 helps relieve a lot of the problems caused by bind peeking and histograms
  • 53. • Share the plan when binds values are “equivalent” • Plans are marked with selectivity range • If current bind values fall within range, they use the same plan • Create a new plan if binds are not equivalent • Generating a new plan with a different selectivity range • Controlled by init.ora parameter _optim_peek_user_binds • Monitoring - V$SQL has 2 new columns • IS_BIND_SENSITIVE - Optimizer believes the plan may depend on the value of bind • IS_BIND_AWARE - Multiple execution plans exist for this statement • Can force bind awareness via the BIND_AWARE hint Adaptive Cursor Sharing Solutions for bind peeking and histograms
  • 54. Applications that only have statements with binds • Drop histogram using DBMS_STATS.DELETE_COL_STATS • Use DBMS_STATS.SET_PARM to change default setting for method_opt parameter to prevent histogram from being created • Re-gather statistics on the table without histogram Applications that have statements with bind and literals • Switch off bind peeking _optim_peek_user_binds = false Prior to 11gR2 Solutions for bind peeking and histograms
  • 55. Two main cause of instability with Histograms 1. Bind peeking interacts with histograms 2. Nearly popular values Why people hate histograms How to gather statistics Nearly popular values are classified as non-popular values that means their cardinality estimates is based on a density calculation, which is not always accurate enough for them • To get an accurate cardinality estimate use dynamic sampling • OR …
  • 56. Nearly popular value means the value is classified as non-popular but the density calculation is not accurate for them Cardinality estimates is based on a density calculation Nearly popular values Same estimate used as for non-popular. Here density is not good enough to get accurate cardinality estimate
  • 57. To get an accurate cardinality estimate for nearly popular values use dynamic sampling Solution 1 dynamic sampling Nearly popular values
  • 58. If hints aren’t possible and stability is a priority drop the histogram BEGIN dbms_stats.Delete_column_stats(ownname=>'SH', tabname=>'CUSTOMERS', - colname=>'CUST_CITY_ID', col_stat_type=>'HISTOGRAM'); END; / BEGIN dbms_stats.Set_table_prefs('SH', 'CUSTOMERS', 'METHOD_OPT', - 'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 1 CUST_CITY_ID'); END; / Solution 2 Drop the Histogram Nearly popular values
  • 59. 59 5 4 3 2 1 Statistics Gathering Performance When to Gather Statistics Additional Statistics What Basic Statistics to Gather How to Gather Statistics AGENDA 6 When Not to Gather 7 OtherTypes of Statistics
  • 60. Two types of Extended Statistics • Column groups statistics - Useful when multiple column from the same table are used in where clause predicates • Expression statistics - Useful when a column is used as part of a complex expression in where clause predicate Can be manually or automatically created Automatically maintained when statistics are gathered on the table When table and column statistics are not enough Additional statistics
  • 61. 61 Extended Statistics – column group statistics SELECT * FROM vehicles WHERE model = '530xi' AND color = 'RED’; MAKE MODEL COLOR ------ ------ ------ BMW 530xi RED Vehicles Table (12 ROWS) MAKE MODEL COLOR BMW 530xi RED BMW 530xi BLACK BMW 530xi SILVER TESLA P85D GRAY : : : PORSCHE 911 RED Cardinality #ROWS * 1 * 1 12 * 1 * 1 1 NDV c1 NDV c2 4 3 = = =
  • 62. 62 Extended Statistics – column group statistics SELECT * FROM vehicles WHERE model = '530xi' AND make = ’BMW’; MAKE MODEL COLOR ------ ------ ------ BMW 530xi RED Vehicles Table (12 ROWS) MAKE MODEL COLOR BMW 530xi RED BMW 530xi BLACK BMW 530xi SILVER TESLA P85D GRAY : : : PORSCHE 911 RED Cardinality #ROWS * 1 * 1 12 * 1 * 1 1 NDV c1 NDV c2 4 3 = = =
  • 63. Create extended statistics on the MODEL & MAKE columns using the CREATE_EXTENDED_STATS function in the DBMS_STATS package New Column with system generated name Extended Statistics – column group statistics
  • 64. 64 Extended Statistics – column group statistics SELECT * FROM vehicles WHERE model = '530xi' AND make = ’BMW’; MAKE MODEL COLOR ------ ------ ------ BMW 530xi RED Vehicles Table (12 ROWS) MAKE MODEL COLOR BMW 530xi RED BMW 530xi BLACK BMW 530xi SILVER TESLA P85D GRAY : : : PORSCHE 911 RED Cardinality calculated using column group statistics
  • 65. 65 Extended Statistics – expression statistics solution SELECT * FROM Customers WHERE UPPER(CUST_NAME) = 'SMITH'; Customers Table (55,500 ROWS) CUST_ID NAME EMAIL 00001 Smith Smith@... 00002 Jones Jones@.... 00003 Colgan Colgan@... 00004 Lewis Lewis@... : : : 55500 Baer Baer@... • Optimizer doesn’t know how function affects values in the column • Optimizer guesses the cardinality to be 1% of rows Cardinality estimate is 1% of the rows
  • 66. Extended Statistics – expression statistics solution New Column with system generated name
  • 67. 1. Start column group usage capture BEGIN dbms_stats.Seed_col_usage(NULL, NULL, 300); END; / Automatic column group detection Extended Statistics Switches on monitoring for 300 seconds or the next 5 minutes. Any statement executed will be monitored for columns used in the where and group by clauses
  • 68. Directives are additional information used during optimization to generate a plan • For example, when tableT1 is joined toT2 use dynamic statistics to get accurate cardinality estimate • Directives are collected on query expressions not at a statement level - Allows for directives to be used for multiple statements • Persisted on disk in the SYSAUX tablespace • Managed using the new package DBMS_SPD • Column group statistics can be automatically created via directives Automatic Creation via SQL Plan Directives in 12c Extended Statistics
  • 69. By default in 12.1 directives were always created & used • Use of directive and the automatic creation of column groups is controlled by the parameter OPTIMIZER_ADAPTIVE_FEATURES (default value is TRUE) From 12.2 onward directives are always created but not always used • Use of SQL Plan Directives by the Optimizer is controlled by the parameter OPTIMIZER_ADAPTIVE_STATISTICS (default value is FALSE) • To have column group statistics automatically created based on directives set the DBMS_STATS preference AUTO_STAT_EXTENSIONS to ON (default value is OFF) Automatic Creation via SQL Plan Directives in 12c and Beyond Extended Statistics
  • 70. 70 5 4 3 2 1 Statistics Gathering Performance When to Gather Statistics Additional Statistics What Basic Statistics to Gather How to Gather Statistics AGENDA 6 When Not to Gather 7 OtherTypes of Statistics
  • 71. • Oracle automatically collect statistics for all database objects, which are missing statistics or have stale statistics • AutoTask run during a predefined maintenance window • Internally prioritizes the database objects • Both user schema and dictionary tables and fixed objects from 12c onwards • Objects that need updated statistics most are processed first • Controlled by DBMS_AUTO_TASK_ADMINpackage or via Enterprise Manager Automatic Statistics Gathering When to gather statistics
  • 73. If you want to disable auto job for application schema leaving it on forOracle dictionary tables The scope of the auto job is controlled by the global preference AUTOSTATS_TARGET Possible values are • AUTO Oracle decides what tables need statistics (Default) • All Statistics gathered for all tables in the system • ORACLE Statistics gathered for only the dictionary tables Automatic Statistics Gathering Let Oracle take care of itself • Remember use ENABLE_AUTOMATIC_MAINTENANCE_PDB to disable the maintenance window at the PDB level
  • 74. • Starting in 12c statistics gathered as part of the direct path load operations • CreateTable As Select or Insert As Select commands • Most essential statistics available directly after load • No additional table scan required to gather statistics • All internal maintenance operations that use CTAS benefit from this • Note only occurs on IAS if table is empty • Remaining statistics gathering is deferred until Auto StatsTask Online Statistics Gathering Note: Histograms are not created as they require an additional scan of the data Use _optimizer_gather_stats_on_load_hist to enable histogram collection
  • 75. 75 • Gather statistics as part of conventional DML (Insert/update/merge) • Statistics gathered during DML needs to be fast with negligible overhead • Only most essential stats are gathered to avoid catastrophic SQL execution plan performance regressions (e.g. avoiding out-of-range conditions) • Min, Max, num_rows, etc. • Gathering of remaining statistics is deferred • Automatic statistics gathering job still kicks in as needed Real-Time Statistics
  • 76. After a large data load • Include gathering statistics as part of the ETL or ELT process If trickle loading into a partition table • Used dbms_stats.Copy_table_stats() - Copies stats from source partition - Adjust min & max values for partition column ◦ Both partition & global statistics* - Copies statistics of the dependent objects ◦ Columns, local (partitioned) indexes* etc. If the Auto Statistics Gather Job is not suitable When to gather statistics Note*: By default global statistics are not adjusted. You must set the FLAGS parameter of the DBMS_STATS.COPY_TABLE_STATS to 8 for global stats to be adjusted. Global indexes statistics are never adjusted Feb 1st 2020 … Feb 12th 2020 Feb 13th 2020 Partitioned Table
  • 77. 77 5 4 3 2 1 Statistics Gathering Performance When to Gather Statistics Additional Statistics What Basic Statistics to Gather How to Gather Statistics AGENDA 6 When Not to Gather 7 OtherTypes of Statistics
  • 78. Four options to speed up statistics gathering • Intra object using parallel execution • Inter object using concurrency • The combination of Inter and Intra object • Incremental statistics gathering for partitioned tables How to speed up statistics gathering Statistics gathering performance
  • 79. • Controlled by GATHER_*_STATS parameter DEGREE • Default is to use parallel degree specified on object • If set to AUTO Oracle decide parallel degree used • Works on one object at a time Intra object using parallel execution Statistics gathering performance
  • 80. Customers table has a degree of parallelism of 4 4 parallel server processes will be used to gather stats Intra object using parallel execution Statistics gathering performance P4 P3 P2 P1 Customers Table
  • 81. dbms_stats.Gather_table_stats('SH', 'SALES'); Intra object using parallel execution Statistics gathering performance The statistics gather procedure on each individual partition operates in parallel BUT the statistics gathering procedures won’t happen concurrently Each individual partition will have statistics gathered one after the other P4 P3 P2 P1Feb 11th 2020 Feb 12th 2020 Feb 13th 2020 Sales Table
  • 82. • Gather statistics on multiple objects at the same time • Controlled by DBMS_STATS preference, CONCURRENT • Uses Database Scheduler and Advanced Queuing • Number of concurrent gather operations controlled by JOB_QUEUE_PROCESSES parameter • Each gather operation can still operate in parallel Inter object Statistics gathering performance
  • 83. Intra object statistics gathering for SH schema Statistics gathering performance A separate statistics gathering job is created for each table and each partition in the schema Level 1 contain statistics gathering jobs for all non-partitioned tables and a coordinating job for each partitioned table Level 2 contain statistics gathering jobs for each partition in the partitioned tables dbms_stats.Gather_schema_stats(’SH');
  • 84. dbms_stats.Gather_schema_stats('SH'); Inter and intra working together for partitioned objects Statistics gathering performance The number of concurrent gathers is controlled by the parameter JOB_QUEUE_PROCESSES In this example it is set to 3 Remember each concurrent gather operates in parallel In this example the parallel degree is 4 Feb 10th 2020 Feb 11th 2020 Feb 12th 2020 Sales Table
  • 85. Typically gathering statistics after a bulk load into one partition would causes a full scan of all partitions to gather global table statistics • Extremely time consuming With Incremental Statistic gather statistics for touched partition(s) ONLY • Table (global) statistics are accurately built from partition statistics • Reduce statistics gathering time considerably • Controlled by INCREMENTAL preference Incremental statistics gathering for partitioned tables Statistics gathering performance
  • 86. Incremental Statistics Gathering 1. Partition level stats are gathered & synopsis created 2. Global stats generated by aggregating partition level statistics and synopsis Sysaux Tablespace Feb 9th 2020 Feb 10th 2020 Feb 11th 2020 Sales Table Feb 12th 2020 Feb 13th 2020
  • 87. Incremental Statistics Gathering Sysaux Tablespace Feb 9th 2020 Feb 10th 2020 Feb 11th 2020 Sales Table Feb 12th 2020 Feb 13th 2020 Feb 14th 2020 3. A new partition is added to the table & Data is Loaded 4. Gather partition statistics for new partition 5. Retrieve synopsis for each of the other partitions from Sysaux 6. Global stats generated by aggregating the original partition synopsis with the new one
  • 88. Incremental statistics have a bad reputation Why some people hate incremental statistics Incremental Statistics Gathering WHY? Originally, they were bloated and slow
  • 89. How expensive is the creation of a synopsis? Incremental Statistics Gathering Feb 9th 2020 Feb 10th 2020 Feb 11th 2020 Sales Table Feb 12th 2020 Feb 13th 2020 feb 14th 2020 Synopsis can now be created in two ways • Adaptive Sampling - Database uses an adaptive sampling algorithm which provide very accurate results, but the resulting synopsis can be large • Hyperloglog - Database uses the hyperloglog algorithm which offers similar accuracy of adaptive sampling but requires significantly less memory as the synopsis size is greatly reduce • It is possible to have both types on one table Control by the new DBMS_STATS preference APPROXIMATE_NDV_ALGORITHM
  • 90. • Number of rows is a factor of the # of columns, partitions & NDV per column • One entry for each column, for each partition How expensive is the creation Adaptive Statistics synopsis? Incremental Statistics Gathering
  • 91. • No rows are store in the WRI$_OPTSTAT_SYNOPSIS$ • One entry for each column, for each partition How expensive is the creation Hyperloglog synopsis? Incremental Statistics Gathering
  • 92. How expensive is the creation Hyperloglog synopsis? Incremental Statistics Gathering Only binary representation of NDV stored for each column in each partition
  • 93. 93 • During data load some rows often going into the older partitions • In 11g any DML on older partitions triggered partition statistics to be re-gathered • New DBMS_STATS preference INCREMENTAL_STALENESS • When set to USE_STALE_PERCENT, DML on less than 10% of rows in older partitions will not trigger re-gather • When set to USE_LOCKED_STATS, locked partitions stats are never considered stale, regardless of DML changes • When set to ALLOW_MIXED_FORMAT, adaptive statistics & hyperloglog synopsis can exist on the same table What happens if data changes in one of the existing partitions? Incremental Statistics Gathering Feb 9th 2020 Feb 10th 2020 Feb 11th 2020 Sales Table Feb 12th 2020 Feb 13th 2020 feb 14th 2020
  • 94. 94 What happens if you do a partition exchange load? Incremental Statistics Gathering 1. Create external table for flat files 2. Use CTAS command to create non-partitioned table Feb 14th 2020 4. Gather Statistics Sept 9th 2017 Sept 10th 2017 Sept 11th 2017 Sept 12th 2017 Sept 13th 2017 Sept 14th 2017 Feb 9th 2020 Feb 10th 2020 Feb 11th 2020 Sales Table Feb 12th 2020 Feb 13th 2020 3. Alter table Sales exchange partition Sept_14_2020 with table tmp_sales feb 14th 2020
  • 95. 95 What happens if you do a partition exchange load? Incremental Statistics Gathering 1. Create external table for flat files 2. Use CTAS command to create non-partitioned table Feb 14th 2020 Sept 9th 2017 Sept 10th 2017 Sept 11th 2017 Sept 12th 2017 Sept 13th 2017 Sept 14th 2017 Feb 9th 2020 Feb 10th 2020 Feb 11th 2020 Sales Table Feb 12th 2020 Feb 13th 2020 5. Alter table Sales exchange partition Sept_14_2020 with table tmp_sales feb 14th 2020 3. Set INCREMENTAL to true & INCREMENTAL_LEVEL to TABLE 4. Gather Statistics
  • 96. 96 5 4 3 2 1 Statistics Gathering Performance When to Gather Statistics Additional Statistics What Basic Statistics to Gather How to Gather Statistics AGENDA 6 When Not to Gather 7 OtherTypes of Statistics
  • 97. Volatile Tables • Volume of data changes dramatically over a period of time • For example orders queue table - Starts empty, orders come in, order get processed, ends day empty GlobalTempTables* • Application code stores intermediate result • Some session have a lot of data, some have very little Intermediate work tables • Written once, read once and then truncated or deleted • For example part of an ETL process When not to gather statistics * Not the case in Oracle Database 12c
  • 98. • Data volume changes dramatically over time • When is a good time to gather statistics? • Gather statistics when the table has a representative data volume • Lock statistics to ensure statistics gathering job does not over write representative statistics Volatile tables When not to gather statistics
  • 99. Often seen as part of an ETL process Written once, read once, and then truncated or deleted When do you gather statistics? Don’t gather statistics it will only increase ETL time Use Dynamic sampling • Add dynamic sampling hint to SQL statements querying the intermediate table Intermediate work tables When not to gather statistics
  • 100. Add dynamic sampling hint or set it at the session or system level SELECT /*+ dynamic_sampling(c 2) */ c.* FROM customers_staging_tab c WHERE cust_address_change = ‘Y’; Intermediate work tables When not to gather statistics
  • 101. • Traditionally statistics gathered on GTT were shared by all sessions • Share statistics are not always optimal • Now each session can have its own version of statistics forGTT • Controlled by new preference GLOBAL_TEMP_TABLE_STATS • Default value is SESSION (non shared) • To force sharing (as in 11g) set table preference to SHARED Overview Session private statistics for GTT
  • 102. 102 5 4 3 2 1 Statistics Gathering Performance When to Gather Statistics Additional Statistics What Basic Statistics to Gather How to Gather Statistics AGENDA 6 When Not to Gather 7 OtherTypes of Statistics
  • 103. • Dictionary Statistics • Fixed Object Statistics • System Statistics • Dynamic Sampling • Cloning databases Other types of statistics
  • 104. • Statistics on dictionary tables • Maintained as part of the automatic statistics job - Set preference AUTOSTATS_TARGET to ORACLE to only gather Dictionary stats • Can be manually gather using - Dbms_stats.Gather_dictionary_stats - Set options to AUTO Dictionary statistics
  • 105. • Fixed objects are the x$ tables and their indexes - v$ performance views built on top of them • NOT maintained as part of the automatic statistics task in 11g but in 12c* • Dynamic sampling not used on X$ tables when stats are missing or stale • Can be manually gathering using - dbms_stats.Gather_fixed_object_stats • X$ tables are transient so gather when representative workload is active • May need to be re-gathered after converting to a PDB from non-PDB Fixed object statistics * In 12c fixed object statistics are gathered as part of auto statistics gathering if they have not been previously collected!
  • 106. • Information about hardware database is running on • NOT maintained as part of the automatic statistics job • Manually gathered using - dbms_stats.Gather_system_stats • Can be gathered with or without a workload • Default setting are recommended for most systems • For Exadata systems new “EXADATA” mode available System statistics Not recommend to mess with system statistics – defaults are generally best

Notas del editor

  1. Copyright: <a href='https://www.123rf.com/profile_gstockstudio'>gstockstudio / 123RF Stock Photo</a>
  2. Copyright: <a href='https://www.123rf.com/profile_Orla'>Orla / 123RF Stock Photo</a>
  3. In order for the Cost Based Optimizer to accurately determine the cost for an execution plan it must have information about all of the objects (tables and indexes) accessed in the SQL statement, and information about the system on which the SQL statement will be run. This necessary information is commonly referred to as Optimizer statistics. Understanding and managing Optimizer statistics is key to optimal SQL execution
  4. ORA 12012 Error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_<NN>
  5. ORA 12012 Error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_<NN>
  6. Approx_ndv algorithm - Specifies the synopsis generation algorithm Auto_stats_extensions - Controls the automatic creation of extensions when database statistics are gathered Global temp tables stats – Whether global temp tables will have shared or session level stats default is session level Incremental_level – Controls which synopses to collect when INCREMENTAL preference is set to TRUE. It takes the following values: Partition or table (for partition exchange) on a table perf Incremental_staleness - Specifies when a partition or subpartition is considered stale. This parameter takes an enumeration of values, such as 'USE_STALE_PERCENT' and 'USE_LOCKED_STATS'. Options - Determines the options parameter used in the GATHER_TABLE_STATS procedure which allows you to say GATHER_AUTO after a direct path load gather basic column stats Perference_overrides_parameter – Lets you override one or more values specified in a gather stats command Stat _category – Specifies which statistics to import or export, accepting multiple values separated by a comma. Values supported: object_stats or synopses Table cached blocks – estimates of the number of blocks cached in the buffer cache when calculating the indexing clustering factor Wait Time To Update stats - Specifies the number of minutes before timing out for locks and pins required for updating statistics. It accepts values in the range 0 to 65535. The default value is 15 minutes. The value 0 gets the locks and pins in no-wait mode.
  7. We create hash values for each value seen in a column. A hash value is a sequence of 1s and 0s, i.e., a binary value with length 64. -- Then we place values into the hash table H of size ~16k. -- Whenever the hash table is full, we split H into two. We do this by removing all values from H which has 1 at position split_level+1. We then increment split_level. -- We continue this process until we consume all values. Then, at the end, we approximate the number of distinct values as 2^split_level.
  8. We create hash values for each value seen in a column. A hash value is a sequence of 1s and 0s, i.e., a binary value with length 64. -- Then we place values into the hash table H of size ~16k. -- Whenever the hash table is full, we split H into two. We do this by removing all values from H which has 1 at position split_level+1. We then increment split_level. -- We continue this process until we consume all values. Then, at the end, we approximate the number of distinct values as 2^split_level.
  9. We create hash values for each value seen in a column. A hash value is a sequence of 1s and 0s, i.e., a binary value with length 64. -- Then we place values into the hash table H of size ~16k. -- Whenever the hash table is full, we split H into two. We do this by removing all values from H which has 1 at position split_level+1. We then increment split_level. -- We continue this process until we consume all values. Then, at the end, we approximate the number of distinct values as 2^split_level.
  10. Copyright: <a href='https://www.123rf.com/profile_ribah'>ribah / 123RF Stock Photo</a>
  11. the index clustering factor is a measure of how many I/Os the database would perform if it were to read every row in that table via the index in index order. If the rows of a table on disk are sorted in about the same order as the index keys, the database will perform a minimum number of I/Os on the table to read the entire table via the index. That is because the next row needed from an index key would likely be the next row in the table block as well. The query would not be skipping all over the table to find row after row—they are naturally next to each other on the block. Conversely, if the rows in the table are not in the same order on disk as the index keys—if the data is scattered—the query will tend to perform the maximum number of I/Os on the table, as many as one I/O for every row in the table. 
  12. the index clustering factor is a measure of how many I/Os the database would perform if it were to read every row in that table via the index in index order. If the rows of a table on disk are sorted in about the same order as the index keys, the database will perform a minimum number of I/Os on the table to read the entire table via the index. That is because the next row needed from an index key would likely be the next row in the table block as well. The query would not be skipping all over the table to find row after row—they are naturally next to each other on the block. Conversely, if the rows in the table are not in the same order on disk as the index keys—if the data is scattered—the query will tend to perform the maximum number of I/Os on the table, as many as one I/O for every row in the table. 
  13. the index clustering factor is a measure of how many I/Os the database would perform if it were to read every row in that table via the index in index order. If the rows of a table on disk are sorted in about the same order as the index keys, the database will perform a minimum number of I/Os on the table to read the entire table via the index. That is because the next row needed from an index key would likely be the next row in the table block as well. The query would not be skipping all over the table to find row after row—they are naturally next to each other on the block. Conversely, if the rows in the table are not in the same order on disk as the index keys—if the data is scattered—the query will tend to perform the maximum number of I/Os on the table, as many as one I/O for every row in the table. 
  14. Real-Time Statistics Oracle Database automatically gathers online statistics during conventional DML operations. Statistics can go stale between execution of DBMS_STATS statistics gathering jobs. By gathering some statistics automatically during DML operations, the database augments the statistics gathered by DBMS_STATS. Fresh statistics enable the optimizer to produce more optimal plans.
  15. Adaptive sampling algorithm stores the actual hash value for every distinct value in every column Hyperloglog stores only the hash value of the NDV for each column for each partititon
  16. Adaptive sampling algorithm stores the actual hash value for every distinct value in every column Hyperloglog stores only the hash value of the NDV for each column for each partition
  17. Adaptive sampling algorithm stores the actual hash value for every distinct value in every column Hyperloglog stores only the hash value of the NDV for each column for each partition
  18. Automatically gathers fixed object statistics as part of automated statistics gathering if they have not been previously collected.
  19. Here are some additional resources