Understanding DB2 Optimizer

Understanding DB2 LUW
Optimizer
Anthony Reina
Zoran Kulina

Speakers
Anthony Reina
• Accelerated Value Specialist at IBM Toronto Lab.
• Over 10 years of experience is various roles in DB2 Support.
• Works with IBM customers on resolving defects, developing solutions, and
conducting education courses.
• Specializes in DB2 engine, particularly recovery and optimizer areas.
• Contact: aereina@ca.ibm.com
Zoran Kulina
• Software engineer and technical lead at DB2 Advanced Support team at IBM
Toronto Lab.
• Specialties include database internal architecture, SQL performance tuning,
deep debugging and advanced problem determination.
• Contact: zoran@ca.ibm.com
2

Agenda
1. Introduction and high-level overview
2. Phases of the optimizer
3. Explain facility and DB2EXFMT Tool
4. Operators
5. Catalog statistics
6. Predicates and joins
7. Cardinality estimates
8. Basic tuning hints
9. Statistical views
10. Optimizer profiles and guidelines
3

Introduction
• What is this workshop about?
– This half-day workshop provides hands-on introduction to the DB2
LUW optimizer.
• Why learn about the optimizer?
– Understanding how the optimizer works and knowing how to
influence its behaviour can lead to improved query performance and
better resource usage.
• What are the workshop objectives?
– To provide participants with basic knowledge of the optimizer and
related concepts, and to serve as a starting point for further study of
the optimizer techniques.
4

Introduction
• What are the prerequisites?
– Basic knowledge of SQL and relational database concepts is assumed.
• What is the workshop format?
– This workshop introduces the optimizer through a series of interactive
hands-on exercises. Each unit is first discussed and then accompanied
by an exercise to demonstrate the unit concepts.
5

Introduction
• Sample database
Database name: DVDSR
Schema name: EXAMPLE
Tables: CUSTOMER
BRANCH
MOVIE
TRANSACTION_MASTER
TRANSACTION_DETAIL
6

High-Level Overview
Unit objectives:
• Define what the optimizer is and how it works from a
high-level perspective.
• Introduce the concept of timeron.
• Discuss major factors influencing the optimizer.
• List various statistics used by the optimizer.
7

High-Level Overview
Optimizer purpose:
• Chooses access plan for data manipulation language (DML) SQL statements.
Means of fulfilling the purpose:
• Calculates the execution cost of many alternative access plans, and then selects
the one with the minimal estimated cost.
• Bases calculations on catalog statistics.
• Measures cost in timerons:
– An abstract unit of measure.
– Not directly reciprocal to actual elapsed time.
– Rough relative estimate of the resources (cost) required by the database manager to
execute an access plan.
– Access plan cost is cumulative: each operator cost is added to the next one.
8

High-Level Overview
Factors influencing the optimizer:
• Database design
– Indexes
– Tablespace configuration
• Table and index statistics
– Basis of plan costs
– Generated and collected by the RUNSTATS command
• Operating environment
– Influences plan costs
– Various configuration parameters
• Optimization class (OPTLEVEL)
– Controls the level of modeling considered during compilation
– Influences SQL compilation time
9

High-Level Overview
Statistics used by the optimizer:
• Basic statistics
– Number of rows in a table
– Number of pages for a table
• Non-uniform distribution statistics
– NUM_FREQVALS (equality)
– NUM_QUANTILES (range predicates)
• Index clustering (DETAILED index statistics)
– Improves the estimates of data page fetches
• Statistical views (STATVIEWS)
– Used table views with predicate condition as defined in the case query
• Column group statistics
– Statistics collected for more than one column
10

Phases of the Optimizer
Unit objectives:
• Describe the stages of the compilation process.
• Explain what happens from the time a query is
submitted to point where its result set is returned.
11

• Parse Query: analyzes the query to validate the syntax.
• Check Semantics: ensures that there are no inconsistencies
among parts of the statement.
• Rewrite Query: uses the global semantics that are stored in
the Query Graph Model (QGM) to transform the query into a
form that can be optimized more easily. The result is stored in
the QGM, as well.
• Pushdown Analysis (Federated DB only): recommends to the
optimizer whether an operation can be remotely evaluated or
pushed down at a data source.
13

• Optimize Access Plan: using the QGM as input, it generates
many alternative access plans, at the end it select the most
cost effective plan for satisfying the query.
• Remote SQL Generation (Federated DB only): creates an
efficient SQL statement for operations that are performed by
each remote data source based on the SQL dialect at that data
source.
• Generate Executable Code: uses the access plan and the
QGM to create an executable access plan or section for the
query.
14

Explain Facility and DB2EXFMT Tool
Unit objectives:
• Define what access plan is.
• Introduce the tools used to generate the explain
snapshot (access plan and related data).
• Discuss the components of the db2exfmt report and how
to interpret its output.
16

Explain purpose:
• Captures information about the access plan, optimizer inputs,
and environment of an SQL query.
• Helps to understand how the SQL query is compiled and
executed.
• Shows how configuration parameter changes impact query
performance.
• Indispensible tool for query problem determination.
17

Explain information includes:
• Sequence of operations to process the query
• Cost information
• Predicates and selectivity estimates for each predicate
• Statistics for all objects referenced in the SQL query at
the time the explain information is captured
18

Explain tables:
• Must be created before the Explain can be invoked:
– SYSINTSALLOBJECTS() procedure
– EXPLAIN.DDL
• Must be created per schema/user or under SYSTOOLS schema
• Capture access plan data when the Explain facility is activated
19

Explain tables:
ADVICE_INDEX
ADVICE_INSTANCE
ADVICE_MQT
ADVICE_PARTITION
ADVICE_TABLE
ADVICE_WORKLOAD
EXPLAIN_ACTUALS
EXPLAIN_ARGUMENT
EXPLAIN_INSTANCE
EXPLAIN_OBJECT
EXPLAIN_OPERATOR
EXPLAIN_PREDICATE
EXPLAIN_STATEMENT
EXPLAIN_STREAM
20

Explain tools:
db2expln
• Provides access plan information for one or more SQL query packages
• Shows the actual implementation of the chosen access plan
• Does not show detailed optimizer information
db2exfmt
• Formats the contents of the explain tables
• Provides more detailed explain information than db2expln
• Shows clues as to why the optimizer has made particular decisions
21

db2exfmt output sections:
1. Explain instance
2. Database context
3. Package context
4. Original statement
5. Optimized statement
6. Access plan
7. Extended diagnostic information
8. Plan details
9. Objects used in the access plan
22

db2exfmt output sections:
• Explain instance: general information (e.g., version, time,
schema, etc).
• Database context: configuration parameters used during the
query compilation.
• Package context: query type and optimization level.
• Original statement: actual query text.
• Optimized statement: rewritten query text according to the
applied optimization techniques.
23

Explain output sections (continued):
• Access plan: graphical representation of the query execution; displays the
order of operations along with rows returned, cost and I/O for each
operator.
• Extended diagnostic information: warnings, informational or error
messages returned by the optimizer.
• Plan details: detailed information for each operator including arguments,
predicates, estimates (filter factor), and input/output streams.
• Objects used in the access plan: any objects (table/index) used by the
query, and/or the access plan is displayed on this are which includes
statistical information.
24

>>-db2exfmt--+-----+--+-------------+--+-------------+---------->
'- -1-' '- -d--dbname-' '- -e--schema-'
>--+--------+--+---------------------------+--+-----+----------->
'- -f--O-' | .-----------. | '- -l-'
| V | |
'- -g--+---+----+-------+-+-'
'-x-' +-O-----+
+-I-----+
+-C-----+
'-+-T-+-'
'-F-'
.- -t-.
>--+-----------+--+--------------+--+-------------+--+-----+---->
'- -n--name-' '- -o--outfile-' '- -s--schema-'
>--+-----------------------+--+----------------+---------------->
'- -u--userID--password-' '- -w--timestamp-'
>--+--------------+--+--------------+--+-----+-----------------><
'- -#--sectnbr-' '- -v--srcvers-' '- -h-'
25

SET CURRENT EXPLAIN MODE statement
• Changes the value of the CURRENT EXPLAIN MODE special register
• Compiler returns SQL0217W for a successful compilation of SQL run
under the EXPLAIN mode
• Syntax:
>>-SET CURRENT EXPLAIN MODE -+---+------------------------------>
>--+-NO----------------------+---------------------------------><
+-YES---------------------+
+-EXPLAIN-----------------+
+-REOPT-------------------+
+-RECOMMEND INDEXES-------+
+-EVALUATE INDEXES--------+
+-RECOMMEND PARTITIONINGS-+
+-EVALUATE PARTITIONINGS--+
'-host-variable-----------'
26

Exercise:
• Objective: create explain tables, determine if explain tables
exist, and generate db2exfmt output.
• Use exercise3-1.bat script
• Create the explain tables:
db2 –tvf …sqllibmiscEXPLAIN.DDL
• Verify the explain tables were created:
list tables for all
27

Exercise (continued):
• Use exercise3-2.bat script
set current explain mode explain
select * from customers
db2 set current explain mode no
db2exfmt –d <dbname> -1 –o exercise3-2.exfmt
or
db2exfmt –d <dbname> -g TIC –w -1 –n –s % -# 0 –o
exercise3-2.exfmt
• Review exercise3-2.exfmt.
28

Operators
Unit objectives:
• Describe what an operator is, and how the choice of
operators influences query execution.
• Discuss the most common operators that the optimizer
uses to compose access plan.
29

Operators
What is an operator?
• Action that is performed on data, or the output from a table
or an index, when the access plan for an SQL statement is
executed.
• Unit of query execution.
• Specific operation that occurs in the query execution control
flow.
30

Operators
29  Rows returned
HSJOIN  Operator
( 5)  Sequence
30.3298  Cost (timeron)
4  Cost (I/O)
/------+-------
168 29
TBSCAN TBSCAN
( 6) ( 7)
22.7451 7.57811
3 1
| |
168 29
TABLE: EXAMPLE TABLE: EXAMPLE
MOVIE TRANSACTION_DETAIL
Q1 Q4
31

Operators
• TBSCAN: Retrieves rows directly from data pages.
• IXSCAN: Scans an index of a table with optional start/stop
conditions, producing an ordered stream of rows.
• DELETE: Deletes rows from a table.
• FETCH: Fetches columns from a table using a specific record
identifier.
• FILTER: Filters data by applying one or more predicates to it.
• GENROW: Generates a table of rows.
32

Operators
• GRPBY: Groups rows by common values of designated
columns or functions, and evaluates set functions.
• HSJOIN: Represents a hash join, where two or more tables are
hashed on the join columns.
• INSERT: Inserts rows into a table.
• IXAND: ANDs together the row identifiers (RIDs) from two or
more index scans.
• MSJOIN: Represents a merge join, where both outer and
inner table must be in joint-predicate order.
33

Operators
• NLJOIN: Represents a nested loop join that accesses an inner
table once for each row of the outer table.
• RETURN: Represents the return of data from the query to the
user.
• RIDSCN: Scans a list of row identifiers (RIDs) obtained from
one or more indexes.
• SORT: Sorts rows in the order of specified columns, and
optionally eliminates duplicate entries.
34

Operators
• TEMP: Stores data in a temporary table to be read back out.
• TQ: Transfers table data between database agents.
• UNION: Concatenates streams of rows from multiple tables.
• UNIQUE: Eliminates rows with duplicate values, for specified
columns.
• UPDATE: Updates rows in a table.
35

Operators
Exercise:
• Review the explain output (exercise3-2.exfmt) from the
previous exercise and identify operators in the access plan
section.
• “Explain” the following queries and identify the new
operators introduced:
set current explain mode explain
<SQL query>
db2exfmt –db wsdb -1 –o sect6ex[1,2].out
36

Operators
User the following SQL queries for step #2:
SELECT cust_branch_id, count(cust_id)
FROM customer
GROUP BY cust_branch_id
SELECT mtran_id, cust_ln, cust_fn, branch_location
FROM transaction_master a, customer b, branch c
WHERE a.mtran_cust_id = b.cust_id
AND a.mtran_branch_id = c.branch_id
37

Catalog statistics
Unit objectives:
• Explain how the optimizer uses statistical information to
calculate execution cost of access plan alternatives.
• Introduce various methods of gathering and maintaining
current statistics on tables and indexes.
38

Catalog statistics
• The optimizer is heavily influenced by statistical information
about the size of tables, indexes and statistical views.
• The optimizer also uses statistics about the distribution of
data in columns of tables, indexes and statistical views, if
these columns are used to select rows or to join tables.
• The optimizer uses distribution statistics in its calculations to
estimate the costs of alternative access plans for each query.
• Statistical information is stored in system catalog tables and is
accessible via the SYSCAT schema views.
39

Catalog statistics
• Statistical information used by the optimizer:
– Cluster ratio of indexes (higher is better)
– Number of leaf pages in indexes
– Number of table rows that overflow their original pages
– Number of filled and empty pages in a table (may indicate if table or
index reorganization is needed)
• Catalog statistics are collected and updated by the
RUNSTATS utility.
40

Catalog statistics
RUNSTATS utility:
• Updates statistics about the characteristics of a table and/or
associated indexes, or statistical views.
• For a table, should be run after frequent updates / inserts /
deletes are made to the table or after reorganizing the table.
• For a statistical view, RUNSTATS should be run when changes
to underlying tables have substantially affected the rows
returned by the view.
41

Catalog statistics
RUNSTATS best practices:
• Use RUNSTATS on all tables and indexes frequently accessed
by queries.
• Use RUNSTATS with FREQUENT and QUANTILE value statistics
for complex queries (only collected when using WITH
DISTRIBUTION clause).
• Rerun RUNSTATS after a significant change in the table data.
42

Catalog statistics
RUNSTATS best practices (continued):
Use RUNSTATS in the following situations:
• After loading data and indexes have been created.
• After creating new index on a table.
• After running a REORG utility.
• After significant insert, update or delete activity.
• Before binding application programs.
• After executing REDISTRIBUTE DB PARTITION GROUP
command.
43

Catalog statistics
RUNSTATS examples (basic statistics):
• Collect statistics for table only:
RUNSTATS ON TABLE <schema>.<tablename>
• Collect statistics for indexes only on a given table:
RUNSTATS ON TABLE <schema>.<tablename> FOR INDEXES ALL
• Collect statistics for both table and its indexes:
RUNSTATS ON TABLE <schema>.<tablename> AND INDEXES ALL
44

Catalog statistics
RUNSTATS examples (enhanced statistics):
• Collect statistics for table including distribution statistics:
RUNSTATS ON TABLE <schema>.<tablename> WITH DISTRIBUTION
• Collect statistics for indexes only on a given table including
extended index statistics:
RUNSTATS ON TABLE <schema>.<tablename> FOR DETAILED INDEXES ALL
• Collect statistics for table and its indexes including extended
index and distribution statistics:
RUNSTATS ON TABLE <schema>.<tablename> WITH DISTRIBUTION AND
DETAILED INDEXES ALL
45

Catalog statistics
DB2LOOK utility:
• Mimic option (-m) extracts catalog statistics for all or given
tables.
• Helps recreate optimizer behaviour in another database.
• DB2LOOK for a specific table:
db2look –d <dbname> -a –e –m –z <schema> -t <table1
table2 …> -o <outfile>
• DB2LOOK for all tables in the database:
db2look –d <dbname> -f –m –l –a –e –o <outfile>
• Alternative method: query the catalog tables directly using
SYSCAT schema views.
46

Catalog statistics
Catalog views
Catalog views Description
SYSCAT.TABLES
SYSSTAT.TABLES
Table statistics
SYSCAT.COLUMNS
SYSSTAT.COLUMNS
Column statistics
SYSCAT.COLGROUPS
SYSSTAT.COLGROUPS
Multicolumn statistics
SYSCAT.COLGROUPDISTCOUNTS
SYSSTAT.COLGROUPDISTCOUNTS
Multicolumn distribution statistics
• Value of -1 in the columns indicates no statistics collected.
• SYSCAT views are read-only; SYSSTAT views are updateable.
47

Catalog statistics
Exercise:
• Run exercise5-1.bat to generate DB2LOOK and EXPLAIN output for the
following query:
SELECT a.mtran_id, b.dtran_num, c.cust_ln, c.cust_fn, d.branch_location,
b.dtran_tran_type, substr(e.movie_title,1,25)
FROM transaction_master a, transaction_detail b, customer c,
branch d, movie e
WHERE a.mtran_id = b.dtran_id AND a.mtran_cust_id = c.cust_id AND
a.mtran_branch_id = d.branch_id AND b.dtran_movie_id = e.movie_id
ORDER BY 1, 2, 6, 7
• Review the EXPLAIN output in exercise5-1.exfmt, and check for indicator if
statistics were collected.
• Review the DB2LOOK output in exercise5-1.db2look, and examine the
statistical data returned for the tables used in the query.
48

Catalog statistics
• Run exercise5-2.bat to collect statistics for CUSTOMER table
and generate the DB2LOOK and EXPLAIN output.
• Review the output files exercise5-2.db2look and exercise5-
2.exfmt
• Observe the type of data collected and how it is reflected in
the EXPLAIN output.
49

Catalog statistics
• Run exercise5-3.bat to collect index statistics for BRANCH table only
and generate the DB2LOOK and EXPLAIN output.
• Review the output files exercise5-3.db2look and exercise5-
3.exfmt,and observe the type of data collected and how it is
reflected in the EXPLAIN output.
• Observe the type of data collected and how it is reflected in the
EXPLAIN output.
• Optional: Run exercise5-4.bat to collect RUNSTATS using WITH
DISTRIBUTION AND DETAILED INDEXES ALL for all tables. Review the
data.
50

Predicates and Joins
Unit objectives:
• Define the terms predicate, join and sargable.
• Describe predicate and joins types, and their on query
execution.
• Discuss the pros and cons of each join method.
51

• A predicate is an element of a search condition that
expresses or implies a comparison operation.
• Predicates are included in clauses beginning with WHERE
or HAVING.
• Predicates limit or filters the query result set.
• When designing predicates, aim for the highest possible
selectivity so that the fewest rows are returned.
52

Example:
SELECT *
FROM movie
WHERE movie_genre = `A`
AND movie_cost_price = 21.99
AND movie_rent_day > 1
• The following parts are the predicates:
– movie_genre = ‘A’
– movie_cost_price = 21.99
– movie_rent_day > 1
53

Predicate types:
• Range-delimiting predicates: limit the scope of index scan
using start and stop conditions.
• Index-sargable predicates: evaluated from the index because
the columns in the predicate are part of the index key.
• Data-sargable predicates: cannot be evaluated from the index
but can be evaluated while rows remain in the buffer.
• Residual predicates: involve subqueries (e.g., ANY, ALL, SOME
or IN clauses), or those that read LONG VARCHAR or LOB data
stored in files separate from the table.
54

What does “sargable” mean?
• Contraction of Search ARGument-able.
• A SARGable predicate is a term that can be used as a
search argument.
• Unsargable predicates, or join predicates, require one or
more joins to be performed before the predicate can be
evaluated.1
55

Range-Delimiting Predicates:
• Limit the scope of an index scan.
• Use start and stop key values for the index search.
• The optimizer uses index data instead of reading the base
table when it evaluates these predicates, e.g.:
SELECT movie_title FROM movie WHERE movie_id='D0010'
– Since movie_id column is defined in the primary key index, the
optimizer scans the index within the range of the predicate value.
• See sect6-range delimiting example.exfmt.
56

3) IXSCAN: (Index Scan)
Predicates:
----------
2) Start Key Predicate,
Comparison Operator: Equal (=)
Subquery Input Required: No
Filter Factor: 0.00595238
Predicate Text:
--------------
(Q1.MOVIE_ID = 'D0010')
2) Stop Key Predicate,
Predicate Text:
--------------
(Q1.MOVIE_ID = 'D0010')
57

Index-Sargable Predicates:
• Cannot limit the scope of a search.
• Can be evaluated from the index, because the columns that
are referenced in the predicate are part of the index key, e.g.:
SELECT movie_title FROM movie
WHERE movie_id='D0010' AND movie_type > null
– Index is defined on columns (movie_id, movie_genre, movie_type).
– Movie_id would be applied as a start-stop key since it is the leading
column in the index, while movie_type would be applied as index-sargable
to fulfill the “>” operator.
• See sect6-index sargable.exfmt.
58

3) IXSCAN: (Index Scan)
Predicates:
----------
2) Sargable Predicate,
Comparison Operator: Less Than (<)
Predicate Text:
--------------
(NULL < Q1.MOVIE_TYPE)
3) Start Key Predicate,
Predicate Text:
--------------
(Q1.MOVIE_ID = 'D0010')
59

Data-Sargable Predicates:
• Cannot be evaluated from an index.
• Can be evaluated by accessing individual rows in a table, e.g.:
SELECT movie_title FROM movie WHERE movie_type = 'A'
– Since movie_type column is not part of any index, it will be
treated as data-sargable predicate.
• See sect6-data-sargable example.exfmt.
60

2) TBSCAN: (Table Scan)
Predicates:
----------
Filter Factor: 0.04
Predicate Text:
--------------
(Q1.MOVIE_TYPE = 'A')
61

Characteristic Predicate Type
Range-delimiting
Index-sargable
Data-sargable
Residual
Reduce index I/O Yes No No No
Reduce data page I/O Yes Yes No No
Reduce the number of
rows that are passed
internally
Yes Yes Yes No
Reduce the number of
qualifying rows
Yes Yes Yes Yes
62

• A join is the process of combining two streams of rows
(result sets).
• Left side of the join is referred to as the OUTER.
• Right side of the join is referred to as the INNER.
• Three explicit join operators exist:
– Nested loop join (NLJOIN)
– Merge join (MSJOIN)
– Hash join (HSJOIN)
63

Nested loop join (NLJOIN):
• Takes the top row from the outer stream and compares it to all
rows from the inner stream.
• Performance heavily influenced by the row count in the inner
stream (i.e., the larger the inner, the slower the performance).
• Inner table is usually an IXSCAN or FETCH-IXSCAN combination.
• To obtain better filtering, the NLJOIN usually pushes the join
predicates down into the inner operator (e.g., IXSCAN).
• The only join type available at optimization level 0.
64

NLJOIN considerations:
• Poor performance when a large number of rows exists on the inner
stream.
• Slow TEMP operation on inner because it is a blocking operation
and requires subsequent TBSCAN.
• Check if cardinality estimates match the actual row counts. Even
though the optimizer estimated one row, it may not be the case at
execution time.
• Check if an index-sargable predicate is used. This can be expensive
when the optimizer scans to match a single row across the entire
index.
65

NLJOIN example:
66

Merge join (MSJOIN):
• Requires inputs sorted on the join column which must be neither
LONG nor LOB.
• Only primary join predicate is applied at MSJOIN, and it must be
equality join predicate.
• Remaining join predicates applied as RESIDUAL predicates not in
MSJOIN but at FILTER.
• MSJOIN scans both tables concurrently (unlike NLJOIN), and it only
scans each table once, unless the outer has duplicates.
• Outer table is preferably the one with less duplicates.
67

MSJOIN considerations:
• SORT is a blocking operator and may be expensive.
• SORT can spill to buffer pools or disk (TEMPSPACE).
• Applies residual predicates after joining.
• If outer is indexed on join column, even without a SORT, the index
scan is typically a full index scan (sargable predicate).
• Duplicates hurt all joins but with other joins one can take advantage
of prefetching or having the inner buffered in memory. With
MSJOIN, the inner is rescanned for every duplicate value in the
outer row join column.
68

MSJOIN example:
69

Hash join (HSJOIN):
• First, the inner table (build table) is scanned and placed into
memory buffers partitioned by hash code.
• Next, the outer table (probe table) is scanned row by row through
the same hash code on the same join column(s).
– If the hash codes match, then the join columns are compared for their
rows to be matched.
– If the row is on the memory buffer, then compare it right away.
– If not in the memory buffer, then probe row is also written into a
temporary table on disk.
• Finally, temporary tables containing rows from the same partitions
are processed.
70

HSJOIN considerations:
• Requires significant sort heap space which may cause contention
with other queries.
• May spill to disk if insufficient sort heap or memory buffers for hash
table.
• Requires join columns to be of the same type.
• Does not guarantee ordering, and if outer table’s ordering is
needed, not an ideal choice.
• If intra-partition parallelism is on (INTRA_PARALLEL enabled,
DFT_DEGREE > 1), the HSJOIN runs in parallel (i.e., build table
partitioned in parallel and loaded into memory).
71

HSJOIN example:
72

Cardinality Estimates
Unit objectives:
• Define the concepts of cardinality, filter factor and cardinality
estimates.
• Describe how the optimizer calculates cardinality estimates.
• Show how different statistics affect cardinality estimate
calculations.
73

• Cardinality (CARD)
– Table statistics: total number of rows in a table.
– Column statistics: the number of unique values in a column.
• Filter factor (FF)
– The percentage of rows estimated by the optimizer to be allowed to
filter through a given predicate.
– Expressed as probability: 0 – 1.0 (0 to 100%).
• Cardinality estimate
– The number of rows estimated by the optimizer to be returned for a
given predicate based on its filter factor.
– Calculated as Filter Factor x Cardinality.
74

• Filtering can occur with a local or join predicate
– Movie_title = ‘RANGO’ (local predicate)
– a.c1 = b.c2 (join predicate)
• COLCARD
– Number of distinct values in the column
– SYSCAT.COLUMNS
• VALCOUNT
– Frequency with which the data value occurs in column
– SYSCAT.COLDIST
75

Calculating FF for equality predicates (c1=value)
• No statistics available/collected
– FF = default value which is typically ¼
– FF = 0.04 or 0.25
• Only basic statistics collected
– FF = 1/COLCARD
– COLCARD is for the column of the predicate being calculated
• With basic and distribution statistics collected
– FF = VALCOUNT / CARD
76

No statistics collected:
Query:
SELECT movie_id
FROM movie
WHERE movie_genre=‘A’
Statistics:
– SYSSTAT.TABLE  CARD = -1
– SYSSTAT.COLUMNS  COLCARD = -1
– SYSSTAT.COLDIST  no data
Filter Factor: 0.04
Explain snippet: sect7-nostat.exmt
77

Predicates:
----------
Filter Factor: 0.04
Predicate Text:
--------------
(Q1.MOVIE_GENRE = 'A')
78

Only basic statistics collected:
Query:
SELECT movie_id
FROM movie
Statistics:
– RUNSTATS ON TABLE example.movie
– SYSSTAT.TABLE  CARD = 146
– SYSSTAT.COLUMNS  COLCARD = 6 (for movie_genre col)
– SYSSTAT.COLDIST  no data
Filter Factor: 1/COLCARD = 1/6 = 0.166667
Explain snippet: sect7-basicstat.exfmt
79

Predicates:
----------
Predicate Text:
--------------
80

Basic and distribution statistics collected:
Query:
SELECT movie_id
FROM movie
Statistics:
– RUNSTATS ON TABLE example.movie WITH DISTRIBUTION
– SYSSTAT.TABLE  CARD = 146
– SYSSTAT.COLUMNS  COLCARD = 6 (for movie_genre col)
– SYSSTAT.COLDIST  VALCOUNT = 72 (movie_genre=‘A’)
Filter Factor: VALCOUNT / CARD = 72/146 = 0.493151
Explain snippet: sect7-diststat.exfmt
81

Predicates:
----------
Predicate Text:
--------------
82

Basic Tuning Hints
Unit objectives:
• Introduce basic access plan tuning techniques.
• Describe the common tuning elements (configuration
parameters, registry variables and CURRENT QUERY
OPTIMIZATION special register).
83

Basic Tuning Hints
• Tuning is usually done by making changes to one of the
configuration categories that influence the optimizer
behaviour:
– Database manager configuration
– Database configuration
– Query compiler registry variables
– CURRENT QUERY OPTIMIZATION special register
84

Basic Tuning Hints
Database manager configuration:
• Parallelism (INTRA_PARALLEL): indicates whether intra-partition
parallelism is enabled. When YES, some parts of query execution (e.g.,
index sort) can run in parallel within a single database partition.
• CPU Speed (CPUSPEED): the optimizer uses the CPU speed (milliseconds
per instruction) to estimate the cost of performing certain operations.
• Communications speed (COMM_BANDWIDTH): the optimizer uses the
value (megabytes per second) in this parameter to estimate the cost of
performing certain operations between the database partition servers in a
partitioned database environment.
85

Basic Tuning Hints
Database configuration:
• Buffer pools (BUFFPAGE):
– Determined by the buffpage parameter, if using buffpage as default for
one bufferpool, or a calculation based on the contents of syscat.
bufferpools.
– The number shown is the total number of bufferpool pages allocated
for the database. The table below shows an example of how we arrive
at the total of 5000 pages:
Bufferpool name Size
IBMDEAFULTBP 1000
BPLONG 4000
BPTEMP 1000
Total 5000
86

Basic Tuning Hints
Database configuration (continued):
• Sort heap size (SORTHEAP): Defines the maximum number of private
memory pages to be used for private sorts or the maximum number of
shared memory pages to be used for shared sorts.
• Database heap size (DBHEAP): Contains metadata for tables, indexes,
table spaces and bufferpools. There is one database heap per database.
• Lock list size (LOCKLIST): Indicates the amount of storage allocated to the
lock list. There is one lock list per database and it contains the locks held
by all applications concurrently connected to the database.
• Maximum lock list (MAXLOCKS): Defines a percentage of the lock list held
by any one application that must be filled before the database manager
performs lock escalation.
87

Basic Tuning Hints
• Average applications (AVG_APPLS): Used by the SQL optimizer to help
estimate how much bufferpool will be available at run-time for the access
plan chosen (since the bufferpool is shared by all active connections).
• Optimization class (DFT_QUERYOPT): Tells the optimizer which level of
optimization to use when compiling SQL query (default=5). The higher the
level, the more complex algorithms are considered.
• Query degree (DFT_DEGREE): Represents the degree of intra-partition
parallelism for an SQL statement. If set to ANY, the optimizer is sensitive to
the actual number of CPUs that are online. If set to ANY and intra_parallel
is enabled, then the number of CPUs on test and production should be
configured the same.
88

Basic Tuning Hints
• Number of frequent values retained (NUM_FREQVALUES): Allows you to
specify the number of "most frequent values" that will be collected when
the WITH DISTRIBUTION option is specified with the RUNSTATS command.
• Number of quantiles retained (NUM_QUANTILES): Controls the number
of quantiles that will be collected when the WITH DISTRIBUTION option is
specified on the RUNSTATS command.
89

Basic Tuning Hints
Query compiler variables (common):
• DB2_EXTENDED_OPTIMIZATION: Specifies whether or not the query
optimizer uses optimization extensions to improve query performance.
• DB2_SORT_AFTER_TQ: Determines how the optimizer works with directed
table queues in a partitioned database environment when the receiving
end requires the data to be sorted, and the number of receiving nodes is
equal to the number of sending nodes.
• DB2_INLIST_TO_NLJN: Causes the optimizer to favour nested loop joins to
join the list of values, using the table that contributes the IN list as the
inner table in the join.
* See the DB2 V9.7 Info Center for additional variables
(search for “Query compiler variables”)
90

Basic Tuning Hints
CURRENT QUERY OPTIMIZATION special register:
• Controls the query optimization class for dynamic SQL statements.
• Overrides the dft_queryopt database configuration parameter.
• Affects the succeeding SQL statements only for the current session.
• Assigned by the SET CURRENT QUERY OPTIMIZATION statement.
>>-SET--CURRENT--QUERY--OPTIMIZATION--+---+----------------->
>--+-0-------------+---------------------------------------><
+-1-------------+
+-2-------------+
+-3-------------+
+-5-------------+
+-7-------------+
+-9-------------+
'-host-variable-'
91

Statistical Views
Unit objectives:
• Define the purpose and benefits of statistical views.
• Provide a basic example and steps for creating a
statistical view.
92

Statistical Views
• Views with statistics that can be used to improve
cardinality estimates for queries in which the view
definition overlaps with the query definition.
• A powerful feature as it provides the optimizer with
accurate statistics for determining cardinality estimates
for queries with complex sets of (possibly correlated)
predicates involving one or more tables.
93

Statistical Views
• Mechanism to provide the optimizer with more sophisticated
statistics that represent more complex relationships:
– Comparisons involving expressions:
price > MSRP + Dealer_markup
– Relationships spanning multiple tables:
product.name = 'Alloy wheels' and product.key = sales.product_key
– Anything other than predicates involving independent
attributes and simple comparison operations.
94

Statistical Views
• Statistical views are able to represent these types of complex
relationships, because statistics are collected on the result set
returned by the view, rather than the base tables reference by
the view.
• When a SQL query is compiled, the optimizer matches the SQL
query to the available statistical views. When the optimizer
computes cardinality estimates for intermediate result sets, it
uses the statistics from the view to compute a better
estimate.
95

Statistical Views
Usage:
• Create a view:
CREATE VIEW <view name> AS (<select clause>)
• Enable the view for optimization using ALTER VIEW statement:
ALTER VIEW <view name> ENABLE QUERY OPTIMIZATION
• Issue the RUNSTATS command to populate system catalog
tables with statistics for the view:
RUNSTATS ON <schema>.<view name> <runstats clause>
96

Optimizer Profiles
Unit objectives:
• Examine the technique for overriding the default access
plan.
• Discuss the structure of optimizer profile.
• Introduce the common optimizer guidelines.
• Provide instructions for directing the optimizer on how to
perform table access or joins in specific ways.
97

Optimizer Profiles
• Mechanism to alter default access plan
– Overrides the default access plan selected by the optimizer.
– Instructs the optimizer how to perform table access or join.
– Allows users to control specific parts of access plan.
• Can be employed without changing the application code
– Compose optimization profile, add to db, rebind targeted packages.
• Should only be used after all other tuning options exhausted
– Query improvement, RUNSTATS, indexes, optimization class, db and
dbm configs, etc.
– Should not be employed to permanently mitigate the effect of
inefficient queries.
98

Optimizer Profiles
Anatomy of an optimizer profile:
• XML document
– Elements and attributes used to define optimization guidelines.
– Must conform to a specific optimization profile schema.
• Profile Header (exactly one)
– Meta data and processing directives.
– Example: schema version.
• Global optimization guidelines (at most one)
– Applies to all statements for which profile is in effect.
– Example: eligible MQTs guideline defining MQTs to be considered for routing.
• Statement optimization guidelines (zero or more)
– Applies to a specific statement for which profile is in effect.
– Specifies directives for desired execution plan.
99

Optimizer Profiles
Anatomy of an optimizer profile (continued):
<?xml version="1.0" encoding="UTF-8"?>
<OPTPROFILE VERSION="9.7.0.0">
<OPTGUIDELINES>
<MQT NAME="Test.AvgSales"/>
<MQT NAME="Test.SumSales"/>
</OPTGUIDELINES>
<STMTPROFILE ID="Guidelines for TPCD">
<STMTKEY SCHEMA="TPCD">
<![CDATA[SELECT * FROM TAB1]]>
</STMTKEY>
<OPTGUIDELINES>
<IXSCAN TABLE=“TAB1" INDEX="I_SUPPKEY"/>
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
 Profile Header
 Global optimization
guidelines section. Optional
but at most one.
 Statement guidelines
section. Zero or more.
100

Optimizer Profiles
• Each statement profile contains a statement key (STMTKEY) and one or
more statement-level optimization guidelines (OPTGUIDELINES).
• The statement key identifies the statement or query to which the
statement-level optimization guidelines apply.
<STMTKEY SCHEMA="TPCD">
<![CDATA[SELECT * FROM TAB1]]>
</STMTKEY>
• The statement-level optimization guidelines identify one of more
directives, e.g. access or join requests, which specify methods for
accessing or joining tables in the statement.
<OPTGUIDELINES>
<IXSCAN TABID=“TAB1" INDEX="I_SUPPKEY"/>
</OPTGUIDELINES>
101

Optimizer Profiles
Optimizer guidelines:
• Access plan guidelines:
– Base access request (method to access a table, e.g. TBSCAN, IXSCAN)
– Join request (method and sequence for performing a join, e.g. HSJOIN, NLJOIN)
• Query rewrite guidelines:
– INLIST2JOIN (convert IN-list to join)
– SUBQ2JOIN (convert subquery to join)
– NOTEX2AJ (convert NOT EXISTS subquery to anti-join)
– NOTIN2AJ (convert NOT IN subquery to anti-join)
• General optimization guidelines:
– REOPT (ONCE/ALWAYS/NONE)
– MQT choices
– QRYOPT (set the query optimization class)
– RTS (limit the amount of time taken by real-time statistics collection)
102

Optimizer Profiles
How to create and use an optimizer profile:
1. Create the SYSTOOLS.OPT_PROFILE table in the SYSTOOLS schema:
– Using the CREATE TABLE statement:
CREATE TABLE SYSTOOLS.OPT_PROFILE (
SCHEMA VARCHAR(128) NOT NULL,  Schema name
NAME VARCHAR(128) NOT NULL,  Profile name
PROFILE BLOB (2M) NOT NULL,  Profile XML
PRIMARY KEY ( SCHEMA, NAME ));
– Using the SYSINSTALLOBJECTS procedure:
db2 "call sysinstallobjects('opt_profiles', 'c', '', '')"
2. Compose optimization profile in XML file prof1.xml.
103

Optimizer Profiles
How to create and use an optimizer profile (continued):
3. Create file prof1.del as follows:
"EXAMPLE","PROF1","prof1.xml"
4. Import prof1.del into SYSTOOLS.OPT_PROFILE table as follows:
IMPORT FROM prof1.del OF DEL
MODIFIED BY LOBSINFILE
INSERT INTO SYSTOOLS.OPT_PROFILE;
5. Enable the PROF1 profile for the current session:
db2 "set current optimization profile = 'EXAMPLE.PROF1'"
104

Optimizer Profiles
Checking if the guidelines are applied:
• SQL0437W reason code 13 is issued if there were problems with applying
the guidelines (usually due to syntax error)
• Db2exfmt output contains a section with details on the problems
encountered while attempting to apply the guidelines:
Extended Diagnostic Information:
--------------------------------
No extended Diagnostic Information for this statement.
• Db2exfmt Profile Information section informs on which guidelines were
used:
Profile Information:
--------------------
OPT_PROF: (Optimization Profile Name)
EXAMPLE.PROF1
STMTPROF: (Statement Profile Name)
Guidelines for the items out query
105

Optimizer Profiles
Exercise:
• Examine the optimize profile in exercise9_1.xml:
– Global guideline forces OPTLEVEL 7 for all queries in the current
session.
– Statement guideline forces nested loop join method (NLJOIN) for
tables MOVIE and TRANSACTION_DETAIL for a given query.
• Run exercise9_1.bat – this will create and apply an optimizer
profile named PROF1. It will also collect db2exfmt output
before and after using the profile.
• Compare the before and after .exfmt files – how can you tell if
the guidelines were applied?
106

Understanding DB2 Optimizer

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Understanding DB2 Optimizer

Similar a Understanding DB2 Optimizer (20)

Último

Último (20)

Understanding DB2 Optimizer

Notas del editor