SlideShare una empresa de Scribd logo
1 de 107
Understanding DB2 LUW 
Optimizer 
Anthony Reina 
Zoran Kulina
Speakers 
Anthony Reina 
• Accelerated Value Specialist at IBM Toronto Lab. 
• Over 10 years of experience is various roles in DB2 Support. 
• Works with IBM customers on resolving defects, developing solutions, and 
conducting education courses. 
• Specializes in DB2 engine, particularly recovery and optimizer areas. 
• Contact: aereina@ca.ibm.com 
Zoran Kulina 
• Software engineer and technical lead at DB2 Advanced Support team at IBM 
Toronto Lab. 
• Specialties include database internal architecture, SQL performance tuning, 
deep debugging and advanced problem determination. 
• Contact: zoran@ca.ibm.com 
2
Agenda 
1. Introduction and high-level overview 
2. Phases of the optimizer 
3. Explain facility and DB2EXFMT Tool 
4. Operators 
5. Catalog statistics 
6. Predicates and joins 
7. Cardinality estimates 
8. Basic tuning hints 
9. Statistical views 
10. Optimizer profiles and guidelines 
3
Introduction 
• What is this workshop about? 
– This half-day workshop provides hands-on introduction to the DB2 
LUW optimizer. 
• Why learn about the optimizer? 
– Understanding how the optimizer works and knowing how to 
influence its behaviour can lead to improved query performance and 
better resource usage. 
• What are the workshop objectives? 
– To provide participants with basic knowledge of the optimizer and 
related concepts, and to serve as a starting point for further study of 
the optimizer techniques. 
4
Introduction 
• What are the prerequisites? 
– Basic knowledge of SQL and relational database concepts is assumed. 
• What is the workshop format? 
– This workshop introduces the optimizer through a series of interactive 
hands-on exercises. Each unit is first discussed and then accompanied 
by an exercise to demonstrate the unit concepts. 
5
Introduction 
• Sample database 
Database name: DVDSR 
Schema name: EXAMPLE 
Tables: CUSTOMER 
BRANCH 
MOVIE 
TRANSACTION_MASTER 
TRANSACTION_DETAIL 
6
High-Level Overview 
Unit objectives: 
• Define what the optimizer is and how it works from a 
high-level perspective. 
• Introduce the concept of timeron. 
• Discuss major factors influencing the optimizer. 
• List various statistics used by the optimizer. 
7
High-Level Overview 
Optimizer purpose: 
• Chooses access plan for data manipulation language (DML) SQL statements. 
Means of fulfilling the purpose: 
• Calculates the execution cost of many alternative access plans, and then selects 
the one with the minimal estimated cost. 
• Bases calculations on catalog statistics. 
• Measures cost in timerons: 
– An abstract unit of measure. 
– Not directly reciprocal to actual elapsed time. 
– Rough relative estimate of the resources (cost) required by the database manager to 
execute an access plan. 
– Access plan cost is cumulative: each operator cost is added to the next one. 
8
High-Level Overview 
Factors influencing the optimizer: 
• Database design 
– Indexes 
– Tablespace configuration 
• Table and index statistics 
– Basis of plan costs 
– Generated and collected by the RUNSTATS command 
• Operating environment 
– Influences plan costs 
– Various configuration parameters 
• Optimization class (OPTLEVEL) 
– Controls the level of modeling considered during compilation 
– Influences SQL compilation time 
9
High-Level Overview 
Statistics used by the optimizer: 
• Basic statistics 
– Number of rows in a table 
– Number of pages for a table 
• Non-uniform distribution statistics 
– NUM_FREQVALS (equality) 
– NUM_QUANTILES (range predicates) 
• Index clustering (DETAILED index statistics) 
– Improves the estimates of data page fetches 
• Statistical views (STATVIEWS) 
– Used table views with predicate condition as defined in the case query 
• Column group statistics 
– Statistics collected for more than one column 
10
Phases of the Optimizer 
Unit objectives: 
• Describe the stages of the compilation process. 
• Explain what happens from the time a query is 
submitted to point where its result set is returned. 
11
Phases of the Optimizer 
12
Phases of the Optimizer 
• Parse Query: analyzes the query to validate the syntax. 
• Check Semantics: ensures that there are no inconsistencies 
among parts of the statement. 
• Rewrite Query: uses the global semantics that are stored in 
the Query Graph Model (QGM) to transform the query into a 
form that can be optimized more easily. The result is stored in 
the QGM, as well. 
• Pushdown Analysis (Federated DB only): recommends to the 
optimizer whether an operation can be remotely evaluated or 
pushed down at a data source. 
13
Phases of the Optimizer 
• Optimize Access Plan: using the QGM as input, it generates 
many alternative access plans, at the end it select the most 
cost effective plan for satisfying the query. 
• Remote SQL Generation (Federated DB only): creates an 
efficient SQL statement for operations that are performed by 
each remote data source based on the SQL dialect at that data 
source. 
• Generate Executable Code: uses the access plan and the 
QGM to create an executable access plan or section for the 
query. 
14
Phases of the Optimizer 
15
Explain Facility and DB2EXFMT Tool 
Unit objectives: 
• Define what access plan is. 
• Introduce the tools used to generate the explain 
snapshot (access plan and related data). 
• Discuss the components of the db2exfmt report and how 
to interpret its output. 
16
Explain Facility and DB2EXFMT Tool 
Explain purpose: 
• Captures information about the access plan, optimizer inputs, 
and environment of an SQL query. 
• Helps to understand how the SQL query is compiled and 
executed. 
• Shows how configuration parameter changes impact query 
performance. 
• Indispensible tool for query problem determination. 
17
Explain Facility and DB2EXFMT Tool 
Explain information includes: 
• Sequence of operations to process the query 
• Cost information 
• Predicates and selectivity estimates for each predicate 
• Statistics for all objects referenced in the SQL query at 
the time the explain information is captured 
18
Explain Facility and DB2EXFMT Tool 
Explain tables: 
• Must be created before the Explain can be invoked: 
– SYSINTSALLOBJECTS() procedure 
– EXPLAIN.DDL 
• Must be created per schema/user or under SYSTOOLS schema 
• Capture access plan data when the Explain facility is activated 
19
Explain Facility and DB2EXFMT Tool 
Explain tables: 
ADVICE_INDEX 
ADVICE_INSTANCE 
ADVICE_MQT 
ADVICE_PARTITION 
ADVICE_TABLE 
ADVICE_WORKLOAD 
EXPLAIN_ACTUALS 
EXPLAIN_ARGUMENT 
EXPLAIN_INSTANCE 
EXPLAIN_OBJECT 
EXPLAIN_OPERATOR 
EXPLAIN_PREDICATE 
EXPLAIN_STATEMENT 
EXPLAIN_STREAM 
20
Explain Facility and DB2EXFMT Tool 
Explain tools: 
db2expln 
• Provides access plan information for one or more SQL query packages 
• Shows the actual implementation of the chosen access plan 
• Does not show detailed optimizer information 
db2exfmt 
• Formats the contents of the explain tables 
• Provides more detailed explain information than db2expln 
• Shows clues as to why the optimizer has made particular decisions 
21
Explain Facility and DB2EXFMT Tool 
db2exfmt output sections: 
1. Explain instance 
2. Database context 
3. Package context 
4. Original statement 
5. Optimized statement 
6. Access plan 
7. Extended diagnostic information 
8. Plan details 
9. Objects used in the access plan 
22
Explain Facility and DB2EXFMT Tool 
db2exfmt output sections: 
• Explain instance: general information (e.g., version, time, 
schema, etc). 
• Database context: configuration parameters used during the 
query compilation. 
• Package context: query type and optimization level. 
• Original statement: actual query text. 
• Optimized statement: rewritten query text according to the 
applied optimization techniques. 
23
Explain Facility and DB2EXFMT Tool 
Explain output sections (continued): 
• Access plan: graphical representation of the query execution; displays the 
order of operations along with rows returned, cost and I/O for each 
operator. 
• Extended diagnostic information: warnings, informational or error 
messages returned by the optimizer. 
• Plan details: detailed information for each operator including arguments, 
predicates, estimates (filter factor), and input/output streams. 
• Objects used in the access plan: any objects (table/index) used by the 
query, and/or the access plan is displayed on this are which includes 
statistical information. 
24
Explain Facility and DB2EXFMT Tool 
>>-db2exfmt--+-----+--+-------------+--+-------------+----------> 
'- -1-' '- -d--dbname-' '- -e--schema-' 
>--+--------+--+---------------------------+--+-----+-----------> 
'- -f--O-' | .-----------. | '- -l-' 
| V | | 
'- -g--+---+----+-------+-+-' 
'-x-' +-O-----+ 
+-I-----+ 
+-C-----+ 
'-+-T-+-' 
'-F-' 
.- -t-. 
>--+-----------+--+--------------+--+-------------+--+-----+----> 
'- -n--name-' '- -o--outfile-' '- -s--schema-' 
>--+-----------------------+--+----------------+----------------> 
'- -u--userID--password-' '- -w--timestamp-' 
>--+--------------+--+--------------+--+-----+----------------->< 
'- -#--sectnbr-' '- -v--srcvers-' '- -h-' 
25
Explain Facility and DB2EXFMT Tool 
SET CURRENT EXPLAIN MODE statement 
• Changes the value of the CURRENT EXPLAIN MODE special register 
• Compiler returns SQL0217W for a successful compilation of SQL run 
under the EXPLAIN mode 
• Syntax: 
>>-SET CURRENT EXPLAIN MODE -+---+------------------------------> 
>--+-NO----------------------+--------------------------------->< 
+-YES---------------------+ 
+-EXPLAIN-----------------+ 
+-REOPT-------------------+ 
+-RECOMMEND INDEXES-------+ 
+-EVALUATE INDEXES--------+ 
+-RECOMMEND PARTITIONINGS-+ 
+-EVALUATE PARTITIONINGS--+ 
'-host-variable-----------' 
26
Explain Facility and DB2EXFMT Tool 
Exercise: 
• Objective: create explain tables, determine if explain tables 
exist, and generate db2exfmt output. 
• Use exercise3-1.bat script 
• Create the explain tables: 
db2 –tvf …sqllibmiscEXPLAIN.DDL 
• Verify the explain tables were created: 
list tables for all 
27
Explain Facility and DB2EXFMT Tool 
Exercise (continued): 
• Use exercise3-2.bat script 
set current explain mode explain 
select * from customers 
db2 set current explain mode no 
db2exfmt –d <dbname> -1 –o exercise3-2.exfmt 
or 
db2exfmt –d <dbname> -g TIC –w -1 –n –s % -# 0 –o 
exercise3-2.exfmt 
• Review exercise3-2.exfmt. 
28
Operators 
Unit objectives: 
• Describe what an operator is, and how the choice of 
operators influences query execution. 
• Discuss the most common operators that the optimizer 
uses to compose access plan. 
29
Operators 
What is an operator? 
• Action that is performed on data, or the output from a table 
or an index, when the access plan for an SQL statement is 
executed. 
• Unit of query execution. 
• Specific operation that occurs in the query execution control 
flow. 
30
Operators 
29  Rows returned 
HSJOIN  Operator 
( 5)  Sequence 
30.3298  Cost (timeron) 
4  Cost (I/O) 
/------+------- 
168 29 
TBSCAN TBSCAN 
( 6) ( 7) 
22.7451 7.57811 
3 1 
| | 
168 29 
TABLE: EXAMPLE TABLE: EXAMPLE 
MOVIE TRANSACTION_DETAIL 
Q1 Q4 
31
Operators 
• TBSCAN: Retrieves rows directly from data pages. 
• IXSCAN: Scans an index of a table with optional start/stop 
conditions, producing an ordered stream of rows. 
• DELETE: Deletes rows from a table. 
• FETCH: Fetches columns from a table using a specific record 
identifier. 
• FILTER: Filters data by applying one or more predicates to it. 
• GENROW: Generates a table of rows. 
32
Operators 
• GRPBY: Groups rows by common values of designated 
columns or functions, and evaluates set functions. 
• HSJOIN: Represents a hash join, where two or more tables are 
hashed on the join columns. 
• INSERT: Inserts rows into a table. 
• IXAND: ANDs together the row identifiers (RIDs) from two or 
more index scans. 
• MSJOIN: Represents a merge join, where both outer and 
inner table must be in joint-predicate order. 
33
Operators 
• NLJOIN: Represents a nested loop join that accesses an inner 
table once for each row of the outer table. 
• RETURN: Represents the return of data from the query to the 
user. 
• RIDSCN: Scans a list of row identifiers (RIDs) obtained from 
one or more indexes. 
• SORT: Sorts rows in the order of specified columns, and 
optionally eliminates duplicate entries. 
34
Operators 
• TEMP: Stores data in a temporary table to be read back out. 
• TQ: Transfers table data between database agents. 
• UNION: Concatenates streams of rows from multiple tables. 
• UNIQUE: Eliminates rows with duplicate values, for specified 
columns. 
• UPDATE: Updates rows in a table. 
35
Operators 
Exercise: 
• Review the explain output (exercise3-2.exfmt) from the 
previous exercise and identify operators in the access plan 
section. 
• “Explain” the following queries and identify the new 
operators introduced: 
set current explain mode explain 
<SQL query> 
db2exfmt –db wsdb -1 –o sect6ex[1,2].out 
36
Operators 
Exercise (continued): 
User the following SQL queries for step #2: 
SELECT cust_branch_id, count(cust_id) 
FROM customer 
GROUP BY cust_branch_id 
SELECT mtran_id, cust_ln, cust_fn, branch_location 
FROM transaction_master a, customer b, branch c 
WHERE a.mtran_cust_id = b.cust_id 
AND a.mtran_branch_id = c.branch_id 
37
Catalog statistics 
Unit objectives: 
• Explain how the optimizer uses statistical information to 
calculate execution cost of access plan alternatives. 
• Introduce various methods of gathering and maintaining 
current statistics on tables and indexes. 
38
Catalog statistics 
• The optimizer is heavily influenced by statistical information 
about the size of tables, indexes and statistical views. 
• The optimizer also uses statistics about the distribution of 
data in columns of tables, indexes and statistical views, if 
these columns are used to select rows or to join tables. 
• The optimizer uses distribution statistics in its calculations to 
estimate the costs of alternative access plans for each query. 
• Statistical information is stored in system catalog tables and is 
accessible via the SYSCAT schema views. 
39
Catalog statistics 
• Statistical information used by the optimizer: 
– Cluster ratio of indexes (higher is better) 
– Number of leaf pages in indexes 
– Number of table rows that overflow their original pages 
– Number of filled and empty pages in a table (may indicate if table or 
index reorganization is needed) 
• Catalog statistics are collected and updated by the 
RUNSTATS utility. 
40
Catalog statistics 
RUNSTATS utility: 
• Updates statistics about the characteristics of a table and/or 
associated indexes, or statistical views. 
• For a table, should be run after frequent updates / inserts / 
deletes are made to the table or after reorganizing the table. 
• For a statistical view, RUNSTATS should be run when changes 
to underlying tables have substantially affected the rows 
returned by the view. 
41
Catalog statistics 
RUNSTATS best practices: 
• Use RUNSTATS on all tables and indexes frequently accessed 
by queries. 
• Use RUNSTATS with FREQUENT and QUANTILE value statistics 
for complex queries (only collected when using WITH 
DISTRIBUTION clause). 
• Rerun RUNSTATS after a significant change in the table data. 
42
Catalog statistics 
RUNSTATS best practices (continued): 
Use RUNSTATS in the following situations: 
• After loading data and indexes have been created. 
• After creating new index on a table. 
• After running a REORG utility. 
• After significant insert, update or delete activity. 
• Before binding application programs. 
• After executing REDISTRIBUTE DB PARTITION GROUP 
command. 
43
Catalog statistics 
RUNSTATS examples (basic statistics): 
• Collect statistics for table only: 
RUNSTATS ON TABLE <schema>.<tablename> 
• Collect statistics for indexes only on a given table: 
RUNSTATS ON TABLE <schema>.<tablename> FOR INDEXES ALL 
• Collect statistics for both table and its indexes: 
RUNSTATS ON TABLE <schema>.<tablename> AND INDEXES ALL 
44
Catalog statistics 
RUNSTATS examples (enhanced statistics): 
• Collect statistics for table including distribution statistics: 
RUNSTATS ON TABLE <schema>.<tablename> WITH DISTRIBUTION 
• Collect statistics for indexes only on a given table including 
extended index statistics: 
RUNSTATS ON TABLE <schema>.<tablename> FOR DETAILED INDEXES ALL 
• Collect statistics for table and its indexes including extended 
index and distribution statistics: 
RUNSTATS ON TABLE <schema>.<tablename> WITH DISTRIBUTION AND 
DETAILED INDEXES ALL 
45
Catalog statistics 
DB2LOOK utility: 
• Mimic option (-m) extracts catalog statistics for all or given 
tables. 
• Helps recreate optimizer behaviour in another database. 
• DB2LOOK for a specific table: 
db2look –d <dbname> -a –e –m –z <schema> -t <table1 
table2 …> -o <outfile> 
• DB2LOOK for all tables in the database: 
db2look –d <dbname> -f –m –l –a –e –o <outfile> 
• Alternative method: query the catalog tables directly using 
SYSCAT schema views. 
46
Catalog statistics 
Catalog views 
Catalog views Description 
SYSCAT.TABLES 
SYSSTAT.TABLES 
Table statistics 
SYSCAT.COLUMNS 
SYSSTAT.COLUMNS 
Column statistics 
SYSCAT.COLGROUPS 
SYSSTAT.COLGROUPS 
Multicolumn statistics 
SYSCAT.COLGROUPDISTCOUNTS 
SYSSTAT.COLGROUPDISTCOUNTS 
Multicolumn distribution statistics 
• Value of -1 in the columns indicates no statistics collected. 
• SYSCAT views are read-only; SYSSTAT views are updateable. 
47
Catalog statistics 
Exercise: 
• Run exercise5-1.bat to generate DB2LOOK and EXPLAIN output for the 
following query: 
SELECT a.mtran_id, b.dtran_num, c.cust_ln, c.cust_fn, d.branch_location, 
b.dtran_tran_type, substr(e.movie_title,1,25) 
FROM transaction_master a, transaction_detail b, customer c, 
branch d, movie e 
WHERE a.mtran_id = b.dtran_id AND a.mtran_cust_id = c.cust_id AND 
a.mtran_branch_id = d.branch_id AND b.dtran_movie_id = e.movie_id 
ORDER BY 1, 2, 6, 7 
• Review the EXPLAIN output in exercise5-1.exfmt, and check for indicator if 
statistics were collected. 
• Review the DB2LOOK output in exercise5-1.db2look, and examine the 
statistical data returned for the tables used in the query. 
48
Catalog statistics 
Exercise (continued): 
• Run exercise5-2.bat to collect statistics for CUSTOMER table 
and generate the DB2LOOK and EXPLAIN output. 
• Review the output files exercise5-2.db2look and exercise5- 
2.exfmt 
• Observe the type of data collected and how it is reflected in 
the EXPLAIN output. 
49
Catalog statistics 
Exercise (continued): 
• Run exercise5-3.bat to collect index statistics for BRANCH table only 
and generate the DB2LOOK and EXPLAIN output. 
• Review the output files exercise5-3.db2look and exercise5- 
3.exfmt,and observe the type of data collected and how it is 
reflected in the EXPLAIN output. 
• Observe the type of data collected and how it is reflected in the 
EXPLAIN output. 
• Optional: Run exercise5-4.bat to collect RUNSTATS using WITH 
DISTRIBUTION AND DETAILED INDEXES ALL for all tables. Review the 
data. 
50
Predicates and Joins 
Unit objectives: 
• Define the terms predicate, join and sargable. 
• Describe predicate and joins types, and their on query 
execution. 
• Discuss the pros and cons of each join method. 
51
Predicates and Joins 
• A predicate is an element of a search condition that 
expresses or implies a comparison operation. 
• Predicates are included in clauses beginning with WHERE 
or HAVING. 
• Predicates limit or filters the query result set. 
• When designing predicates, aim for the highest possible 
selectivity so that the fewest rows are returned. 
52
Predicates and Joins 
Example: 
SELECT * 
FROM movie 
WHERE movie_genre = `A` 
AND movie_cost_price = 21.99 
AND movie_rent_day > 1 
• The following parts are the predicates: 
– movie_genre = ‘A’ 
– movie_cost_price = 21.99 
– movie_rent_day > 1 
53
Predicates and Joins 
Predicate types: 
• Range-delimiting predicates: limit the scope of index scan 
using start and stop conditions. 
• Index-sargable predicates: evaluated from the index because 
the columns in the predicate are part of the index key. 
• Data-sargable predicates: cannot be evaluated from the index 
but can be evaluated while rows remain in the buffer. 
• Residual predicates: involve subqueries (e.g., ANY, ALL, SOME 
or IN clauses), or those that read LONG VARCHAR or LOB data 
stored in files separate from the table. 
54
Predicates and Joins 
What does “sargable” mean? 
• Contraction of Search ARGument-able. 
• A SARGable predicate is a term that can be used as a 
search argument. 
• Unsargable predicates, or join predicates, require one or 
more joins to be performed before the predicate can be 
evaluated.1 
55
Predicates and Joins 
Range-Delimiting Predicates: 
• Limit the scope of an index scan. 
• Use start and stop key values for the index search. 
• The optimizer uses index data instead of reading the base 
table when it evaluates these predicates, e.g.: 
SELECT movie_title FROM movie WHERE movie_id='D0010' 
– Since movie_id column is defined in the primary key index, the 
optimizer scans the index within the range of the predicate value. 
• See sect6-range delimiting example.exfmt. 
56
Predicates and Joins 
3) IXSCAN: (Index Scan) 
Predicates: 
---------- 
2) Start Key Predicate, 
Comparison Operator: Equal (=) 
Subquery Input Required: No 
Filter Factor: 0.00595238 
Predicate Text: 
-------------- 
(Q1.MOVIE_ID = 'D0010') 
2) Stop Key Predicate, 
Comparison Operator: Equal (=) 
Subquery Input Required: No 
Filter Factor: 0.00595238 
Predicate Text: 
-------------- 
(Q1.MOVIE_ID = 'D0010') 
57
Predicates and Joins 
Index-Sargable Predicates: 
• Cannot limit the scope of a search. 
• Can be evaluated from the index, because the columns that 
are referenced in the predicate are part of the index key, e.g.: 
SELECT movie_title FROM movie 
WHERE movie_id='D0010' AND movie_type > null 
– Index is defined on columns (movie_id, movie_genre, movie_type). 
– Movie_id would be applied as a start-stop key since it is the leading 
column in the index, while movie_type would be applied as index-sargable 
to fulfill the “>” operator. 
• See sect6-index sargable.exfmt. 
58
Predicates and Joins 
3) IXSCAN: (Index Scan) 
Predicates: 
---------- 
2) Sargable Predicate, 
Comparison Operator: Less Than (<) 
Subquery Input Required: No 
Filter Factor: 0.333333 
Predicate Text: 
-------------- 
(NULL < Q1.MOVIE_TYPE) 
3) Start Key Predicate, 
Comparison Operator: Equal (=) 
Subquery Input Required: No 
Filter Factor: 0.00595238 
Predicate Text: 
-------------- 
(Q1.MOVIE_ID = 'D0010') 
59
Predicates and Joins 
Data-Sargable Predicates: 
• Cannot be evaluated from an index. 
• Can be evaluated by accessing individual rows in a table, e.g.: 
SELECT movie_title FROM movie WHERE movie_type = 'A' 
– Since movie_type column is not part of any index, it will be 
treated as data-sargable predicate. 
• See sect6-data-sargable example.exfmt. 
60
Predicates and Joins 
2) TBSCAN: (Table Scan) 
Predicates: 
---------- 
2) Sargable Predicate, 
Comparison Operator: Equal (=) 
Subquery Input Required: No 
Filter Factor: 0.04 
Predicate Text: 
-------------- 
(Q1.MOVIE_TYPE = 'A') 
61
Predicates and Joins 
Characteristic Predicate Type 
Range-delimiting 
Index-sargable 
Data-sargable 
Residual 
Reduce index I/O Yes No No No 
Reduce data page I/O Yes Yes No No 
Reduce the number of 
rows that are passed 
internally 
Yes Yes Yes No 
Reduce the number of 
qualifying rows 
Yes Yes Yes Yes 
62
Predicates and Joins 
• A join is the process of combining two streams of rows 
(result sets). 
• Left side of the join is referred to as the OUTER. 
• Right side of the join is referred to as the INNER. 
• Three explicit join operators exist: 
– Nested loop join (NLJOIN) 
– Merge join (MSJOIN) 
– Hash join (HSJOIN) 
63
Predicates and Joins 
Nested loop join (NLJOIN): 
• Takes the top row from the outer stream and compares it to all 
rows from the inner stream. 
• Performance heavily influenced by the row count in the inner 
stream (i.e., the larger the inner, the slower the performance). 
• Inner table is usually an IXSCAN or FETCH-IXSCAN combination. 
• To obtain better filtering, the NLJOIN usually pushes the join 
predicates down into the inner operator (e.g., IXSCAN). 
• The only join type available at optimization level 0. 
64
Predicates and Joins 
NLJOIN considerations: 
• Poor performance when a large number of rows exists on the inner 
stream. 
• Slow TEMP operation on inner because it is a blocking operation 
and requires subsequent TBSCAN. 
• Check if cardinality estimates match the actual row counts. Even 
though the optimizer estimated one row, it may not be the case at 
execution time. 
• Check if an index-sargable predicate is used. This can be expensive 
when the optimizer scans to match a single row across the entire 
index. 
65
Predicates and Joins 
NLJOIN example: 
66
Predicates and Joins 
Merge join (MSJOIN): 
• Requires inputs sorted on the join column which must be neither 
LONG nor LOB. 
• Only primary join predicate is applied at MSJOIN, and it must be 
equality join predicate. 
• Remaining join predicates applied as RESIDUAL predicates not in 
MSJOIN but at FILTER. 
• MSJOIN scans both tables concurrently (unlike NLJOIN), and it only 
scans each table once, unless the outer has duplicates. 
• Outer table is preferably the one with less duplicates. 
67
Predicates and Joins 
MSJOIN considerations: 
• SORT is a blocking operator and may be expensive. 
• SORT can spill to buffer pools or disk (TEMPSPACE). 
• Applies residual predicates after joining. 
• If outer is indexed on join column, even without a SORT, the index 
scan is typically a full index scan (sargable predicate). 
• Duplicates hurt all joins but with other joins one can take advantage 
of prefetching or having the inner buffered in memory. With 
MSJOIN, the inner is rescanned for every duplicate value in the 
outer row join column. 
68
Predicates and Joins 
MSJOIN example: 
69
Predicates and Joins 
Hash join (HSJOIN): 
• First, the inner table (build table) is scanned and placed into 
memory buffers partitioned by hash code. 
• Next, the outer table (probe table) is scanned row by row through 
the same hash code on the same join column(s). 
– If the hash codes match, then the join columns are compared for their 
rows to be matched. 
– If the row is on the memory buffer, then compare it right away. 
– If not in the memory buffer, then probe row is also written into a 
temporary table on disk. 
• Finally, temporary tables containing rows from the same partitions 
are processed. 
70
Predicates and Joins 
HSJOIN considerations: 
• Requires significant sort heap space which may cause contention 
with other queries. 
• May spill to disk if insufficient sort heap or memory buffers for hash 
table. 
• Requires join columns to be of the same type. 
• Does not guarantee ordering, and if outer table’s ordering is 
needed, not an ideal choice. 
• If intra-partition parallelism is on (INTRA_PARALLEL enabled, 
DFT_DEGREE > 1), the HSJOIN runs in parallel (i.e., build table 
partitioned in parallel and loaded into memory). 
71
Predicates and Joins 
HSJOIN example: 
72
Cardinality Estimates 
Unit objectives: 
• Define the concepts of cardinality, filter factor and cardinality 
estimates. 
• Describe how the optimizer calculates cardinality estimates. 
• Show how different statistics affect cardinality estimate 
calculations. 
73
Cardinality Estimates 
• Cardinality (CARD) 
– Table statistics: total number of rows in a table. 
– Column statistics: the number of unique values in a column. 
• Filter factor (FF) 
– The percentage of rows estimated by the optimizer to be allowed to 
filter through a given predicate. 
– Expressed as probability: 0 – 1.0 (0 to 100%). 
• Cardinality estimate 
– The number of rows estimated by the optimizer to be returned for a 
given predicate based on its filter factor. 
– Calculated as Filter Factor x Cardinality. 
74
Cardinality Estimates 
• Filtering can occur with a local or join predicate 
– Movie_title = ‘RANGO’ (local predicate) 
– a.c1 = b.c2 (join predicate) 
• COLCARD 
– Number of distinct values in the column 
– SYSCAT.COLUMNS 
• VALCOUNT 
– Frequency with which the data value occurs in column 
– SYSCAT.COLDIST 
75
Cardinality Estimates 
Calculating FF for equality predicates (c1=value) 
• No statistics available/collected 
– FF = default value which is typically ¼ 
– FF = 0.04 or 0.25 
• Only basic statistics collected 
– FF = 1/COLCARD 
– COLCARD is for the column of the predicate being calculated 
• With basic and distribution statistics collected 
– FF = VALCOUNT / CARD 
76
Cardinality Estimates 
No statistics collected: 
Query: 
SELECT movie_id 
FROM movie 
WHERE movie_genre=‘A’ 
Statistics: 
– SYSSTAT.TABLE  CARD = -1 
– SYSSTAT.COLUMNS  COLCARD = -1 
– SYSSTAT.COLDIST  no data 
Filter Factor: 0.04 
Explain snippet: sect7-nostat.exmt 
77
Cardinality Estimates 
2) TBSCAN: (Table Scan) 
Predicates: 
---------- 
2) Sargable Predicate, 
Comparison Operator: Equal (=) 
Subquery Input Required: No 
Filter Factor: 0.04 
Predicate Text: 
-------------- 
(Q1.MOVIE_GENRE = 'A') 
78
Cardinality Estimates 
Only basic statistics collected: 
Query: 
SELECT movie_id 
FROM movie 
WHERE movie_genre=‘A’ 
Statistics: 
– RUNSTATS ON TABLE example.movie 
– SYSSTAT.TABLE  CARD = 146 
– SYSSTAT.COLUMNS  COLCARD = 6 (for movie_genre col) 
– SYSSTAT.COLDIST  no data 
Filter Factor: 1/COLCARD = 1/6 = 0.166667 
Explain snippet: sect7-basicstat.exfmt 
79
Cardinality Estimates 
2) TBSCAN: (Table Scan) 
Predicates: 
---------- 
2) Sargable Predicate, 
Comparison Operator: Equal (=) 
Subquery Input Required: No 
Filter Factor: 0.166667 
Predicate Text: 
-------------- 
(Q1.MOVIE_GENRE = 'A') 
80
Cardinality Estimates 
Basic and distribution statistics collected: 
Query: 
SELECT movie_id 
FROM movie 
WHERE movie_genre=‘A’ 
Statistics: 
– RUNSTATS ON TABLE example.movie WITH DISTRIBUTION 
– SYSSTAT.TABLE  CARD = 146 
– SYSSTAT.COLUMNS  COLCARD = 6 (for movie_genre col) 
– SYSSTAT.COLDIST  VALCOUNT = 72 (movie_genre=‘A’) 
Filter Factor: VALCOUNT / CARD = 72/146 = 0.493151 
Explain snippet: sect7-diststat.exfmt 
81
Cardinality Estimates 
2) TBSCAN: (Table Scan) 
Predicates: 
---------- 
2) Sargable Predicate, 
Comparison Operator: Equal (=) 
Subquery Input Required: No 
Filter Factor: 0.493151 
Predicate Text: 
-------------- 
(Q1.MOVIE_GENRE = 'A') 
82
Basic Tuning Hints 
Unit objectives: 
• Introduce basic access plan tuning techniques. 
• Describe the common tuning elements (configuration 
parameters, registry variables and CURRENT QUERY 
OPTIMIZATION special register). 
83
Basic Tuning Hints 
• Tuning is usually done by making changes to one of the 
configuration categories that influence the optimizer 
behaviour: 
– Database manager configuration 
– Database configuration 
– Query compiler registry variables 
– CURRENT QUERY OPTIMIZATION special register 
84
Basic Tuning Hints 
Database manager configuration: 
• Parallelism (INTRA_PARALLEL): indicates whether intra-partition 
parallelism is enabled. When YES, some parts of query execution (e.g., 
index sort) can run in parallel within a single database partition. 
• CPU Speed (CPUSPEED): the optimizer uses the CPU speed (milliseconds 
per instruction) to estimate the cost of performing certain operations. 
• Communications speed (COMM_BANDWIDTH): the optimizer uses the 
value (megabytes per second) in this parameter to estimate the cost of 
performing certain operations between the database partition servers in a 
partitioned database environment. 
85
Basic Tuning Hints 
Database configuration: 
• Buffer pools (BUFFPAGE): 
– Determined by the buffpage parameter, if using buffpage as default for 
one bufferpool, or a calculation based on the contents of syscat. 
bufferpools. 
– The number shown is the total number of bufferpool pages allocated 
for the database. The table below shows an example of how we arrive 
at the total of 5000 pages: 
Bufferpool name Size 
IBMDEAFULTBP 1000 
BPLONG 4000 
BPTEMP 1000 
Total 5000 
86
Basic Tuning Hints 
Database configuration (continued): 
• Sort heap size (SORTHEAP): Defines the maximum number of private 
memory pages to be used for private sorts or the maximum number of 
shared memory pages to be used for shared sorts. 
• Database heap size (DBHEAP): Contains metadata for tables, indexes, 
table spaces and bufferpools. There is one database heap per database. 
• Lock list size (LOCKLIST): Indicates the amount of storage allocated to the 
lock list. There is one lock list per database and it contains the locks held 
by all applications concurrently connected to the database. 
• Maximum lock list (MAXLOCKS): Defines a percentage of the lock list held 
by any one application that must be filled before the database manager 
performs lock escalation. 
87
Basic Tuning Hints 
Database configuration (continued): 
• Average applications (AVG_APPLS): Used by the SQL optimizer to help 
estimate how much bufferpool will be available at run-time for the access 
plan chosen (since the bufferpool is shared by all active connections). 
• Optimization class (DFT_QUERYOPT): Tells the optimizer which level of 
optimization to use when compiling SQL query (default=5). The higher the 
level, the more complex algorithms are considered. 
• Query degree (DFT_DEGREE): Represents the degree of intra-partition 
parallelism for an SQL statement. If set to ANY, the optimizer is sensitive to 
the actual number of CPUs that are online. If set to ANY and intra_parallel 
is enabled, then the number of CPUs on test and production should be 
configured the same. 
88
Basic Tuning Hints 
Database configuration (continued): 
• Number of frequent values retained (NUM_FREQVALUES): Allows you to 
specify the number of "most frequent values" that will be collected when 
the WITH DISTRIBUTION option is specified with the RUNSTATS command. 
• Number of quantiles retained (NUM_QUANTILES): Controls the number 
of quantiles that will be collected when the WITH DISTRIBUTION option is 
specified on the RUNSTATS command. 
89
Basic Tuning Hints 
Query compiler variables (common): 
• DB2_EXTENDED_OPTIMIZATION: Specifies whether or not the query 
optimizer uses optimization extensions to improve query performance. 
• DB2_SORT_AFTER_TQ: Determines how the optimizer works with directed 
table queues in a partitioned database environment when the receiving 
end requires the data to be sorted, and the number of receiving nodes is 
equal to the number of sending nodes. 
• DB2_INLIST_TO_NLJN: Causes the optimizer to favour nested loop joins to 
join the list of values, using the table that contributes the IN list as the 
inner table in the join. 
* See the DB2 V9.7 Info Center for additional variables 
(search for “Query compiler variables”) 
90
Basic Tuning Hints 
CURRENT QUERY OPTIMIZATION special register: 
• Controls the query optimization class for dynamic SQL statements. 
• Overrides the dft_queryopt database configuration parameter. 
• Affects the succeeding SQL statements only for the current session. 
• Assigned by the SET CURRENT QUERY OPTIMIZATION statement. 
>>-SET--CURRENT--QUERY--OPTIMIZATION--+---+-----------------> 
>--+-0-------------+--------------------------------------->< 
+-1-------------+ 
+-2-------------+ 
+-3-------------+ 
+-5-------------+ 
+-7-------------+ 
+-9-------------+ 
'-host-variable-' 
91
Statistical Views 
Unit objectives: 
• Define the purpose and benefits of statistical views. 
• Provide a basic example and steps for creating a 
statistical view. 
92
Statistical Views 
• Views with statistics that can be used to improve 
cardinality estimates for queries in which the view 
definition overlaps with the query definition. 
• A powerful feature as it provides the optimizer with 
accurate statistics for determining cardinality estimates 
for queries with complex sets of (possibly correlated) 
predicates involving one or more tables. 
93
Statistical Views 
• Mechanism to provide the optimizer with more sophisticated 
statistics that represent more complex relationships: 
– Comparisons involving expressions: 
price > MSRP + Dealer_markup 
– Relationships spanning multiple tables: 
product.name = 'Alloy wheels' and product.key = sales.product_key 
– Anything other than predicates involving independent 
attributes and simple comparison operations. 
94
Statistical Views 
• Statistical views are able to represent these types of complex 
relationships, because statistics are collected on the result set 
returned by the view, rather than the base tables reference by 
the view. 
• When a SQL query is compiled, the optimizer matches the SQL 
query to the available statistical views. When the optimizer 
computes cardinality estimates for intermediate result sets, it 
uses the statistics from the view to compute a better 
estimate. 
95
Statistical Views 
Usage: 
• Create a view: 
CREATE VIEW <view name> AS (<select clause>) 
• Enable the view for optimization using ALTER VIEW statement: 
ALTER VIEW <view name> ENABLE QUERY OPTIMIZATION 
• Issue the RUNSTATS command to populate system catalog 
tables with statistics for the view: 
RUNSTATS ON <schema>.<view name> <runstats clause> 
96
Optimizer Profiles 
Unit objectives: 
• Examine the technique for overriding the default access 
plan. 
• Discuss the structure of optimizer profile. 
• Introduce the common optimizer guidelines. 
• Provide instructions for directing the optimizer on how to 
perform table access or joins in specific ways. 
97
Optimizer Profiles 
• Mechanism to alter default access plan 
– Overrides the default access plan selected by the optimizer. 
– Instructs the optimizer how to perform table access or join. 
– Allows users to control specific parts of access plan. 
• Can be employed without changing the application code 
– Compose optimization profile, add to db, rebind targeted packages. 
• Should only be used after all other tuning options exhausted 
– Query improvement, RUNSTATS, indexes, optimization class, db and 
dbm configs, etc. 
– Should not be employed to permanently mitigate the effect of 
inefficient queries. 
98
Optimizer Profiles 
Anatomy of an optimizer profile: 
• XML document 
– Elements and attributes used to define optimization guidelines. 
– Must conform to a specific optimization profile schema. 
• Profile Header (exactly one) 
– Meta data and processing directives. 
– Example: schema version. 
• Global optimization guidelines (at most one) 
– Applies to all statements for which profile is in effect. 
– Example: eligible MQTs guideline defining MQTs to be considered for routing. 
• Statement optimization guidelines (zero or more) 
– Applies to a specific statement for which profile is in effect. 
– Specifies directives for desired execution plan. 
99
Optimizer Profiles 
Anatomy of an optimizer profile (continued): 
<?xml version="1.0" encoding="UTF-8"?> 
<OPTPROFILE VERSION="9.7.0.0"> 
<OPTGUIDELINES> 
<MQT NAME="Test.AvgSales"/> 
<MQT NAME="Test.SumSales"/> 
</OPTGUIDELINES> 
<STMTPROFILE ID="Guidelines for TPCD"> 
<STMTKEY SCHEMA="TPCD"> 
<![CDATA[SELECT * FROM TAB1]]> 
</STMTKEY> 
<OPTGUIDELINES> 
<IXSCAN TABLE=“TAB1" INDEX="I_SUPPKEY"/> 
</OPTGUIDELINES> 
</STMTPROFILE> 
</OPTPROFILE> 
 Profile Header 
 Global optimization 
guidelines section. Optional 
but at most one. 
 Statement guidelines 
section. Zero or more. 
100
Optimizer Profiles 
• Each statement profile contains a statement key (STMTKEY) and one or 
more statement-level optimization guidelines (OPTGUIDELINES). 
• The statement key identifies the statement or query to which the 
statement-level optimization guidelines apply. 
<STMTKEY SCHEMA="TPCD"> 
<![CDATA[SELECT * FROM TAB1]]> 
</STMTKEY> 
• The statement-level optimization guidelines identify one of more 
directives, e.g. access or join requests, which specify methods for 
accessing or joining tables in the statement. 
<OPTGUIDELINES> 
<IXSCAN TABID=“TAB1" INDEX="I_SUPPKEY"/> 
</OPTGUIDELINES> 
101
Optimizer Profiles 
Optimizer guidelines: 
• Access plan guidelines: 
– Base access request (method to access a table, e.g. TBSCAN, IXSCAN) 
– Join request (method and sequence for performing a join, e.g. HSJOIN, NLJOIN) 
• Query rewrite guidelines: 
– INLIST2JOIN (convert IN-list to join) 
– SUBQ2JOIN (convert subquery to join) 
– NOTEX2AJ (convert NOT EXISTS subquery to anti-join) 
– NOTIN2AJ (convert NOT IN subquery to anti-join) 
• General optimization guidelines: 
– REOPT (ONCE/ALWAYS/NONE) 
– MQT choices 
– QRYOPT (set the query optimization class) 
– RTS (limit the amount of time taken by real-time statistics collection) 
102
Optimizer Profiles 
How to create and use an optimizer profile: 
1. Create the SYSTOOLS.OPT_PROFILE table in the SYSTOOLS schema: 
– Using the CREATE TABLE statement: 
CREATE TABLE SYSTOOLS.OPT_PROFILE ( 
SCHEMA VARCHAR(128) NOT NULL,  Schema name 
NAME VARCHAR(128) NOT NULL,  Profile name 
PROFILE BLOB (2M) NOT NULL,  Profile XML 
PRIMARY KEY ( SCHEMA, NAME )); 
– Using the SYSINSTALLOBJECTS procedure: 
db2 "call sysinstallobjects('opt_profiles', 'c', '', '')" 
2. Compose optimization profile in XML file prof1.xml. 
103
Optimizer Profiles 
How to create and use an optimizer profile (continued): 
3. Create file prof1.del as follows: 
"EXAMPLE","PROF1","prof1.xml" 
4. Import prof1.del into SYSTOOLS.OPT_PROFILE table as follows: 
IMPORT FROM prof1.del OF DEL 
MODIFIED BY LOBSINFILE 
INSERT INTO SYSTOOLS.OPT_PROFILE; 
5. Enable the PROF1 profile for the current session: 
db2 "set current optimization profile = 'EXAMPLE.PROF1'" 
104
Optimizer Profiles 
Checking if the guidelines are applied: 
• SQL0437W reason code 13 is issued if there were problems with applying 
the guidelines (usually due to syntax error) 
• Db2exfmt output contains a section with details on the problems 
encountered while attempting to apply the guidelines: 
Extended Diagnostic Information: 
-------------------------------- 
No extended Diagnostic Information for this statement. 
• Db2exfmt Profile Information section informs on which guidelines were 
used: 
Profile Information: 
-------------------- 
OPT_PROF: (Optimization Profile Name) 
EXAMPLE.PROF1 
STMTPROF: (Statement Profile Name) 
Guidelines for the items out query 
105
Optimizer Profiles 
Exercise: 
• Examine the optimize profile in exercise9_1.xml: 
– Global guideline forces OPTLEVEL 7 for all queries in the current 
session. 
– Statement guideline forces nested loop join method (NLJOIN) for 
tables MOVIE and TRANSACTION_DETAIL for a given query. 
• Run exercise9_1.bat – this will create and apply an optimizer 
profile named PROF1. It will also collect db2exfmt output 
before and after using the profile. 
• Compare the before and after .exfmt files – how can you tell if 
the guidelines were applied? 
106
Q&A

Más contenido relacionado

La actualidad más candente

Understanding my database through SQL*Plus using the free tool eDB360
Understanding my database through SQL*Plus using the free tool eDB360Understanding my database through SQL*Plus using the free tool eDB360
Understanding my database through SQL*Plus using the free tool eDB360
Carlos Sierra
 

La actualidad más candente (20)

Db2 Important questions to read
Db2 Important questions to readDb2 Important questions to read
Db2 Important questions to read
 
Solving the DB2 LUW Administration Dilemma
Solving the DB2 LUW Administration DilemmaSolving the DB2 LUW Administration Dilemma
Solving the DB2 LUW Administration Dilemma
 
Db2 tutorial
Db2 tutorialDb2 tutorial
Db2 tutorial
 
Including Constraints -Oracle Data base
Including Constraints -Oracle Data base Including Constraints -Oracle Data base
Including Constraints -Oracle Data base
 
IBM DB2 LUW UDB DBA Training by www.etraining.guru
IBM DB2 LUW UDB DBA Training by www.etraining.guruIBM DB2 LUW UDB DBA Training by www.etraining.guru
IBM DB2 LUW UDB DBA Training by www.etraining.guru
 
Oracle data guard for beginners
Oracle data guard for beginnersOracle data guard for beginners
Oracle data guard for beginners
 
2 db2 instance creation
2 db2 instance creation2 db2 instance creation
2 db2 instance creation
 
Understanding my database through SQL*Plus using the free tool eDB360
Understanding my database through SQL*Plus using the free tool eDB360Understanding my database through SQL*Plus using the free tool eDB360
Understanding my database through SQL*Plus using the free tool eDB360
 
SKILLWISE-DB2 DBA
SKILLWISE-DB2 DBASKILLWISE-DB2 DBA
SKILLWISE-DB2 DBA
 
Partitioning tables and indexing them
Partitioning tables and indexing them Partitioning tables and indexing them
Partitioning tables and indexing them
 
Less16 Recovery
Less16 RecoveryLess16 Recovery
Less16 Recovery
 
DB2 TABLESPACES
DB2 TABLESPACESDB2 TABLESPACES
DB2 TABLESPACES
 
All of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL DeveloperAll of the Performance Tuning Features in Oracle SQL Developer
All of the Performance Tuning Features in Oracle SQL Developer
 
Db2 and storage management (mullins)
Db2 and storage management (mullins)Db2 and storage management (mullins)
Db2 and storage management (mullins)
 
DB2 utilities
DB2 utilitiesDB2 utilities
DB2 utilities
 
An Hour of DB2 Tips
An Hour of DB2 TipsAn Hour of DB2 Tips
An Hour of DB2 Tips
 
Backup and recovery in sql server database
Backup and recovery in sql server databaseBackup and recovery in sql server database
Backup and recovery in sql server database
 
오라클 DB 아키텍처와 튜닝
오라클 DB 아키텍처와 튜닝오라클 DB 아키텍처와 튜닝
오라클 DB 아키텍처와 튜닝
 
DB2 Basic Commands - UDB
DB2 Basic Commands - UDBDB2 Basic Commands - UDB
DB2 Basic Commands - UDB
 
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OSPractical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
Practical Recipes for Daily DBA Activities using DB2 9 and 10 for z/OS
 

Destacado

An Intro to Tuning Your SQL on DB2 for z/OS
An Intro to Tuning Your SQL on DB2 for z/OSAn Intro to Tuning Your SQL on DB2 for z/OS
An Intro to Tuning Your SQL on DB2 for z/OS
Willie Favero
 
CMW Cyber Liability Presentation
CMW Cyber Liability PresentationCMW Cyber Liability Presentation
CMW Cyber Liability Presentation
Sean Graham
 
Dibucaine number
Dibucaine numberDibucaine number
Dibucaine number
Dr Sandeep
 
Antiemeticos..farma
Antiemeticos..farmaAntiemeticos..farma
Antiemeticos..farma
google
 
X ray tube
X ray tubeX ray tube
X ray tube
Rad Tech
 
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
급대출//BU797。СΟΜ//법인신용대출 제3금융기관급대출//BU797。СΟΜ//법인신용대출 제3금융기관
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
hsldfsod
 
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt namPhát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
https://www.facebook.com/garmentspace
 

Destacado (20)

An Intro to Tuning Your SQL on DB2 for z/OS
An Intro to Tuning Your SQL on DB2 for z/OSAn Intro to Tuning Your SQL on DB2 for z/OS
An Intro to Tuning Your SQL on DB2 for z/OS
 
CMW Cyber Liability Presentation
CMW Cyber Liability PresentationCMW Cyber Liability Presentation
CMW Cyber Liability Presentation
 
What Is CustomerCentric Selling®
What Is CustomerCentric Selling®What Is CustomerCentric Selling®
What Is CustomerCentric Selling®
 
CRM assignment
CRM assignmentCRM assignment
CRM assignment
 
Data Management Strategies
Data Management StrategiesData Management Strategies
Data Management Strategies
 
B2B Branding
B2B BrandingB2B Branding
B2B Branding
 
Customer experience architecture
Customer experience architectureCustomer experience architecture
Customer experience architecture
 
Dibucaine number
Dibucaine numberDibucaine number
Dibucaine number
 
Desalter Desalting
Desalter  DesaltingDesalter  Desalting
Desalter Desalting
 
12 Deductive Thinking Puzzles
12 Deductive Thinking Puzzles12 Deductive Thinking Puzzles
12 Deductive Thinking Puzzles
 
Project Requirements, What Are They And How Do You Know You
Project Requirements, What Are They And How Do You Know YouProject Requirements, What Are They And How Do You Know You
Project Requirements, What Are They And How Do You Know You
 
Drug dilution
Drug dilutionDrug dilution
Drug dilution
 
IBM DataPower Gateway - Common Use Cases
IBM DataPower Gateway - Common Use CasesIBM DataPower Gateway - Common Use Cases
IBM DataPower Gateway - Common Use Cases
 
How to Build a DevOps Toolchain
How to Build a DevOps ToolchainHow to Build a DevOps Toolchain
How to Build a DevOps Toolchain
 
Antiemeticos..farma
Antiemeticos..farmaAntiemeticos..farma
Antiemeticos..farma
 
X ray tube
X ray tubeX ray tube
X ray tube
 
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
급대출//BU797。СΟΜ//법인신용대출 제3금융기관급대출//BU797。СΟΜ//법인신용대출 제3금융기관
급대출//BU797。СΟΜ//법인신용대출 제3금융기관
 
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt namPhát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
Phát triển dịch vụ phi tín dụng của các ngân hàng thương mại nhà nước việt nam
 
Accounts Payable (AP) Process Flow
Accounts Payable (AP) Process FlowAccounts Payable (AP) Process Flow
Accounts Payable (AP) Process Flow
 
DENTAL PLASTER
DENTAL PLASTERDENTAL PLASTER
DENTAL PLASTER
 

Similar a Understanding DB2 Optimizer

NOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdfNOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
cookie1969
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
Shehap Elnagar
 

Similar a Understanding DB2 Optimizer (20)

Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
 
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdfNOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
NOCOUG_201311_Fine_Tuning_Execution_Plans.pdf
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
 
T sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powersT sql performance guidelines for better db stress powers
T sql performance guidelines for better db stress powers
 
Presentation interpreting execution plans for sql statements
Presentation    interpreting execution plans for sql statementsPresentation    interpreting execution plans for sql statements
Presentation interpreting execution plans for sql statements
 
Part1 of SQL Tuning Workshop - Understanding the Optimizer
Part1 of SQL Tuning Workshop - Understanding the OptimizerPart1 of SQL Tuning Workshop - Understanding the Optimizer
Part1 of SQL Tuning Workshop - Understanding the Optimizer
 
Informatica course content
Informatica course contentInformatica course content
Informatica course content
 
Informatica course content
Informatica course contentInformatica course content
Informatica course content
 
Informatica course content
Informatica course contentInformatica course content
Informatica course content
 
Informatica course content
Informatica course contentInformatica course content
Informatica course content
 
Informatica Training in Chennai
Informatica Training in ChennaiInformatica Training in Chennai
Informatica Training in Chennai
 
Sql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices ISql and PL/SQL Best Practices I
Sql and PL/SQL Best Practices I
 
05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptx05_DP_300T00A_Optimize.pptx
05_DP_300T00A_Optimize.pptx
 
SAP BO and Teradata best practices
SAP BO and Teradata best practicesSAP BO and Teradata best practices
SAP BO and Teradata best practices
 
SQL Server Query Optimization, Execution and Debugging Query Performance
SQL Server Query Optimization, Execution and Debugging Query PerformanceSQL Server Query Optimization, Execution and Debugging Query Performance
SQL Server Query Optimization, Execution and Debugging Query Performance
 
Oracle Query Optimizer - An Introduction
Oracle Query Optimizer - An IntroductionOracle Query Optimizer - An Introduction
Oracle Query Optimizer - An Introduction
 
Explaining the explain_plan
Explaining the explain_planExplaining the explain_plan
Explaining the explain_plan
 
Access Data from XPages with the Relational Controls
Access Data from XPages with the Relational ControlsAccess Data from XPages with the Relational Controls
Access Data from XPages with the Relational Controls
 
Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019Modernizing your database with SQL Server 2019
Modernizing your database with SQL Server 2019
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Understanding DB2 Optimizer

  • 1. Understanding DB2 LUW Optimizer Anthony Reina Zoran Kulina
  • 2. Speakers Anthony Reina • Accelerated Value Specialist at IBM Toronto Lab. • Over 10 years of experience is various roles in DB2 Support. • Works with IBM customers on resolving defects, developing solutions, and conducting education courses. • Specializes in DB2 engine, particularly recovery and optimizer areas. • Contact: aereina@ca.ibm.com Zoran Kulina • Software engineer and technical lead at DB2 Advanced Support team at IBM Toronto Lab. • Specialties include database internal architecture, SQL performance tuning, deep debugging and advanced problem determination. • Contact: zoran@ca.ibm.com 2
  • 3. Agenda 1. Introduction and high-level overview 2. Phases of the optimizer 3. Explain facility and DB2EXFMT Tool 4. Operators 5. Catalog statistics 6. Predicates and joins 7. Cardinality estimates 8. Basic tuning hints 9. Statistical views 10. Optimizer profiles and guidelines 3
  • 4. Introduction • What is this workshop about? – This half-day workshop provides hands-on introduction to the DB2 LUW optimizer. • Why learn about the optimizer? – Understanding how the optimizer works and knowing how to influence its behaviour can lead to improved query performance and better resource usage. • What are the workshop objectives? – To provide participants with basic knowledge of the optimizer and related concepts, and to serve as a starting point for further study of the optimizer techniques. 4
  • 5. Introduction • What are the prerequisites? – Basic knowledge of SQL and relational database concepts is assumed. • What is the workshop format? – This workshop introduces the optimizer through a series of interactive hands-on exercises. Each unit is first discussed and then accompanied by an exercise to demonstrate the unit concepts. 5
  • 6. Introduction • Sample database Database name: DVDSR Schema name: EXAMPLE Tables: CUSTOMER BRANCH MOVIE TRANSACTION_MASTER TRANSACTION_DETAIL 6
  • 7. High-Level Overview Unit objectives: • Define what the optimizer is and how it works from a high-level perspective. • Introduce the concept of timeron. • Discuss major factors influencing the optimizer. • List various statistics used by the optimizer. 7
  • 8. High-Level Overview Optimizer purpose: • Chooses access plan for data manipulation language (DML) SQL statements. Means of fulfilling the purpose: • Calculates the execution cost of many alternative access plans, and then selects the one with the minimal estimated cost. • Bases calculations on catalog statistics. • Measures cost in timerons: – An abstract unit of measure. – Not directly reciprocal to actual elapsed time. – Rough relative estimate of the resources (cost) required by the database manager to execute an access plan. – Access plan cost is cumulative: each operator cost is added to the next one. 8
  • 9. High-Level Overview Factors influencing the optimizer: • Database design – Indexes – Tablespace configuration • Table and index statistics – Basis of plan costs – Generated and collected by the RUNSTATS command • Operating environment – Influences plan costs – Various configuration parameters • Optimization class (OPTLEVEL) – Controls the level of modeling considered during compilation – Influences SQL compilation time 9
  • 10. High-Level Overview Statistics used by the optimizer: • Basic statistics – Number of rows in a table – Number of pages for a table • Non-uniform distribution statistics – NUM_FREQVALS (equality) – NUM_QUANTILES (range predicates) • Index clustering (DETAILED index statistics) – Improves the estimates of data page fetches • Statistical views (STATVIEWS) – Used table views with predicate condition as defined in the case query • Column group statistics – Statistics collected for more than one column 10
  • 11. Phases of the Optimizer Unit objectives: • Describe the stages of the compilation process. • Explain what happens from the time a query is submitted to point where its result set is returned. 11
  • 12. Phases of the Optimizer 12
  • 13. Phases of the Optimizer • Parse Query: analyzes the query to validate the syntax. • Check Semantics: ensures that there are no inconsistencies among parts of the statement. • Rewrite Query: uses the global semantics that are stored in the Query Graph Model (QGM) to transform the query into a form that can be optimized more easily. The result is stored in the QGM, as well. • Pushdown Analysis (Federated DB only): recommends to the optimizer whether an operation can be remotely evaluated or pushed down at a data source. 13
  • 14. Phases of the Optimizer • Optimize Access Plan: using the QGM as input, it generates many alternative access plans, at the end it select the most cost effective plan for satisfying the query. • Remote SQL Generation (Federated DB only): creates an efficient SQL statement for operations that are performed by each remote data source based on the SQL dialect at that data source. • Generate Executable Code: uses the access plan and the QGM to create an executable access plan or section for the query. 14
  • 15. Phases of the Optimizer 15
  • 16. Explain Facility and DB2EXFMT Tool Unit objectives: • Define what access plan is. • Introduce the tools used to generate the explain snapshot (access plan and related data). • Discuss the components of the db2exfmt report and how to interpret its output. 16
  • 17. Explain Facility and DB2EXFMT Tool Explain purpose: • Captures information about the access plan, optimizer inputs, and environment of an SQL query. • Helps to understand how the SQL query is compiled and executed. • Shows how configuration parameter changes impact query performance. • Indispensible tool for query problem determination. 17
  • 18. Explain Facility and DB2EXFMT Tool Explain information includes: • Sequence of operations to process the query • Cost information • Predicates and selectivity estimates for each predicate • Statistics for all objects referenced in the SQL query at the time the explain information is captured 18
  • 19. Explain Facility and DB2EXFMT Tool Explain tables: • Must be created before the Explain can be invoked: – SYSINTSALLOBJECTS() procedure – EXPLAIN.DDL • Must be created per schema/user or under SYSTOOLS schema • Capture access plan data when the Explain facility is activated 19
  • 20. Explain Facility and DB2EXFMT Tool Explain tables: ADVICE_INDEX ADVICE_INSTANCE ADVICE_MQT ADVICE_PARTITION ADVICE_TABLE ADVICE_WORKLOAD EXPLAIN_ACTUALS EXPLAIN_ARGUMENT EXPLAIN_INSTANCE EXPLAIN_OBJECT EXPLAIN_OPERATOR EXPLAIN_PREDICATE EXPLAIN_STATEMENT EXPLAIN_STREAM 20
  • 21. Explain Facility and DB2EXFMT Tool Explain tools: db2expln • Provides access plan information for one or more SQL query packages • Shows the actual implementation of the chosen access plan • Does not show detailed optimizer information db2exfmt • Formats the contents of the explain tables • Provides more detailed explain information than db2expln • Shows clues as to why the optimizer has made particular decisions 21
  • 22. Explain Facility and DB2EXFMT Tool db2exfmt output sections: 1. Explain instance 2. Database context 3. Package context 4. Original statement 5. Optimized statement 6. Access plan 7. Extended diagnostic information 8. Plan details 9. Objects used in the access plan 22
  • 23. Explain Facility and DB2EXFMT Tool db2exfmt output sections: • Explain instance: general information (e.g., version, time, schema, etc). • Database context: configuration parameters used during the query compilation. • Package context: query type and optimization level. • Original statement: actual query text. • Optimized statement: rewritten query text according to the applied optimization techniques. 23
  • 24. Explain Facility and DB2EXFMT Tool Explain output sections (continued): • Access plan: graphical representation of the query execution; displays the order of operations along with rows returned, cost and I/O for each operator. • Extended diagnostic information: warnings, informational or error messages returned by the optimizer. • Plan details: detailed information for each operator including arguments, predicates, estimates (filter factor), and input/output streams. • Objects used in the access plan: any objects (table/index) used by the query, and/or the access plan is displayed on this are which includes statistical information. 24
  • 25. Explain Facility and DB2EXFMT Tool >>-db2exfmt--+-----+--+-------------+--+-------------+----------> '- -1-' '- -d--dbname-' '- -e--schema-' >--+--------+--+---------------------------+--+-----+-----------> '- -f--O-' | .-----------. | '- -l-' | V | | '- -g--+---+----+-------+-+-' '-x-' +-O-----+ +-I-----+ +-C-----+ '-+-T-+-' '-F-' .- -t-. >--+-----------+--+--------------+--+-------------+--+-----+----> '- -n--name-' '- -o--outfile-' '- -s--schema-' >--+-----------------------+--+----------------+----------------> '- -u--userID--password-' '- -w--timestamp-' >--+--------------+--+--------------+--+-----+----------------->< '- -#--sectnbr-' '- -v--srcvers-' '- -h-' 25
  • 26. Explain Facility and DB2EXFMT Tool SET CURRENT EXPLAIN MODE statement • Changes the value of the CURRENT EXPLAIN MODE special register • Compiler returns SQL0217W for a successful compilation of SQL run under the EXPLAIN mode • Syntax: >>-SET CURRENT EXPLAIN MODE -+---+------------------------------> >--+-NO----------------------+--------------------------------->< +-YES---------------------+ +-EXPLAIN-----------------+ +-REOPT-------------------+ +-RECOMMEND INDEXES-------+ +-EVALUATE INDEXES--------+ +-RECOMMEND PARTITIONINGS-+ +-EVALUATE PARTITIONINGS--+ '-host-variable-----------' 26
  • 27. Explain Facility and DB2EXFMT Tool Exercise: • Objective: create explain tables, determine if explain tables exist, and generate db2exfmt output. • Use exercise3-1.bat script • Create the explain tables: db2 –tvf …sqllibmiscEXPLAIN.DDL • Verify the explain tables were created: list tables for all 27
  • 28. Explain Facility and DB2EXFMT Tool Exercise (continued): • Use exercise3-2.bat script set current explain mode explain select * from customers db2 set current explain mode no db2exfmt –d <dbname> -1 –o exercise3-2.exfmt or db2exfmt –d <dbname> -g TIC –w -1 –n –s % -# 0 –o exercise3-2.exfmt • Review exercise3-2.exfmt. 28
  • 29. Operators Unit objectives: • Describe what an operator is, and how the choice of operators influences query execution. • Discuss the most common operators that the optimizer uses to compose access plan. 29
  • 30. Operators What is an operator? • Action that is performed on data, or the output from a table or an index, when the access plan for an SQL statement is executed. • Unit of query execution. • Specific operation that occurs in the query execution control flow. 30
  • 31. Operators 29  Rows returned HSJOIN  Operator ( 5)  Sequence 30.3298  Cost (timeron) 4  Cost (I/O) /------+------- 168 29 TBSCAN TBSCAN ( 6) ( 7) 22.7451 7.57811 3 1 | | 168 29 TABLE: EXAMPLE TABLE: EXAMPLE MOVIE TRANSACTION_DETAIL Q1 Q4 31
  • 32. Operators • TBSCAN: Retrieves rows directly from data pages. • IXSCAN: Scans an index of a table with optional start/stop conditions, producing an ordered stream of rows. • DELETE: Deletes rows from a table. • FETCH: Fetches columns from a table using a specific record identifier. • FILTER: Filters data by applying one or more predicates to it. • GENROW: Generates a table of rows. 32
  • 33. Operators • GRPBY: Groups rows by common values of designated columns or functions, and evaluates set functions. • HSJOIN: Represents a hash join, where two or more tables are hashed on the join columns. • INSERT: Inserts rows into a table. • IXAND: ANDs together the row identifiers (RIDs) from two or more index scans. • MSJOIN: Represents a merge join, where both outer and inner table must be in joint-predicate order. 33
  • 34. Operators • NLJOIN: Represents a nested loop join that accesses an inner table once for each row of the outer table. • RETURN: Represents the return of data from the query to the user. • RIDSCN: Scans a list of row identifiers (RIDs) obtained from one or more indexes. • SORT: Sorts rows in the order of specified columns, and optionally eliminates duplicate entries. 34
  • 35. Operators • TEMP: Stores data in a temporary table to be read back out. • TQ: Transfers table data between database agents. • UNION: Concatenates streams of rows from multiple tables. • UNIQUE: Eliminates rows with duplicate values, for specified columns. • UPDATE: Updates rows in a table. 35
  • 36. Operators Exercise: • Review the explain output (exercise3-2.exfmt) from the previous exercise and identify operators in the access plan section. • “Explain” the following queries and identify the new operators introduced: set current explain mode explain <SQL query> db2exfmt –db wsdb -1 –o sect6ex[1,2].out 36
  • 37. Operators Exercise (continued): User the following SQL queries for step #2: SELECT cust_branch_id, count(cust_id) FROM customer GROUP BY cust_branch_id SELECT mtran_id, cust_ln, cust_fn, branch_location FROM transaction_master a, customer b, branch c WHERE a.mtran_cust_id = b.cust_id AND a.mtran_branch_id = c.branch_id 37
  • 38. Catalog statistics Unit objectives: • Explain how the optimizer uses statistical information to calculate execution cost of access plan alternatives. • Introduce various methods of gathering and maintaining current statistics on tables and indexes. 38
  • 39. Catalog statistics • The optimizer is heavily influenced by statistical information about the size of tables, indexes and statistical views. • The optimizer also uses statistics about the distribution of data in columns of tables, indexes and statistical views, if these columns are used to select rows or to join tables. • The optimizer uses distribution statistics in its calculations to estimate the costs of alternative access plans for each query. • Statistical information is stored in system catalog tables and is accessible via the SYSCAT schema views. 39
  • 40. Catalog statistics • Statistical information used by the optimizer: – Cluster ratio of indexes (higher is better) – Number of leaf pages in indexes – Number of table rows that overflow their original pages – Number of filled and empty pages in a table (may indicate if table or index reorganization is needed) • Catalog statistics are collected and updated by the RUNSTATS utility. 40
  • 41. Catalog statistics RUNSTATS utility: • Updates statistics about the characteristics of a table and/or associated indexes, or statistical views. • For a table, should be run after frequent updates / inserts / deletes are made to the table or after reorganizing the table. • For a statistical view, RUNSTATS should be run when changes to underlying tables have substantially affected the rows returned by the view. 41
  • 42. Catalog statistics RUNSTATS best practices: • Use RUNSTATS on all tables and indexes frequently accessed by queries. • Use RUNSTATS with FREQUENT and QUANTILE value statistics for complex queries (only collected when using WITH DISTRIBUTION clause). • Rerun RUNSTATS after a significant change in the table data. 42
  • 43. Catalog statistics RUNSTATS best practices (continued): Use RUNSTATS in the following situations: • After loading data and indexes have been created. • After creating new index on a table. • After running a REORG utility. • After significant insert, update or delete activity. • Before binding application programs. • After executing REDISTRIBUTE DB PARTITION GROUP command. 43
  • 44. Catalog statistics RUNSTATS examples (basic statistics): • Collect statistics for table only: RUNSTATS ON TABLE <schema>.<tablename> • Collect statistics for indexes only on a given table: RUNSTATS ON TABLE <schema>.<tablename> FOR INDEXES ALL • Collect statistics for both table and its indexes: RUNSTATS ON TABLE <schema>.<tablename> AND INDEXES ALL 44
  • 45. Catalog statistics RUNSTATS examples (enhanced statistics): • Collect statistics for table including distribution statistics: RUNSTATS ON TABLE <schema>.<tablename> WITH DISTRIBUTION • Collect statistics for indexes only on a given table including extended index statistics: RUNSTATS ON TABLE <schema>.<tablename> FOR DETAILED INDEXES ALL • Collect statistics for table and its indexes including extended index and distribution statistics: RUNSTATS ON TABLE <schema>.<tablename> WITH DISTRIBUTION AND DETAILED INDEXES ALL 45
  • 46. Catalog statistics DB2LOOK utility: • Mimic option (-m) extracts catalog statistics for all or given tables. • Helps recreate optimizer behaviour in another database. • DB2LOOK for a specific table: db2look –d <dbname> -a –e –m –z <schema> -t <table1 table2 …> -o <outfile> • DB2LOOK for all tables in the database: db2look –d <dbname> -f –m –l –a –e –o <outfile> • Alternative method: query the catalog tables directly using SYSCAT schema views. 46
  • 47. Catalog statistics Catalog views Catalog views Description SYSCAT.TABLES SYSSTAT.TABLES Table statistics SYSCAT.COLUMNS SYSSTAT.COLUMNS Column statistics SYSCAT.COLGROUPS SYSSTAT.COLGROUPS Multicolumn statistics SYSCAT.COLGROUPDISTCOUNTS SYSSTAT.COLGROUPDISTCOUNTS Multicolumn distribution statistics • Value of -1 in the columns indicates no statistics collected. • SYSCAT views are read-only; SYSSTAT views are updateable. 47
  • 48. Catalog statistics Exercise: • Run exercise5-1.bat to generate DB2LOOK and EXPLAIN output for the following query: SELECT a.mtran_id, b.dtran_num, c.cust_ln, c.cust_fn, d.branch_location, b.dtran_tran_type, substr(e.movie_title,1,25) FROM transaction_master a, transaction_detail b, customer c, branch d, movie e WHERE a.mtran_id = b.dtran_id AND a.mtran_cust_id = c.cust_id AND a.mtran_branch_id = d.branch_id AND b.dtran_movie_id = e.movie_id ORDER BY 1, 2, 6, 7 • Review the EXPLAIN output in exercise5-1.exfmt, and check for indicator if statistics were collected. • Review the DB2LOOK output in exercise5-1.db2look, and examine the statistical data returned for the tables used in the query. 48
  • 49. Catalog statistics Exercise (continued): • Run exercise5-2.bat to collect statistics for CUSTOMER table and generate the DB2LOOK and EXPLAIN output. • Review the output files exercise5-2.db2look and exercise5- 2.exfmt • Observe the type of data collected and how it is reflected in the EXPLAIN output. 49
  • 50. Catalog statistics Exercise (continued): • Run exercise5-3.bat to collect index statistics for BRANCH table only and generate the DB2LOOK and EXPLAIN output. • Review the output files exercise5-3.db2look and exercise5- 3.exfmt,and observe the type of data collected and how it is reflected in the EXPLAIN output. • Observe the type of data collected and how it is reflected in the EXPLAIN output. • Optional: Run exercise5-4.bat to collect RUNSTATS using WITH DISTRIBUTION AND DETAILED INDEXES ALL for all tables. Review the data. 50
  • 51. Predicates and Joins Unit objectives: • Define the terms predicate, join and sargable. • Describe predicate and joins types, and their on query execution. • Discuss the pros and cons of each join method. 51
  • 52. Predicates and Joins • A predicate is an element of a search condition that expresses or implies a comparison operation. • Predicates are included in clauses beginning with WHERE or HAVING. • Predicates limit or filters the query result set. • When designing predicates, aim for the highest possible selectivity so that the fewest rows are returned. 52
  • 53. Predicates and Joins Example: SELECT * FROM movie WHERE movie_genre = `A` AND movie_cost_price = 21.99 AND movie_rent_day > 1 • The following parts are the predicates: – movie_genre = ‘A’ – movie_cost_price = 21.99 – movie_rent_day > 1 53
  • 54. Predicates and Joins Predicate types: • Range-delimiting predicates: limit the scope of index scan using start and stop conditions. • Index-sargable predicates: evaluated from the index because the columns in the predicate are part of the index key. • Data-sargable predicates: cannot be evaluated from the index but can be evaluated while rows remain in the buffer. • Residual predicates: involve subqueries (e.g., ANY, ALL, SOME or IN clauses), or those that read LONG VARCHAR or LOB data stored in files separate from the table. 54
  • 55. Predicates and Joins What does “sargable” mean? • Contraction of Search ARGument-able. • A SARGable predicate is a term that can be used as a search argument. • Unsargable predicates, or join predicates, require one or more joins to be performed before the predicate can be evaluated.1 55
  • 56. Predicates and Joins Range-Delimiting Predicates: • Limit the scope of an index scan. • Use start and stop key values for the index search. • The optimizer uses index data instead of reading the base table when it evaluates these predicates, e.g.: SELECT movie_title FROM movie WHERE movie_id='D0010' – Since movie_id column is defined in the primary key index, the optimizer scans the index within the range of the predicate value. • See sect6-range delimiting example.exfmt. 56
  • 57. Predicates and Joins 3) IXSCAN: (Index Scan) Predicates: ---------- 2) Start Key Predicate, Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.00595238 Predicate Text: -------------- (Q1.MOVIE_ID = 'D0010') 2) Stop Key Predicate, Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.00595238 Predicate Text: -------------- (Q1.MOVIE_ID = 'D0010') 57
  • 58. Predicates and Joins Index-Sargable Predicates: • Cannot limit the scope of a search. • Can be evaluated from the index, because the columns that are referenced in the predicate are part of the index key, e.g.: SELECT movie_title FROM movie WHERE movie_id='D0010' AND movie_type > null – Index is defined on columns (movie_id, movie_genre, movie_type). – Movie_id would be applied as a start-stop key since it is the leading column in the index, while movie_type would be applied as index-sargable to fulfill the “>” operator. • See sect6-index sargable.exfmt. 58
  • 59. Predicates and Joins 3) IXSCAN: (Index Scan) Predicates: ---------- 2) Sargable Predicate, Comparison Operator: Less Than (<) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- (NULL < Q1.MOVIE_TYPE) 3) Start Key Predicate, Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.00595238 Predicate Text: -------------- (Q1.MOVIE_ID = 'D0010') 59
  • 60. Predicates and Joins Data-Sargable Predicates: • Cannot be evaluated from an index. • Can be evaluated by accessing individual rows in a table, e.g.: SELECT movie_title FROM movie WHERE movie_type = 'A' – Since movie_type column is not part of any index, it will be treated as data-sargable predicate. • See sect6-data-sargable example.exfmt. 60
  • 61. Predicates and Joins 2) TBSCAN: (Table Scan) Predicates: ---------- 2) Sargable Predicate, Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.04 Predicate Text: -------------- (Q1.MOVIE_TYPE = 'A') 61
  • 62. Predicates and Joins Characteristic Predicate Type Range-delimiting Index-sargable Data-sargable Residual Reduce index I/O Yes No No No Reduce data page I/O Yes Yes No No Reduce the number of rows that are passed internally Yes Yes Yes No Reduce the number of qualifying rows Yes Yes Yes Yes 62
  • 63. Predicates and Joins • A join is the process of combining two streams of rows (result sets). • Left side of the join is referred to as the OUTER. • Right side of the join is referred to as the INNER. • Three explicit join operators exist: – Nested loop join (NLJOIN) – Merge join (MSJOIN) – Hash join (HSJOIN) 63
  • 64. Predicates and Joins Nested loop join (NLJOIN): • Takes the top row from the outer stream and compares it to all rows from the inner stream. • Performance heavily influenced by the row count in the inner stream (i.e., the larger the inner, the slower the performance). • Inner table is usually an IXSCAN or FETCH-IXSCAN combination. • To obtain better filtering, the NLJOIN usually pushes the join predicates down into the inner operator (e.g., IXSCAN). • The only join type available at optimization level 0. 64
  • 65. Predicates and Joins NLJOIN considerations: • Poor performance when a large number of rows exists on the inner stream. • Slow TEMP operation on inner because it is a blocking operation and requires subsequent TBSCAN. • Check if cardinality estimates match the actual row counts. Even though the optimizer estimated one row, it may not be the case at execution time. • Check if an index-sargable predicate is used. This can be expensive when the optimizer scans to match a single row across the entire index. 65
  • 66. Predicates and Joins NLJOIN example: 66
  • 67. Predicates and Joins Merge join (MSJOIN): • Requires inputs sorted on the join column which must be neither LONG nor LOB. • Only primary join predicate is applied at MSJOIN, and it must be equality join predicate. • Remaining join predicates applied as RESIDUAL predicates not in MSJOIN but at FILTER. • MSJOIN scans both tables concurrently (unlike NLJOIN), and it only scans each table once, unless the outer has duplicates. • Outer table is preferably the one with less duplicates. 67
  • 68. Predicates and Joins MSJOIN considerations: • SORT is a blocking operator and may be expensive. • SORT can spill to buffer pools or disk (TEMPSPACE). • Applies residual predicates after joining. • If outer is indexed on join column, even without a SORT, the index scan is typically a full index scan (sargable predicate). • Duplicates hurt all joins but with other joins one can take advantage of prefetching or having the inner buffered in memory. With MSJOIN, the inner is rescanned for every duplicate value in the outer row join column. 68
  • 69. Predicates and Joins MSJOIN example: 69
  • 70. Predicates and Joins Hash join (HSJOIN): • First, the inner table (build table) is scanned and placed into memory buffers partitioned by hash code. • Next, the outer table (probe table) is scanned row by row through the same hash code on the same join column(s). – If the hash codes match, then the join columns are compared for their rows to be matched. – If the row is on the memory buffer, then compare it right away. – If not in the memory buffer, then probe row is also written into a temporary table on disk. • Finally, temporary tables containing rows from the same partitions are processed. 70
  • 71. Predicates and Joins HSJOIN considerations: • Requires significant sort heap space which may cause contention with other queries. • May spill to disk if insufficient sort heap or memory buffers for hash table. • Requires join columns to be of the same type. • Does not guarantee ordering, and if outer table’s ordering is needed, not an ideal choice. • If intra-partition parallelism is on (INTRA_PARALLEL enabled, DFT_DEGREE > 1), the HSJOIN runs in parallel (i.e., build table partitioned in parallel and loaded into memory). 71
  • 72. Predicates and Joins HSJOIN example: 72
  • 73. Cardinality Estimates Unit objectives: • Define the concepts of cardinality, filter factor and cardinality estimates. • Describe how the optimizer calculates cardinality estimates. • Show how different statistics affect cardinality estimate calculations. 73
  • 74. Cardinality Estimates • Cardinality (CARD) – Table statistics: total number of rows in a table. – Column statistics: the number of unique values in a column. • Filter factor (FF) – The percentage of rows estimated by the optimizer to be allowed to filter through a given predicate. – Expressed as probability: 0 – 1.0 (0 to 100%). • Cardinality estimate – The number of rows estimated by the optimizer to be returned for a given predicate based on its filter factor. – Calculated as Filter Factor x Cardinality. 74
  • 75. Cardinality Estimates • Filtering can occur with a local or join predicate – Movie_title = ‘RANGO’ (local predicate) – a.c1 = b.c2 (join predicate) • COLCARD – Number of distinct values in the column – SYSCAT.COLUMNS • VALCOUNT – Frequency with which the data value occurs in column – SYSCAT.COLDIST 75
  • 76. Cardinality Estimates Calculating FF for equality predicates (c1=value) • No statistics available/collected – FF = default value which is typically ¼ – FF = 0.04 or 0.25 • Only basic statistics collected – FF = 1/COLCARD – COLCARD is for the column of the predicate being calculated • With basic and distribution statistics collected – FF = VALCOUNT / CARD 76
  • 77. Cardinality Estimates No statistics collected: Query: SELECT movie_id FROM movie WHERE movie_genre=‘A’ Statistics: – SYSSTAT.TABLE  CARD = -1 – SYSSTAT.COLUMNS  COLCARD = -1 – SYSSTAT.COLDIST  no data Filter Factor: 0.04 Explain snippet: sect7-nostat.exmt 77
  • 78. Cardinality Estimates 2) TBSCAN: (Table Scan) Predicates: ---------- 2) Sargable Predicate, Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.04 Predicate Text: -------------- (Q1.MOVIE_GENRE = 'A') 78
  • 79. Cardinality Estimates Only basic statistics collected: Query: SELECT movie_id FROM movie WHERE movie_genre=‘A’ Statistics: – RUNSTATS ON TABLE example.movie – SYSSTAT.TABLE  CARD = 146 – SYSSTAT.COLUMNS  COLCARD = 6 (for movie_genre col) – SYSSTAT.COLDIST  no data Filter Factor: 1/COLCARD = 1/6 = 0.166667 Explain snippet: sect7-basicstat.exfmt 79
  • 80. Cardinality Estimates 2) TBSCAN: (Table Scan) Predicates: ---------- 2) Sargable Predicate, Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.166667 Predicate Text: -------------- (Q1.MOVIE_GENRE = 'A') 80
  • 81. Cardinality Estimates Basic and distribution statistics collected: Query: SELECT movie_id FROM movie WHERE movie_genre=‘A’ Statistics: – RUNSTATS ON TABLE example.movie WITH DISTRIBUTION – SYSSTAT.TABLE  CARD = 146 – SYSSTAT.COLUMNS  COLCARD = 6 (for movie_genre col) – SYSSTAT.COLDIST  VALCOUNT = 72 (movie_genre=‘A’) Filter Factor: VALCOUNT / CARD = 72/146 = 0.493151 Explain snippet: sect7-diststat.exfmt 81
  • 82. Cardinality Estimates 2) TBSCAN: (Table Scan) Predicates: ---------- 2) Sargable Predicate, Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.493151 Predicate Text: -------------- (Q1.MOVIE_GENRE = 'A') 82
  • 83. Basic Tuning Hints Unit objectives: • Introduce basic access plan tuning techniques. • Describe the common tuning elements (configuration parameters, registry variables and CURRENT QUERY OPTIMIZATION special register). 83
  • 84. Basic Tuning Hints • Tuning is usually done by making changes to one of the configuration categories that influence the optimizer behaviour: – Database manager configuration – Database configuration – Query compiler registry variables – CURRENT QUERY OPTIMIZATION special register 84
  • 85. Basic Tuning Hints Database manager configuration: • Parallelism (INTRA_PARALLEL): indicates whether intra-partition parallelism is enabled. When YES, some parts of query execution (e.g., index sort) can run in parallel within a single database partition. • CPU Speed (CPUSPEED): the optimizer uses the CPU speed (milliseconds per instruction) to estimate the cost of performing certain operations. • Communications speed (COMM_BANDWIDTH): the optimizer uses the value (megabytes per second) in this parameter to estimate the cost of performing certain operations between the database partition servers in a partitioned database environment. 85
  • 86. Basic Tuning Hints Database configuration: • Buffer pools (BUFFPAGE): – Determined by the buffpage parameter, if using buffpage as default for one bufferpool, or a calculation based on the contents of syscat. bufferpools. – The number shown is the total number of bufferpool pages allocated for the database. The table below shows an example of how we arrive at the total of 5000 pages: Bufferpool name Size IBMDEAFULTBP 1000 BPLONG 4000 BPTEMP 1000 Total 5000 86
  • 87. Basic Tuning Hints Database configuration (continued): • Sort heap size (SORTHEAP): Defines the maximum number of private memory pages to be used for private sorts or the maximum number of shared memory pages to be used for shared sorts. • Database heap size (DBHEAP): Contains metadata for tables, indexes, table spaces and bufferpools. There is one database heap per database. • Lock list size (LOCKLIST): Indicates the amount of storage allocated to the lock list. There is one lock list per database and it contains the locks held by all applications concurrently connected to the database. • Maximum lock list (MAXLOCKS): Defines a percentage of the lock list held by any one application that must be filled before the database manager performs lock escalation. 87
  • 88. Basic Tuning Hints Database configuration (continued): • Average applications (AVG_APPLS): Used by the SQL optimizer to help estimate how much bufferpool will be available at run-time for the access plan chosen (since the bufferpool is shared by all active connections). • Optimization class (DFT_QUERYOPT): Tells the optimizer which level of optimization to use when compiling SQL query (default=5). The higher the level, the more complex algorithms are considered. • Query degree (DFT_DEGREE): Represents the degree of intra-partition parallelism for an SQL statement. If set to ANY, the optimizer is sensitive to the actual number of CPUs that are online. If set to ANY and intra_parallel is enabled, then the number of CPUs on test and production should be configured the same. 88
  • 89. Basic Tuning Hints Database configuration (continued): • Number of frequent values retained (NUM_FREQVALUES): Allows you to specify the number of "most frequent values" that will be collected when the WITH DISTRIBUTION option is specified with the RUNSTATS command. • Number of quantiles retained (NUM_QUANTILES): Controls the number of quantiles that will be collected when the WITH DISTRIBUTION option is specified on the RUNSTATS command. 89
  • 90. Basic Tuning Hints Query compiler variables (common): • DB2_EXTENDED_OPTIMIZATION: Specifies whether or not the query optimizer uses optimization extensions to improve query performance. • DB2_SORT_AFTER_TQ: Determines how the optimizer works with directed table queues in a partitioned database environment when the receiving end requires the data to be sorted, and the number of receiving nodes is equal to the number of sending nodes. • DB2_INLIST_TO_NLJN: Causes the optimizer to favour nested loop joins to join the list of values, using the table that contributes the IN list as the inner table in the join. * See the DB2 V9.7 Info Center for additional variables (search for “Query compiler variables”) 90
  • 91. Basic Tuning Hints CURRENT QUERY OPTIMIZATION special register: • Controls the query optimization class for dynamic SQL statements. • Overrides the dft_queryopt database configuration parameter. • Affects the succeeding SQL statements only for the current session. • Assigned by the SET CURRENT QUERY OPTIMIZATION statement. >>-SET--CURRENT--QUERY--OPTIMIZATION--+---+-----------------> >--+-0-------------+--------------------------------------->< +-1-------------+ +-2-------------+ +-3-------------+ +-5-------------+ +-7-------------+ +-9-------------+ '-host-variable-' 91
  • 92. Statistical Views Unit objectives: • Define the purpose and benefits of statistical views. • Provide a basic example and steps for creating a statistical view. 92
  • 93. Statistical Views • Views with statistics that can be used to improve cardinality estimates for queries in which the view definition overlaps with the query definition. • A powerful feature as it provides the optimizer with accurate statistics for determining cardinality estimates for queries with complex sets of (possibly correlated) predicates involving one or more tables. 93
  • 94. Statistical Views • Mechanism to provide the optimizer with more sophisticated statistics that represent more complex relationships: – Comparisons involving expressions: price > MSRP + Dealer_markup – Relationships spanning multiple tables: product.name = 'Alloy wheels' and product.key = sales.product_key – Anything other than predicates involving independent attributes and simple comparison operations. 94
  • 95. Statistical Views • Statistical views are able to represent these types of complex relationships, because statistics are collected on the result set returned by the view, rather than the base tables reference by the view. • When a SQL query is compiled, the optimizer matches the SQL query to the available statistical views. When the optimizer computes cardinality estimates for intermediate result sets, it uses the statistics from the view to compute a better estimate. 95
  • 96. Statistical Views Usage: • Create a view: CREATE VIEW <view name> AS (<select clause>) • Enable the view for optimization using ALTER VIEW statement: ALTER VIEW <view name> ENABLE QUERY OPTIMIZATION • Issue the RUNSTATS command to populate system catalog tables with statistics for the view: RUNSTATS ON <schema>.<view name> <runstats clause> 96
  • 97. Optimizer Profiles Unit objectives: • Examine the technique for overriding the default access plan. • Discuss the structure of optimizer profile. • Introduce the common optimizer guidelines. • Provide instructions for directing the optimizer on how to perform table access or joins in specific ways. 97
  • 98. Optimizer Profiles • Mechanism to alter default access plan – Overrides the default access plan selected by the optimizer. – Instructs the optimizer how to perform table access or join. – Allows users to control specific parts of access plan. • Can be employed without changing the application code – Compose optimization profile, add to db, rebind targeted packages. • Should only be used after all other tuning options exhausted – Query improvement, RUNSTATS, indexes, optimization class, db and dbm configs, etc. – Should not be employed to permanently mitigate the effect of inefficient queries. 98
  • 99. Optimizer Profiles Anatomy of an optimizer profile: • XML document – Elements and attributes used to define optimization guidelines. – Must conform to a specific optimization profile schema. • Profile Header (exactly one) – Meta data and processing directives. – Example: schema version. • Global optimization guidelines (at most one) – Applies to all statements for which profile is in effect. – Example: eligible MQTs guideline defining MQTs to be considered for routing. • Statement optimization guidelines (zero or more) – Applies to a specific statement for which profile is in effect. – Specifies directives for desired execution plan. 99
  • 100. Optimizer Profiles Anatomy of an optimizer profile (continued): <?xml version="1.0" encoding="UTF-8"?> <OPTPROFILE VERSION="9.7.0.0"> <OPTGUIDELINES> <MQT NAME="Test.AvgSales"/> <MQT NAME="Test.SumSales"/> </OPTGUIDELINES> <STMTPROFILE ID="Guidelines for TPCD"> <STMTKEY SCHEMA="TPCD"> <![CDATA[SELECT * FROM TAB1]]> </STMTKEY> <OPTGUIDELINES> <IXSCAN TABLE=“TAB1" INDEX="I_SUPPKEY"/> </OPTGUIDELINES> </STMTPROFILE> </OPTPROFILE>  Profile Header  Global optimization guidelines section. Optional but at most one.  Statement guidelines section. Zero or more. 100
  • 101. Optimizer Profiles • Each statement profile contains a statement key (STMTKEY) and one or more statement-level optimization guidelines (OPTGUIDELINES). • The statement key identifies the statement or query to which the statement-level optimization guidelines apply. <STMTKEY SCHEMA="TPCD"> <![CDATA[SELECT * FROM TAB1]]> </STMTKEY> • The statement-level optimization guidelines identify one of more directives, e.g. access or join requests, which specify methods for accessing or joining tables in the statement. <OPTGUIDELINES> <IXSCAN TABID=“TAB1" INDEX="I_SUPPKEY"/> </OPTGUIDELINES> 101
  • 102. Optimizer Profiles Optimizer guidelines: • Access plan guidelines: – Base access request (method to access a table, e.g. TBSCAN, IXSCAN) – Join request (method and sequence for performing a join, e.g. HSJOIN, NLJOIN) • Query rewrite guidelines: – INLIST2JOIN (convert IN-list to join) – SUBQ2JOIN (convert subquery to join) – NOTEX2AJ (convert NOT EXISTS subquery to anti-join) – NOTIN2AJ (convert NOT IN subquery to anti-join) • General optimization guidelines: – REOPT (ONCE/ALWAYS/NONE) – MQT choices – QRYOPT (set the query optimization class) – RTS (limit the amount of time taken by real-time statistics collection) 102
  • 103. Optimizer Profiles How to create and use an optimizer profile: 1. Create the SYSTOOLS.OPT_PROFILE table in the SYSTOOLS schema: – Using the CREATE TABLE statement: CREATE TABLE SYSTOOLS.OPT_PROFILE ( SCHEMA VARCHAR(128) NOT NULL,  Schema name NAME VARCHAR(128) NOT NULL,  Profile name PROFILE BLOB (2M) NOT NULL,  Profile XML PRIMARY KEY ( SCHEMA, NAME )); – Using the SYSINSTALLOBJECTS procedure: db2 "call sysinstallobjects('opt_profiles', 'c', '', '')" 2. Compose optimization profile in XML file prof1.xml. 103
  • 104. Optimizer Profiles How to create and use an optimizer profile (continued): 3. Create file prof1.del as follows: "EXAMPLE","PROF1","prof1.xml" 4. Import prof1.del into SYSTOOLS.OPT_PROFILE table as follows: IMPORT FROM prof1.del OF DEL MODIFIED BY LOBSINFILE INSERT INTO SYSTOOLS.OPT_PROFILE; 5. Enable the PROF1 profile for the current session: db2 "set current optimization profile = 'EXAMPLE.PROF1'" 104
  • 105. Optimizer Profiles Checking if the guidelines are applied: • SQL0437W reason code 13 is issued if there were problems with applying the guidelines (usually due to syntax error) • Db2exfmt output contains a section with details on the problems encountered while attempting to apply the guidelines: Extended Diagnostic Information: -------------------------------- No extended Diagnostic Information for this statement. • Db2exfmt Profile Information section informs on which guidelines were used: Profile Information: -------------------- OPT_PROF: (Optimization Profile Name) EXAMPLE.PROF1 STMTPROF: (Statement Profile Name) Guidelines for the items out query 105
  • 106. Optimizer Profiles Exercise: • Examine the optimize profile in exercise9_1.xml: – Global guideline forces OPTLEVEL 7 for all queries in the current session. – Statement guideline forces nested loop join method (NLJOIN) for tables MOVIE and TRANSACTION_DETAIL for a given query. • Run exercise9_1.bat – this will create and apply an optimizer profile named PROF1. It will also collect db2exfmt output before and after using the profile. • Compare the before and after .exfmt files – how can you tell if the guidelines were applied? 106
  • 107. Q&A

Notas del editor

  1. What are we trying to Explain? We want to explain how our query goes about fulfilling our request. Why is this important? Because there are many ways of going about fulfilling the request. Some are faster or more efficient than the others (in terms or resource usage). This about a GPS device… Its sole purpose is to get you from point A to point B in the fastest possible time. The optimizer is the same. This “way” of fulfilling the request is called an access plan. Namely, an access plan specifies a type and order of operations for accessing the data, and producing a result set. The optimizer produces an access plan whenever you compile a SQL statement. We also refer to access plan also as: explain = db2exfmt = access path What is the purpose of the Explain tools? The explain tools produce visual representation of access plan, and they dump certain information that the access plan is based upon. Explain output lets you view statistics for selected tables, indexes, or columns; properties for operators; global information such as table space and function statistics; and configuration parameters relevant to optimization. It lets you evaluate your query performance tuning options. Perform "what-if" query performance analysis. Once you become familiar with the explain tool, it will become your first choice to evaluate a query performance problem.
  2. Sequence of operations to process the query: How did we get from point A to point B. What operations were done from the time the query was submitted to the time the result set is returned. Cost information: Everything that the access plan does with data along the way has certain cost in terms of CPU cycles, input and output performed on memory or disk, or both. This cost is expressed in timerons. Predicates and selectivity estimates for each predicate: These are the conditions that we used to limit the search scope of our query, along with estimates of how much data the condition will filter through. Statistics for all objects referenced in the SQL query at the time the explain information is captured As we shall see, statistics are critical for the optimizer decision making process. Hence they are contained in the explain output.
  3. The Explain tables capture access plans when the Explain facility is activated. This is where the access plain information goes for the explain purposes. Not to a file or memory.
  4. Tables are listed here for your reference.
  5. Db2expln: Provides basic, relatively compact and English-like overview of access plan and the related data. Unlike db2exfmt, does not make recommendations on what to do to improve performance. Use when you want a quick and dirty explanation of what kind access plan was used (such as table access and join information). Db2exfmt: Provides more detailed information, including the catalog statistics (such as column distribution stats) which are crucial to the optimizer’s decision making process. We will be using db2exfmt as the primary explain tool in this workshop. Recommendation: strive to use it always over db2expln. Since you are going through the trouble of generating explain output, you might as well run db2exfmt for the sake of completeness.
  6. These are the sections of the db2exfmt output. We will briefly discuss each section.
  7. Explain instance: Contains general information about the environment, such as: DB2 version, time, schema name, etc. Database context: List instance and database configuration parameters that were in effect during the query compilation. These parameters are important because they influence the access plan generation. Package context: List query type (static or dynamic ) and optimization level. The optimizer level is important because it determines how hard the optimizer will work to produce an optimal access plan. Original statement: Actual query text. Optimized statement: Rewritten query text according to the applied optimization techniques.
  8. Access plan: This is the big one… May want to have an full HD monitor. Graphical representation of the query execution. Displays the order of operations along with rows returned, cost and I/O for each operation. Execution starts at the bottom, and it goes up. Next, we are onto the advanced sections. Extended diagnostic information: Warnings, informational or error messages returned by the optimizer. Very important section as it may warn you about missing stats. For example, this section may tell you that your query may perform poorly because you have not collected distribution stats on a column B. Plan details: Detailed information for each operator including arguments, predicates, estimates (filter factor), and input/output streams. This section tells us what did the optimizer based its calculations on. Objects used in the access plan: Any objects (table/index) used by the query, and/or the access plan is displayed on this are which includes statistical information.
  9. Syntax diagram for db2exfmt for the participants reference.
  10. The most important options for our purposes: NO Disables the Explain facility. No Explain information is captured. NO is the initial value of the special register. YES Enables the Explain facility and causes Explain information to be inserted into the Explain tables for eligible dynamic SQL SQL query. All dynamic SQL SQL query are compiled and executed normally. EXPLAIN Enables the Explain facility and causes Explain information to be captured for any eligible dynamic SQL query that is prepared. However, dynamic SQL query are not executed.
  11. An operator typically takes one or more inputs, does something with it, and passes the output to the next operator.
  12. An operator typically takes one or two input streams. In this example, we see three operators: two table scans and a join. Two TBSCAN operators each perform a table scan and feed the results into the third operator – the HSJOIN. The third operator combines the two data streams called using the join type called hash join, and it passes the resulting data stream into the next operator, and so on. Remember that the figures (rows returned and cost) are estimates.
  13. TBSCAN and IXSCAN will typically be placed at the bottom of the access plan graph. These are the operations that retrieve data from the sources (tables and indexes). Recall that graphical representation of execution starts at the bottom, and it goes up.
  14. Some of the common operators are the join operators: HSJOIN, MSJOIN and NLJOIN. These operators merge two streams of data. At least one of them will be found in any typical query which queries multiple tables.
  15. SORT is another common operator Sorting is required when no index exists that satisfies the requested ordering, or when sorting would be less expensive than an index scan. Sorting is usually performed as a final operation once the required rows are fetched, or to sort data prior to a join or a group by.
  16. Another common operator, especially in more complex queries: TEMP The action of storing data in a temporary table, to be read back out by another operator (possibly multiple times). This operator is typically required to evaluate subqueries or to store intermediate results.
  17. We have learned what the optimizer is. Now, we are getting into details about how it works.
  18. Why is the optimizer heavily influenced by the statistical information? Because the optimizer uses this information to estimate the costs of alternative access plans for each query and choose the one. As we shall see, there are other factors that influence the optimizer, but the statistics form the basis for access plan calculations. Going back to the GPS example… To compare the alternatives, a GPS device needs to know the length of each road, their speed limits, and current traffic conditions. In DB2 context we refer to these figures as statistics. Statistics describe characteristics of database objects (e.g., tables and indexes). Distribution statistics are used to estimate how many rows will be returned. This is important because the optimizer chooses join type or sort method based on the distribution stats.
  19. Examples of statistical information used by the optimizer: table and index cardinality the distribution of data in specific table columns the cluster ratio of indexes the number of leaf pages in indexes the number of table rows that overflow their original pages the number of filled and empty pages in a table and so forth.
  20. In the nutshell, the RUNSTATS utility collects statistics about tables, indexes and statistical views, and stores them into catalog tables for use by the optimizer. We want to run RUNSTATS after a significant change to data (either volume or cardinality) because the stats get out of date. The main risk of having outdated stats is that the compiler may select a suboptimal plan for a given query. For instance, the optimizer may estimate based on stale statistics, that a query would return 100 rows from table T1. As a result it chooses nested loop join to join the tables. However, in reality the query returns one million rows from T1, for which merge join may have been more suited. As a result, the query that used to run fast now grinds to a halt. Statistical view: special type of view which combines the columns referenced by commonly used queries, and which enabled for use in query optimization.
  21. NUM_FREQVALUES Defines the maximum number of frequency values to collect. NUM_QUANTILES Defines the maximum number of distribution quantile values to collect. You might also want to collect statistics on system catalog tables, if an application queries these tables directly and if there is significant catalog update activity, such as that resulting from the execution of data definition language (DDL) statements.
  22. Because these operations add significant amount new data (such as LOAD and IUD), or they change data layout on disk (REORG, REDISTRIBUTE), or the way data is accessed (INDEX creation).
  23. Distribution statistics make the optimizer aware of data skew. Namely, how many times a given value appears in the column. If distribution stats are so good for the optimizer, why not collect them all the time for every column, and do it five times a week? Because RUNSTATS is not free. It consumes system resources and contends with queries for CPU cycles and buffer pools. It also places locks on database objects, which may reduce concurrency (although RUNSTATS can be throttled to have low priority and yield to applications when there are locking conflicts). Detailed index statistics provide more details about the I/O required to fetch data pages when the table is accessed using a particular index. However, collecting detailed index statistics consumes considerable processing time and memory for large tables. As a result, you should schedule RUNSTATS execution during maintenance windows, typically after REORG.
  24. There are also joins based on a cluster of operators (i.e. starjoin, order NLJOIN) The are joins specific to DPF (i.e. collocated joins, directed joins, broadcast joins, repartitioned joins).
  25. Database manager configuration: These options affect all databases in a given instance. May require instance restart after setting a new parameter value. Database configuration: These options affect only the database for which they are defined. May require database restart after setting a new parameter value. Query compiler registry variables: Affect or more instances on the system. Typically require instance restart. CURRENT QUERY OPTIMIZATION special register: Affect only the current session.
  26. Parallelism INTRA_PARALLEL: Tells DB2 to take advantage of symmetric multiprocessing (SMP) on machines that have two or more CPUs or CPU cores. Not everything will run in parallel. It enables some parts of query execution to run in parallel within a single database partition. For example, index sort will run in parallel if you have this parameter enabled. Index keys will be fetched, then multiple sort agents would be started, one for each CPU/core. When all the sort passes are complete, these sorted subsets are merged into a single sorted set of index keys. Also, HSJOIN runs in parallel (i.e., build table partitioned in parallel and loaded into memory). Why is parallelism important to the optimizer? Because it may favour certain operators over the others when parallelism is on (i.e., HSJOIN over NLJOIN). CPU Speed (CPUSPEED): The optimizer uses the CPU speed to estimate the cost of performing certain operations. This configuration parameter is automatically set to an appropriate value when the database is installed or upgraded. Recommendation: do not adjust this parameter unless you are modelling a production environment on a test system or assessing the impact of a hardware change. Using this parameter to model a different hardware environment allows you to find out the access plans that might be chosen for that environment. To have the database manager recompute the value of this automatic configuration parameter, set it to -1. Communications speed (COMM_BANDWIDTH): The value calculated for the communications bandwidth, in megabytes per second, is used by the optimizer to estimate the cost of performing certain operations between the database partition servers of a partitioned database system. The optimizer does not model the cost of communications between a client and a server, so this parameter should reflect only the nominal bandwidth between the database partition servers, if any. Recommendation: You should only adjust this parameter if you wish to model a production environment on your test system or to assess the impact of upgrading hardware.
  27. Buffer pools (BUFFPAGE): Represents the total size of the buffer pools that you specified when you created or altered them. Why is this parameter important? Because when the optimizer chooses the access plan, it considers the I/O cost of fetching pages from disk to the buffer pool and estimates the number of I/Os required to satisfy a query. The estimate includes a prediction of buffer-pool usage, because additional physical I/Os are not required to read rows in a page that is already in the buffer pool. The I/O costs of reading the tables can have an impact on: How two tables are joined Whether an unclustered index will be used to read the data What this tells is that bigger is better when it comes to the buffer pool size. However, there is a point of diminishing returns, so should be careful not to increase your buffer pools to 75% of your system memory because it may affect other processes.
  28. Sort Heap Size (SORTHEAP): Defines the maximum number of memory pages to be used for sort operations. Why is this important? Because it helps the optimizer determine whether a sort will be performed in memory or be “spilled” to disk (i.e., to a system temporary table) due to running out of sort heap space. A sort that occurs in memory versus a system temporary table always results in better performance, and is used if possible. When choosing an access plan, the optimizer estimates the cost of the sort operations by: Estimating the amount of data to be sorted Looking at the sortheap parameter to determine if there is enough space to read a sort in a single, sequential access. Accurate costing of the sort operation helps the optimizer decide whether to use sort or index scan in certain cases. Database heap size (DBHEAP): There is one database heap per database, and the database manager uses it on behalf of all applications connected to the database. It contains control block information for tables, indexes, table spaces, and buffer pools. It is important to the optimizer because it helps estimate the amount of memory available to certain operations. Lock list size (LOCKLIST): Indicates the amount of storage allocated to the lock list. Maximum lock list (MAXLOCKS): Represents the maximum percentage lock memory that can be held by any one application before lock escalation is performed. Lock escalation, by the way, is the process of replacing a large number of row-level locks with a single table lock. It is done to reduce the amount of memory used to keep track of locks. Why are these parameters important? Because, when the isolation level is repeatable read (RR), the optimizer considers the values of the locklist and maxlocks parameters to determine whether row locks might be escalated to a table lock. If the optimizer estimates that lock escalation will occur for a table access, then it chooses a table lock for the access plan. This is done to prevent incurring the overhead of lock escalation during the query execution. Downside: while we avoid the overhead of escalation, table access will lock the entire table, thereby increasing the contention and possibly reducing database access concurrency. Recommendation: do some benchmarking to determine the minimum value for each parameter necessary to completely eliminate lock escalation.
  29. Average applications (AVG_APPLS): The optimizer uses the avg_appls parameter to help estimate how much of the buffer pool might be available at run-time for the access plan chosen. Higher values for this parameter can influence the optimizer to choose access plans that are more conservative in buffer pool usage. If you specify a value of 1, the optimizer considers that the entire buffer pool will be available to the application. Optimization Level (DFT_QUERYOPT): Tells the optimizer which level of optimization to use by default when compiling queries (the default is 3). The higher the level, the more complex algorithms are considered. This means the optimizer will try more access plan combinations before choosing the best one. In general, a higher degree results in longer compile time. Query Degree (DFT_DEGREE): Specifies the degree of parallelism to be used by default within the single database partition . A value of one (1) means no intra-partition parallelism. A value of minus one (-1) or ANY means the optimizer determines the degree of intra-partition parallelism based on the number of processors and the type of query. Note: Intra-parallel processing does not occur unless you enable it by setting the intra_parallel database manager configuration parameter.
  30. Number of frequent values retained (NUM_FREQVALUES): Allows you to specify the number of "most frequent values" that will be collected when the WITH DISTRIBUTION option is specified with the RUNSTATS command. The "most frequent value" statistics help the optimizer understand the distribution of data values within a column. A higher value results in more information being available to the query optimizer but requires additional catalog space and incurs higher heap space usage during RUNSTATS. When 0 is specified, no frequent-value statistics are retained, even if you request that distribution statistics be collected. Number of quantiles retained (NUM_QUANTILES): Controls the number of quantiles that will be collected when the WITH DISTRIBUTION option is specified on the RUNSTATS command. Each quantile represents to the subset of a distribution data containing one fifth (20%) of the total sample. The "quantile" statistics help the optimizer understand the distribution of data values within a column. A higher value results in more information being available to the query optimizer but requires additional catalog space. When 0 or 1 is specified, no quantile statistics are retained, even if you request that distribution statistics be collected.
  31. DB2_EXTENDED_OPTIMIZATION: This variable specifies whether or not the query optimizer uses optimization extensions to improve query performance. It affects one of more instances on the system, along with all the databases within them. Use this if you wish to try specific optimization algorithms on your query (refer to the documentation for the description of algorithms). These optimization extensions may increase compilation time, and may not improve query performance in all situations. Testing should be done to determine individual query performance improvements. DB2_SORT_AFTER_TQ: Specifies how the optimizer works with directed table queues in a partitioned database environment when the receiving end requires the data to be sorted and the number of receiving nodes is equal to the number of sending nodes. When DB2_SORT_AFTER_TQ=NO, the optimizer tends to sort at the sending end and merge the rows at the receiving end. When DB2_SORT_AFTER_TQ=YES, the optimizer tends to transmit the rows unsorted, not merge at the receiving end, and sort the rows at the receiving end after receiving all the rows. DB2_INLIST_TO_NLJN: Sometimes the optimizer does not have accurate information to determine the best join method for the rewritten version of the query. This can occur if the IN list contains parameter markers or host variables which prevent the optimizer from using catalog statistics to determine the selectivity. This registry variable causes the optimizer to favor nested loop joins to join the list of values, using the table that contributes the IN list as the inner table in the join.
  32. Use this statement when you want to test or apply a certain query optimization class to one or more queries in the current session, without affecting other database users. Default optimization class will be used for all other sessions.
  33. What is optimization profile? Optimization profiles allow certain optimization guidelines to be provided to the optimizer without application or database configuration changes. Advanced method of query performance tuning. Why would we want to override the optimizer? After all, isn’t it smarter that we are in calculating hundreds of combinations in a split second and figuring out which one forms the best access plan? Yes and no… Going back to our GPS example: Imagine wanting to get from one end of Toronto to the other. There are countless possibilities. The shortest route is not always the fastest. Also, what is normally the fastest route may not be available to you that day due to road closure. In this case, you would want to override the GPS device and tell it to take a particular detour. The same logic applies to the optimizer. It may have the computational ability, but unlike a human brain it does not have a learning capability to make heuristic decisions on the fly. When you would want to use the optimizer profiles: You know your statistics are out of date, can’t afford to collect then now, but you want to tell the optimizer to use a particular index even though stats are telling us otherwise. You want to test various join combinations or table access methods (e.g., MQT, index) for your query. You want to apply different optimization techniques to a handful of queries without affecting all the other queries. You have exhausted all other tuning options.
  34. XML document Why XML? XML is handy for describing data, so it was a logical choice for implementing optimizer profiles. It must conform to a certain XML schema, otherwise the optimizer returns a warning at compile time and it ignores the profile. Profile Header: Typically specifies schema version. Does not change a lot between the released, but you should always use the releases that corresponds to your DB2 level (e.g., V9.7.0.0). Global optimization guidelines: This section contains optimization guidelines that are to be used for all statements for which the profile is currently in effect. Statement optimization guidelines This section contains optimization guidelines that are to be used for a particular SQL statement only, e.g.: “For app1.0, only consider routing to MQTs: Newt.AvgSales and Newt.SumSales” “Use index ISUPPKEY to access SUPPLIERS in the subquery of query 9”
  35. Optimization guidelines do not need to be comprehensive, but they should be targeted to a desired set of queries and a particular execution plan. The DB2 optimizer still considers other possible access plans using the existing cost-based methods. In this example: In the global guidelines section we are telling the optimizer to first consider MQT tables for all queries against the base tables Test.AvgSales and Test.SumSales for which the profile is currently in effect. In the statement guidelines section we are telling the optimizer to access table TAB1 through the I_SUPPKEY index, but only for the query specified in the STMTKEY tag.
  36. More details on the statement section as it is here where you would place most of our guidelines. Each statement profile contains a statement key (STMTKEY) and one or more statement-level optimization guidelines (OPTGUIDELINES). The original statement and statement in the profile must be exactly the same, case sensitive. Otherwise, the optimizer ignores the profile. You may use the EXPLAIN statement to verify whether the optimizer applied the guideline (more on this later).
  37. These are some examples of guidelines. There are no specific recommendations as to when you should use these guidelines. It depends on case-by-case basis. It is often lent to a trial and error approach. You may refer to the DB2 documentation for a more detailed description of query rewrite guidelines. http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.perf.doc/doc/c0024581.html
  38. First, you compose a simple XML document. Then, you insert it into the database. Finally, you specify the name of the optimization profile when you decide to use it. Limitation: The original statement and the statement in the profile must be exactly the same (case sensitive). Review the file prof1.xml
  39. The optimizer ignores invalid or inapplicable optimization guidelines. If any optimization guidelines are ignored, an execution plan is created and SQL0437W with reason code 13 is returned. You can then use the EXPLAIN statement to get detailed diagnostic information regarding optimization guidelines processing. The Profile Information section contains the names of profile and guidelines in effect, as well as any warnings (e.g., SQL0437W)