Ugif 04 2011 france iiug 4 april - paris informix at ibm update ss
Ugif 10 2012 ppt0000002
1. Update Statistics
Olivier Bourdin
olivier.bourdin@fr.ibm.com
Mercredi 3 Octobre 2012
User Group Informix France
2. Overview
Brief Review and History
What’s changed?
– 11.10, 11.50
– 11.70 – “Smart Statistics”
11.70 FAQ’s
– Do I need to do anything different?
– Did the update statistics update any stats?
– Update statistics and reoptimization
User Group Informix France
3. Why is statistics important?
Choosing the right QUERY PATH determines how fast you get
your results.
Choosing the Wrong Path can be like going around the world to
get to your neighbor’s.
• Expensive to go around the world.
• Takes too long.
User Group Informix France
4. Query Optimization Process
Examine all tables (table A, table B, table C)
– Examine selectivity of every filter (where clauses)
– Determine if indexes can be used for filters, order by, group by
– Find the best way to scan a table -- sequentially or by an index
Identify Join Pairs (AB, AC, BA, BC, CA, CB)
– Find best join method (nested loop, hash, or sort merge)
– Decide which indexes are best for the join
– Calculate the cost of the join
Repeat for each additional table (ABC, ACB,
BAC, ...)
User Group Informix France
5. Estimating costs: need data !
Find the cheapest/lowest cost path.
– Cost = I/O cost + Weight * (CPU cost)
– I/O -- disk access
– CPU -- Rows processed
Estimate costs
– Filters -- Which indexes to use?
– Joins -- Nested Loop, Hash, or Sort Merge?
– Eliminate redundant pairs?
User Group Informix France
6. Filter selectivity
Selectivity is the percentage of rows selected as a result of
a filter (number between 0 and 1)
Expression Filter Selectivity
indexed_col = literal F=1/(number of distinct keys in index)
value
indexed_col > literal F = (literal value - 2nd min)/(2nd max-2nd
value min)
NOT expression F = 1 - F(expression)
expr1 AND expr2 F = F(expr1) x F(expr2)
User Group Informix France
7. How do we influence Quey
Optimization ?
OPTCOMPIND
Optimizer directives, Optimization Goals
Update Statistics
– Collect information for the optimizer
– Table nrows, npused; Index Statistics -- LOW
– Data Distributions -- MEDIUM & HIGH
– Compile Stored Procedures
User Group Informix France
8. Where are the stats stored ?
systables (Low)
– nrows, npused
sysindices (Low)
– leaves, levels, nunique, clust
syscolumns (Low)
– colmin, colmax
sysfragments (Low)
– nrows, npused,
– For index partitions, levels, clust Can view with
sysdistrib (Medium or High) dbschema -hd
User Group Informix France
9. View Query Path
Set explain on
– Can be set in session
Explain Directive
– Can be embedded in the query
FOREACH SELECT {+EXPLAIN } order_num INTO p_num
FROM orders
WHERE customer_num = 104 ORDER BY order_num
xtrace Debug
– Support may ask you to turn this on
User Group Informix France
10. Debugging with xtrace
To “see” the statistics information being used for query
optimization
Example:
xtrace heavy -c XTF_OPTMZR -f XTF_DEBUG
xtrace size 10000
xtrace on
Use “xtrace fview” or Use “xtrace info” to
“xtrace view” to view display current xtrace
traces. settings.
“xtrace fview” includes Use “xtrace --” for xtrace
timestamps. usage info.
User Group Informix France
11. Xtrace: example
f1 31310 16 get_distrib(): distrib not found for table c col zipcode Before
f1 7401 16 selec1: op = 46(OP_EQ), defsel = 0.1 sel = 0.0434783 …
…
f2 1207 16 oprowspages(tab = c, nrows = 28, npages = 2)
f2 13217 16 opmix_iscancost(numrows=1.21739,npages=2,pagesread=1.13988)
f2 13225 16 opmix_iscancost(scancost=1.1764,indexcost=1.08, …, iscancost=2.2564)
f1 31310 18 get_distrib(): distrib found for table c col zipcode After Update
f1 7401 18 selec1: op = 46(OP_EQ), defsel = 0.1 sel = 0.0357143 … Statistics
…
f2 1207 18 oprowspages(tab = c, nrows = 28672, npages = 2048)
…
f2 2237 18 dpages = 24576 lpages = 84 nlevels = 2
f2 1871 18 dcost = 33.72 seek 0 keyonly = TRUE
f2 1896 18 iscancost(c, zip_ix) cost = 35.72
f2 13217 18 opmix_iscancost(numrows=1024,npages=2048,pagesread=805.977)
f2 13225 18 opmix_iscancost(scancost=836.697,indexcost=35.72, …, iscancost=872.417)
User Group Informix France
13. sqexplain.out (before)
select c.city, c.state, o.ship_date from customer c, orders o
where c.customer_num = o.customer_num and c.state = ? and
c.zipcode = ?
Estimated Cost: 3
Estimated # of Rows Returned: 1
1) informix.c: INDEX PATH
Filters: informix.c.state = 'AZ'
(1) Index Name: informix.zip_ix
Index Keys: zipcode (Serial, fragments: ALL)
Lower Index Filter: informix.c.zipcode = '85016'
2) informix.o: INDEX PATH
(1) Index Name: informix. 102_4
Index Keys: customer_num (Serial, fragments: ALL)
Lower Index Filter: informix.c.customer_num =
informix.o.customer_num
NESTED LOOP JOIN
User Group Informix France
14. sqexplain.out (after)
select c.city, c.state, o.ship_date from customer c, orders o
where c.customer_num = o.customer_num and c.state = ? and
c.zipcode = ?
Estimated Cost: 19 Customer has 28672 rows.
Estimated # of Rows Returned: 1
Orders has 23 rows.
1) informix.o: SEQUENTIAL SCAN
2) informix.c: INDEX PATH
Filters: (informix.c.zipcode = '85016' AND
informix.c.state = 'AZ' )
(1) Index Name: informix. 100_1
Index Keys: customer_num (Serial, fragments: ALL)
Lower Index Filter: informix.c.customer_num =
informix.o.customer_num
NESTED LOOP JOIN
User Group Informix France
15. Before 11.x
Before 11.x
– Update statistics low,
– Update statistics medium, high
• Resolution, Confidence
Scripts
– Update statistics distributions only
Cron jobs
– Update statistics drop distributions
– Update statistics for table, for procedure
– Lots of guidelines
• What to run update statistics on
• Which update statistics to run
• How to run update statistics
User Group Informix France
16. Guidelines
Update statistics medium distributions only for all columns
that do not have an index
Update statistics high for columns that are the first key in an
index
Update statistics low for all columns in multicolumn indexes
Run with PDQ for better performance (for table ONLY)
Do not run with PDQ for update statistics for procedure
User Group Informix France
17. Issues (before 11.x)
Difficult to know when update statistics was run last
Guidelines weren’t always well-understood
People weren’t sure how to run update statistics
– Accidentally over-wrote statistics by running HIGH first,
then MEDIUM
– Accidentally compiled stored procedures with PDQ
– Ran Update Stats LOW twice (performance issue)
Update statistics LOW for table tab1; What might be
considered
Update statistics HIGH for table tab1 (col1, col2); “missing” here?
User Group Informix France
18. 11.10 Features
11.10 Enhancements
– Create index creates initial stats and distribution
information for the leading column of the index
– Enhance catalog information
• What time was update statistics Low run?
• What time were the distributions created?
• How many rows were sampled for the distributions?
– New “Sampling Size” option
– Update statistics drop distributions ONLY
– Auto Update Statistics Scheduler tasks
User Group Informix France
19. Help with Guidelines
Use scheduler task “Auto Update Statistics
Evaluation”
– Scheduler task can be run “on-demand” using exectask()
Execute function exectask(‘Auto Update Statistics Evaluation’)
Use script in Informix Technote (swg21137764)
– UPDATE STATISTICS commands to allow the optimizer
to work its best
http://www-01.ibm.com/support/docview.wss?uid=swg21137764
Use Art Kagel’s dostats (from IIUG)
User Group Informix France
20. US History
First introduced in 11.10
– Scheduler task “Auto Update Statistics Evaluation”
– Scheduler task “Auto Update Statistics Refresh”
– Uses the guidelines to determine the update statistics
commands to run
Enhancement to work with non-English Locales in
11.50.xC6
User Group Informix France
21. AUS Scheduler Tasks
Runs Update Statistics FOR TABLE commands
UPDATE STATISTICS LOW FOR TABLE stores7:customer
UPDATE STATISTICS HIGH FOR TABLE stores7:customer (
customer_num, zipcode ) RESOLUTION 0.500 DISTRIBUTIONS ONLY
Runs with PDQ set to AUS_PDQ in sysadmin:ph_threshold
> select * from ph_threshold where name = "AUS_PDQ";
id 30
name AUS_PDQ
task_name Auto Update Statistics Refresh
value 10
value_type NUMERIC
description Update statistics executes with this PDQ priority.
User Group Informix France
22. AUS Parameters
AUS_AGE aus_evaluator
The statistics are rebuilt after specified days.
AUS_CHANGE aus_evaluator
The statistics are rebuilt after specified percentage
of data has changed.
AUS_AUTO_RULES aus_evaluator
1 or 0 – if “off”, only evaluates tables that already
have statistics.
AUS_SMALL_TABLES aus_evaluator
Tables containing less than this number of rows will
always have their statistics rebuilt.
AUS_PDQ aus_refresh_stats
Run Update Statistics with this PDQ setting.
User Group Informix France
23. 11.70 Features
Smart Statistics
– Default: AUTO_STAT_MODE 1
– Default: STATCHANGE 10
– Update Statistics command, when run, is not executed
for index statistics and for table distribution if the
STATCHANGE threshold has not been met
Fragment-level Statistics
– Not on by default
– Not discussed in this presentation
User Group Informix France
24. 11.70 Statistics Updated ?
Update Statistics info in database catalog
tables
–Look at ustlowts in systables
• Updated when systables' nrows and npused are updated
– this is done whenever update statistics command is run
– STATCHANGE threshold is not looked at
–Look at ustlowts in sysindices
• Updated when index statistics are rebuilt/updated
–Look at constr_time in sysdistrib
• Updated when distribution statistics are rebuilt/updated
User Group Informix France
25. Example
$ dbaccessdemo7 stores7 –nots
select idxname, levels, leaves, nrows, nupdates, ndeletes, ninserts, ustlowts
from sysindices where tabid = 100 and idxname = “zip_ix” ;
idxname zip_ix Index on customer(zipcode)
levels 1
leaves 1.000000000000
nrows 28.00000000000 UDI counters for this index
nupdates 0.00 at the time of the update
ndeletes 0.00 statistics low run.
ninserts 28.00000000000
ustlowts 2012-04-03 22:54:56.00000
> select * from sysdistrib where tabid = 100; dbaccessdemo7 did not
create table distributions
No rows found. for customer table.
User Group Informix France
26. Example (cont’d)
> load from customer.unl insert into customer;
199863 row(s) loaded.
> select idxname, levels, leaves, nrows, nupdates, ndeletes, ninserts,
> ustlowts from sysindices where tabid = 100 and idxname = “zip_ix”;
idxname zip_ix
levels 1 Index statistics for zip_ix
leaves 1.000000000000 unchanged after 199,863
nrows 28.00000000000 rows inserted into the
nupdates 0.00 customer table.
ndeletes 0.00
ninserts 28.00000000000 -- No update statistics
ustlowts command has been run.
2012-04-03 22:54:56.00000
User Group Informix France
27. Example (cont’d)
> create index state_ix on customer(state);
idxname zip_ix idxname state_ix
levels 1 levels 3
leaves 1.000000000000 leaves 556.0000000000
nrows 28.00000000000 nrows
nupdates 0.00 nupdates 0.00
ndeletes 0.00 ndeletes 0.00
ninserts 28.00000000000 ninserts 0.00
ustlowts 2012-04-03 ustlowts 2012-04-03
22:54:56.00000 23:04:33.00000
After inserting 199,863 rows into the customer table, create index
state_ix on customer(state).
-- No update statistics command has been run.
User Group Informix France
28. Example (cont’d)
> select tabid, colno, mode, smplsize, rowssmpld, constr_time,
> ustnrows, ustbuildduration, nupdates, ndeletes, ninserts
> from sysdistrib where tabid = 100;
tabid 100
colno 8 column state
mode H
smplsize 199891.0000000
rowssmpld 199891.0000000
constr_time 2012-04-03 23:04:33.00000
ustnrows 199891.0000000
ustbuildduration 0:00:00.00000 Distribution
nupdates 0.00 information for
ndeletes 0.00 column state in
ninserts 199891.0000000 customer table
User Group Informix France
29. Example (cont’d)
> select partnum, nupdates, ndeletes, ninserts from sysmaster:sysptnhdr
> where partnum in (select partn from sysfragments
> where fragtype = "I" and indexname in ('state_ix', 'zip_ix'));
partnum nupdates ndeletes ninserts
zip_ix 1049092 0 0 199891
state_ix 1049100 0 0 0
> select partnum, nupdates, ndeletes, ninserts from sysmaster:sysptnhdr
> where partnum = (select partnum from systables where tabid = 100);
partnum nupdates ndeletes ninserts
customer 1049069 0 0 199891
Actual partition page info, showing the UDI counters for the partition, since
the partition was created – this is not the same as the UDI info in the catalogs,
which are updated when statistics are updated.
User Group Informix France
30. OAT view of Statistics
User Group Informix France
31. OAT view (cont’d)
For customer table --
• Index zip_ix has exceeded STATCHANGE.
• Index state_ix has not.
User Group Informix France
32. Example (cont’d)
> update statistics low for table customer;
idxname zip_ix BEFORE idxname zip_ix AFTER
levels 1 levels 3
leaves 1.000000000000 leaves 505.0000000000
nrows 28.00000000000 nrows 199891.0000000
nupdates 0.00 nupdates 0.00
ndeletes 0.00 ndeletes 0.00
ninserts 28.00000000000 ninserts 199891.0000000
ustlowts 2012-04-03 ustlowts 2012-04-04
22:54:56.00000 00:36:53.00000
• Index statistics updated.
zip_ix index • Catalog UDI values updated.
• sysindices ustlowts updated.
User Group Informix France
33. Example (cont’d)
> update statistics low for table customer;
BEFORE AFTER
idxname state_ix idxname state_ix
levels 3 levels 3
leaves 556.0000000000 leaves 556.0000000000
nrows nrows 199891.0000000
nupdates 0.00 nupdates 0.00
ndeletes 0.00 ndeletes 0.00
ninserts 0.00 ninserts 0.00
ustlowts 2012-04-03 ustlowts 2012-04-03
23:04:33.00000 23:04:33.00000
• Index statistics unchanged.
state_ix index • Catalog UDI values unchanged.
• sysindices ustlowts unchanged.
User Group Informix France
34. Example (cont’d)
> select tabname, tabid, nrows, created, ustlowts
> from systables where tabid = 100;
tabname customer
tabid 100
nrows 199891.0000000
created 04/03/2012
ustlowts 2012-04-04 00:36:53.00000
The systables information is always updated when update
statistics for table stats are run, regardless of
STATCHANGE.
User Group Informix France
35. Example
Update Statistics LOW for table tab1;
Update Statistics HIGH for table tab1 (col1, col2);
Before 11.70
– You should put “Distributions Only” in the Update
Statistics HIGH command to avoid collecting index
statistics again
After 11.70
– Doesn’t matter since index statistics will only be
updated if STATCHANGE has been met for the
index
User Group Informix France
36. Sysmaster query for %change
SELECT colname as name, 'Column' as type, constr_time::datetime year to second as build_date,
rowssmpld::bigint as sample, d.ustnrows::bigint as nrows,
case when d.mode = 'M' then 'Medium‘ when d.mode = 'H' then 'High' end as mode,
resolution, confidence, ustbuildduration as build_duration,
(table_counter.udi_counter - d.ninserts - d.nupdates - d.ndeletes) as udi_counter,
CASE WHEN d.ustnrows=0 and
(table_counter.udi_counter - d.ninserts - d.nupdates - d.ndeletes) = 0 THEN 0.00
WHEN d.ustnrows=0 and
(table_counter.udi_counter - d.ninserts - d.nupdates - d.ndeletes) != 0 THEN -1
ELSE ROUND((table_counter.udi_counter - d.ninserts - d.nupdates –
d.ndeletes)/d.ustnrows * 100,2)
END as change
FROM sysdistrib d, syscolumns c,
( select SUM(nupdates + ndeletes + ninserts) as udi_counter from sysmaster:sysptnhdr
where partnum in (select partn from sysfragments where tabid = 100 and fragtype='T'
union select partnum as partn from systables where tabid = 100) ) as table_counter
WHERE d.tabid=100 and c.tabid=100 and d.colno = c.colno and d.seqno = 1
UNION
User Group Informix France
37. Sysmaster query for %change
-- Continuing query started on previous slide
SELECT idxname as name, MIN('Index') as type, MIN(ustlowts)::datetime year to second as
build_date, MIN(0) as sample, SUM(f.nrows)::bigint as nrows, MIN('Low') as mode,
MIN(0) as resolution, MIN(0) as confidence, SUM(i.ustbuildduration) as build_duration,
SUM(NVL(p.ninserts,0) + NVL(p.nupdates,0) + NVL(p.ndeletes,0)) -
SUM(NVL(f.ninserts,0) + NVL(f.nupdates,0) + NVL(f.ndeletes,0)) as udi_counter,
CASE WHEN SUM(f.nrows)=0 and (SUM(NVL(p.ninserts,0) + NVL(p.nupdates,0)
+ NVL(p.ndeletes,0)) - SUM(NVL(f.ninserts,0) + NVL(f.nupdates,0) + NVL(f.ndeletes,0))) = 0
THEN 0.00
WHEN SUM(f.nrows)=0 and (SUM(NVL(p.ninserts,0) + NVL(p.nupdates,0)
+ NVL(p.ndeletes,0)) - SUM(NVL(f.ninserts,0) + NVL(f.nupdates,0) + NVL(f.ndeletes,0))) != 0
THEN -1
ELSE ROUND((SUM(NVL(p.ninserts,0) + NVL(p.nupdates,0) + NVL(p.ndeletes,0))
- SUM(NVL(f.ninserts,0) + NVL(f.nupdates,0) + NVL(f.ndeletes,0)))/SUM(f.nrows) * 100,2)
END as change
FROM sysindices i, sysmaster:sysptnhdr p, sysfragments f
WHERE i.idxname = f.indexname
AND i.tabid = 100 AND i.tabid = f.tabid AND f.partn = p.partnum
GROUP BY i.idxname ORDER BY change DESC
User Group Informix France
38. Table STATCHANGE value
Default STATCHANGE applies if not set for table
Can be set at session level using set
environment
– Set environment statchange ‘5’ ;
Can set STATCHANGE when creating table
Can alter table to set STATCHANGE
– Alter table customer statchange 5;
select tabname, NVL ( statchange, (select cf_effective from
sysmaster:sysconfig where cf_name = ‘STATCHANGE’) ) as statchange
from systables where tabname = "customer";
User Group Informix France
39. FORCE option
Can add “FORCE” to any update statistics
command to ignore STATCHANGE
When you upgrade to 11.70
– Existing partition pages will have UDI counters added
(UDI values are 0)
– Catalog tables sysfragments (for indexes) and
sysdistrib (for table column data distributions) will
have UDI counters added (values are 0)
– What does this mean for Update Statistics?
• FORCE Execute even if NO change
• STATCHANGE 0 Execute if any amount of change (non-
zero)
User Group Informix France
40. FORCE option (cont’d)
Add “FORCE” to end of update statistics
command to get legacy behavior (ignore
STATCHANGE)
FORCE
– Execute even if NO change
– Sets sysdistrib nupdates, ndeletes, ninserts to 0 –
same behavior isn’t seen with sysfragments
nupdates, ndeletes, ninserts
STATCHANGE 0
– Execute if non-zero amount of change
– Set environment STATCHANGE ‘0’
User Group Informix France
41. Stored Procedures
Not affected by STATCHANGE -- Update
statistics FOR PROCEDURE
SQL statements in SPL are optimized
– When SPL is created or on first execution
– When dependent table or indexes are altered
– When statistics of dependent tables change
In 11.70, this means every time update statistics is run to update a
table, systable’s npused, nrows, and ustlowts are updated (even if
index statistics or distribution statistics are not updated due to
STATCHANGE not having been met).
User Group Informix France
42. Update Statistics Low - Summary
Update statistics low performance improvement feature takes effect
when :
• USTLOW_SAMPLE is set to 1
• the index has 100,000 or more leaf pages
• Detached index
USTLOW_SAMPLE
• New ONCONFIG parameter, documented in 11.70.xC4
• Controls use of sampling (new feature) to collect index statistics during
update statistics
• 0 or 1 (on) / Default value is 0 (off)
• Can be updated with onmode -wm/wf
• Can be set at session-level using SET ENVIRONMENT
– Set Environment USTLOW_SAMPLE '0' / '1' / 'on' / 'off'
User Group Informix France
43. Update Statistics Low – Why?
Update Statistics LOW takes too long when gathering statistics for large
indexes
• Entire index is read in sequence
• Each leaf page of an index must be read individually (separate I/O)
• Some customers do not run the command because it does not fit in the
maintenance window
• On a single large table (billions of rows and many indexes), command
can take over 3 days
New Feature Solution: USTLOW_SAMPLE
• Use sampling to reduce time required to gather index statistics
• Many samples are taken, and index statistics is calculated based on
statistics from the samples
User Group Informix France
44. Update Statistics Low - Details
Update statistics low gathers the following index statistics
• number of index levels
• number of index leaf pages
• number of unique values for index lead key
• clustering factor
• 2nd lowest and 2nd highest value for index lead key
Index statistics saved in database catalog
• Sysindices (levels, leaves, nunique, clust)
• Syscolumns (colmin, colmax)
• Sysfragments (levels, clust) for fragtype = “I”
When Update Statistics Med or High is run, index statistics are also
collected, unless “Distributions Only” is used
User Group Informix France
45. Update Statistics Low – Details (cont’d)
Instead of reading the entire index in sequence, the new feature:
• Uses sampling
• Each sample will go from index root page to index leaf page,
reading one or more index leaf pages
• Sampling is “dynamic” -- number of samples is not pre-
determined
• Number of samples is determined by the quality of the samples
– Fewer samples needed if data is evenly distributed
– More samples needed if data distribution is skewed
– Standard deviation among the samples is used as criteria as a
measurement of “quality”
• Time for update statistics is not predictable up-front
User Group Informix France
46. Update Statistics Low - Example
Example based on internal traces
User Group Informix France
47. Update Statistics Low - Example
Example based on internal traces
User Group Informix France
48. Update Statistics Low - Notes
Review of Update statistics feature
– 11.70.xC1 “Smart Statistics” Feature Review
• Default: AUTO_STAT_MODE 1
• Default: STATCHANGE 10
• Update Statistics command, when run, is not executed for index statistics and for
table distribution if the STATCHANGE threshold has not been met
– Update Statistics info in database catalog tables
• Look at ustlowts in systables
– Updated when systables' nrows and npused are updated – this is done
whenever update statistics command is run – STATCHANGE threshold is
not looked at
• Look at ustlowts in sysindices
– Updated when index statistics are rebuilt/updated
• Look at constr_time in sysdistrib
– Updated when distribution statistics are rebuilt/updated
Remember, 11.10 Feature – Statistics are collected when Index is
created
User Group Informix France
49. Catalog for smarter Statistics
systables sysfragments 11.70
statchange nupdates Existing
statlevel ndeletes
ustlowts ninserts
sysindices sysdistrib sysfragdist
nupdates nupdates nupdates
ndeletes ndeletes ndeletes
ninserts ninserts ninserts
ustbuildduration ustbuildduration ustbuildduration
ustlowts constr_time constr_time
User Group Informix France