➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
DB2 for z/OS Bufferpool Tuning win by Divide and Conquer or Lose by Multiply and Surrender
1. #IDUG
DB2 for z/OS Buffer Pool Tuning: Win by divide and
conquer or lose by multiply and surrender
John Campbell
DB2 for z/OS Development
Session Code: 7008
5. #IDUG
5
Page size selection
• 4K page size usually optimize the buffer hit ratio
• But the page needs to be large enough to store the max row size
• DB2 can store at most 255 rows on a page
• A large page size (>8K) provides better sequential performance, but could
aggravate buffer pool hit ratio for random/mixed access
• When row size is large, a large page size helps minimize DASD space
consumption
• On average, each page wastes a half-row of space
• e.g., If you average 10 rows per page, you waste 5% of the space
• Index considerations
• A large page size is necessary for index compression
• A large page size minimizes index splits
• A large page size can reduce the number of index levels
• A large page size may increase the frequency of deferred write I/Os
• With DB2 10, a large page size helps enable inline LOBs, which may help
improve I/O and CPU performance significantly
6. #IDUG
6
LOB table space page size considerations
• A page in a LOB table space contains only one LOB (i.e., one row)
• A small page size always provides the most space efficiency, especially when
LOBs are small
• If a LOB fits in one page, then it only takes one I/O to read the LOB
• Otherwise it takes a minimum of two I/Os and the second I/O will read up to 128KB
• DB2 10: Special tuning considerations for inline LOBs
• See “DB2 for z/OS Best Practices” web site
8. #IDUG
8
Multiple buffer pools
• Multiple buffer pools are recommended
• Ease of monitoring and tuning
• Online monitoring via Omegamon, –DISPLAY BPOOL, DISPLAY GBPOOL
• Post processing via DB2PM, RMF CF Activity post processing
• Useful for access path monitoring
• Potential reduction of DB2 latch contention, CFCC latch contention
• Moderation required as too many buffer pools can
• Fragment your total buffer pool space too much
• Create administrative overhead
• Dynamic tuning
• Full exploitation of buffer pool tuning parameters for customized tuning
• -ALTER BPOOL is synchronous and effective immediately, except for buffer pool
contraction because of waiting for updated pages to be written out
9. #IDUG
9
Multiple buffer pools
• Catalog/Directory is in BP0, BP8K0, BP16K0 and BP32K
• Recommendation to not use these buffer pools for user objects
• Can enlarge buffer pool size for release migration and mass data recovery
• Best Insert performance practice (see Info APAR II14743)
• Recommendation to isolate LOB/XML table spaces out into their own separate
buffer pools away from other objects to avoid read LSN impact during insert
• Recommendation to separate out into different buffer pools: table spaces without
LOB columns vs. UTS table spaces with LOB columns
• Avoid being impacted by read claim tracking for UTS table spaces with LOB
columns because V10 NFM added tracking by read Log Sequence Number(LSN)
for UTS table spaces that have LOB or XML column
• Separate large objects with potentially unstable access path out spaces out into
their own separate buffer pools away from other objects
• Avoid CF performance impact when table space scan is accidentally chosen
• z/OS APAR OA41203 (fair queuing support) to mitigate the CF performance risk
10. #IDUG
10
Example of buffer pool separation for 4K page size pool
• BP0 DB2 Catalog/Directory
• BP1 Table spaces
• BP2 Index
• BP3 In-memory heavy read (data and index)
• VPSEQT=0
• Optionally PGSTEAL(FIFO or NONE)
• BP4 In-memory heavy update (data and index)
• VPSEQT=0
• DWQT=VDWQT=90%
• Optionally PGSTEAL(FIFO or NONE)
• BP5 (optionally) Large table spaces with unstable access path
• BP6 LOB/XML
• BP7 Work files
• VPSEQT 90%
10
12. #IDUG
12
LRU Processing
• DB2 has a mechanism to prevent sequentially accessed data from
monopolising a buffer pool and pushing out useful random pages
R
S
R
R
R
R
S
S
S
S
Oldest sequential buffer
Oldest buffer
• If the length of the Sequential LRU chain (SLRU) exceeds VPSEQT, buffer manager will
steal the oldest sequential buffer
13. #IDUG
13
GetPage and Buffer classification
• Buffers in a BP are classified as either random or sequential
• Classification is done at getpage level
• A buffer that is allocated for prefetch always goes on the Sequential LRU chain
• Prefetched pages are always classified as sequential, but then may be reclassified
by the random getpage and removed from the SLRU chain
• A buffer is never reclassified from random to sequential
14. #IDUG
14
VPSEQT recommendations
• General recommendations
• Use the default value of 80% (or range of 50-80%) for most buffer pools
• Should consider reducing VPSEQT when increasing VPSIZE to reduce sync I/Os
• However, ensure that VPSEQT x VPSIZE is greater than 320 MB in order to
ensure that the SQL/utilities use the best prefetch quantity and format write
quantity
• Set VPSEQT to 99% for the sort only work file buffer pool, and 90% for other
workfile usage
• If mixed, use 90%
• Set VPSEQT to 0% for in-memory buffer pool to avoid the unnecessary overhead of
wasteful scheduling of prefetch engines when data is already in buffer pool
• With DB2 10, use PGSTEAL=NONE
15. #IDUG
15
Lowering VPSEQT with DB2 11
• If the buffer pool tuning goal is to protect random pages avoid synchronous
I/O, as opposed to avoiding prefetch I/O, then ...
• Lower VPSEQT to protect random pages and improve the random Getpage hit ratio
• However, ensure that VPSEQT x VPSIZE is greater than 320 MB in order to ensure
that the utilities use a 512K quantity for prefetch and format write operations
• Doing so minimizes the number of utility I/Os
• But lowering VPSEQT might be a problem if list prefetch I/O is running slow
• Indicated by high Other Read I/O Suspension Time
• In this case caching the data in the buffer pool to avoid I/O is more beneficial
• Use IBM DS8000 storage with the combination of flash storage (HPFE or SSD)
and zHPF to optimize list prefetch I/O
• Use of Easy Tier will automate the storing of "hot" data on flash or SSD, and is very cost effective
16. #IDUG
16
DISPLAY BUFFERPOOL(bpid) DETAIL
• New statistics that were added in DB2 11, and in APAR PM70981 for DB2 10
• SEQUENTIAL
• The length of the SLRU chain
• DB2 statistics IFCID 2 record contains the minimum and maximum of this statistic
during the statistics interval
• Use can use these statistics to track the mixture of random and sequential buffer
pool activity
• VPSEQT HITS
• Number of times that the length of the SLRU changed and became equal to VPSEQT
• RECLASSIFY
• The number of times that a random Getpage found the page was on the SLRU chain
17. #IDUG
17
PGSTEAL - LRU, FIFO and NONE
• LRU (Least Recently Used) Default
• Maintains LRU chain to keep the frequently used page remained long
• FIFO (First In First Out)
• Remove the oldest pages out, recommend to use when you are seeing 100% buffer
pool hit for frequently accessed objects
• Less maintenance for buffer pool chain
• NONE (no PGSTEAL) In-memory buffer pool
• Pre-load table or index at the first access using sequential prefetch
• Assumption is the object fit into the buffer pools
• If not, FIFO is used to manage PGSTEAL
• Less maintenance for buffer pool chain
• MRU (Most Recently Used), internal, not a user option
• Remove newest pages out when DB2 knows that the pages are not re-used
18. #IDUG
18
MRU usage
• DB2 uses MRU for all “format write” Getpages
• These buffers are eligible to be stolen immediately after they are written
• Applicable to LOAD, REBUILD INDEX (load phase), REORG (reload phase) and
RECOVER
• Prior to DB2 9, DB2 did not use MRU for sequential reads
• The COPY utility began using MRU in DB2 9
• DB2 11 adds MRU usage for most of the other utilities:
• UNLOAD, RUNSTATS
• REORG TABLESPACE and INDEX (unload phase)
• REBUILD INDEX (unload phase)
• CHECK INDEX and DATA (unload phase)
20. #IDUG
20
20
Buffer Pool Allocation
• Both buffer pools and associated control blocks are allocated in 64 bit private
storage (thread storage is allocated in 64 bit shared)
• DB2 10 allocates as needed for both virtual and real
• DB2 11 allocates virtual storage immediately according to VPSIZE, but
allocates real storage as needed
• Exception – Buffer pools with PGFIX(YES) and FRAMESIZE 1MB or 2GB
21. #IDUG
21
Long-Term Page Fix for BPs with Frequent I/Os
• It has always been strongly recommended that DB2 buffer pools should be
backed up 100% by real memory to avoid paging to AUX (or Flash Express)
• Given that there is sufficient real memory and no paging, might as well page
fix each buffer pool just once to avoid the repetitive CPU cost of page fix and
free for each and every I/O
• ALTER BPOOL(name) PGFIX(YES|NO)
• Requires the buffer pool to go through reallocation before it becomes
operative
• A DB2 restart is necessary to change PGFIX for BP0, BP8K0, etc
• Observed 0 to 8% reduction in overall IRWW transaction CPU time
• Recommendation to use PGFIX(YES) for buffer pools and to never over-size
your buffer pools
22. #IDUG
22
Buffer I/O Intensity
• Obtain (Total Page I/Os) as SUM of
• Pages read from DASD synchronously (sync read I/O)
• Pages read from DASD asynchronously (List, Dynamic, Sequential prefetch read)
• Pages written to DASD (mostly asynchronously)
• Pages read from GBP synchronously
• SYNCREAD (XI) DATA RETURNED + SYNCREAD(NF) DATA RETURNED
• Pages read from GBP asynchronously
• Pages written to GBP (CHANGED pages written to GBP)
• I/O Intensity = (TOTAL Page I/Os) divided by VPSIZE
22
24. #IDUG
24
1 MB and 2 GB Size Page Frames
• 1 MB size page frames requires z10 or above and use LFAREA defined in
IEASYSxx
• The advantage of 1 MB pages is to improve TLB (Translation Lookaside Buffers)
performance during getpage
• Use 1MB page frames for buffer pools which have high getpage intensity
• DB2 10 attempts to allocate 1MB frames for buffer pools with PGFIX(YES)
• DB2 11 supports FRAMESIZE parameter (4K, 1M, 2G) for flexibility
• 2 GB size page frames requires DB2 11, zEC12 or zBC12, PGFIX(YES)
25. #IDUG
25
PGFIX (YES) and 1MB Page Frames
SIZE #Get Page
#Pages Read
Sync
#Pages
Read
Prefetch
#Pages
Written
Hit
Ratio
I/O
Intensit
y
Getpage
Intensity
PGFI
X 1MB
BP0 3K 138 0 0 0.06 100% 0 5
BP1 524.3K 1496.3K 0.03 0 589 100% 0 285 Y Y
BP2 2097K 160.4K 404 0 402 100% 0 8
BP3 524.3K 93.6K 2101 35300 197 98% 7 18 Y Y
BP4 2097K 40.9K 9873 2530 433 76% 1 2 Y
• PGFIX (YES): recommended for buffer pools with high I/O intensity, such as BP3 and
BP4
• 1MB page frames: recommended for buffer pools with high getpage intensity, such as
BP1 & BP3
• DB2 10: PGFIX YES - preferred frame size is always 1MB, DB2 11 can use FRAMESIZE
• Option 1 - use PGFIX (YES) for BP3 and BP4 but do not configure 1MB LFAREA
• Option 2 - use PGFIX (YES) and 1MB (LFAREA) for BP1, BP3 and BP4
• DB2 11 - use PGFIX(YES) and 1M frames for BP1, BP3, and use PFGIX(YES) and 4K
frames for BP4
26. #IDUG
26
Evaluating buffer pool performance
• Use the random buffer pool hit ratio, better still use page residency time
• Unlike buffer pool hit ratio, page residency time is not affected by
“extraneous” Getpage buffer hits
• e.g., If a thread does two rapid Getpages and page releases of the same page, the
second Getpage is an extraneous buffer hit, because it has no risk of incurring a
sync I/O
• Random page residency time
• Maximum of the following two formulae:
• Ratio of VPSIZE to total # pages read/sec (including prefetched pages)
• Ratio of VPSIZE * (1-VPSEQT/100) to random synch IO/sec
27. #IDUG
27
Large memory
• “Memory is cheap. CPUs are expensive”
• For every I/O that you save, you avoid the software charge for the CPU that it
took to otherwise do that I/O
28. #IDUG
28
CPU Cost Saving by Reducing DB2 Syc I/Os
• Banking (60M account) workload with 2 way data sharing :
• 11 % response and 6 % CPU reduction from 52 GB GBP to 398 GB for both
members with same LBP size (60GB)
• 40% response and 11% CPU reduction from 30GB LBP to 236GB LBP for both
members with same reasonable GBP size (60GB)
28
29. #IDUG
29
Findings Summary
• Sync I/O reduction ties closely with CPU reduction
• This was done in z/OS 2.1, zEC12, DB2 11
• Expect larger saving with z/OS 1.13
• For this particular workload, the higher benefit on investing on local buffer
pools
• Not all the objects become GBP-dependent
• Large TS defined with Member Cluster
• Most of access pattern is fairly random
• GBP has to be big enough to support enough directory entries
• The best performer was with both large LBP/GBP
30. #IDUG
30
IBM Brokerage Workload – Enlarging Local Buffer Pool
• 10->24GB 14% CPU reduction, 3% Throughput improvement (26->13 sync I/O)
• 24 ->70GB 3% CPU reduction,15% Throughput improvement (13->8 sync I/O)
31. #IDUG
31
DB2 Buffer Pool Simulation - Why?
• Larger local buffer pools can potentially reduce CPU usage by reducing sync
I/Os
• The benefit depends on the size of active workload and access pattern
• May not see any benefit if working set size is very small and already fit in the
buffer pools today
• May not see any benefit if working set is too large and increment is not large
enough
• Pages have to be re-referenced – not for one time sequential read
• Try and validate may not work well with customer workload with high
variations
• Existing available tooling requires expensive set of traces and intensive
analysis
32. #IDUG
32
Buffer Pool Simulation
• Simulation provides accurate benefit of increasing buffer pool size from
production environment
• -ALTER BUFFERPOOL command will support
• SPSIZE (simulated pool size)
• SPSEQT (sequential threshold for simulated pool)
• -DISPLAY BPOOL DETAIL and Statistics Trace will include
• # Sync and Async DASD I/Os that could have been avoided
• Sync I/O delay that could have avoided
• Cost of simulation
• CPU cost: approximate 1-2% per buffer pool
• Real storage cost: approximate 2% of simulating pool size for 4K pages (1% for 8K,
so on…)
• For example, to simulate SPSIZE(1000K) 4K pools requires approx. 78MB
additional real storage
• V11 APAR PI22091 for Buffer Pool Simulation is scheduled to go out Sept/Oct
2014
34. #IDUG
34
Deferred Writes
• VDWQT (Vertical Deferred Write Queue Threshold) based on the dataset level
as a % of VPSIZE or number of buffers
• DB2 schedules a write for up to 128 pages, sorts them in sequence, and writes
them out in at least 4 I/Os A page distance of 180 pages is applied to each write
I/O to avoid high page latch contention, since buffers are latched during I/O
• VDWQT hit when 64 updated pages queued for a GBP dependent object
• DWQT (horizontal Deferred Write Queue) at the BP level
DS1 DS2 DS3 DS4 DS5
DWQT
VDWQT
Buffer Pool
VPSIZE
35. #IDUG
35
Deferred Writes …
• Setting VDWQT and DWQT to 90 is good for objects that reside entirely in the
buffer pool and are updated frequently
• Setting VDWQT to 1-2% is good if the probability of re-writing pages is low
• If VDWQT set to 0, DB2 waits for up to 40 changed pages for 4K BP (24 for 8K, 16
for 16K, 12 for 32K) and writes out 32 pages for 4K BP (16 for 8K, 8 for 16K and 4
for 32K)
• In other cases, set VDWQT and DWQT low enough to achieve a “trickle” write
effect in between successive system checkpoints or log switch
• Setting VDWQT and DWQT too low may result in writing the same page out many
times, short deferred write I/Os, increased DBM1 SRB CPU, and LC23
• If you want to set VDWQT in pages, do not specify anything below 128
• Target to hit VDWQT instead of DWQT to increase the number of pages per
I/O and reduce the number of I/Os
36. #IDUG
36
Critical BP Thresholds
Immediate Write Threshold
Checked at page update
>> After update, synchronous write
Data Management Threshold
Checked at page read or update
>> Getpage can be issued for each row
sequentially scanned on the same page –
potential large CPU time increase
Prefetch Threshold
Checked before and during prefetch
90% of buffers not available for steal, or running
out of sequential buffers (VPSEQT with 80%
default)
>> Disables Prefetch (PREF DISABLED – NO
BUFFER)
97.5%
95%
90%
37. #IDUG
37
System Checkpoint interval
• Periodically, the system checkpoint flushes out ‘dirty’ pages out of all of the
buffer pools to DASD, causing a burst of I/Os
• DB2 restart has to read the logs since the start of the prior system checkpoint
interval and apply those log records
• Reducing the system checkpoint interval will help avoid the burstiness of the
writes, and reduce the restart time for DB2, but it may cause more write I/O
38. #IDUG
38
Potential causes of poor write I/O performance
• Problems with remote replication
• Network contention
• A poor DASD control unit implementation can cause reads to queue behind the
writes
• Poor local disk performance
• RAID 5 rank is overloaded
• 10K RPM HDDs
• Poor CF performance
• CF links are overloaded
• Low write buffer residency time
• Caused by VDWQT set too low?
• Insufficient buffer pool size
39. #IDUG
39
Potential for poor castout performance
• When castout performs poorly, the GBP can fill up
• When the GBP fills up, bad things happen
• DB2 start pacing for the commit - slower response time
• After repetitive write failures pages will be put on LPL
• Recovery actions are then necessary
• Problems are often precipitated by batch jobs that generate a surge of
updates pages being written to the GBP
40. #IDUG
40
Overlap GBP reads with castout I/O (New in V11)
Buffer pool
Cast out I/O work
area
GBP
• DB2 10 did not overlap writing to DASD with reading from the GBP
• DB2 11 does overlap
41. #IDUG
41
GBP write-around (New in DB2 11)
Buffer
pool
Castout I/O
work area
GBP
GBPOOLT=30%
Start write-around (50%)
Stop write-around (40%)
Castout to DASD
Commits never
write-around
GBP
43. #IDUG
43
Sequential buffer misses
• Do not use the sequential (or overall) buffer hit ratio, which could legitimately
be negative in the skip-sequential case
• With DB2 11, look at the absolute number of sequential sync I/Os
• A lot of sequential sync I/Os indicates a problem
• Possible causes:
• Prefetch was disabled for one of two reasons
• You ran out of prefetch engines
• The dirty page count exceeded the prefetch threshold of 90% - check for
Prefetch Disabled messages in OMPE
• Prefetch I/O was done, but the prefetched pages were stolen prior to the Getpages
• Detecting that this happened is difficult to do prior to DB2 11
• What should you do if this is happening?
• Reduce the number of prefetch streams
• Increase the number of sequential buffers by increasing VPSEQT and/or VPSIZE
• Schedule the work that is using prefetch to a different time of day
44. #IDUG
44
Optimizing list prefetch
• http://www.redbooks.ibm.com/abstracts/redp4862.html?Open
• Up to 3 times faster with spinning disks
• Up to 8 times faster with solid state disks
Use IBM DS8700 or DS8800 storage wth zHPF, which uses List
Prefetch Optimizer
45. #IDUG
45
Optimizing format writes and preformatting
• zHPF increases the throughput for loading a tablespace with 4K pages
by 50% if VPSEQT x VPSIZE >=320MB
• zHPF doubles the throughput of preformatting, used by inserts
• Preformatting is often asynchronous, but not the first 16 cylinders of a new
z/OS extent
• zHPF also reduces the number of I/Os per cylinder from 15 to 2, thereby
saving CPU and reducing the load on remote networks
Use IBM DS8700 or DS8800 storage wth zHPF in order to optimize the
performance of format writes and preformat writes
46. #IDUG
46
Solid State Disks
• The cost of SSD continues to drop
• SSD is ideal for DB2 list prefetch
• SSD is also ideal for improving synch I/O response time if the DASD cache hit
ratio is not high
47. #IDUG
47
Summary
• General recommendation to use multiple buffer pools
• For dynamic monitoring and tuning
• To avoid performance bottlenecks
• Improve overall performance in an environment with buffer pool related stress
• For a workload with no buffer pool related performance problems e.g., a workload with
negligible I/O's
• Sufficient to provide the minimum of 4 buffer pools, one for each page size
• But will be miss the ability to dynamically monitor buffer pool activities such as Getpage which may
help to
• Expose inefficient access paths
• Identify heavy catalog access due to repeated rebind
• etc.
• When I/O activity is significant, multiple buffer pools designed to match the specific
characteristics of objects assigned can provide a significant improvement in avoiding
performance bottlenecks and contributing to an overall performance improvement
• 32K versus 4K size work file buffer pools
• For small sort record size, 4K size work file buffer pool can cut down the number of bytes read and
written by up to 6 times or more, helping to reduce work file I/O time primarily and related CPU
time secondarily
• As discussed in the presentation, in-memory versus not, sequential versus random, and
read-intensive versus update-intensive can also have a significant positive performance
impact by allowing separate buffer pools to provide a customized performance
improvement
47
49. #IDUG
49
General Buffer Pool Recommendations
• Recommend simplicity in object-to-buffer pool mapping
• Catalog/Directory only (4K, 8K, 16K, 32K)
• Work files only (4K and 32K)
• Set VPSEQT=90, DWQT= 60, VDWQT=50, PGSTEAL=LRU
• General default for tablespaces (4K, 8K, 16K, 32K)
• General default for indexes (4K, 8K, 16K, 32K)
• General default for LOBs and XML (4K, 8K, 16K, 32K)
• Set VDWQT=DWQT=0 to force dribble write of pages to DASD and/or CF
50. #IDUG
50
General Buffer Pool Recommendations …
• Recommend simplicity in object-to-buffer pool mapping …
• Additional ‘specialised' bufferpools - after proper justification
• In-memory buffer pool, supersized to eliminate read I/Os
• Once achieved, can set PGSTEAL=NONE
• Potential for significant CPU savings
• Journals - Heavy inserts/low reference
• Low deferred write thresholds
• Sequential scan
• Large enough to sustain optimised prefetch quantity (VPSIZE*VPSEQT >= 160MB/320MB)
• Very large objects accessed randomly and unlikely to get page re-reference
• Do not waste an excessive number of buffers
• Small LOBs that are heavily re-referenced (e.g. LOBs for MQ shared queues)
51. #IDUG
51
General Buffer Pool Recommendations …
• Good value to use a separate set of isolated buffer pools when introducing a
new application for informational value, but it should not persist
• Not a scalable solution - finite number of buffer pools
• Provides some isolation but fragment real storage
• More complex management
• Clean up any pollution in the existing object-to-bufferpool mapping
• Work files should be in their own isolated set of buffer pools (e.g. BP7 and BP32K7)
away from BP0 and BP32K
• Separate out table spaces and indexes in default general set of bufferpools
• Default bufferpools for tablespaces: BP2, BP8K2, BP16K2, BP32K2
• Default bufferpools for indexes: BP3, BP8K3, BP16K3, BP32K3
• Align ZPARM settings with the buffer pool strategy
• Use Statistics Trace Class 8 to collect dataset level I/O statistics to help with object
classification and identify tuning opportunities
53. #IDUG
John Campbell
DB2 for z/OS Development
campbelj@uk.ibm.com
Session 7008
DB2 for z/OS Buffer Pool Tuning: Win by divide and conquer or lose by multiply and surrender