2. 2
Agenda
• Speaker bio
• About Quest Software
• Accolades & Awards
• Surviving the Data Avalanche
• Disk IO Tuning
• Resources
• Q & A
3. 3
• Started in IT in 1986. BS in MIS in 1989 from
University of Alabama.
• Microsoft SQL Server MVP since 2004
• Author on 7 database books
– 1 on Oracle, 2 on SQL Server, 2 on SQL, 1 on DB design, 1 on DB
Benchmarking
• President of PASS (www.sqlpass.org)
– Conference is next Sept 16-18 in Denver, CO
– Over 130 sessions on SQL Server, BI & Dev
• Blogs for SQLMag and SQLBlog
• Monthly columns in SQL Server Magazine and
Database Trends & Applications
• Worked for NASA, US Army, and Deloitte & Touche
before Quest Software.
Speaker Bio – Kevin Kline
4. 4
Accolades & Awards
• TechTarget 2006 Product of the Year with LiteSpeed
• Best of Tech Ed award 2006 with Spotlight on SQL Server
• SQL Server Magazine, Platinum reader’s choice award
• SQL Server Magazine Readers Choice Awards, winner in 11
categories, 2005
• No. 1 vendor in Distributed Data Management Facilities, IDC,
2005
• Microsoft ISV Partner of the Year, 2004
• Microsoft TechEd Best of Show Winner, 2004
• Microsoft TechEd Europe Best of Show Winner, 2005
• No. 1 vendor in Application Management Software, Gartner
Dataquest, 2005
• Jolt Productivity Award Winner
• Network Computing Editor’s Choice Winner
• No. 8 in the “Who’s Who in Enterprise Software,” Investor’s
Business Daily
5. 5
What is a VLDB?
• Used to be a set size:
– >100mb in the 1980’s
– >100gb in the early 1990’s
– >1tb in the early 2000’s
– >10tb in the mid 2000’s
• Now, a more flexible definition for VLDB prevails:
– It’s a VLDB when you can no longer use “common and standard”
techniques for management and performance
6. 6
How big are SQL Server databases today?
• Hundreds of multi-terabyte databases on SQL Server
are known. At least an equal number are unknown.
• An example Microsoft customer has a 270tb data
warehouse
– Growing 3tb per day, with 1tb of deletions per day
– Definitely a VLDB
• An example EMC customer in the DOE has a 200tb
database (with lots of blob data)
– Possibly not a VLDB
7. 7
Why do apps generate more data today?
• Compliance and Auditing Requirements
– Especially for telecom, finance, and health industries
– Requirements to retain data for 7-10 years
– SOX
– Internal processes in industries like insurance
• Real Time Business Intelligence
– Old ETL/Mainframe-derived data is losing ground
– Live streaming data is producing more accurate, more timely
decision-making
• Data is both more granular and more records (e.g. drive-through
fast-food vendors in the restaurant industry)
• More sophisticated business processes
• Longer-term data is still kept on-line for better decision making
(e.g. the leap-year birthday phenomenon in the retail industry)
8. 8
Why else is there more data today?
• Backup retention is a HUGE driver
• Recovery from tapes is losing ground to:
– High-availability alternatives (e.g. hot standby’s, clustering,
replicated data) are often easier than tapes:
• Old versions of application aren’t around
• Platform software (OS, database, drivers) aren’t the right
versions
• Tape drives or other important hardware may not be around
– Tiered storage is a growing alternative
• Replace the “sliding window” of data with a tiered set of storage
based on performance and cost
• Recovery is the overlooked component of backup
strategy. TEST!!!
9. 9
What other problems arise with VLDBs?
• Server proliferation becomes a big headache with
VLDBs
– Additional servers are needed for high-availability, replicated, or hot
standby servers
– Often, architecture divides processing onto a several servers
– ETL servers
• KISS principles dictate that when you have
something big and unwieldy, that you break it down
into to more manageable components
– Partitioning in SQL Server 2005
– Metadata server with partitioned data warehouses in BI VLDBs
• Loading data and, separately, cleaning data is an
enormous issue
10. 10
Storage Strategies – Tiered Storage
• Tapes don’t get upgraded when the system gets
upgraded
• Some progressive customers are using tiered storage
– Active application data is on state-of-the-art disk arrays (e.g.
RAID10, high RPM speeds)
– Near-term older data is kept on less expensive disk arrays (e.g.
RAID5, middle RPM speeds)
– Old, long-term data is kept on very inexpensive SATA drives (e.g.
high volume, low RPM speeds) and never deleted
11. 11
Storage Strategies – Disk and IO Design
• Storage admin might give the DBA disk according to
the volume s/he needs, but not the IO
– Schema design is key; not always a knob to turn
– Schema design (done poorly) often contributes to data bloat
through poor normalization and/or poor choice of data types
– Misaligned block sizes
– LUNs set up for serial IO transaction load, not large block reads
– Exchange best practices are carried forward to SQL Server even
when they don’t apply!
– OLTP and OLAP have conflicting needs (IO/sec versus MB/sec)
• Remedy of first resort is often “throw more hardware
at it”
12. 12
Storage Strategies – Personnel
• Applications need an overall architect!
• Personnel (storage admin, database admin) within a
company often don’t communicate
• Ensure that there’s one version of the “truth”
– Developers aren’t always thinking beyond the deadline
• Coding to business requirements, not performance
requirements
• Often thinking as in row-based mode
– Many companies could benefit from a database programmer role
• DBAs very often spend a LOT of time fixing the bad
code of other people
13. 13
Storage Strategies – SAN
• Other apps may suck up all the cache.
• Other apps may suck up all of the IO on the SAN
• DBAs often data the volume of storage they ask for,
but don’t know to also ask for IO as well!
• Test your SAN to ensure it carries the load effectively
14. 14
Managing VLDBs, part 1
• Backup
– You have to get clever with VLDBs, using something like database
snapshots
– Date range backups; e.g. older data doesn’t change, so you don’t
need to back it up
– Partitioning, especially read-only partitions, can amplify your ability
to manage a SQL Server system – not only backups, but also
indexing and defragmentation
– Serialized may be better than parallel backup
– More thought needs to go into multi-database backup and recovery!
– More databases on a single instance means more IO contention
and shorter maintenance windows
• Indexing
– New on-line indexing opens up new capabilities; slower due to
trickle effect, but on-line through-out the process
15. 15
Managing VLDBs, part 2
• Partitioning
– Multiple IO paths to the SAN
– Allows parallel indexing, backups, data cleanup operations, and
much more
– Be careful of updating statistics and DBCC commands. They run
per tables across all partitions.
– Sexy!
• Transaction Processing
– Views can help mitigate transactions that have run amok
– Views work on all versions of SQL Server
– Partitioning is a big help in SQL Server 2005
• ETL
– Smart clustering can facilitate parallel loads, even to a single table
– Load jobs are long running, single-threaded jobs – thus tied to a
processor – meaning that no more load jobs than number of CPUs.
17. 17
The Basics of I/O
1. A single fixed disk is inadequate except for the
simplest needs
2. Database applications require a Redundant Array of
Inexpensive Disks (RAID) for:
a. Fault tolerance
b. Availability
c. Speed
d. Different levels offer different pros/cons
18. 18
RAID Level 5
• Pros
– Highest Read data transaction rate; Medium Write data transaction rate
– Low ratio of parity disks to data disks means high efficiency
– Good aggregate transfer rate
• Cons
– Disk failure has a medium impact on throughput; Most complex controller
design
– Difficult to rebuild in the event of a disk failure (compared to RAID 1)
– Individual block data transfer rate same as single disk
19. 19
RAID Level 1
• Pros
– One Write or two Reads possible per mirrored pair
– 100% redundancy of data
– RAID 1 can (possibly) sustain multiple simultaneous drive failures
Simplest RAID storage subsystem design
• Cons
– High disk overhead (100%)
– Cost
20. 20
RAID Level 10 (a.k.a. 1 + 0)
• Pros
– RAID 10 is implemented as a striped array whose segments are RAID 1
arrays
– RAID 10 has the same fault tolerance as RAID level 1
RAID 10 has the same overhead for fault-tolerance as mirroring alone
– High I/O rates are achieved by striping RAID 1 segments
– RAID 10 array can (possibly) sustain multiple simultaneous drive failures
– Excellent solution for sites that would have otherwise go with RAID 1 but
need some additional performance boost
21. 21
SAN (Storage Area Network)
• Pros
– Supports
multiple systems
– Newest
technology
matches
RAID1 /
RAID1+0
performance
• Cons
– Expense and
setup
– Must measure
for bandwidth
requirements of
systems,
internal RAID,
and I/O
requirements
24. 24
Monitoring Raw Disk Physical Performance
Avg. Disk sec/Read and Avg. Disk sec/Write
• Transaction Log Access
• Avg disk writes/sec should be <= 1 msec (with array
accelerator enabled)
• Database Access
• Avg disk reads/sec should be <= 15-20 msec
• Avg disk writes/sec should be <= 1 msec (with array
accelerator enabled)
• Remember checkpointing in your calculations!
25. 25
Monitoring Raw I/O Physical Performance
1. Counters - Disk Transfers/sec, Disk Reads/sec, and
Disk Writes/sec
2. Calculate the nbr of transfers/sec for a single drive:
a. First divide the number of I/O operations/sec by
number of disk drives
b. Then factor in appropriate RAID overhead
3. You shouldn’t have more I/O requests (disk
transfers)/sec per disk drive:
8KB I/O Requests 10K RPM 9-72 GB 15K RPM 9–18 GB
Sequential Write ~166 ~250
Random Read/Write ~90 ~110
26. 26
Estimating Average I/O
1. Collect long-term averages of I/O counters (Disk
Transfers/sec, Disk Reads/sec, and Disk Writes/sec)
2. Use the following equations to calculate I/Os per second
per disk drive:
a. I/Os per sec. per drive w/RAID 1 = (Disk Reads/sec + 2*Disk
Writes /sec)/(nbr drives in volume)
b. I/Os per sec. per drive w/RAID 5 = (Disk Reads/sec + 4*Disk
Writes /sec)/(nbr drives in volume)
3. Repeat for each logical volume. (Remember
Checkpoints!)
4. If your values don’t equal or exceed the values on the
previous slide, increase speeds by:
a. Adding drives to the volume
b. Getting faster drives
27. 27
Queue Lengths
1. Counters - Avg. Disk Queue Length and Current Disk
Queue Length
a. Avg Disk Queue <= 2 per disk drive in volume
b. Calculate by dividing queue length by number of drives in
volume
2. Example:
a. In a 12-drive array, max queued disk request = 22 and
average queued disk requests = 8.25
b. Do the math for max: 22 (max queued requests) divided by
12 (disks in array) = 1.83 queued requests per disk during
peak. We’re ok since we’re <= 2.
c. Do the math for avg: 8.25 (avg queued requests) divided by
12 (disks in array) = 0.69 queued requests per disk on
average. Again, we’re ok since we’re <= 2.
28. 28
Disk Time
1. Counters - % Disk Time (%DT), % Disk Read Time
(%DRT), and % Disk Write Time (%DWT)
a. Use %DT with % Processor Time to determine time spent
executing I/O requests and processing non-idle threads.
b. Use %DRT and %DWT to understand types of I/O performed
2. Goal is the have most time spent processing non-idle
threads (i.e. %DT and % Processor Time >= 90).
3. If %DT and % Processor Time are drastically different,
then there’s usually a bottleneck.
29. 29
Database I/O
1. Counters – Page Reads/sec, Page Requests/sec,
Page Writes/sec, and Readahead Pages/sec
2. Page Reads/sec
a. If consistently high, it may indicate low memory allocation or
an insufficient disk drive subsystem. Improve by optimizing
queries, using indexes, and/or redesigning database
b. Related to, but not the same as, the Reads/sec reported by
the Logical Disk or Physical Disk objects
3. Page Writes/Sec: Ratio of Page Reads/sec to Page
Writes/sec typically ranges from 5:1 and higher in OLTP
environments.
4. Readahead Pages/Sec
a. Included in Page Reads/sec value
b. Performs full extent reads of 8 8k pages (64k per read)
30. 30
Tuning I/O
1. When bottlenecking on too much I/O:
a. Tuning queries (reads) or transactions (writes)
b. Tuning or adding indexes
c. Tuning fill factor
d. Placing tables and/or indexes in separate file groups on
separate drives
e. Partitioning tables
2. Hardware solutions include:
a. Adding spindles (reads) or controllers (writes)
b. Adding or upgrading drive speed
c. Adding or upgrading controller cache. (However, beware
write cache without battery backup.)
d. Adding memory or moving to 64-bit memory.
31. 31
Trending and Forecasting
1. Trending and forecasting is hard work!
2. Create a tracking table to store:
a. Number of records in each table
b. Amount of data pages and index pages, or space consumed
c. Track I/O per table using fn_virtualfilestats
d. Run a daily job to capture data
3. Perform analysis:
a. Export tracking data to Excel
b. Forecast and graph off of data in worksheet
4. Go back to step 2d and repeat
32. 32
Disk Rules of Thumb for Better Performance
1. Put SQL Server data devices on a non-boot disk
2. Put logs and data on separate volumes and, if possible, on
independent SCSI channels
3. Pre-size your data and log files; Don’t rely on AUTOGROW
4. RAID 1 and RAID1+0 are much better than RAID5
5. Tune TEMPDB separately
6. Create 1 data file (per filegroup) for physical CPU on the server
7. Create data files all the same size per database
8. Add spindles for read speed, controllers for write speed
9. Partitioning … for the highly stressed database
10. Monitor, tune, repeat…
33. 33
Quest Management Suite for SQL Server
• LiteSpeed for SQL Server Enterprise – Advanced
compression and encryption
– Efficiently manage SQL Server backup and recovery operations
• Spotlight on SQL Server Enterprise – Real-time
performance diagnostics
– Diagnose and resolve SQL Server performance Issues
• Change Director – 24x7 SQL Server change
tracking
– Track database changes on any SQL Server
• Capacity Manager for SQL Server – Automated
storage and resource planning
– Automate the process of capacity and resource planning
35. 35
Resources, VLDB
• Download my white papers at Quest.com, read my
blog at http://www.sqlmag.com and http://sqlblog.com
• See the full panel discussion from the PASS 2006
Summit at
http://www.quest.com/events/listdetails.aspx?
ContentID=4688&site=&prod=&technology=&prodfa
mily=&loc=
• Project REAL at
http://www.microsoft.com/sql/solutions/bi/projectreal.
mspx
• SQL Server Customer Advisory Blog at
http://blogs.msdn.com/sqlcat/
36. 36
Resources, Disk IO Tuning
• See my webcast and read my article on SQL Server
Magazine called ‘Bare Metal Tuning’ to learn about
file placement, RAID comparisons, etc.
• Check out www.baarf.com and www.SQL-Server-
Performance.com
• Storage Top 10 Best Practices at
http://www.microsoft.com/technet/prodtechnol/sql/bes
tpractice/storage-top-10.mspx
37. 37
Call to Action – Next Steps
• Learn more about Quest Management Suite for SQL
Server: http://www.quest.com/sql_server/
– Download trials
– Read white papers
– Review case studies
• Email us with your questions: sales@quest.com
38. 38
Q & A
• Send questions to me at: kevin.kline@quest.com
• Send broader technical questions to: info@quest.com
• Send Sales questions to: sales@quest.com
• For sales & licensing questions, call:
– UK: +44.1628.518000
THANK YOU!
Notas del editor
A quick note of thanks to the participants in the original Quest discussion panel “Surviving the Data Avalanche” at the PASS 2006 Community Summit – Mark Sousa and Ward Pond of Microsoft, Darren Bennick of EMC, Allan Hirt of Avanade, Scott Himsel of HP, and Patrick O’Keeffe of Quest Software
Industry experts also agree. Quest continues to win significant industry awards from partners, press and market analysts who each underscore the prowess of Quest solutions and our market share leadership.
As an aside, Quest is also ranked #1 in Windows Management by IDC and Gartner
Used to be able to delete backup files. Now, in many cases, you cannot.
Recovery is often overlooked. Are you sure they work? Will they work within the limitations of your SLA? Are they routinely tested?
Many BI designers are normalizing their data warehouse. Bad!
Developers, and many other roles in an application rollout, are not talking with each other. Developers are often thrust into the role of database design. Sys admin has to do hardware layout.
4850 row-level locks is the threshold at which SQL Server escalates to a table lock. Manage to that threshold!
Many applications now have more than one database, not to mention MSDB and Master. You need to thoroughly test backup and especially recovery to ensure that these mlti-db applications work properly when restored.
Partitioning – Microsoft is working on partition-only versions of update statistics and DBCC. You can work around this by switching a partition to its own table, running the command, and then switching it back in to the partition.
For ETL, remember 4850 row-level locks is the threshold at which SQL Server escalates to a table lock. Manage to that threshold!
Might use CPU-1 as the total number of load processes.
Each entire data block is written on a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads.
RAID Level 5 requires a minimum of 3 drives to implement
For Highest performance, the controller must be able to perform two concurrent separate Reads per mirrored pair or two duplicate Writes per mirrored pair.
RAID Level 1 requires a minimum of 2 drives to implement
Other Pros:
Twice the Read transaction rate of single disks, same Write transaction rate as single disks; Transfer rate per block is equal to that of a single disk
100% redundancy of data means no rebuild is necessary in case of a disk failure, just a copy to the replacement disk
RAID Level 10 requires a minimum of 4 drives to implement.
Cons
Very expensive / High overhead
All drives must move in parallel to properly track, lowering sustained performance
Very limited scalability at a very high inherent cost
Note that RAID 0 + 1 is this diagram turned on its side. RAID 0 + 1 is excellent for logs.
Note that on Windows 2000 server you must enable Diskperf –y to get physical disk counters. Otherwise, PerfMon will return values for the physical disk counters, but they’ll actually be logical values. This is setting enabled by default on Windows 2003 server. It incurs a small overhead of perhaps 1-2%.
The Avg. Disk sec/Read and Avg. Disk sec/Write counters monitor the average number of seconds for read or write operations from or to a disk, respectively.
If your values significantly exceed those listed above for database access, you may want to increase the speed of your disk subsystem by either using faster drives or adding more drives to the system.
Checkpointing note: The above values will temporarily increase during periods of heavy I/O activity, such as during the checkpoint. When monitoring these values, take an average over a longer period of time, and/or monitor periods that do not contain a checkpoint.
Monitors the rate of i/o operations handled by the disk subsystem. Remember that with several drives allocated to a logical disk volume, the counters monitor the total number of disk transfers for the entire volume.
Note: With the array accelerator enabled, you may actually see substantially higher I/O per second per drive rates than those suggested in the table above. This is due to the array controller caching some of these I/Os.
This slide details how to estimate the average number of I/O requests per second for each disk drive.
With RAID 1, each write is duplicated onto a mirrored drive. Hence there are two-disk writes/sec.
With RAID 5, each write generates four I/O operations: reading the data block, reading the parity block, writing the data block, and writing the parity block. Hence there are four disk writes/sec.
Repeat these steps for each logical volume. If the values significantly exceed those suggested above, increase the speed of your disk subsystem by adding more or using faster drives. As the equations illustrate, RAID 0 has the lowest impact on performance but offers no data protection. RAID 5, on the other hand, slows performance but offers low-cost data protection. Disk Reads/sec and Disk Writes/sec counters can be used to determine an application’s read-to-write ratio. They can also be used to profile disk I/O at a lower level. The sum of these two counters should equal the Disk Transfers/sec value.
Note: The above values will temporarily increase during periods of heavy I/O activity, such as during the checkpoint. When monitoring these values, take an average over a longer period of time, and/or monitor periods that do not contain a checkpoint.
The Avg. Disk Queue Length and Current Disk Queue Length counters monitor both the average number and the instantaneous number of reads and writes queued for the selected disk.
Disk devices composed of multiple spindles, such as logical volumes configured on Smart Array controllers, will have several active requests at any point in time and several requests waiting for different disk drives. You therefore need to factor in the number of disks in the logical volume that are servicing the I/O requests.
For example, A twelve-drive array is facing a maximum of 22 queued disk requests and an average of 8.25 queued disk requests. Therefore, the array has 22/12=1.83 queued disk requests per disk drive during the peak, and 8.25/12=0.69 queued disk requests per disk drive on an average.
You should not average more than two queued disk requests per disk drive. The Avg. Disk Read Queue Length and Avg. Disk Write Queue Length counters provide you with more insight into what type of I/O requests are being queued the most.
Remember that these values will temporarily increase under spikes of heavy I/O, such as during the checkpoint.
The % Disk Time, % Disk Read Time, and % Disk Write Time counters monitor the percentage of time spent servicing particular I/O requests during the sampling interval.
Use the % Disk Time counter in conjunction with the % Processor Time counter to determine the time the system spends executing I/O requests or processing non-idle threads. Use the % Disk Read Time and % Disk Write Time counters to gain a further insight into the type of I/O being performed.
Your goal is to have a high percentage of time being spent executing non-idle threads (high % Processor Time) AND executing I/O (high % Disk Time). On a highly optimized system, these counters consistently measure at over 90 percent.
If one of these counters reads substantially lower than the other, this usually indicates a bottleneck, and further investigation is necessary. With high % Disk Time, use the % Disk Read Time and % Disk Write Time counters to get the I/O breakdown. With high % Processor Time, use the % User Time and % Privileged Time to get further CPU utilization breakdown.
Page Reads/sec
The Page Reads/sec counter monitors the number of pages read from disk per second.
Depending on your environment, this counter may be high before your system reaches a steady state, and then gradually decrease. If your database fits entirely into memory, the counter should be zero. If the counter is consistently high, it may indicate low memory allocation or an insufficient disk drive subsystem. You may be able to reduce the number of Reads/sec by optimizing your queries, using indexes, and/or redesigning your database.
Page Reads/sec is related to, but not the same as, the Reads/sec reported by the Logical Disk or Physical Disk objects. Multiple pages can be read with a single logical or physical disk read. The number of physical reads and Page Reads should be roughly the same in OLTP environments.
Page Requests/sec
A Page Request occurs when SQL Server looks in the buffer pool for a database page. If the page is in the buffer pool, it can be processed immediately. If the page is not in the buffer pool, a Page Read is issued.
Page Writes/sec
Eventually all modified database pages have to be written back to disk. The Page Writes/sec counter reports the rate at which this occurs. The ratio of Page Reads/sec to Page Writes/sec typically ranges from 2:1 to 5:1 in OLTP environments. Most Business Intelligence applications perform few updates and, as a result, few Page Writes. Excessive Page Writes can be caused by insufficient memory or frequent checkpoints.
Readahead Pages/sec
The SQL Server storage architecture supports optimizations that allow SQL Server to determine in advance which database pages will be requested (read-ahead).
A full scan, in which every page of an index or table is read, is the simplest case. Read-ahead occurs when SQL Server issues the read request before the thread that is processing the query or transaction needs the page. Readahead Pages/sec is included in the Page Reads/sec counter. The number of read requests issued due to cache misses (the requested page was not found in the data cache) can be calculated by subtracting Readahead Pages/sec from Page Reads/sec.
SQL Server typically reads entire extents when performing read-aheads. All eight pages of an extent will be read with a single 64KB read. Read-aheads will cause the Avg. Disk Bytes/Read reported by the Logical Disk or Physical Disk object to be larger than 8KB. It is important to note that read-aheads are performed in sequential order, allowing much higher throughput than random accesses.
Do not put SQL Server data devices on the boot disk
Put logs on a RAID 1 on an independent SCSI channel
Put data on a RAID 5 on an independent SCSI channel
If read disk queuing is high on your data device (avg 2 or higher per spindle) put non-clustered indexes in a new filegroup on a RAID 5 on an independent SCSI channel
If tempdb is stressed (consistent blocking in dbid 2 is one common indicator) and you cannot redesign to relieve the stress put tempdb on a RAID 1+0 on an independent SCSI channel and the tempdb log on yet another RAID 1+0 on an independent channel.
If your database holds highly sensitive data consider RAID 1+0 over RAID 5.
Avoid unnecessary complexity (KISS).
With thanks to Bill Wunder and his article on the SIGs at the PASS website (http://sigs.sqlpass.org/Resources/Articles/tabid/35/ctl/ArticleView/mid/349/articleId/58/Default.aspx).
Bare Metal Tuning facts, for example:
The commodity platform was RAID5
Extra spindles added 1.4% eachover baseline.
RAID1 & RAID1+0 each provided about a 319% boost over baseline
Extra spindles added 5% each over baseline