SlideShare una empresa de Scribd logo
1 de 53
June 21, 2009 Hanging By a Thread: Using Capacity Planning to Survive  Session 2240 Surf F 08:00 Wednesday  Paul O’Sullivan
Topics Up for Discussion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Current State of Performance Analysis and Capacity Planning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Issues ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Issues ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Falling hardware costs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OK anyone can complain…. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
June 21, 2009 Capacity Planning Oracle RAC on Itanium Linux
A Sample Study Oracle RAC Capacity Planning ,[object Object],[object Object],[object Object],[object Object]
RAC Node CPU Utilizations, July-Sept 2008
Selection of Peak Benchmark Load
CPU by Image / Disk I/O Rate
CPU Utilization by Core Reasonable core load balance at heavy loads.
Overall Disk I/O Rates
Overall Disk Data Rate
Disk Response  Times
Memory Allocation
eCAP Workload Definition
Workload Characteristics Primary Response Time Components oracleNDSPRD1  oracleLockProcs  oracleProcs  asmProcs Disk I/O CPU CPU CPU CPU Disk I/O Disk I/O Disk I/O   Workload Class   Process Count Multi- Processing Level Process Creation Rate (/sec)   CPU Utilization Disk I/O Rate (/sec) oracleNDSPRD1 1110 547.1 0.925 73% 639 oracleLockProcs 8 3.2 0.007 5% 277 oracleWorkProcs 46 31.8 0.038 1% 14 ASM processes 20 9.7 0.017 0.2% 10 daemons 6 2.4 0.005 0.05% 4 data collector 1 0.4 0.001 0.3% 26 root processes 1161 266.0 0.968 3% 233 other processes 774 47.5 0.645 2% 311
Current System Response Time Curve 9% Headroom  9%
Current System Headroom Headroom 9% Capacity  100%
Findings - Current System ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Platform Alternatives (3 or 4 nodes) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Configuration must support 200% workload growth
Response Time  vs  Workload Growth 3-node RAC Note:  CPU is primary resource bottleneck;  disk and memory will support 200% growth
Response Time  vs  Workload Growth 4-node RAC
Qualifying Platforms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Response Time  vs  Workload Growth (reduced core, 3-node configurations)
Response Time  vs  Workload Growth (reduced core, 4-node configurations)
Optimized Configurations Final choice based on cost and management issues. Platform 3-node 4-node Sun SPARC Enterprise M8000 (2.4 GHz) 32 24 HP rx8640 (1.6 GHz, 25MB L3 cache) 30 24 IBM p 570 (2.2 GHz, Power 5) 26 20 IBM p 570 (4.7 GHz, Power 6) 12 10
June 21, 2009 Performance Analysis SQL Server on HP Blades and EVA
Performance Analysis 1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Performance Analysis ,[object Object],[object Object],[object Object],[object Object]
Hardware Configuration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Initial Analysis ,[object Object],[object Object],[object Object],[object Object],[object Object]
June 21, 2009 Not really high IO counts these days…. IO Rates
June 21, 2009 Very high D: drive response time…. Disk Response Time
June 21, 2009 Very high D: drive response time…. IO Sizes
June 21, 2009 SQL Server process generating  all the IO Obviously, something wrong with the application, right? Process-based IO Rates
June 21, 2009 1.7Gb. Excuse me? But the server has 24Gb of memory SQL Server Memory
June 21, 2009 Soft paging into the free list SQL Server paging
June 21, 2009 Soft paging into the free list huge IO load generated as data I s moved to and from the SQL Server process SQL Server paging
So what happened? ,[object Object],[object Object],[object Object],[object Object]
June 21, 2009 Production: IO before
June 21, 2009 Production: IO After
June 21, 2009 Production: IO Q Before
June 21, 2009 Production: IO Q After
June 21, 2009 Production: Disk Busy Q Before
June 21, 2009 : Production: Disk Busy Q after HUGE reduction in disk busy
Result ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lessons ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
So what do we need? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Want to know more? ,[object Object],[object Object],[object Object],[object Object],[object Object]

Más contenido relacionado

La actualidad más candente

Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Databricks
 
Microsoft SQL Server - Benchmark Presentation
Microsoft SQL Server - Benchmark PresentationMicrosoft SQL Server - Benchmark Presentation
Microsoft SQL Server - Benchmark Presentation
Microsoft Private Cloud
 
A4 oracle's application engineered storage your application advantage
A4   oracle's application engineered storage your application advantageA4   oracle's application engineered storage your application advantage
A4 oracle's application engineered storage your application advantage
Dr. Wilfred Lin (Ph.D.)
 

La actualidad más candente (19)

In-Memory Computing: Myths and Facts
In-Memory Computing: Myths and FactsIn-Memory Computing: Myths and Facts
In-Memory Computing: Myths and Facts
 
Virtual Storage Center
Virtual Storage CenterVirtual Storage Center
Virtual Storage Center
 
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
 
HBase New Features
HBase New FeaturesHBase New Features
HBase New Features
 
11g R2
11g R211g R2
11g R2
 
Propelling IoT Innovation with Predictive Analytics
Propelling IoT Innovation with Predictive AnalyticsPropelling IoT Innovation with Predictive Analytics
Propelling IoT Innovation with Predictive Analytics
 
Planning and What's New in Windows Server 2008 R2 SP1 for Virtualization
Planning and What's New in Windows Server 2008 R2 SP1 for VirtualizationPlanning and What's New in Windows Server 2008 R2 SP1 for Virtualization
Planning and What's New in Windows Server 2008 R2 SP1 for Virtualization
 
Microsoft SQL Server - Benchmark Presentation
Microsoft SQL Server - Benchmark PresentationMicrosoft SQL Server - Benchmark Presentation
Microsoft SQL Server - Benchmark Presentation
 
Ac922 watson 180208 v1
Ac922 watson 180208 v1Ac922 watson 180208 v1
Ac922 watson 180208 v1
 
Ycsb benchmarking
Ycsb benchmarkingYcsb benchmarking
Ycsb benchmarking
 
PostgreSQL Hangout Replication Features v9.4
PostgreSQL Hangout Replication Features v9.4PostgreSQL Hangout Replication Features v9.4
PostgreSQL Hangout Replication Features v9.4
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Joel Gibson - Challenge 2 - Virtual Design Master
Joel Gibson - Challenge 2 - Virtual Design MasterJoel Gibson - Challenge 2 - Virtual Design Master
Joel Gibson - Challenge 2 - Virtual Design Master
 
VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2
VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2 VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2
VMworld 2013: Enterprise Architecture Design for VMware Horizon View 5.2
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
A4 oracle's application engineered storage your application advantage
A4   oracle's application engineered storage your application advantageA4   oracle's application engineered storage your application advantage
A4 oracle's application engineered storage your application advantage
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 

Similar a Hptf 2240 Final

Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
Jun Liu
 
Scalable Storage Configuration for the Physics Database Services
Scalable Storage Configuration for the Physics Database ServicesScalable Storage Configuration for the Physics Database Services
Scalable Storage Configuration for the Physics Database Services
mabessisindu
 
Hp Connect 10 06 08 V5
Hp Connect 10 06 08 V5Hp Connect 10 06 08 V5
Hp Connect 10 06 08 V5
guestea711d0
 

Similar a Hptf 2240 Final (20)

Oracle R12 EBS Performance Tuning
Oracle R12 EBS Performance TuningOracle R12 EBS Performance Tuning
Oracle R12 EBS Performance Tuning
 
WMS Performance Shootout 2010
WMS Performance Shootout 2010WMS Performance Shootout 2010
WMS Performance Shootout 2010
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
Choosing the Right EC2 Instance and Applicable Use Cases - AWS June 2016 Webi...
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Designing Scalable Applications
Designing Scalable ApplicationsDesigning Scalable Applications
Designing Scalable Applications
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
 
Java ee7 with apache spark for the world's largest credit card core systems, ...
Java ee7 with apache spark for the world's largest credit card core systems, ...Java ee7 with apache spark for the world's largest credit card core systems, ...
Java ee7 with apache spark for the world's largest credit card core systems, ...
 
Why sap hana
Why sap hanaWhy sap hana
Why sap hana
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
 
Performance Tuning intro
Performance Tuning introPerformance Tuning intro
Performance Tuning intro
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
Scalable Storage Configuration for the Physics Database Services
Scalable Storage Configuration for the Physics Database ServicesScalable Storage Configuration for the Physics Database Services
Scalable Storage Configuration for the Physics Database Services
 
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
Scaling Redis Cluster Deployments for Genome Analysis (featuring LSU) - Terry...
 
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...IMCSummit 2015 - Day 2  IT Business Track - Drive IMC Efficiency with Flash E...
IMCSummit 2015 - Day 2 IT Business Track - Drive IMC Efficiency with Flash E...
 
Hp Connect 10 06 08 V5
Hp Connect 10 06 08 V5Hp Connect 10 06 08 V5
Hp Connect 10 06 08 V5
 
Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
Exadata
ExadataExadata
Exadata
 

Hptf 2240 Final

  • 1. June 21, 2009 Hanging By a Thread: Using Capacity Planning to Survive Session 2240 Surf F 08:00 Wednesday Paul O’Sullivan
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. June 21, 2009 Capacity Planning Oracle RAC on Itanium Linux
  • 10.
  • 11. RAC Node CPU Utilizations, July-Sept 2008
  • 12. Selection of Peak Benchmark Load
  • 13. CPU by Image / Disk I/O Rate
  • 14. CPU Utilization by Core Reasonable core load balance at heavy loads.
  • 17. Disk Response Times
  • 20. Workload Characteristics Primary Response Time Components oracleNDSPRD1 oracleLockProcs oracleProcs asmProcs Disk I/O CPU CPU CPU CPU Disk I/O Disk I/O Disk I/O Workload Class Process Count Multi- Processing Level Process Creation Rate (/sec) CPU Utilization Disk I/O Rate (/sec) oracleNDSPRD1 1110 547.1 0.925 73% 639 oracleLockProcs 8 3.2 0.007 5% 277 oracleWorkProcs 46 31.8 0.038 1% 14 ASM processes 20 9.7 0.017 0.2% 10 daemons 6 2.4 0.005 0.05% 4 data collector 1 0.4 0.001 0.3% 26 root processes 1161 266.0 0.968 3% 233 other processes 774 47.5 0.645 2% 311
  • 21. Current System Response Time Curve 9% Headroom 9%
  • 22. Current System Headroom Headroom 9% Capacity 100%
  • 23.
  • 24.
  • 25. Response Time vs Workload Growth 3-node RAC Note: CPU is primary resource bottleneck; disk and memory will support 200% growth
  • 26. Response Time vs Workload Growth 4-node RAC
  • 27.
  • 28. Response Time vs Workload Growth (reduced core, 3-node configurations)
  • 29. Response Time vs Workload Growth (reduced core, 4-node configurations)
  • 30. Optimized Configurations Final choice based on cost and management issues. Platform 3-node 4-node Sun SPARC Enterprise M8000 (2.4 GHz) 32 24 HP rx8640 (1.6 GHz, 25MB L3 cache) 30 24 IBM p 570 (2.2 GHz, Power 5) 26 20 IBM p 570 (4.7 GHz, Power 6) 12 10
  • 31. June 21, 2009 Performance Analysis SQL Server on HP Blades and EVA
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. June 21, 2009 Not really high IO counts these days…. IO Rates
  • 37. June 21, 2009 Very high D: drive response time…. Disk Response Time
  • 38. June 21, 2009 Very high D: drive response time…. IO Sizes
  • 39. June 21, 2009 SQL Server process generating all the IO Obviously, something wrong with the application, right? Process-based IO Rates
  • 40. June 21, 2009 1.7Gb. Excuse me? But the server has 24Gb of memory SQL Server Memory
  • 41. June 21, 2009 Soft paging into the free list SQL Server paging
  • 42. June 21, 2009 Soft paging into the free list huge IO load generated as data I s moved to and from the SQL Server process SQL Server paging
  • 43.
  • 44. June 21, 2009 Production: IO before
  • 45. June 21, 2009 Production: IO After
  • 46. June 21, 2009 Production: IO Q Before
  • 47. June 21, 2009 Production: IO Q After
  • 48. June 21, 2009 Production: Disk Busy Q Before
  • 49. June 21, 2009 : Production: Disk Busy Q after HUGE reduction in disk busy
  • 50.
  • 51.
  • 52.
  • 53.

Notas del editor

  1. Note that having experience of the other side of the fence – (almost adverserial) Compaq/DEC background.
  2. Server numbers peaked 2005-2007 Windows/Blades/Virtualiasation All platforms (worse with Solaris x86) Not seen as a value add
  3. CP and Performmance Specialists are almost extinct Replaced by ITIL Capacity Management Specialists – not the same thing! CP 99% of cases only under infrastructure budgets – not aligned to business Experiences with ITIL found – good for developing processes, bad for developing budget Suits management not to have overall department with responsbility for Infrastructure and Applications
  4. CP and Performmance Specialists are almost extinct Replaced by ITIL Capacity Management Specialists – not the same thing! CP 99% of cases only under infrastructure budgets – not aligned to business Experiences with ITIL found – good for developing processes, bad for developing budget Suits management not to have overall department with responsbility for Infrastructure and Applications
  5. This was for a 4 way Sybase Server which today could be performed by a single blade server on the end of a SAN Point here: with a server costing so much you NEED to make sure that it is correctly sized – today better performance for less than ¼ fo the price – is that why many sites have 4x the servers?
  6. This was for a 4 way Sybase Server which today could be performed by a single blade server on the end of a SAN Point here: with a server costing so much you NEED to make sure that it is correctly sized – today better performance for less than ¼ fo the price – is that why many sites have 4x the servers?
  7. Clearly, something odd is happening here
  8. Clearly, something odd is happening here
  9. Server was a BL460c 4Gb FC cards, 24Gb of memory
  10. Asked the question: what was the EVA configuration: EVA6000, 300Gb 15k drives, 96 disks. Shared Modelled EVA to confirm issues….
  11. Ah, first clue, large sizes of IO 80,000kB/sec = 8000MB size, 8Gb xfers !!!!!
  12. All SQL Server, mostly during on-line day.
  13. SQL Server has 1.7Gb, is enterprise edition, and SLQ server memory has been set to use all the memory it can.
  14. So, what happens when SQL cannot get enough memory – it will soft fault…
  15. So, what happens when SQL cannot get enough memory – it will soft fault…
  16. ALL SQL servers had this issue. Looks like the customer forgot to implement the feature…. But what happened next?
  17. So, what happens when SQL cannot get enough memory – it will soft fault…
  18. So, what happens when SQL cannot get enough memory – it will soft fault…
  19. So, what happens when SQL cannot get enough memory – it will soft fault…
  20. So, what happens when SQL cannot get enough memory – it will soft fault…
  21. We put the change on a stress test system
  22. We put the change on a stress test system
  23. Since this work, the fix went in onto another SQL Server – Disk read queue of 34m peak down to 300. Analysis was’t hard to do, just no-one had done it before.
  24. ALL SQL servers had this issue. But what happened next?
  25. To start with, just getting decent performance data was a problem Then came the issue of logging into each system and looking at the graphs Then came the issue of looking at 100s of systems Then came the issue of modelling