This document discusses optimizing cloud applications using RightScale. It covers monitoring applications using tools like Collectd and New Relic RPM. It also discusses optimizing database performance on cloud infrastructure by scaling instances vertically and horizontally, using the right indices, and ensuring the working set fits in memory. The document provides an agenda and overview of these optimization techniques.
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Optimizing Your Cloud Applications in RightScale
1. Optimizing Your Cloud Applications in RightScale Rafael H. Saavedra - VP Engineering, RightScale June 8th, 2011
2. Introduction 3-tier application architecture Vertical & horizontal scaling RightScale monitoring and cluster graphs New Relic RPM Support for optimizing DB performance Miscellaneous Agenda
3. Multi-tenancy Shared resource pooling Geo-distribution and ubiquitous network access Service oriented Dynamic resource provisioning Self-organizing Utility based pricing Cloud computing characteristics
4. No upfront investment Lowering operating costs Highly scalable Easy access Reduces business risk and maintenance costs Cloud computing advantages
9. The array scales up or down based on performance votes Tags allow scaling on an arbitrary decision set Decision threshold controls reaction time Sleep time allows new resources to have an impact Scaling can be time dependent Detailed setup instructions: http://bit.ly/c1oLr2 Fast response to changes in load conditions using alerts Allocation of servers to availability zones based on weights Deployment-based so configuration is consistent Arrays can be pre-scaled to support anticipated demand Server arrays provide horizontal scaling
10. Cluster monitoring Individual graphs Good for a dozen servers Displays all standard graphs with full detail Stacked graphs Displays the contribution of many servers to a total Great to see the sum and variability of activity in a cluster Difficult to make out individual servers Examples: requests/sec, cpu busy cycles, I/O bytes/sec Heat maps Displays a bar for each server Great to see uneven distribution across servers Great to quickly spot performance problems across many servers Difficult to read absolute values or see the total cluster activity
16. Cluster monitoring architecture Architecture Monitoring front-end serverspull data from storage servers Up to 100 servers on one graph(to be increased) monitoring storage servers monitoring front-end servers your servers
23. An expensive query The N+1 query problem Finding patterns in similar requests New Relic RPM – 3 Examples
24. Optimizing DB performance RightScale MySQLServerTemplates Configuration files tailored to instance size innodb_buffer_pool_size key_buffer_size thread_size sort_buffer_size The never ending task of identifying current bottlenecks Disk seeks Performance of disk operations Scale up when working set cannot fit in memory – avoid active swapping Constant monitoring of performance graphs, logs and query Schema considerations
25. Schema considerations Lookups need to be indexed Sorting requires an index Joins need to be done on indices Become slower as tables grow Compounded indices should be used consistently Do not abuse indices Each index requires a disk write Compact tables if they become fragmented Deleted rows do not remove the corresponding index entries
26. Monitoring DB performance Standard collectd statistics User vs wait time (disk operations) Performance of disk operations Scale up when working set cannot fit in memory MySQLcollectdplugin Monitor INSERT, SELECT, UPDATE operations The breakdown of read operations can indicate missing indices Monitoring /var/log/mysql-slow.log file Identify slow queries Use MySQL EXPLAIN command to identify query plan
27. MySQLCollectdPlugin Uses MySQL SHOW STATUS command to collect statistics A large set of counters that are divided into 10 categories Connections IO Requests Select Rates Read Rates Key Rates Commands Rates Query Cache Tables Memory Misc.
29. Mysql-slow.log & explain command # Time: 101006 23:30:11 # User@Host: prod[prod] @ domU-12-31-39-0F-D0-C1.compute-1.internal [10.193.211.47] # Query_time: 7 Lock_time: 0 Rows_sent: 1 Rows_examined: 19785 SELECT * FROM `ec2_elastic_ips` WHERE (`ec2_elastic_ips`.ec2_instance_id = 6810144) LIMIT 1; mysql> explain select * FROM `ec2_elastic_ips` WHERE (`ec2_elastic_ips`.ec2_instance_id = 6810144) LIMIT 1 *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ec2_elastic_ips type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 33332 Extra: Using where 1 row in set (0.00 sec)
30. MySQL performance depends on locality Wait time should be minimum when working set fits in memory Performance degrades once wait time is significant wait time insignificant user time dominates
31. MySQL reads graphs Read-random-next represents a table scan Read-next represents an index scan
32. Misc load testing using httperf RightScale provides ServerTemplates in the marketplace https://my.rightscale.com/library/server_templates/Httperf-Load-Tester-11H1/18316 Tutorial on httperf setup and configuration http://support.rightscale.com/03-Tutorials/02-AWS/E2E_Examples/E2E_Gaming_Deployment/Adding_Httperf_Load_Tester
The cluster monitoring is very powerful in that it provides different types of views into the operation of large clusters of servers
Walk through ofhow it works: in any deployment, go to the monitoring tab select servers select metric to plot familiar controls to switch time period and graph size displays one graph per server, here core1.rightscale.com through core8.rightscale.com in this example the graphs show cpu utilization for the past week, where blue is busy time and green is idle
Individual graphs only work for so many servers, they also don’t show what is happening as an aggregateStacked graphs stack the contribution of each server on top of one anotherWalk through what the graph shows
Stacked graphs are great to see the aggregate, but it is often difficult to see abnormal server behaviorHeat maps show many servers on one graph by plotting one horizontal bar per serverThe time axis is the same for all servers and it is shown at the bottom of the graphThe color of the bar shows the value of the metric for the serverWalk through the graphIt’s easy to see that there are 6 servers sharing the load, and two servers that are different
At scale this is how all this looks and comes togetherThis example is real, it shows an incident we had with our monitoring cluster a few months agoThis heat map shows 100 servers out of one of our monitoring clusters (we want to be vague here…)When there are more than 100 servers, the heat map shows a sampling of 100Describe the sampling: most recently launched, longest running, some of each server template, rest randomStory:This heat map plots I/O wait for our monitoring servers on a day where we suddenly received a number of alerts for a few serversThe heap map shows these servers clearly as red bands starting between 7am and 8amSo we could clearly see that something was going on with a small number of servers and that it started more or less at the same time on all themTo see what happened in aggregate, we can switch graph type…
This shows the same incident as on the previous slide, but with a timescale of a weekIt shows the number of servers handled by each monitoring server, i.e. each color bar shows one serverIt is easy to see that some customer launched a large number of servers right at the time the overload beganFurther investigation showed that due to a bug these servers were allocated unevenly across the cluster causing the overload’
The architecture behind the cluster monitoring is rather extensiveCustomer (i.e. your) servers send monitoring data every 20 seconds to our serversThe data points are cached in-memory on those servers and flushed to disk periodicallyCluster monitoring graphs are produced on separate front-end servers, which pull the data from over 100 monitoring storage serversThe graphs are produced using rrdtool and auto-refresh