This document discusses setting performance goals to optimize existing applications. It recommends defining goals like 95th percentile response times for different types of requests and measuring these goals over short intervals like every 5 minutes. The goals should focus on important user interactions and prioritize the most critical performance problems first. Instrumenting production systems to collect response time data can help understand where to optimize and ensure the goals are being met for all users.
1. Goal Driven Performance
Optimization
Highload++,
October 25-26,2010
Moscow, Russia
Peter Zaitsev
Percona Inc
2. What is this all about ?
• First step to successful performance optimization is
setting right goals
• In most cases goals are not set (or unclear) and a lot
of resources wasted on not important things
• This presentation is about setting the right goals and
using them to optimize performance of existing
system
Goal Driven Performance Optimization
3. When is it Applicable ?
• Optimizing Performance for Existing Applications
• Can be used with load testing for scaling application
and testing new features
• A way to implement monitoring and spot problems
before users start complain
Goal Driven Performance Optimization
4. Understanding Performance
• Latency/Response Time
– Always Important
– Tolerance can be very different
• 50ms of Ajax Request
• 30minutes for report
• Throughtput
– Often important for multi-user systems
– System can do 1000 transactions/second
Goal Driven Performance Optimization
5. Throughput/Latency Relation
• Response time tends to increase with throughput
– When system overload response time goes to infinity
• Call Center analogy
– Fewer people servicing calls = better utilization
• Same as throughput per person
– More people servicing calls = better response time
• Calls spend less time waiting in the queue
• Classical Performance Optimization Goal
– Maximizing Throughput/Utilization while maintaining
Response time within a guidelines
Goal Driven Performance Optimization
6. Response Time Metrics
• Average/Medium/Response Time
– Not a good metric for adequate performance
– Same as average person temperature in hospital
– Can be helpful for historical trending
• Maximum Response Time
– Good in theory. We want No requests taking longer than X
– Hard to work in practice – some requests will take too long
• Define Percentile response time
– 95% or requests serviced within 500ms
– 99% or requests serviced within 1000ms
Goal Driven Performance Optimization
7. Alternative Measurments
• 95 percentille response time is hard/expensive to
compute in SQL
– Can use other metrics
• APDEX
– http://en.wikipedia.org/wiki/Apdex
• Portion where response time is within response time
– SUM(response_time<0.5)/count(*)
– Returning 0.95 Is same as 95% response time of 0.5 sec
Goal Driven Performance Optimization
8. Even Response Time
• 95% response time goal will allow your system to be
non responsive for an hour every day
– Ie extremely bad performance when taking backup
• You want to ensure there is no stalls/performance
dips.
• If page loads slow and user presses reload and it
loads quickly it is OK – there are always network
glitches.
• Define your performance goals at short intervals.
– Goals should be met at ALL 5 minutes intervals.
Goal Driven Performance Optimization
9. Even Response Time math
• If you only can work with long intervals you can
define stricter performance goals
– 99.9% metrics means 2 min slow response will affect it
• 86400/1000~=86 (sec) – assuming uniform traffic
• The longer response time is OK the larger intervals
you can have
– 1min allowed response time in 99% cases means 1 hour
check interval should be enough
Goal Driven Performance Optimization
10. Response Time and an Object
• Not all the pages are created Equal
• Complexity and User Requirement Differ
• Ajax Pop Ups
– 50ms
• Profile Page Generation
– 150ms
• Search
– 300ms
• Site Usage Report
– 1000ms
Goal Driven Performance Optimization
11. Responses by Type of Client
• Human Being
– Actual Human waiting and being impatient
– Response Time critical
• Bots
– Some systems have over 80% of bot traffic
– Bot response time is less critical
• Though should be good enough to be indexed
• Interactive Web Services
– Can be used to generate pages on other sites
– Low Response time is even more critical
Goal Driven Performance Optimization
12. Different kinds of Slowness
• System “randomly” responds slowly
– OK as long as rare enough.
– Users will write it off as Internet/computer slowness
• Sustained Slowness is bad
– Search request which is always slow
– User with many friends which is “always” slow
• Are these users/cases important ?
– Track them separately. They may be invisible with 99%
alone. ie Performance per customer
– Consider Firing users/Blocking cases otherwise
Goal Driven Performance Optimization
13. Where to measure performance
• Client Side (the actual data)
– http://code.google.com/p/jiffy-web/
– Firebug etc (but only for development)
• External Performance Monitoring
– Gomez, Keynote etc
– Selected pages from selected locations
• Web Server Performance Analyses
– Focused on one dynamil request response time
– http://code.google.com/p/instrumentation-for-php/
– Mk-query-digest; tcprstat
Goal Driven Performance Optimization
14. Summary of the Goal
• Define 95%, 99% etc response time
• For each User Interaction/Class, each application
instance/user
• Measured/Monitored each 5 minutes
• From Front End and Backend observation
• Avoiding Performance Holes
– Some actions or users which are rare but often slow
Goal Driven Performance Optimization
15. Performance Black Swans
• Queries can be intrinsically slow or caused to be
slow by side load (queueing)
• You can ignore outliers only if their impact to system
performance is limited.
• Discover Such Queries
– Mk-query-digest will report outliers by default
– Check SHOW PROCESSLIST for never completing
queries
– Optimize; Build protection to kill overly slow queries.
Goal Driven Performance Optimization
16. Production Instrumentation
• Many People Instrument Test System
– Option to print out Queries/Web Service Requests
– Great for Debugging/Testing
– Will not show a lot of performance problems
• Cold vs hot requests
• Contention happening in production
• Special User Cases
• Run Instrumented App in Production and Store Data
– Can instrument only one of Web servers if overhead is
large.
– Can log only 1% of user sessions if can't handle all data
Goal Driven Performance Optimization
17. What to Instrument
• Total Response Time
• CPU Time
• “Wait Time”
– Connections/Database Queries
– MemCache
– Web Services Request
– Other Network Requests
• Additional Information
– Number and Nature of different queries
– Hits/Misses for Queries
– Options which can affect performance
Goal Driven Performance Optimization
18. Where to Store
• Plain old log files
– Or directly to the database for smaller systems
• Load them to the database
• Or Hadoop on the larger scale
• Generate standard reports
• Provide Ad-Hoc way to do deep data analyses
Goal Driven Performance Optimization
19. Start from what is most important
• Optimize Most important User Interactions first
• Pick What case to focus in
– Queries which do not meet response time
– But not Worse Case Scenario
• Unless outliers kill your system
• There are always going to be outliers
• Do not analyze just queries above response time
threshold
– It is much easier to reach 95% of 1 second if 50% of the
queries are below 500ms.
Goal Driven Performance Optimization
20. Benefits of Such Approach
• Direct connection to the business goals
• High Priority problems targeted first
• Focus on real stuff
– No guess work like “is my buffer pool hit ratio bad?” or “am
I doing too much full table scans ?”
– If these there the issues you will find and fix them anyway.
• Understandable and predictable result
– If MySQL contributes 15% to the response time I can't
possibly double performance focusing on MySQL
optimization.
Goal Driven Performance Optimization
21. Final Notes
• Spikes; Special Cases should not be discarded
– They are the most interesting/challenging are
• Understand what you're trying to achieve
– The method is best for optimization of current scale for
system already in production.
• Check out goal driven performance optimization
whitepaper
– http://www.percona.com/files/white-papers/goal-driven-
performance-optimization.pdf
Goal Driven Performance Optimization
22. -22-
Thanks for Coming
• Questions ? Followup ?
– pz@percona.com
• Yes, we do MySQL and Web Scaling Consulting
– http://www.percona.com
• Check out our book
– Complete rewrite of 1st edition
– Available in Russian Too
• And Yes we're hiring
– http://www.percona.com/contact/careers/
Goal Driven Performance Optimization