2. Agenda
• Performance & Design
• Why should I care ?
• What should I measure ?
• References
3. What is performance ?
the capabilities of a machine or
product, esp. when observed
under particular conditions : the
hardware is put through tests
which assess the performance of
the processor.
4. What is design ?
his design of reaching the top:
intention, aim, purpose, plan,
intent, objective, object, goal,
end, target; hope, desire, wish,
dream, aspiration, ambition.
9. Underlying Operating
Systems
Read IOPS
IO Wait Page Faults
Run Queue
Disk Usage
# users USER CPU
SYSTEM CPU Resident Size Write IOPS
Network Traffic Memory Usage
Page out
Interrupts
Page in
Packet Loss Network Collision
# processes
Buffers Kernel Tables
11. Apr 25, 2011 5:44:02 PM org.apache.fop.fo.FOTreeBuilder fatalError
SEVERE: javax.xml.transform.TransformerException:
java.lang.NullPointerException: Parameter alpha must not be null
Apr 25, 2011 5:44:02 PM org.apache.fop.cli.Main startFOP
SEVERE: Exception
javax.xml.transform.TransformerException:
java.lang.NullPointerException: Parameter alpha must not be null
at
org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:217)
at
org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:125)
at org.apache.fop.cli.Main.startFOP(Main.java:166)
at org.apache.fop.cli.Main.main(Main.java:197)
Caused by: javax.xml.transform.TransformerException:
java.lang.NullPointerException: Parameter alpha must not be null
at
org.apache.xalan.transformer.TransformerImpl.executeChildTemplates(Tr
ansformerImpl.java:2416)
at
org.apache.xalan.templates.ElemLiteralResult.execute(ElemLiteralResul
t.java:1374)
at
org.apache.xalan.templates.ElemApplyTemplates.transformSelectedNodes(
ElemApplyTemplates.java:393)
at
org.apache.xalan.templates.ElemApplyTemplates.execute(ElemApplyTempla
tes.java:176)
14. Why should I care ?
Capacity planning is not just about the future
anymore.
Today, there is a serious need to squeeze more
out of your current capital equipment.
The Guerrilla Manual Online
http://www.perfdynamics.com/Manifesto/gcaprules.html
15. Why should I care ?
“Our systems are very simple,
there’s no need for such
performance metrics”
16. It goes like this...
The Internet
Web Server
Application Server
Database
17. It goes like this...
The Internet
Web Server
Application Server
Database
18. It goes like this...
The Internet
Web Server
Application Server
Database
19. It goes like this...
The Internet
Web Server
Application Server
Slaves RO
Master RW
20. It goes like this...
The Internet
Web Server
Application Server
Master RW
Slaves RO
21. It goes like this...
The Internet
Web Server
Application Server
Caches
Master RW Slaves RO
Evil Machines Corporation
22. It goes like this...
The Internet
CPUs will be idle
Disks will be sleeping
Network will be
Web Server
Application Server
underused
... and your users will be
Caches
complaining...
Master RW Slaves RO
Evil Machines Corporation
23.
24. Why should I care ?
“But we are using the
Cloud !”
25. Why should I care ?
• So now you’re in an
utility computing model
• You’re charged per usage
26. Why should I care ?
“Updating performance
counters will make my
code run slower”
27. Why should I care ?
• Datacenter Average CPU
utilization is around 15%
• If updating performance
counters is a problem then you
really need them
• Those microseconds will save
you hours of troubleshooting !
28. Why should I care ?
“These are non-functional
requirements”
29. Why should I care ?
Distinct Query Revenue/ Any Clicks Satisfaction Time to Click
Queries/User Refinement User (increase in
ms)
50ms 0 0 0 0 0 0
200ms 0 0 0 -0,30% -0,40% 500
500ms 0 -0,60% -1,20% -1,00% -0,90% 1200
1000ms -0,70% -0,90% -2,80% -1,90% -1,60% 1900
2000ms -1,80% -2,10% -4,30% -4,40% -3,80% 3100
The User and Business Impact of Server Delays, Additional Bytes, and HTTP
Chunking in Web Search - Eric Schurman (Amazon), Jake Brutlag (Google)
http://velocityconf.com/velocity2009/public/schedule/detail/8523
30. Why should I care ?
“Fast isn’t a feature, fast is
a Requirement”
Jesse Robins - OPSCode
32. Queues
The not so typical performance metrics
• Invented the fields of traffic
engineering and queuing theory
• 1909 - Published “The theory of
Probabilities and Telephone
Conversations” Agner Krarup Erlang
• 1917 - Published “Solution of some
Problems in the Theory of Probabilities
of Significance in Automatic Telephone
Exchanges"
33. Queues
The not so typical performance metrics
• 1961 - CTSS was first
demonstrated at MIT
• 1965 - Allan Scherr used
machine repairman
problem to model a
time-shared system as
part of Project MAC
• Another offspring of
Project MAC is Multics
34. Queues
The not so typical performance metrics
• IBM System/370 model 158-3 - 1.0 MIPS @
1.0 MHz -1972
• Average purchase price: $ 771,000*
• No disks or peripherals included
• $ 4,082,039 by 2011
• Intel Core i7 Extreme Edition 990x
released in 2011 peaks 159,000 MIPS @
3,46GHz
* Source: http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP3135.html
35. Queues
The not so typical performance metrics
Computer System
Disks
CPU
36. Queues
The not so typical performance metrics
(A) λ X (C)
S
Open/Closed W
Network R
A Arrival Count
λ Arrival Rate (A/T)
W Time spent in Queue
R Residence Time (W+S)
S Service Time
X System Throughput (C/T)
C Completed tasks count
37. Arrival Rate (λ)
• Pretty straightforward
• Requests per second/hour/day
• Not the same as throughput (X)
• Although in a steady state:
• A = C as T →∞
• λ=X
38. Service Time (S)
• Time spent in processing
• Web server response time
• Total query time
• IO operation time length
39. What to look for ?
• Stretch factor
• Method Count
• Method Service Time
• Geolocation
• Inbound & Outbound Traffic
• Round Trip Delays
40. What should I
measure?
Average Hits/s = 65.142
Average Svc time = 0.0159
41. What should I
measure ?
• A simple tag collection data store
• For each data operation:
• A 64 bit counter for the number of calls
• An average counter for the service time
42. What should I
measure ?
Method Call Count Service Time (ms)
dbConnect 1.876 11,2
fetchDatum 19.987.182 12,4
postDatum 1.285.765 98,4
deleteDatum 312.873 31,1
fetchKeys 27.334.983 278,3
fetchCollection 34.873.194 211,9
createCollection 118.853 219,4
43. What should I
measure ?
Call Count x Service Time
fetchKeys
createCollection
Service Time (ms)
fetchCollection
deleteDatum
postDatum
dbConnect fetchDatum
Call Count