In this session, we explain how to measure the key performance-impacting metrics in a cloud-based application. With specific examples of good and bad tests, we make it clear how to get reliable measurements of CPU, memory, disk, and how to map benchmark results to your application. We also cover the importance of selecting tests wisely, repeating tests, and measuring variability.
3. Cloud Benchmarks: Prequel
•
•
•
•
•
The best benchmark
Absolute vs. relative measures
Fixed time or fixed work
What’s different?
Use a good AMI
Average CPU result
CentOS 5.4 ami-…
CentOS 5.4 ami-…
CentOS 5.4 ami-…
AWS CentOS 5.4 ami-…
Ubuntu 12.4 ami-…
0.00 5.00 10.0015.0020.0025.0030.00
Coefficient of Variance
60%
50%
40%
30%
20%
10%
0%
4. Scenario: CPU-based Instance Selection
•
•
•
•
1.
2.
3.
4.
Application runs on premises
Primary requirement is integer CPU performance
Application is complex to set up, no benchmark tests exist, limited time
What instance would work best?
Choose a synthetic benchmark
Baseline: Build, configure, tune, and run it on premises
Run the same test (or tests) on a set of instance types
Use results from the instance tests to choose the best match
5. Testing CPU
• Choose a benchmark
– geekbench, UnixBench, sysbench(cpu), and SPEC CPU2006
Integer
• How do you know when you have a good result?
• Tests run on 9 instance types
– 10 instances of each of the 9 types launched
– Tests run a minimum of 4 times on each instance
– Ubuntu 13.04 base AMI
6. geekbench Overview
• Workloads in 3 categories
– 13 Integer tests
– 10 Floating Point tests
– 4 Memory tests
•
•
•
•
Integer
AES
Twofish
SHA1
SHA2
BZip2 compress
BZip2 decompress
JPEG compress
JPEG decompress
PNG compress
Commercial product (64bit)
No source code
Runs single and multi-cpu
Fast setup, fast runtime
PNG decompress
Sobel
LUA
Dijkstra
Memory
STREAM copy
STREAM scale
STREAM add
STREAM triad
Floating Point
Black-Scholes
Mandelbrot
Sharpen image
Blur image
SGEMM
DGEMM
SFFT
DFFT
N-Body
Ray trace
11. UnixBench Overview
• Default: the BYTE Index
– 12 workloads, run 2 times (roughly 29 minutes each time)
•
•
•
•
Integer computation
Floating point computation
System calls
File system calls
– Geomean Of results to a baseline produces a system benchmarks
index score
• Open source – must be built
– Must be patched for > 16 CPUs
11
15. SPEC CPU2006 Overview
•
•
•
•
•
•
Competitive (reviewed)
Commercial (site) license required
Source code provided, must be built
Highly customizable
Full “reportable” run 5+ hours
Published results on www.spec.org
19. Sysbench Overview
• Designed as quick system test of MySQL servers
• Test categories
–
–
–
–
–
–
Fileio
Cpu
Memory
Threads
Mutex
oltp
• Source code provided, must be built
• Very simplistic defaults – tuning recommended
23. Scenario: Memory Instance Selection
• Application runs on premises
• Primary requirement: memory throughput of 20K MB/sec
• What instance would work best?
1.
2.
3.
4.
Choose a synthetic benchmark
Baseline: Build, configure, tune, and run it on premises
Run the same test (or tests) on a set of instance types
Use results from the instance tests to choose the best match
24. Testing Memory
• Choose a benchmark:
– stream, geekbench, sysbench(memory)
• How do you know when you have a good result?
• Tests run on 9 instance types
– Minimum of 10 instances launched
– Tests run a minimum of 3 times on each instance
– Ubuntu 13.04 base AMI
25. Stream* Overview
• Synthetic measure sustainable memory bandwidth
–
–
–
–
Published results at www.cs.virginia.edu/stream/top20/Bandwidth.html
Must be built
By default, runs 1 thread per cpu
Use stream-scaling to automate array size and thread scaling
• https://github.com/gregs1104/stream-scaling
name kernel
COPY: a(i) = b(i)
SCALE: a(i) = q*b(i)
SUM: a(i) = b(i) + c(i)
TRIAD: a(i) = b(i) + q*c(i)
bytes FLOPS
iter
iter
16
0
16
1
24
1
24
2
* McCalpin, John D.: "STREAM: Sustainable Memory Bandwidth in High Performance Computers",
31. Summary Disk I/O
Seconds MB/sec
cp f1 f2
17.248
59.37
rm –rf f2; cp f1 f2
.853 1200.47
cp f1 f3
.880 1164.96
dd if=/dev/zero bs=1048 count=1024000 of=d1
.722 1419.01
dd if=/dev/urandom bs=1048 count=1024000 of=d2
fio simple.cfg
79.710
12.84
NA
61.55
32. Beyond Simple Disk I/O
Random PIOPs 16disk
1M I/O MBps
read
1006.73
write
904.03
r70w30
1005.91
33. Summary
If benchmarking your application is not practical, synthetic
benchmarks can be used if you are careful.
•
•
•
•
•
Choose the best benchmark that represents your application
Analysis – what does “best” mean?
Run enough tests to quantify variability
Baseline – what is a “good result” ?
Samples – keep all of your results – more is better!
34. Please give us your feedback on this
presentation
ENT305
As a thank you, we will select prize
winners daily for completed surveys!