Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Best Practices for Benchmarking and
Performance Analysis in the Cloud
Robert Barnes, Amazon Web Services
November 15, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Benchmarks: Measurement Demo
4
How many
ways to
measure?

6
3

At least 20…
4
3

Cloud Benchmarks: Prequel
•
•
•
•
•

The best benchmark
Absolute vs. relative measures
Fixed time or fixed work
What’s different?
Use a good AMI
Average CPU result
CentOS 5.4 ami-…
CentOS 5.4 ami-…
CentOS 5.4 ami-…
AWS CentOS 5.4 ami-…
Ubuntu 12.4 ami-…
0.00 5.00 10.0015.0020.0025.0030.00

Coefficient of Variance
60%
50%
40%
30%
20%

10%
0%

Scenario: CPU-based Instance Selection
•
•
•
•
1.
2.
3.
4.

Application runs on premises
Primary requirement is integer CPU performance
Application is complex to set up, no benchmark tests exist, limited time
What instance would work best?
Choose a synthetic benchmark
Baseline: Build, configure, tune, and run it on premises
Run the same test (or tests) on a set of instance types
Use results from the instance tests to choose the best match

Testing CPU
• Choose a benchmark
– geekbench, UnixBench, sysbench(cpu), and SPEC CPU2006
Integer

• How do you know when you have a good result?
• Tests run on 9 instance types
– 10 instances of each of the 9 types launched
– Tests run a minimum of 4 times on each instance
– Ubuntu 13.04 base AMI

geekbench Overview
• Workloads in 3 categories
– 13 Integer tests
– 10 Floating Point tests
– 4 Memory tests

•
•
•
•

Integer
AES
Twofish
SHA1

SHA2
BZip2 compress
BZip2 decompress
JPEG compress
JPEG decompress
PNG compress

Commercial product (64bit)
No source code
Runs single and multi-cpu
Fast setup, fast runtime

PNG decompress
Sobel
LUA
Dijkstra
Memory

STREAM copy
STREAM scale
STREAM add
STREAM triad

Floating Point
Black-Scholes
Mandelbrot
Sharpen image
Blur image
SGEMM
DGEMM
SFFT

DFFT
N-Body
Ray trace

geekbench Script
SEQNO=$1
GBTXT=gbtest.txt
DL=+
ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`"
TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`”
OUTID=$ID$DL$TYPE$DL
START=$(date +%s.%N)
./geekbench_x86_64 --no-upload >$GBTXT
END=$(date +%s.%N)
DIFF=$(echo "$END - $START" | bc)
OUTNAME=$OUTID$SEQNO$DL$DIFF$DL$GBTXT
mv $GBTXT $OUTNAME
…
grep “Geekbench Score” i-*$GBTXT >gbresults.txt
cat gbresults.txt | sed s/:// | awk ‘/i-/ {print $1”;”$4”;”$5}’>gbresults.csv

geekbench
Geekbench
1CPU ratio C.O.V.
m3.xlarge
m3.2xlarge
m2.xlarge
m2.2xlarge
m2.4xlarge
c3.large
c3.xlarge
c3.2xlarge
cc2.8xlarge

NCPU ratio C.O.V.

0.93
0.93
0.80
0.80
0.76
1.13
1.13
1.13
1.00

2.04
3.80
1.54
2.82
5.11
1.32
2.51
4.88
15.46

1.04%
1.40%
2.84%
1.34%
2.28%
0.93%
0.39%
0.19%
0.71%

2.31%
1.46%
4.06%
1.21%
1.71%
0.71%
1.81%
0.25%
1.93%

RT (min)
2.06
2.08
1.99
2.04
2.01
1.76
1.74
1.70
2.21

geekbench – Run Variance
geekbench 1CPU ratio
m3.xlarge
instance-1
instance-2
instance-3
instance-4
instance-5
instance-6
instance-7
instance-8
instance-9
instance-10

0.93
0.97
0.94
0.94
0.94
0.94
0.93
0.93
0.94
0.94

C.O.V.

0.31%
0.23%
0.17%
0.10%
0.32%
0.10%
0.25%
0.38%
0.11%
0.09%

geekbench – Integer Portion
gb-integer 1CPU ratio

C.O.V.

NCPU ratio

C.O.V.

RT (min)

c3.large
c3.xlarge
c3.2xlarge
cc2.8xlarge

1.12
1.13
1.12
1.00

0.50%
0.38%
0.38%
0.20%

1.37
2.72
5.35
17.88

0.43%
0.41%
0.51%
3.31%

NA
NA
NA
NA

geekbench
c3.large
c3.xlarge
c3.2xlarge
cc2.8xlarge

1.13
1.13
1.13
1.00

0.93%
0.39%
0.19%
0.71%

1.32
2.51
4.88
15.46

0.71%
1.81%
0.25%
1.93%

1.76
1.74
1.70
2.21

UnixBench Overview
• Default: the BYTE Index
– 12 workloads, run 2 times (roughly 29 minutes each time)
•
•
•
•

Integer computation
Floating point computation
System calls
File system calls

– Geomean Of results to a baseline produces a system benchmarks
index score

• Open source – must be built
– Must be patched for > 16 CPUs
11

UnixBench Script
SEQNO=$1
UBTXT=ubtest.txt
DL=+
ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`"
TYPE="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-type`"
FN=$ID$DL$TYPE$DL$SEQNO$DL$UBTXT
COPIES=`cat /proc/cpuinfo | grep processor | wc –l`
./Run –c 1 –c $COPIES >$FN
…
grep “System Benchmarks Index Score” i-*$UBTXT >ubresults.txt
cat ubresults.txt | sed s/”.txt:System Benchmarks Index Score”// |
awk ‘/i-/ {print $1”;”$2}’>ubresults.csv

UnixBench
UnixBench 1CPU ratio
m3.xlarge
m3.2xlarge
m2.xlarge
m2.2xlarge
m2.4xlarge
c3.large
c3.xlarge
c3.2xlarge
cc2.8xlarge

1.38
1.42
0.40
0.42
0.48
1.10
1.06
1.10
1.00

C.O.V.
1.90%
1.85%
5.82%
1.71%
3.31%
1.33%
1.48%
0.54%
2.97%

NCPU ratio
2.49
4.21
0.76
1.23
2.02
1.91
2.85
4.50
6.44

C.O.V.
1.36%
1.99%
1.28%
1.75%
1.71%
1.54%
1.26%
1.02%
2.65%

RT (min)
28.25
28.29
28.30
28.32
28.34
28.17
28.21
28.96
30.20

UnixBench – Dhrystone 2
UB-Integer 1CPU ratio
c3.large
c3.xlarge
c3.2xlarge
cc2.8xlarg
e
UnixBench
c3.large
c3.xlarge
c3.2xlarge
cc2.8xlarg
e

C.O.V.

NCPU ratio

C.O.V.

RT (min)

1.05
1.05
1.05

0.24%
0.27%
0.07%

1.10
2.20
4.34

0.30%
0.28%
0.23%

0.17
0.17
0.17

1.00

0.10%

15.54

0.95%

0.17

1.10
1.06
1.10

1.33%
1.48%
0.54%

1.91
2.85
4.50

1.54%
1.26%
1.02%

28.17
28.21
28.96

1.00

2.97%

6.44

2.65%

30.20

SPEC CPU2006 Overview
•
•
•
•
•
•

Competitive (reviewed)
Commercial (site) license required
Source code provided, must be built
Highly customizable
Full “reportable” run 5+ hours
Published results on www.spec.org

SPEC CPU2006 Overview
Benchmark
400.perlbench
401.bzip2
403.gcc
429.mcf
445.gobmk
456.hmmer
458.sjeng
462.libquantum
464.h264ref
471.omnetpp
473.astar
483.xalancbmk

Category
C
Programming language
C
Compression
C
C compiler
C
Combinatorial optimization
C
Artificial intelligence
C
Search gene sequence
C
Artificial intelligence
C
Physics / quantum computing
C
Video compression
C++ Discrete event simulation
C++ Path-finding algorithms
C++ Xml processing

SPEC CPU2006 Integer Script
CPATH=“/cpu2006/result”
SITXT=estspecint.txt
DL=+
ID="`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`”
FN=$ID$DL$TYPE$DL$SEQNO$DL$SITXT
runspec –noreportable –tune=base –size=ref –rate=$COPIES –iterations=1 /
400 403 445 456 458 462 464 471 473 483
grep “_base” $CPATH/CINT*.ref.csv | cut -d, -f1-2 > $FN
grep “total seconds elapsed” $CPATH/CPU*.log | awk '/finished/ {print $9}’ >>$FN

Estimated SPEC CPU2006 Integer
Est.
SPECint
m3.xlarge
m3.2xlarge
m2.xlarge
m2.2xlarge
m2.4xlarge
c3.large
c3.xlarge
c3.2xlarge
cc2.8xlarge

1CPU ratio
1.01
1.01
0.76
0.79
0.78
1.11
1.10
1.08
1.00

C.O.V.
1.06%
1.67%
1.97%
0.94%
0.16%
1.95%
1.96%
0.87%
0.29%

RT (min)
54.39
54.49
70.83
68.85
68.73
50.00
50.29
50.87
54.92

NCPU
ratio
2.24
4.25
1.39
2.76
5.21
1.25
2.39
4.67
14.92

C.O.V.
1.15%
1.63%
2.45%
1.24%
1.26%
1.47%
1.28%
0.25%
0.52%

RT (min)
104.18
109.22
85.37
85.42
89.91
94.22
97.66
100.22
125.74

Sysbench Overview
• Designed as quick system test of MySQL servers
• Test categories
–
–
–
–
–
–

Fileio
Cpu
Memory
Threads
Mutex
oltp

• Source code provided, must be built
• Very simplistic defaults – tuning recommended

Sysbench Script
TDS=$(($COPIES * 2))
STXT=sysbenchcpu.txt
DL=+
FN=$ID$DL$TYPE$DL$TDS$DL$STXT
sysbench –num-threads=$TDS --max-requests=30000 --test=cpu /
--cpu-max-prime=100000 run > $FN
grep “total time:” i-*$STXT| cut -d, -f1-2 > $FN

Sysbench – CPU
sysbench

m3.xlarge
m3.2xlarge
m2.xlarge
m2.2xlarge
m2.4xlarge
c3.large
c3.xlarge
c3.2xlarge
cc2.8xlarge

Default

3.21
6.41
1.59
3.19
8.83
1.78
3.55
6.55
25.34

C.O.V.

1.44%
1.38%
0.75%
0.64%
0.62%
0.26%
0.53%
8.45%
2.30%

RT (min) tuned ratio

0.06
0.03
0.11
0.06
0.02
0.10
0.05
0.03
0.01

1.69
3.38
0.80
1.60
4.71
0.91
1.83
3.54
13.69

C.O.V.

1.29%
1.41%
0.23%
0.76%
0.20%
0.09%
0.02%
3.31%
1.10%

RT (min)

3.86
1.93
8.16
4.07
1.38
7.13
3.57
1.85
0.48

Summary: CPU Comparison
GB

GB
Int

UB

UB
Int

Est.
SPECInt

sysbench
default

sysbench
tuned

m3.xlarge

2.04

2.01

2.49

1.88

2.24

3.21

1.69

m3.2xlarge

3.80

3.96

4.21

3.77

4.25

6.41

3.38

m2.xlarge

1.54

1.52

0.76

1.59

1.38

1.59

0.80

m2.2xlarge

2.82

3.02

1.23

3.19

2.76

3.19

1.60

m2.4xlarge

5.11

5.54

2.02

6.48

5.21

8.83

4.71

c3.large

1.32

1.37

1.91

1.10

1.25

1.78

0.91

c3.xlarge

2.51

2.72

2.85

2.20

2.39

3.55

1.83

c3.2xlarge

4.88

5.35

4.50

4.34

4.67

6.55

3.54

15.46 17.88

6.44

15.5
4

14.92

25.34

13.69

cc2.8xlarge

Scenario: Memory Instance Selection
• Application runs on premises
• Primary requirement: memory throughput of 20K MB/sec
• What instance would work best?

1.
2.
3.
4.

Choose a synthetic benchmark
Baseline: Build, configure, tune, and run it on premises
Run the same test (or tests) on a set of instance types
Use results from the instance tests to choose the best match

Testing Memory
• Choose a benchmark:
– stream, geekbench, sysbench(memory)

• How do you know when you have a good result?
• Tests run on 9 instance types
– Minimum of 10 instances launched
– Tests run a minimum of 3 times on each instance
– Ubuntu 13.04 base AMI

Stream* Overview
• Synthetic measure sustainable memory bandwidth
–
–
–
–

Published results at www.cs.virginia.edu/stream/top20/Bandwidth.html
Must be built
By default, runs 1 thread per cpu
Use stream-scaling to automate array size and thread scaling
• https://github.com/gregs1104/stream-scaling

name kernel
COPY: a(i) = b(i)
SCALE: a(i) = q*b(i)
SUM: a(i) = b(i) + c(i)
TRIAD: a(i) = b(i) + q*c(i)

bytes FLOPS
iter
iter
16
0
16
1
24
1
24
2

* McCalpin, John D.: "STREAM: Sustainable Memory Bandwidth in High Performance Computers",

Memory Comparison
StreamTriad
m3.xlarge
m3.2xlarge
m2.xlarge
m2.2xlarge
m2.4xlarge
c3.large
c3.xlarge
c3.2xlarge
cc2.8xlarge

23640.56
26046.17
18766.58
22421.91
19634.50
11434.83
21141.30
30235.78
55200.86

Geekbench
sysbench
Memory-Triad (default)
15375.64
14999.27
17365.76
17600.00
14405.82
9967.96
13972.65
20657.49
37067.32

302.95
603.40
528.16
1019.08
1576.30
2116.84
2643.33
2944.91
1195.90

sysbench memory defaults
--memory-block-size [1K]
--memory-total-size [100G]
--memory-scope {global,local} [global]
--memory-hugetlb [off]
--memory-oper {read, write, none} [write]
--memory-access-mode {seq,rnd} [seq]

Testing Disk I/O
• Storage options:
–
–
–
–

Amazon EBS
Amazon EBS PIOPs
Ephemeral
hi1.4xlarge local storage

• Test parameters:
–
–
–
–
–

Read %
Write %
Sequential
Random
Queue depth

• Storage configuration
– Volume(s)
– RAID
– LVM

• I/O metrics
– IOPs
– Throughput
– Latency

Benchmarking PIOPs
1200

Launch an Amazon EBS-optimized
instance

•

Create provisioned IOPS volumes

•

Attach the volumes to Amazon
EBS-optimized instance

•

Pre-warm volumes

•

Tune queue depth and latency
against IOPs

PIOPs 2K Queue Depth

1D PIOPS 2K
QD2
2D PIOPS 2K
2D PIOPS 2K
QD2

1000

Latency (usec)

•

1D PIOPS 2K

800

600

400

200

0
Seq.
Read

Seq.
Write

Mixed
Seq
Read

Mixed
Seq
Write

Rand
Read

Rand
Write

Mixed
Rand
Read

Mixed
Rand
Write

Testing Disk I/O Examples
• disk copy
• cp file1 /disk1/file1
• dd
• dd if=/dev/zero of=/data1/testile1
bs=1048 count=1024000
• fio – flexible io tester
• fio simple.cfg

•
•
•
•
•
•
•

[global]
clocksource=cpu
randrepeat=0
ioengine=libaio
direct=1
group_reporting
size=1G

•
•
•
•
•
•
•
•

[xvdd-fill]
filename=/data1/testfile1
refill_buffers
scramble_buffers=1
iodepth=4
rw=write
bs=2m
stonewall

•
•
•
•
•
•
•
•
•
•

[xvdd-1disk-write-1k-1]
time_based
ioscheduler=deadline
iodepth=1
rate_iops=4080
ramp_time=10
filename=/data1/testfile1
runtime=30
bs=1k
rw=write

Summary Disk I/O
Seconds MB/sec

cp f1 f2

17.248

59.37

rm –rf f2; cp f1 f2

.853 1200.47

cp f1 f3

.880 1164.96

dd if=/dev/zero bs=1048 count=1024000 of=d1

.722 1419.01

dd if=/dev/urandom bs=1048 count=1024000 of=d2

fio simple.cfg

79.710

12.84

NA

61.55

Beyond Simple Disk I/O

Random PIOPs 16disk
1M I/O MBps
read

1006.73

write

904.03

r70w30

1005.91

Summary
If benchmarking your application is not practical, synthetic
benchmarks can be used if you are careful.

•
•
•
•
•

Choose the best benchmark that represents your application
Analysis – what does “best” mean?
Run enough tests to quantify variability
Baseline – what is a “good result” ?
Samples – keep all of your results – more is better!

Please give us your feedback on this
presentation

ENT305
As a thank you, we will select prize
winners daily for completed surveys!

Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013

Similar to Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013 (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

Best Practices for Benchmarking and Performance Analysis in the Cloud (ENT305) | AWS re:Invent 2013