Paper

PERFORMANCE ANALYSIS OF HIGH
PERFORMANCE COMPUTING APPLICATIONS ON
THE AMAZON WEB SERVICES CLOUD
Keith R. Jackson, Lavanya Ramakrishnan, Krishna Muriki, Shane
Canon, Shreyas Cholia, Harvey J. Wasserman, Nicholas J. Wright
Lawrence Berkeley National Lab

Presentation by Abhishek Gupta,
1 CS 598 Cloud Computing

GOALS
 Examine the performance of existing cloud computing
infrastructures and create a mechanism for their
quantitative evaluation
 Build upon previous studies by using the NERSC
benchmarking framework to evaluate the performance
of real scientific workloads on EC2
 Under DOE Magellan project - evaluate the ability of
cloud computing to meet DOE’s computing needs

2

CONTRIBUTIONS

 Broadest evaluation to date of application performance on
virtualized cloud computing platforms
 Experiences with running on Amazon EC2 and the
encountered performance and availability variations.
 Analysis of the impact of virtualization based on the
communication characteristics of the application
 Impact of virtualization through a simple, well-
documented aggregate measure that expresses the
useful potential of the systems considered

3

METHODS - MACHINES

 Carver:
 Quad-core, dual-socket Linux / Nehalem / QDR IB
cluster
 Medium-sized cluster for jobs scaling to hundreds of
processors; 3,200 total cores
 Franklin:
 Cray XT4
 Linux environment / Quad-core, AMD Opteron / Seastar
interconnect, Lustre parallel filesystem
 Integrated HPC system for jobs scaling to tens of
thousands of processors; 38,640 total cores 4

METHODS - MACHINES

 Lawrencium
 Quad-core, dual-socket Linux / Harpertown / DDR IB
cluster
 Designed for jobs scaling to tens-hundreds of
processors; 1,584 total cores
 Amazon EC2
 m1.large instance type: four EC2 Compute Units, two
virtual cores with two EC2 Compute Units each, and 7.5
GB of memory
 Heterogeneous processor types
5

METHODS – VIRTUAL CLUSTER
ARCHITECTURE ON EC2

6

METHODS – APPLICATIONS AND BENCHMARKS
USED

 High Performance Computing Challenge (HPCC)
benchmark suite
 Consists of seven synthetic benchmarks
 Targeted synthetics : DGEMM, STREAM, and two measures
of network latency and bandwidth.
 Complex synthetics :HPL, FFTE, PTRANS, and
RandomAccess.
 NERSC 6 Benchmarks
 Set of applications representative of the NERSC workload
 Covers the science domains, parallelization schemes, and
concurrencies, as well as machine-based characteristics that
influence performance such as message size, memory 7
access pattern, and working set sizes

METHODS – NERSC APPLICATIONS
 CAM: The Community Atmospheric Model
 Lower computational intensity
 Large point-to-point & collective MPI messages

 GAMESS: General Atomic and Molecular
Electronic Structure System
 Memory access
 No collectives, very little communication

 GTC: GyrokineticTurbulence Code
 High computational intensity
 Bandwidth-bound nearest-neighbor communication plus
collectives with small data payload
8

METHODS – NERSC APPLICATIONS
 IMPACT-T: Integrated Map and Particle Accelerator Tracking
Time
 Memory bandwidth & moderate computational intensity
 Collective performance with small to moderate message sizes

 MAESTRO: A Low Mach Number Stellar Hydrodynamics
Code
 Low computational intensity
 Irregular communication patterns

 MILC: QCD
 High computation intensity
 Global communication with small messages
9
 PARATEC: PARAllel Total Energy Code
 Global communication with small messages

RESULTS: HPCC PERFORMANCE

 64 cores
 Poor network performance on EC2

10

RESULTS: APPLICATION PERFORMANCE

 Franklin and Lawrencium 1.4 to 2.6 slower than
Carver.
 EC2
• Best case, GAMESS, EC2 is only 2.7 slower than Carver.
• Worst case, PARATEC, EC2 is more than 50 slower than Carver.
• Large performance spread caused by different demands of 11
application on the network.
o More detailed analysis required

RESULTS: PERFORMANCE ANALYSIS USING IPM

 Integrated Performance Monitoring (IPM) framework
• Uses the MPI profiling interface
• Examine the relative amounts of time taken by an application
for computing and communicating, types of MPI calls made

12

RESULTS: SUSTAINED SYSTEM PERFORMANCE

 SSP: aggregate measure of the workload-specific,
delivered performance of a computing system
 For each code measure
• FLOP counts on a reference system
• Wall clock run time on various systems
13
• N chosen to be 3,200
 Problem sets drastically reduced

RESULTS: VARIABILITY

 Performance Variability across runs
• Non-homogeneous nature of the systems allocated
• Network sharing and contention
• Sharing the un-virtualized hardware

14

RESULTS: SCALING

15

CONCLUSIONS
 EC2 performance degrades significantly as
applications spend more time communicating
 Applications with global, all-to-all. communication
perform worse then those that mostly use point-to-
point communication.
 Amount of variability in EC2 performance can be
significant.

16

DISCUSSION QUESTIONS

 This paper focused on performance alone. What
are the performance cost tradeoffs for different
platforms?
 How does the above tradeoff differ with application
characteristics such as granularity, communication
sensitivity etc.?
 What is the primary source of performance
variability on Amazon EC2?

17

Paper

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Paper

Similar to Paper (20)

Recently uploaded

Recently uploaded (20)

Paper

Editor's Notes