SlideShare a Scribd company logo
1 of 17
PERFORMANCE ANALYSIS OF HIGH
    PERFORMANCE COMPUTING APPLICATIONS ON
    THE AMAZON WEB SERVICES CLOUD
    Keith R. Jackson, Lavanya Ramakrishnan, Krishna Muriki, Shane
    Canon, Shreyas Cholia, Harvey J. Wasserman, Nicholas J. Wright
    Lawrence Berkeley National Lab




      Presentation by Abhishek Gupta,
1     CS 598 Cloud Computing
GOALS
 Examine the performance of existing cloud computing
  infrastructures and create a mechanism for their
  quantitative evaluation
 Build upon previous studies by using the NERSC
  benchmarking framework to evaluate the performance
  of real scientific workloads on EC2
 Under DOE Magellan project - evaluate the ability of
  cloud computing to meet DOE’s computing needs



                                                         2
CONTRIBUTIONS


 Broadest evaluation to date of application performance on
  virtualized cloud computing platforms
 Experiences with running on Amazon EC2 and the
  encountered performance and availability variations.
 Analysis of the impact of virtualization based on the
  communication characteristics of the application
 Impact of virtualization through a simple, well-
  documented aggregate measure that expresses the
  useful potential of the systems considered


                                                          3
METHODS - MACHINES

   Carver:
     Quad-core, dual-socket Linux / Nehalem / QDR IB
      cluster
     Medium-sized cluster for jobs scaling to hundreds of
      processors; 3,200 total cores
   Franklin:
     Cray XT4
     Linux environment / Quad-core, AMD Opteron / Seastar
      interconnect, Lustre parallel filesystem
     Integrated HPC system for jobs scaling to tens of
      thousands of processors; 38,640 total cores            4
METHODS - MACHINES

   Lawrencium
     Quad-core, dual-socket Linux / Harpertown / DDR IB
      cluster
     Designed for jobs scaling to tens-hundreds of
      processors; 1,584 total cores
   Amazon EC2
     m1.large instance type: four EC2 Compute Units, two
      virtual cores with two EC2 Compute Units each, and 7.5
      GB of memory
     Heterogeneous processor types
                                                               5
METHODS – VIRTUAL CLUSTER
ARCHITECTURE ON EC2




                            6
METHODS – APPLICATIONS AND BENCHMARKS
USED

   High Performance Computing Challenge (HPCC)
    benchmark suite
     Consists of seven synthetic benchmarks
     Targeted synthetics : DGEMM, STREAM, and two measures
      of network latency and bandwidth.
     Complex synthetics :HPL, FFTE, PTRANS, and
      RandomAccess.
   NERSC 6 Benchmarks
     Set of applications representative of the NERSC workload
     Covers the science domains, parallelization schemes, and
      concurrencies, as well as machine-based characteristics that
      influence performance such as message size, memory           7
      access pattern, and working set sizes
METHODS – NERSC APPLICATIONS
   CAM: The Community Atmospheric Model
     Lower computational intensity
     Large point-to-point & collective MPI messages

   GAMESS: General Atomic and Molecular
    Electronic Structure System
     Memory access
     No collectives, very little communication

   GTC: GyrokineticTurbulence Code
     High computational intensity
     Bandwidth-bound nearest-neighbor communication plus
      collectives with small data payload
                                                            8
METHODS – NERSC APPLICATIONS
   IMPACT-T: Integrated Map and Particle Accelerator Tracking
    Time
     Memory bandwidth & moderate computational intensity
     Collective performance with small to moderate message sizes

   MAESTRO: A Low Mach Number Stellar Hydrodynamics
    Code
     Low computational intensity
     Irregular communication patterns

   MILC: QCD
     High computation intensity
     Global communication with small messages
                                                                    9
   PARATEC: PARAllel Total Energy Code
       Global communication with small messages
RESULTS: HPCC PERFORMANCE




  64 cores
  Poor network performance on EC2




                                     10
RESULTS: APPLICATION PERFORMANCE




 Franklin and Lawrencium 1.4 to 2.6 slower than
  Carver.
 EC2
     •   Best case, GAMESS, EC2 is only 2.7 slower than Carver.
     •   Worst case, PARATEC, EC2 is more than 50 slower than Carver.
     •   Large performance spread caused by different demands of        11
         application on the network.
o   More detailed analysis required
RESULTS: PERFORMANCE ANALYSIS USING IPM




    Integrated Performance Monitoring (IPM) framework
      • Uses the MPI profiling interface
      • Examine the relative amounts of time taken by an application
        for computing and communicating, types of MPI calls made

                                                                 12
RESULTS: SUSTAINED SYSTEM PERFORMANCE




 SSP: aggregate measure of the workload-specific,
  delivered performance of a computing system
 For each code measure
     • FLOP counts on a reference system
     • Wall clock run time on various systems
                                                     13
     •   N chosen to be 3,200
   Problem sets drastically reduced
RESULTS: VARIABILITY




   Performance Variability across runs
     •   Non-homogeneous nature of the systems allocated
     •   Network sharing and contention
     •   Sharing the un-virtualized hardware

                                                           14
RESULTS: SCALING




                   15
CONCLUSIONS
 EC2 performance degrades significantly as
  applications spend more time communicating
 Applications with global, all-to-all. communication
  perform worse then those that mostly use point-to-
  point communication.
 Amount of variability in EC2 performance can be
  significant.




                                                        16
DISCUSSION QUESTIONS

 This paper focused on performance alone. What
  are the performance cost tradeoffs for different
  platforms?
 How does the above tradeoff differ with application
  characteristics such as granularity, communication
  sensitivity etc.?
 What is the primary source of performance
  variability on Amazon EC2?


                                                        17

More Related Content

What's hot

Cloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and ControlsCloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and Controls
lylcheng88
 
How cloud computing work
How cloud computing workHow cloud computing work
How cloud computing work
icloud9
 

What's hot (19)

Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
Cloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and ControlsCloud Computing - Availability Issues and Controls
Cloud Computing - Availability Issues and Controls
 
Cloud computing architectures
Cloud computing architecturesCloud computing architectures
Cloud computing architectures
 
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
Load Balancing in Cloud Computing Environment: A Comparative Study of Service...
 
The Cloud Cube
The Cloud CubeThe Cloud Cube
The Cloud Cube
 
Networking in cloud computing
Networking in cloud computingNetworking in cloud computing
Networking in cloud computing
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
The Economic Benefits of Cloud Computing
The Economic Benefits of Cloud ComputingThe Economic Benefits of Cloud Computing
The Economic Benefits of Cloud Computing
 
Scalability and fault tolerance
Scalability and fault toleranceScalability and fault tolerance
Scalability and fault tolerance
 
How Cloud Computing Works?
How Cloud Computing Works?How Cloud Computing Works?
How Cloud Computing Works?
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computing
 
Network virtualization seminar report
Network virtualization seminar reportNetwork virtualization seminar report
Network virtualization seminar report
 
All about Clod computing
All about Clod computingAll about Clod computing
All about Clod computing
 
Scheduling in CCE
Scheduling in CCEScheduling in CCE
Scheduling in CCE
 
How cloud computing work
How cloud computing workHow cloud computing work
How cloud computing work
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud computing presentation(ppt)
Cloud  computing presentation(ppt)Cloud  computing presentation(ppt)
Cloud computing presentation(ppt)
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
Cloud Computing & Cloud Architecture
Cloud Computing & Cloud ArchitectureCloud Computing & Cloud Architecture
Cloud Computing & Cloud Architecture
 

Similar to Paper

HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
Inderjeet Singh
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
jsvetter
 

Similar to Paper (20)

Gupta_Keynote_VTDC-3
Gupta_Keynote_VTDC-3Gupta_Keynote_VTDC-3
Gupta_Keynote_VTDC-3
 
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
[EWiLi2016] Towards a performance-aware power capping orchestrator for the Xe...
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluation
 
OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021OpenACC Monthly Highlights: September 2021
OpenACC Monthly Highlights: September 2021
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
AWS re:Invent 2016: High Performance Computing on AWS (CMP207)
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
Co-Design Architecture for Exascale
Co-Design Architecture for ExascaleCo-Design Architecture for Exascale
Co-Design Architecture for Exascale
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Application Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance CenterApplication Profiling at the HPCAC High Performance Center
Application Profiling at the HPCAC High Performance Center
 
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
 
Scalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availabilityScalable analytics for iaas cloud availability
Scalable analytics for iaas cloud availability
 
Super Computer
Super ComputerSuper Computer
Super Computer
 
Interconnect your future
Interconnect your futureInterconnect your future
Interconnect your future
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...Qos aware data replication for data-intensive applications in cloud computing...
Qos aware data replication for data-intensive applications in cloud computing...
 
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road AheadAmazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Recently uploaded (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 

Paper

  • 1. PERFORMANCE ANALYSIS OF HIGH PERFORMANCE COMPUTING APPLICATIONS ON THE AMAZON WEB SERVICES CLOUD Keith R. Jackson, Lavanya Ramakrishnan, Krishna Muriki, Shane Canon, Shreyas Cholia, Harvey J. Wasserman, Nicholas J. Wright Lawrence Berkeley National Lab Presentation by Abhishek Gupta, 1 CS 598 Cloud Computing
  • 2. GOALS  Examine the performance of existing cloud computing infrastructures and create a mechanism for their quantitative evaluation  Build upon previous studies by using the NERSC benchmarking framework to evaluate the performance of real scientific workloads on EC2  Under DOE Magellan project - evaluate the ability of cloud computing to meet DOE’s computing needs 2
  • 3. CONTRIBUTIONS  Broadest evaluation to date of application performance on virtualized cloud computing platforms  Experiences with running on Amazon EC2 and the encountered performance and availability variations.  Analysis of the impact of virtualization based on the communication characteristics of the application  Impact of virtualization through a simple, well- documented aggregate measure that expresses the useful potential of the systems considered 3
  • 4. METHODS - MACHINES  Carver:  Quad-core, dual-socket Linux / Nehalem / QDR IB cluster  Medium-sized cluster for jobs scaling to hundreds of processors; 3,200 total cores  Franklin:  Cray XT4  Linux environment / Quad-core, AMD Opteron / Seastar interconnect, Lustre parallel filesystem  Integrated HPC system for jobs scaling to tens of thousands of processors; 38,640 total cores 4
  • 5. METHODS - MACHINES  Lawrencium  Quad-core, dual-socket Linux / Harpertown / DDR IB cluster  Designed for jobs scaling to tens-hundreds of processors; 1,584 total cores  Amazon EC2  m1.large instance type: four EC2 Compute Units, two virtual cores with two EC2 Compute Units each, and 7.5 GB of memory  Heterogeneous processor types 5
  • 6. METHODS – VIRTUAL CLUSTER ARCHITECTURE ON EC2 6
  • 7. METHODS – APPLICATIONS AND BENCHMARKS USED  High Performance Computing Challenge (HPCC) benchmark suite  Consists of seven synthetic benchmarks  Targeted synthetics : DGEMM, STREAM, and two measures of network latency and bandwidth.  Complex synthetics :HPL, FFTE, PTRANS, and RandomAccess.  NERSC 6 Benchmarks  Set of applications representative of the NERSC workload  Covers the science domains, parallelization schemes, and concurrencies, as well as machine-based characteristics that influence performance such as message size, memory 7 access pattern, and working set sizes
  • 8. METHODS – NERSC APPLICATIONS  CAM: The Community Atmospheric Model  Lower computational intensity  Large point-to-point & collective MPI messages  GAMESS: General Atomic and Molecular Electronic Structure System  Memory access  No collectives, very little communication  GTC: GyrokineticTurbulence Code  High computational intensity  Bandwidth-bound nearest-neighbor communication plus collectives with small data payload 8
  • 9. METHODS – NERSC APPLICATIONS  IMPACT-T: Integrated Map and Particle Accelerator Tracking Time  Memory bandwidth & moderate computational intensity  Collective performance with small to moderate message sizes  MAESTRO: A Low Mach Number Stellar Hydrodynamics Code  Low computational intensity  Irregular communication patterns  MILC: QCD  High computation intensity  Global communication with small messages 9  PARATEC: PARAllel Total Energy Code  Global communication with small messages
  • 10. RESULTS: HPCC PERFORMANCE  64 cores  Poor network performance on EC2 10
  • 11. RESULTS: APPLICATION PERFORMANCE  Franklin and Lawrencium 1.4 to 2.6 slower than Carver.  EC2 • Best case, GAMESS, EC2 is only 2.7 slower than Carver. • Worst case, PARATEC, EC2 is more than 50 slower than Carver. • Large performance spread caused by different demands of 11 application on the network. o More detailed analysis required
  • 12. RESULTS: PERFORMANCE ANALYSIS USING IPM  Integrated Performance Monitoring (IPM) framework • Uses the MPI profiling interface • Examine the relative amounts of time taken by an application for computing and communicating, types of MPI calls made 12
  • 13. RESULTS: SUSTAINED SYSTEM PERFORMANCE  SSP: aggregate measure of the workload-specific, delivered performance of a computing system  For each code measure • FLOP counts on a reference system • Wall clock run time on various systems 13 • N chosen to be 3,200  Problem sets drastically reduced
  • 14. RESULTS: VARIABILITY  Performance Variability across runs • Non-homogeneous nature of the systems allocated • Network sharing and contention • Sharing the un-virtualized hardware 14
  • 16. CONCLUSIONS  EC2 performance degrades significantly as applications spend more time communicating  Applications with global, all-to-all. communication perform worse then those that mostly use point-to- point communication.  Amount of variability in EC2 performance can be significant. 16
  • 17. DISCUSSION QUESTIONS  This paper focused on performance alone. What are the performance cost tradeoffs for different platforms?  How does the above tradeoff differ with application characteristics such as granularity, communication sensitivity etc.?  What is the primary source of performance variability on Amazon EC2? 17

Editor's Notes

  1. It has quad-core Intel Nehalem processorsrunning at 2.67 GHz, with dual socket nodes and a singleQuad Data Rate (QDR) IB link per node to a network that islocally a fat-tree with a global 2D-mesh.Each XT4 compute node containsa single quad-core 2.3 GHz AMD Opteron ”Budapest” processor,which is tightly integrated to the XT4 interconnectvia a Cray SeaStar-2 ASIC through a 6.4 GB/s bidirectionalHyperTransport interface.
  2. Each compute node is a Dell Poweredge 1950 server equippedwith two Intel Xeon quad-core 64 bit, 2.66GHz Harpertownprocessors, connected to a Dual Data Rate (DDR) Infinibandnetwork configured as a fat treeAmazon EC2: is a virtual computing environment thatprovides a web services API for launching and managingvirtual machine instances. Amazon provides a number of differentinstance types that have varying performance characteristics.CPU capacity is defined in terms of an abstract AmazonEC2 Compute Unit. One EC2 Compute Unit is approximatelyequivalent to a 1.0-1.2 GHz 2007 Opteron or 2007 Xeonprocessor. For our tests we used the m1.large instances type.The m1.large instance type has four EC2 Compute Units, twovirtual cores with two EC2 Compute Units each, and 7.5 GBof memory. The nodes are connected with gigabit ethernet.
  3. major differences between the Amazon Web Services environmentand that at a typical supercomputing center. For example,almost all HPC applications assume the presence of a sharedparallel filesystem between compute nodes, and a head nodethat can submit MPI jobs to all of the worker nodesThe head node couldsubmit MPI jobs to all of the worker nodes, and the file serverprovided a shared filesystem between the nodes
  4. Targeted: These are microkernelswhich quantify basic system parameters that separatelycharacterize computation and communication performance.Proxy apps
  5. P2p vs all-to-allCommvscompuvs memorySmallmsgvs large msg
  6. The DGEMMresults are as one would expect based on the properties of theCPUs. The STREAM results show that EC2 is significantlyfaster for this benchmark than Lawrencium. We believe this isbecause of the particular processor distribution we received forour EC2 nodes for this testThe network latency and bandwidth results clearly show thedifference between the interconnects on the tested systemsThe ping-pong results show the latency andthe bandwidth with no self-induced contention, while therandomly ordered ring tests show the performance degradationwith self-contention. The uncontended latency and bandwidthmeasurements of the EC2 gigabit ethernet interconnect aremore than 20 times worse than the slowest other machine.However,for EC2 the less capable network clearly inhibits overall HPLperformance, by a factor of six or more. The FFTE benchmarkmeasures the floating point rate of execution of a doubleprecision complex one-dimensional discrete Fourier transform,and the PTRANS benchmark measures the time to transpose alarge matrix. Both of these benchmarks performance dependsupon the memory and network bandwidth and therefore showsimilar trends. EC2 is approximately 20 times slower thanCarver and four times slower than Lawrencium in both cases.The RandomAccess benchmark measures the rate of randomupdates of memory and its performance depends on memoryand network latency. In this case EC2 is approximately 10times slower than Carver and three times slower than Lawrencium.
  7. GAMESS (2.7), for this benchmark problem,places relatively little demand upon the network, and thereforeis hardly slowed down at all on EC2.PARATECshows the worst performance on EC2, 52 slower thanCarver. It performs 3-DFFT’s, and the global (i.e., all-toall)data transposes within these FFT operations can incur alarge communications overheadQualitatively, it seems that those applications that performthe most collective communication with the most messages arethose that perform the worst on EC2.
  8. relative runtime on EC2 compared to Lawrencium plottedagainst the percentage communication for each applicationas measured on Lawrencium. The overall trend is clear:the greater the fraction of its runtime an application spendscommunicating, the worse the performance is on EC2To determine these characteristics we classifiedthe MPI calls of the applications into 4 categories: smalland large messages (latency vs bandwidth limited) and pointto-point vs collective. (Note for the purposes of this work weclassified all messages < 4KB to be latency bound. The overallconclusions shown here contain no significant dependenceon this choice.) From this analysis it is clear why fvCAMbehaves anomalously; it is the only one of the applications thatperforms most of its communication via large messages, bothpoint-to-point and collectives