SlideShare una empresa de Scribd logo
1 de 63
Descargar para leer sin conexión
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Kevin Jorissen
Technical Business Development Manager, Research and Technical Computing
SRV216
HCP: Accelerate Innovation with Virtually
Unlimited Infrastructure Capacity
Don’t let your innovation get stuck in a queue!
HCP: Accelerate Innovation with Virtually Unlimited
Infrastructure Capacity
Kevin Jorissen
Technical Business Development Manager,
Research and Technical Computing
jorissen@amazon.com
SRV216
Agenda
Overview of AWS Infrastructure
HPC Solution Overview
HPC Use Cases in Design and Engineering
Customer Success Stories
Best Practices for Performance, Scalability, and Cost
Additional Resources
AWS Global Infrastructure
18 Regions – 55 Availability Zones
The AWS Cloud spans 55
Availability Zones within
18 geographic Regions
and 1 Local Region around
the world
Announced plans: 12
more Availability Zones
and four more Regions in
Bahrain, Hong Kong SAR,
Sweden, and a second
AWS AWS GovCloud (US)
# Indicates # of AZs in the Region
Over 100 Global CloudFront PoPs
AWS Global Infrastructure
Regions
Amazon Global
Network
• Redundant 100-GbE network
• Redundant private capacity
between all Regions except China
Agenda
Overview of AWS Infrastructure
HPC Solution Overview
HPC Use Cases in Design and Engineering
Customer Success Stories
Best Practices for Performance, Scalability, and Cost
Additional Resources
HPC on AWS: Fundamental Rethink of What is Possible
Capex Capacity Technology
Collaboration
From worrying about Focusing on
Science,
Engineering
and Innovation
to
AWS HPC Solution Components
Storage
Amazon EBS
Amazon EFS
Amazon S3
Networking
Enhanced networking
Placement
groups
Automation &
Orchestration
AWS Batch
CfnCluster
NICE EnginFrame
Visualization
NICE DCV
AppStream 2.0
Compute
Amazon EC2 instances
(compute and accelerated)
EC2 Spot
Auto Scaling
Amazon EC2 Instances
• Select compute that best fits the workload profile. Match the architecture to the job, not vice versa.
• Optimize price/performance of your HPC workloads with widest range of compute instances.
• Benefit from the AWS pace of innovation.
General
purpose Dense
storage
Compute
optimized
FPGAs
General
purpose
GPU
Big data
optimized
Graphics
intensive
Memory
optimized
High
I/OM5 D2 X1T2
R5.
metal
I3.
metal
C5d F1M5d P3H1 z1dG3X1e I3 C5
High
I/O
Burstable
Compute and
memory intensive
R5
C O M I N G
S O O N
Memory
intensive
z1d.
metal
Amazon
Lightsail
In-memory
Virtual
private
servers
C O M I N G
S O O N
Amazon EFS
File
Amazon EBS
Amazon EC2
Instance store
Block
Amazon
S3 / S3-IA
Amazon Glacier
Object
Data Transfer
AWS Direct
Connect
ISV
Connectors
Amazon
Kinesis
Data Firehose
AWS
Storage
Gateway
Amazon S3
Transfer
Acceleration
AWS Storage is a Platform
AWS
Snowball
Amazon
CloudFront
Internet/
VPN
Data transfer
HPC Data Flow on AWS Storage
Campus data center
Amazon
Glacier
Amazon S3
AWS Direct
Connect
ISV
Connectors
Storage
Gateway
AWS
Snowball
Internet/VPN
Ingress
Egress
Lifecycle
EC2 Instance
EBS
Instance
Store
Object, block, file storage
Kinesis Data Firehose
S3 Transfer
Acceleration
Amazon
CloudFront
Other Shared File System
​EFS
25 Gbps to Amazon S3
HPC Data in the Cloud
AthenaGlue
High Speed Networking
Elastic Network Adaptor (ENA)
▪ Latest generation of enhanced networking
▪ Hardware checksums
▪ multi-queue support
▪ Receive side steering
▪ Up to 25 Gbps in a cluster placement group
▪ Open-source Amazon network driver
▪ Compatible with MPI libraries including OpenMPI 3.0
▪ A cluster placement group is a logical grouping of instances within a
Single-AZ
▪ Cluster placement groups are recommended for applications that
benefit from low network latency, high network throughput, or both
AWS Batch
▪ AWS Batch is a set of fully managed batch
primitives
▪ Focus on your applications (shell scripts,
Linux executables, Docker images) and their
resource requirements
▪ We take care of the rest!
AWS Batch Concepts
▪ The scheduler evaluates when, where, and how to
run jobs that have been submitted to a job queue.
▪ Jobs run in approximately the order in which they
are submitted as long as all dependencies on
other jobs have been met.
▪ There is no charge for AWS Batch; you only pay
for the underlying resources that you consume!
HPC Automation with CfnCluster
▪ CfnCluster simplifies deployment of
HPC in the cloud, including integrating
with popular HPC schedulers (SGE,
Torque, Slurm)
▪ Built on AWS CloudFormation, easy to
modify to meet specific application or
project requirements
▪ CfnCluster will handle the automatic
addition of compute nodes when there
are pending jobs in the queue
Launch a cluster in minutes
▪ Cluster creation usually takes ~15
minutes
▪ Completely managed by AWS
CloudFormation
$ cfncluster create mycluster
Your compute cluster in the cloud
Shared File Storage
Cloud-based, scaling HPC cluster
Cluster head node with
job scheduler
Amazon S3
and
Amazon
Glacier
Thin or Zero Client
- No local data -
CfnCluster Configuration Options
▪ Operating system
• Amazon Linux
• Centos 6
• Centos 7
• Ubuntu 14.04
▪ Scheduler
• Sun Grid Engine (SGE)
• PBS/Torque
• SLURM
▪ Storage size & IOPS
▪ EBS & instance store
Encryption
▪ Scaling speed & limits
▪ Provisioning scripts
Remote Visualization
availability zone
corporate data centermodeling
cluster
▪ Using a GPU optimized instance (G3)
and NICE DCV to visualize results
GPU instance
Popular scientific applications prepackaged
Deploys in ~5 minutes.
Familiar job schedulers, scientific applications, and shared file system.
Install any software you need.
No job queues – it’s your personal cluster.
Access to the graphical console.
Deploys in minutes.
Scales as large as needed when
you add jobs to the queue, and
scales back down when the jobs
are done.
Alces Flight scientific computing cluster in the cloud
Self-scaling HPC clusters instantly ready to compute, billed by the hour, and
use the AWS Spot market by default, so they’re very low cost
Partnerships that enable a seamless migration
AWS Advantages for HPC Workload Types
Tightly Coupled
Parallel
Computing
Loosely Coupled
Parallel
Computing
Accelerated
Computing
Visualization and
Interpretation
High Performance
Data Storage and
Analytics
Scale
EC2 Spot
pricing
Early access to
technology
Choice Performance
Derive unique
insights with AI/ML
Skip the queue View results
instantly
High Performance Computing on AWS
Faster Time to
Results
Better ROI
▪ Innovate faster with virtually unlimited
infrastructure enabling scaling and agility
not attainable on-premises
▪ Optimize cost with flexible resource
selection and pay per use
▪ Increase collaboration with secure
access to clusters around the world
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accelerated Time
to Results
Scalability and
Dynamic Resourcing
Secure Environment
for Collaboration
AWS Directly Benefits Design and Engineering
▪ Scale higher and run
more simulations to
quickly converge on
more efficient, safer,
cost-effective products,
and get to market
faster
▪ No large upfront
investments in time,
infrastructure, money
▪ Efficiently scale
resources to meet
shifting demands
throughout the product
lifecycle
▪ Encourages more
experimentation, more
frequent product
launches, higher quality
manufacturing
▪ Streamlined,
repeatable, and
secure
simulation and
development
environments
▪ Global regions with
traceability helps
satisfy regulations and
audits, while enabling
supply chain
collaboration
Global, Fault-Tolerant
Infrastructure
▪ 55 AWS Availability
Zones in 18 Regions
worldwide means high
availability
▪ Allows for seamless,
controlled information-
sharing between global
stakeholders, and
enables reduced latency
for interactive
engineering use cases
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
Overview of AWS Infrastructure
HPC Solution Overview
HPC Use Cases in Design and Engineering
Customer Success Stories
Best Practices for Performance, Scalability, and Cost
Additional Resources
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Sample HPC Use Cases
Data Light
Minimal
requirements
for high
performance
storage
Data Heavy
Benefits from
access to
high
performance
storage
Clustered (Tightly coupled)
Distributed / Grid (Loosely coupled)
▪ Fluid dynamics
▪ Materials simulations
▪ Structural simulations
▪ Weather forecasting
▪ Electromagnetics
▪ Molecular modeling
▪ Risk simulations
▪ Contextual search
▪ Logistics simulations
▪ Analog & Digital Simulation
▪ Semiconductor verification/DRC
▪ Animation and VFX
▪ Image processing/GIS
▪ Genomics
▪ Deep learning
▪ Seismic processing
▪ Clinical trial simulations
▪ Astrophysics
Graphics for Design and Engineering
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Energy
“The exploration and production
models are increasingly complex
with very large datasets, 3D and
dynamic algorithms, security, and
global reach… Amazon EC2 G3
instances enable Landmark to
deliver value to our clients in ways
that were not possible before.”
Chandra Yeleshwarapu
Global Head of Services and Cloud
Landmark, Halliburton
Automotive
“Amazon EC2 G3 instances will
enable us to continue to deliver
our unique, real-time, high-quality
3D experiences for automotive
customers through every
channel.”
Darren Joblin
Chief Executive Officer
ZeroLight
Media
“The advanced graphics features
of Amazon EC2 G3 instances
help our customers edit their work
with the same quality and fidelity
as a local workstation, deliver
results faster, and much more
cost effectively than ever before.”
David Benson
Chief Technology Officer
BeBop Technology
Traditional HPC stack for engineering & science applications
Shared file storage
HPC clusters
License managers and cluster
head nodes with job schedulers
3D graphics remote desktop servers
Remote
graphics workstations
Storage cache
Remote sites
Remote backup
Corporate data center
Traditional HPC
infrastructure is
inflexible, often
poorly utilized, and
must be managed
through a years-
long lifecycle
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Migrating HPC to AWS
Virtual private cloud on AWS
3D GRAPHICS VIRTUAL WORKSTATION
LICENSE MANAGERS AND CLUSTER
HEAD NODES WITH JOB SCHEDULERS
CLOUD-BASED, AUTO-SCALING HPC CLUSTERS
SHARED FILE
STORAGE
STORAGE CACHE
On AWS, secure and
well-optimized HPC
clusters can be
automatically created,
operated, and torn down
in just minutes
Amazon S3
and Amazon Glacier
ON-PREMISES
HPC RESOURCES
Corporate data center
AWS SNOWBALL
AWS DIRECT
CONNECT
THIN OR ZERO CLIENT
- NO LOCAL DATA -
Third-party IP providers
and collaborators
Encryption everywhere – with your own keys!
Machine learning and
analytics
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
Overview of AWS Infrastructure
HPC Solution Overview
HPC Use Cases in Design and Engineering
Customer Success Stories
Best Practices for Performance, Scalability, and Cost
Additional Resources
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Off to a cute start …
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Natural Language Processing at Clemson University
550,000 cores using EC2 Spot Instances
https://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million-
vcpus-ec2-spot-instances/
“I am absolutely thrilled with the outcome of this experiment. The graduate students on the project […]
used resources from AWS and Omnibond and developed a new software infrastructure to perform
research at a scale and time-to-completion not possible with only campus resources.” – Prof. Amy Apon,
Co-Director of the Complex Systems, Analytics and Visualization Institute
“spot market”: cheap AWS computing –a good fit for research
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
BNL/Fermilab: High Throughput Computing at Scale
High Energy Physics
• Discovery of the Higgs Boson Particle
• Added 58,000 Spot Cores Elastically
• Monte Carlo Simulations Searching for Particles
• Reduced workload from 6 weeks to 10 days
Source: https://aws.amazon.com/blogs/aws/experiment-that-discovered-the-higgs-boson-uses-aws-to-probe-nature/
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Genomics processing on FPGA Accelerators
Children’s Hospital of Philadelphia and Edico Genome Achieve Fastest-Ever Analysis of 1,000
Genomes
Orlando, Fla., Oct 19, 2017 – The
Children’s Hospital of Philadelphia (CHOP)
and Edico Genome today set a new
scientific world standard in rapidly
processing whole human genomes into
data files usable for researchers aiming to
bring precision medicine into mainstream
clinical practice. Utilizing Edico Genome’s
DRAGENTM Genome Pipeline, deployed
on 1,000 Amazon EC2 F1 instances on the
Amazon Web Services (AWS) Cloud, 1,000
pediatric genomes were processed in two
hours and 25 minutes.
… Available in “AWS App Store” (AWS Marketplace) for ~$24 / genome
HPC in aerospace
“Rescale’s ScaleX cloud
platform is a game
changer for engineering. It
gives Boom computing
resources comparable to
building a large on-
premises HPC center.
Rescale lets us move fast
with minimal capital
spending and resources
overhead.”
Josh Krall
CTO & Co-Founder
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Boom leverages Rescale and AWS to enable supersonic travel
▪ Simulated vortex lift with 200M cell models on 512+ cores
▪ Increased simulation throughput: 100 jobs in parallel with 6x
speed-up per job → 600x speed-up
▪ Eliminated IT overhead, including server capital costs & in-house
IT and software teams
▪ Elastic HPC capacity and pay-as-you-go AWS clusters allow
business agility & ability to scale
Big Data Meets HPC
“The IT organization has been
driving a massive digital
transformation and optimization of
business capabilities across the
organization. IT has been leading
these changes by creating rich
environments for the data to thrive,
ensuring improvements in
productivity and collaboration across
the massively global organization.”
- Steve Phillpot, CIO
Western Digital
Big Data Platform (BDP)
▪ Data from across global manufacturing sites are collected into a cloud-based
Big Data Platform
▪ The BDP enables operational/logistics tracking of millions of hard drives
produced each year
▪ Allows analysts to visualize data across JMP, Tableau, IBM SPSS, and SAS
to support all phases of engineering, manufacturing, testing and support
High Performance Computing for Design and Engineering
▪ Cloud-based HPC accelerates product optimization, using clusters of CPUs
and GPUs to perform millions of drive-head and disk interface simulations,
and to improve storage magnetics product capacities
▪ Cloud-based HPC is the foundation for future storage architecture analysis,
materials science explorations and machine learning based investigations for
multiple storage product families
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
HPC in design and manufacturing
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
3
Applications for engineering:
▪ Molecular dynamics, CAD, CAE, EDA
▪ Collaboration tools for engineering
▪ Big data for manufacturing yield analysis
Running drive-head simulations at scale:
Millions of parallel parameter sweeps,
running months of simulations in just hours
Over 85,000 Intel cores running at peak,
using Spot Instances
HPC in aerospace
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.40
▪ Running parallel CFD studies using Siemens
STAR-CCM+
▪ Goal: shorten the time between Design
Requirements and Configuration, and
Flight Testing
▪ 1000+ cores per CFD study, multiple studies
required for each workflow iteration
▪ Job-level optimizations:
▪ Enhanced Networking, Placement Groups
▪ Amazon Linux, Hyper-threading disabled
▪ Workflow optimizations:
▪ Spot instances, multiple clusters
▪ Multiple parallel studies for faster
throughput
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Semiconductor Design: Cadence Cloud Products
For electronic design automation (EDA)
Cloud-enabled products to run in your
cloud environment with Cloud
Passport
CLOUD READY
Cloud-optimized products that run in
a fully supported and Cadence-
managed, ready-to-go cloud design
environment.
HDS CLOUD
Cadence HDS Cloud includes:
• Licensed software and support
• Cloud-optimized services
• CAD and IT infrastructure support
• PDK and foundry expertise
• Complete security support
Customer
Managed
Cadence
Managed
• Licensed cloud software support
• Freedom to manage your own IaaS
account, CAD/IT, and cloud infrastructure
environment
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
Overview of AWS Infrastructure
HPC Solution Overview
HPC Use Cases in Design and Engineering
Customer Success Stories
Best Practices for Performance, Scalability, and Cost
Additional Resources
Important Factors for Performance and Throughput
▪ Compute performance – CPUs, GPUs, FPGAs
▪ Memory performance – high RAM requirements in many applications
▪ Network performance – throughput, latency, and consistency
▪ Storage performance – including shared filesystems
▪ Automation and cluster/job management
▪ Remote graphics for interactive applications
▪ ISV support – including license management
…and SCALE
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Accelerate HPC with
Rapid, Massive Scaling
Scale up when needed, then scale down
▪ In a traditional HPC data center, the only certainty is that
you always have the wrong number of servers – too few,
or too many
▪ Every additional HPC server launched in the cloud can
improve speed of innovation – if there are no other
constraints to scaling
▪ Overnight or over-weekend workloads reduced to an
hour or less
Think BIG
What if you could launch
1 million concurrent jobs?
C P U C O R E S O V E R T I M E
PRODUCT DEVELOPMENT CYCLE
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
4
Cluster
Tightly coupled, latency-
sensitive applications
Use larger EC2 compute
instances, placement
groups, enhanced
networking, HPC job
schedulers to “scale up”
Grid
Loosely coupled, pleasingly
parallel
Use a variety of EC2
instances, multiple AZs,
Spot, Auto Scaling, to
“scale out”
Grids of Clusters
Running parallel cluster
jobs, parameter studies
Use a grid strategy on the
cloud to run a group of
parallel, individually-
clustered HPC jobs
Use a “Grids of Clusters” Strategy for Scale
Using a “grid of cluster”
strategy results in faster,
higher quality of results for
design exploration using
parameter sweeps and
other parallel methods
Expand the solution space
Courtesy of TLG Aerospace
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
4
Optimize Using AWS Compute Instance Types
M5
T2
C5
C4
I3
D2
X1
R5
G3
EG
General
Purpose
Compute
Optimized
Storage and I/O
Optimized
Memory
Optimized
GPU Graphics GPU and FPGA
Compute
G2
F1
P3
P2
M4
H1Z1d
R4“Shape the compute to the work to be done”
Z1d
Instance types: Z1d example
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Z1d instances are optimized for memory-
intensive, compute-intensive applications
▪ Custom Intel Xeon Scalable processor
▪ Up to 4 GHz sustained, all-Turbo
performance
▪ Up to 385 GiB DDR4 Memory
▪ Enhanced Networking, up to 25 Gb
throughput
Choose an instance
type and size to meet
the unique needs of
each application
▪ Powered by 2.5 GHz Intel Xeon Scalable
Processors (Skylake)
▪ New larger instance size—m5.24xlarge with
96 vCPUs and 384 GiB of memory
(4:1 Memory:vCPU ratio)
▪ AWS Support for Intel AVX-512 offering up to twice
the performance for vector and floating point
workloads
▪ Use case: FEA Implicit
14% price/performance
improvement With M5
M4 M5
M5: Next Generation General Purpose Instance
Also Available: M5d with local NVMe-based SSD storage
C5: Compute-Optimized Instances Based on Intel Skylake
▪ Based on 3.0 GHz Intel Xeon Scalable
Processors (Skylake)
▪ Run up to 3.5 GHz using Intel Turbo Boost
Technology
▪ Up to 72 vCPUs and 144 GiB of memory
(2:1 Memory:vCPU ratio)
▪ 25-Gbps NW bandwidth
▪ Support for Intel AVX-512
▪ Use case: CFD, FEA Explicit
25% price/performance
improvement over C4
C4 C5
Also available: C5d with local NVMe-based SSD storage
P3: Next Generation of GPU Compute Instances
• Up to eight NVIDIA Tesla V100 GPUs
• 1 PetaFLOPs of computational
performance – Up to 14x better than
P2
• 300 GB/s GPU-to-GPU
communication (NVLink) – 9X better
than P2
• 16GB GPU memory with 900 GB/sec
peak GPU memory bandwidth
• Use cases: ML/AI, Accelerated
Engineering
0
1
2
3
4
5
6
7
8
K80 P100 V100
FP64 Perf
(TFLOPS)
2.6X
0
20
40
60
80
100
120
140
K80 P100 V100
Mixed/FP16 Perf
(TFLOPS)
FP32
14X
over K80’s
max perf.
F1 Instances: First Cloud Instance with FPGA
• Use cases:
• Engineering simulations
• Image and video processing
• Big data and ML
F 1 i n s t a n c e
W i t h y o u r
c u s t o m
l o g i c
D e v e l o p , s i m u l a t e , d e b u g
& c o m p i l e y o u r c o d e
P a c k a g e a s
F P G A
I m a g e s
Speed up applications over 30x
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FPGA acceleration using F1
PCIe
DDR
controllers DDR-4
attached
memory
EC2
F1
Launch instance and load AFI
Amazon Machine Image (AMI)
CPU
Application
Amazon FPGA Image (AFI)
An F1 instance can have
any number of AFIs
An AFI can be loaded into
the FPGA in seconds
Performance considerations
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
For tightly-coupled cluster workloads
Test using real-world examples
▪ Use large cases for testing: do not
benchmark scalability using only small
examples
MPI libraries
▪ Test with Intel MPI and OpenMPI 3.0,
and make use of available tunings
Domain decomposition
▪ Choose number of cells per core for
either pre-core efficiency or for faster
results
Network
▪ Use a placement group
▪ Enable enhanced networking
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0.0
500.0
1000.0
1500.0
2000.0
2500.0
0 500 1000 1500 2000 2500 3000 3500 4000
Time(S)
Scale-Up
Cores
WRF 2.5 km CONUS Benchmark
Scale-Up time
Performance considerations
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
For all HPC workloads
OS version
▪ Use Amazon Linux, RHEL 7.5, or an
updated 3.10+ kernel – 4.0+ if using
NVME on Z1d, R5d, F1, I3, etc.
Instance types
▪ Always test with the latest EC2
instances – and with earlier generation
instance types as well
Processor states
▪ Use P-states to reduce processor
variability
Hyper-threading and affinity
▪ Test with Hyper-threading (HT) on and
off – usually off is best, but not always
▪ Use CPU affinity to pin threads to CPU
cores when HT is off
Optimize Using EC2 Purchasing Options
On-Demand
Pay for compute capacity by
the second with no long-term
commitments
Spiky workloads, to define
needs
Reserved
Make a 1 or 3 Year commitment
and receive a significant discount
off On-Demand prices
Committed, steady-state usage
Spot
Spare EC2 capacity at savings of
up to 90% off On-Demand prices
Fault-tolerant, dev/test, time-
flexible, stateless workloads
Per Second Billing for EC2 Linux instances & EBS volumes
Cost optimization
RESERVED INSTANCES
ON-DEMAND INSTANCES
Conservative:
SPOT
RESERVED INSTANCES
ON-DEMAND INSTANCES
Optimized:
SPOT
SPOT
RESERVED INSTANCES
ON-DEMAND INSTANCES
Optimized with scale out (magnify the peak):
Reserved
Make a low, one-
time payment and
receive a significant
discount on the
hourly charge
For committed
utilization
On-Demand
Pay for compute
capacity by the
hour, with per-
second billing and
no long-term
commitments
For spiky
workloads,
or to define needs
Spot
Use unused
capacity, charged
at a Spot price that
fluctuates based
on supply and
demand
For high-scale,
time-flexible
workloads
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
Overview of AWS Infrastructure
HPC Solution Overview
HPC Use Cases in Design and Engineering
Customer Success Stories
Best Practices for Performance, Scalability, and Cost
Additional Resources
Further reading
White Papers
• Introduction to HPC on AWS
• Optimizing Electronic Design Automation
(EDA) Workflows on AWS
Reference Architecture
• HPC Lens - Well Architected Framework
Blogs
• Ansys - Getting Faster, Cost-effective
Simulation on the Cloud
• Real World Scalability with AWS
Docs
• Processor State Control for Your
EC2 Instance
• Optimizing CPU Options
• Big data
• EFS performance
• Monitor performance
• AWS Online Tech Talk
https://aws.amazon.com/hpc/
Thank you
Q&A
Learn More:
http://aws.amazon.com/hpc
Numbers are like pictures
72
26 75
90
10
6000
50000
They are worth a thousand words
Numbers that made a difference
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Submit session feedback
1. Tap the Schedule icon.
2. Select the session you attended.
3. Tap Session Evaluation to submit your
feedback.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

Más contenido relacionado

La actualidad más candente

AWS Enterprise First Call Deck
AWS Enterprise First Call DeckAWS Enterprise First Call Deck
AWS Enterprise First Call Deck
Alexandre Melo
 

La actualidad más candente (20)

Introduction to Amazon Relational Database Service (Amazon RDS)
Introduction to Amazon Relational Database Service (Amazon RDS)Introduction to Amazon Relational Database Service (Amazon RDS)
Introduction to Amazon Relational Database Service (Amazon RDS)
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web Services
 
4.30.19 HPE GreenLake and Cloud Technology Partners (CTP)
4.30.19 HPE GreenLake and Cloud Technology Partners (CTP)4.30.19 HPE GreenLake and Cloud Technology Partners (CTP)
4.30.19 HPE GreenLake and Cloud Technology Partners (CTP)
 
Creating the Cloud Business Case
Creating the Cloud Business CaseCreating the Cloud Business Case
Creating the Cloud Business Case
 
NEW LAUNCH! Introducing Amazon Transcribe – Now in Preview - MCL215 - re:Inve...
NEW LAUNCH! Introducing Amazon Transcribe – Now in Preview - MCL215 - re:Inve...NEW LAUNCH! Introducing Amazon Transcribe – Now in Preview - MCL215 - re:Inve...
NEW LAUNCH! Introducing Amazon Transcribe – Now in Preview - MCL215 - re:Inve...
 
AWS Enterprise First Call Deck
AWS Enterprise First Call DeckAWS Enterprise First Call Deck
AWS Enterprise First Call Deck
 
Breaking Down the Economics and TCO of Migrating to AWS
Breaking Down the Economics and TCO of Migrating to AWSBreaking Down the Economics and TCO of Migrating to AWS
Breaking Down the Economics and TCO of Migrating to AWS
 
Intro to AWS: Database Services
Intro to AWS: Database ServicesIntro to AWS: Database Services
Intro to AWS: Database Services
 
Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS
 
Transparency and Auditing on AWS
Transparency and Auditing on AWSTransparency and Auditing on AWS
Transparency and Auditing on AWS
 
Amazon CloudFront 101
Amazon CloudFront 101Amazon CloudFront 101
Amazon CloudFront 101
 
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel AvivFinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
FinOps - AWS Cost and Operational Efficiency - Pop-up Loft Tel Aviv
 
Simplify & Standardise your migration to AWS with a Migration Landing Zone
Simplify & Standardise your migration to AWS with a Migration Landing ZoneSimplify & Standardise your migration to AWS with a Migration Landing Zone
Simplify & Standardise your migration to AWS with a Migration Landing Zone
 
워크로드에 맞는 데이터베이스 찾기 - 박주연 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
워크로드에 맞는 데이터베이스 찾기 - 박주연 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019워크로드에 맞는 데이터베이스 찾기 - 박주연 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
워크로드에 맞는 데이터베이스 찾기 - 박주연 솔루션즈 아키텍트, AWS :: AWS Summit Seoul 2019
 
Implementing Multi-Region AWS IoT, ft. Analog Devices (IOT401) - AWS re:Inven...
Implementing Multi-Region AWS IoT, ft. Analog Devices (IOT401) - AWS re:Inven...Implementing Multi-Region AWS IoT, ft. Analog Devices (IOT401) - AWS re:Inven...
Implementing Multi-Region AWS IoT, ft. Analog Devices (IOT401) - AWS re:Inven...
 
Computing at the Edge with AWS Greengrass and Amazon FreeRTOS, ft. Enel (IOT2...
Computing at the Edge with AWS Greengrass and Amazon FreeRTOS, ft. Enel (IOT2...Computing at the Edge with AWS Greengrass and Amazon FreeRTOS, ft. Enel (IOT2...
Computing at the Edge with AWS Greengrass and Amazon FreeRTOS, ft. Enel (IOT2...
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 
Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...
Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...
Accelerating Your Portfolio Migration to AWS Using AWS Migration Hub - ENT321...
 
Migration Planning
Migration PlanningMigration Planning
Migration Planning
 
Enable Your Smart Factory with the AWS Industrial IoT Reference Solution (MFG...
Enable Your Smart Factory with the AWS Industrial IoT Reference Solution (MFG...Enable Your Smart Factory with the AWS Industrial IoT Reference Solution (MFG...
Enable Your Smart Factory with the AWS Industrial IoT Reference Solution (MFG...
 

Similar a High Performance Computing on AWS: Accelerating Innovation with virtually unlimited infrastructure capacity - SRV216 - Toronto AWS Summit

Track 3 Session 5_ 使用 Amazon EC2 打造企業計算平台與成本和容量優化
Track 3 Session 5_ 使用 Amazon EC2 打造企業計算平台與成本和容量優化Track 3 Session 5_ 使用 Amazon EC2 打造企業計算平台與成本和容量優化
Track 3 Session 5_ 使用 Amazon EC2 打造企業計算平台與成本和容量優化
Amazon Web Services
 
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Amazon Web Services
 
Introduction of AWS Cloud Computing and its future for Biometric Department
Introduction of AWS Cloud Computing and its future for Biometric DepartmentIntroduction of AWS Cloud Computing and its future for Biometric Department
Introduction of AWS Cloud Computing and its future for Biometric Department
Kevin Lee
 
Migrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSMigrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWS
Tom Laszewski
 

Similar a High Performance Computing on AWS: Accelerating Innovation with virtually unlimited infrastructure capacity - SRV216 - Toronto AWS Summit (20)

Amf304 optimizing-design-and-e-660cc73d-5c4c-4331-8f59-48cccdc1b7f4-135588426...
Amf304 optimizing-design-and-e-660cc73d-5c4c-4331-8f59-48cccdc1b7f4-135588426...Amf304 optimizing-design-and-e-660cc73d-5c4c-4331-8f59-48cccdc1b7f4-135588426...
Amf304 optimizing-design-and-e-660cc73d-5c4c-4331-8f59-48cccdc1b7f4-135588426...
 
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
AMF304-Optimizing Design and Engineering Performance in the Cloud for Manufac...
 
AWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWSAWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWS
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
High-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-SimulationHigh-Performance-Computing-on-AWS-and-Industry-Simulation
High-Performance-Computing-on-AWS-and-Industry-Simulation
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWS
 
Solving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute finalSolving enterprise challenges through scale out storage & big compute final
Solving enterprise challenges through scale out storage & big compute final
 
Standard Chartered Bank Cloud Journey
Standard Chartered Bank Cloud JourneyStandard Chartered Bank Cloud Journey
Standard Chartered Bank Cloud Journey
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
Track 3 Session 5_ 使用 Amazon EC2 打造企業計算平台與成本和容量優化
Track 3 Session 5_ 使用 Amazon EC2 打造企業計算平台與成本和容量優化Track 3 Session 5_ 使用 Amazon EC2 打造企業計算平台與成本和容量優化
Track 3 Session 5_ 使用 Amazon EC2 打造企業計算平台與成本和容量優化
 
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
Lunch and Learn - Store and Move your Data To & From the AWS Cloud, Markku Le...
 
Introduction of AWS Cloud Computing and its future for Biometric Department
Introduction of AWS Cloud Computing and its future for Biometric DepartmentIntroduction of AWS Cloud Computing and its future for Biometric Department
Introduction of AWS Cloud Computing and its future for Biometric Department
 
Building Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit CanberraBuilding Complex Workloads in Cloud - AWS PS Summit Canberra
Building Complex Workloads in Cloud - AWS PS Summit Canberra
 
What would you do with a million cores - HPC on AWS
What would you do with a million cores - HPC on AWSWhat would you do with a million cores - HPC on AWS
What would you do with a million cores - HPC on AWS
 
Cloud Economics: The Financial Case for Cloud Migration
Cloud Economics: The Financial Case for Cloud MigrationCloud Economics: The Financial Case for Cloud Migration
Cloud Economics: The Financial Case for Cloud Migration
 
Migrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSMigrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWS
 
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
 
AWS Summit Stockholm 2014 – B2 – Migrating enterprise applications to AWS
AWS Summit Stockholm 2014 – B2 – Migrating enterprise applications to AWSAWS Summit Stockholm 2014 – B2 – Migrating enterprise applications to AWS
AWS Summit Stockholm 2014 – B2 – Migrating enterprise applications to AWS
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 

Más de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

High Performance Computing on AWS: Accelerating Innovation with virtually unlimited infrastructure capacity - SRV216 - Toronto AWS Summit

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Kevin Jorissen Technical Business Development Manager, Research and Technical Computing SRV216 HCP: Accelerate Innovation with Virtually Unlimited Infrastructure Capacity
  • 2. Don’t let your innovation get stuck in a queue! HCP: Accelerate Innovation with Virtually Unlimited Infrastructure Capacity Kevin Jorissen Technical Business Development Manager, Research and Technical Computing jorissen@amazon.com SRV216
  • 3. Agenda Overview of AWS Infrastructure HPC Solution Overview HPC Use Cases in Design and Engineering Customer Success Stories Best Practices for Performance, Scalability, and Cost Additional Resources
  • 4. AWS Global Infrastructure 18 Regions – 55 Availability Zones The AWS Cloud spans 55 Availability Zones within 18 geographic Regions and 1 Local Region around the world Announced plans: 12 more Availability Zones and four more Regions in Bahrain, Hong Kong SAR, Sweden, and a second AWS AWS GovCloud (US) # Indicates # of AZs in the Region
  • 5. Over 100 Global CloudFront PoPs AWS Global Infrastructure Regions Amazon Global Network • Redundant 100-GbE network • Redundant private capacity between all Regions except China
  • 6. Agenda Overview of AWS Infrastructure HPC Solution Overview HPC Use Cases in Design and Engineering Customer Success Stories Best Practices for Performance, Scalability, and Cost Additional Resources
  • 7. HPC on AWS: Fundamental Rethink of What is Possible Capex Capacity Technology Collaboration From worrying about Focusing on Science, Engineering and Innovation to
  • 8. AWS HPC Solution Components Storage Amazon EBS Amazon EFS Amazon S3 Networking Enhanced networking Placement groups Automation & Orchestration AWS Batch CfnCluster NICE EnginFrame Visualization NICE DCV AppStream 2.0 Compute Amazon EC2 instances (compute and accelerated) EC2 Spot Auto Scaling
  • 9. Amazon EC2 Instances • Select compute that best fits the workload profile. Match the architecture to the job, not vice versa. • Optimize price/performance of your HPC workloads with widest range of compute instances. • Benefit from the AWS pace of innovation. General purpose Dense storage Compute optimized FPGAs General purpose GPU Big data optimized Graphics intensive Memory optimized High I/OM5 D2 X1T2 R5. metal I3. metal C5d F1M5d P3H1 z1dG3X1e I3 C5 High I/O Burstable Compute and memory intensive R5 C O M I N G S O O N Memory intensive z1d. metal Amazon Lightsail In-memory Virtual private servers C O M I N G S O O N
  • 10. Amazon EFS File Amazon EBS Amazon EC2 Instance store Block Amazon S3 / S3-IA Amazon Glacier Object Data Transfer AWS Direct Connect ISV Connectors Amazon Kinesis Data Firehose AWS Storage Gateway Amazon S3 Transfer Acceleration AWS Storage is a Platform AWS Snowball Amazon CloudFront Internet/ VPN
  • 11. Data transfer HPC Data Flow on AWS Storage Campus data center Amazon Glacier Amazon S3 AWS Direct Connect ISV Connectors Storage Gateway AWS Snowball Internet/VPN Ingress Egress Lifecycle EC2 Instance EBS Instance Store Object, block, file storage Kinesis Data Firehose S3 Transfer Acceleration Amazon CloudFront Other Shared File System ​EFS 25 Gbps to Amazon S3
  • 12. HPC Data in the Cloud AthenaGlue
  • 13. High Speed Networking Elastic Network Adaptor (ENA) ▪ Latest generation of enhanced networking ▪ Hardware checksums ▪ multi-queue support ▪ Receive side steering ▪ Up to 25 Gbps in a cluster placement group ▪ Open-source Amazon network driver ▪ Compatible with MPI libraries including OpenMPI 3.0 ▪ A cluster placement group is a logical grouping of instances within a Single-AZ ▪ Cluster placement groups are recommended for applications that benefit from low network latency, high network throughput, or both
  • 14. AWS Batch ▪ AWS Batch is a set of fully managed batch primitives ▪ Focus on your applications (shell scripts, Linux executables, Docker images) and their resource requirements ▪ We take care of the rest!
  • 15. AWS Batch Concepts ▪ The scheduler evaluates when, where, and how to run jobs that have been submitted to a job queue. ▪ Jobs run in approximately the order in which they are submitted as long as all dependencies on other jobs have been met. ▪ There is no charge for AWS Batch; you only pay for the underlying resources that you consume!
  • 16. HPC Automation with CfnCluster ▪ CfnCluster simplifies deployment of HPC in the cloud, including integrating with popular HPC schedulers (SGE, Torque, Slurm) ▪ Built on AWS CloudFormation, easy to modify to meet specific application or project requirements ▪ CfnCluster will handle the automatic addition of compute nodes when there are pending jobs in the queue
  • 17. Launch a cluster in minutes ▪ Cluster creation usually takes ~15 minutes ▪ Completely managed by AWS CloudFormation $ cfncluster create mycluster
  • 18. Your compute cluster in the cloud Shared File Storage Cloud-based, scaling HPC cluster Cluster head node with job scheduler Amazon S3 and Amazon Glacier Thin or Zero Client - No local data -
  • 19. CfnCluster Configuration Options ▪ Operating system • Amazon Linux • Centos 6 • Centos 7 • Ubuntu 14.04 ▪ Scheduler • Sun Grid Engine (SGE) • PBS/Torque • SLURM ▪ Storage size & IOPS ▪ EBS & instance store Encryption ▪ Scaling speed & limits ▪ Provisioning scripts
  • 20. Remote Visualization availability zone corporate data centermodeling cluster ▪ Using a GPU optimized instance (G3) and NICE DCV to visualize results GPU instance
  • 21. Popular scientific applications prepackaged Deploys in ~5 minutes. Familiar job schedulers, scientific applications, and shared file system. Install any software you need. No job queues – it’s your personal cluster. Access to the graphical console. Deploys in minutes. Scales as large as needed when you add jobs to the queue, and scales back down when the jobs are done. Alces Flight scientific computing cluster in the cloud Self-scaling HPC clusters instantly ready to compute, billed by the hour, and use the AWS Spot market by default, so they’re very low cost
  • 22. Partnerships that enable a seamless migration
  • 23. AWS Advantages for HPC Workload Types Tightly Coupled Parallel Computing Loosely Coupled Parallel Computing Accelerated Computing Visualization and Interpretation High Performance Data Storage and Analytics Scale EC2 Spot pricing Early access to technology Choice Performance Derive unique insights with AI/ML Skip the queue View results instantly
  • 24. High Performance Computing on AWS Faster Time to Results Better ROI ▪ Innovate faster with virtually unlimited infrastructure enabling scaling and agility not attainable on-premises ▪ Optimize cost with flexible resource selection and pay per use ▪ Increase collaboration with secure access to clusters around the world
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Accelerated Time to Results Scalability and Dynamic Resourcing Secure Environment for Collaboration AWS Directly Benefits Design and Engineering ▪ Scale higher and run more simulations to quickly converge on more efficient, safer, cost-effective products, and get to market faster ▪ No large upfront investments in time, infrastructure, money ▪ Efficiently scale resources to meet shifting demands throughout the product lifecycle ▪ Encourages more experimentation, more frequent product launches, higher quality manufacturing ▪ Streamlined, repeatable, and secure simulation and development environments ▪ Global regions with traceability helps satisfy regulations and audits, while enabling supply chain collaboration Global, Fault-Tolerant Infrastructure ▪ 55 AWS Availability Zones in 18 Regions worldwide means high availability ▪ Allows for seamless, controlled information- sharing between global stakeholders, and enables reduced latency for interactive engineering use cases
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Overview of AWS Infrastructure HPC Solution Overview HPC Use Cases in Design and Engineering Customer Success Stories Best Practices for Performance, Scalability, and Cost Additional Resources
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sample HPC Use Cases Data Light Minimal requirements for high performance storage Data Heavy Benefits from access to high performance storage Clustered (Tightly coupled) Distributed / Grid (Loosely coupled) ▪ Fluid dynamics ▪ Materials simulations ▪ Structural simulations ▪ Weather forecasting ▪ Electromagnetics ▪ Molecular modeling ▪ Risk simulations ▪ Contextual search ▪ Logistics simulations ▪ Analog & Digital Simulation ▪ Semiconductor verification/DRC ▪ Animation and VFX ▪ Image processing/GIS ▪ Genomics ▪ Deep learning ▪ Seismic processing ▪ Clinical trial simulations ▪ Astrophysics
  • 28. Graphics for Design and Engineering © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Energy “The exploration and production models are increasingly complex with very large datasets, 3D and dynamic algorithms, security, and global reach… Amazon EC2 G3 instances enable Landmark to deliver value to our clients in ways that were not possible before.” Chandra Yeleshwarapu Global Head of Services and Cloud Landmark, Halliburton Automotive “Amazon EC2 G3 instances will enable us to continue to deliver our unique, real-time, high-quality 3D experiences for automotive customers through every channel.” Darren Joblin Chief Executive Officer ZeroLight Media “The advanced graphics features of Amazon EC2 G3 instances help our customers edit their work with the same quality and fidelity as a local workstation, deliver results faster, and much more cost effectively than ever before.” David Benson Chief Technology Officer BeBop Technology
  • 29. Traditional HPC stack for engineering & science applications Shared file storage HPC clusters License managers and cluster head nodes with job schedulers 3D graphics remote desktop servers Remote graphics workstations Storage cache Remote sites Remote backup Corporate data center Traditional HPC infrastructure is inflexible, often poorly utilized, and must be managed through a years- long lifecycle
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Migrating HPC to AWS Virtual private cloud on AWS 3D GRAPHICS VIRTUAL WORKSTATION LICENSE MANAGERS AND CLUSTER HEAD NODES WITH JOB SCHEDULERS CLOUD-BASED, AUTO-SCALING HPC CLUSTERS SHARED FILE STORAGE STORAGE CACHE On AWS, secure and well-optimized HPC clusters can be automatically created, operated, and torn down in just minutes Amazon S3 and Amazon Glacier ON-PREMISES HPC RESOURCES Corporate data center AWS SNOWBALL AWS DIRECT CONNECT THIN OR ZERO CLIENT - NO LOCAL DATA - Third-party IP providers and collaborators Encryption everywhere – with your own keys! Machine learning and analytics
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Overview of AWS Infrastructure HPC Solution Overview HPC Use Cases in Design and Engineering Customer Success Stories Best Practices for Performance, Scalability, and Cost Additional Resources
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Off to a cute start …
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Natural Language Processing at Clemson University 550,000 cores using EC2 Spot Instances https://aws.amazon.com/blogs/aws/natural-language-processing-at-clemson-university-1-1-million- vcpus-ec2-spot-instances/ “I am absolutely thrilled with the outcome of this experiment. The graduate students on the project […] used resources from AWS and Omnibond and developed a new software infrastructure to perform research at a scale and time-to-completion not possible with only campus resources.” – Prof. Amy Apon, Co-Director of the Complex Systems, Analytics and Visualization Institute “spot market”: cheap AWS computing –a good fit for research
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. BNL/Fermilab: High Throughput Computing at Scale High Energy Physics • Discovery of the Higgs Boson Particle • Added 58,000 Spot Cores Elastically • Monte Carlo Simulations Searching for Particles • Reduced workload from 6 weeks to 10 days Source: https://aws.amazon.com/blogs/aws/experiment-that-discovered-the-higgs-boson-uses-aws-to-probe-nature/
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Genomics processing on FPGA Accelerators Children’s Hospital of Philadelphia and Edico Genome Achieve Fastest-Ever Analysis of 1,000 Genomes Orlando, Fla., Oct 19, 2017 – The Children’s Hospital of Philadelphia (CHOP) and Edico Genome today set a new scientific world standard in rapidly processing whole human genomes into data files usable for researchers aiming to bring precision medicine into mainstream clinical practice. Utilizing Edico Genome’s DRAGENTM Genome Pipeline, deployed on 1,000 Amazon EC2 F1 instances on the Amazon Web Services (AWS) Cloud, 1,000 pediatric genomes were processed in two hours and 25 minutes. … Available in “AWS App Store” (AWS Marketplace) for ~$24 / genome
  • 36. HPC in aerospace “Rescale’s ScaleX cloud platform is a game changer for engineering. It gives Boom computing resources comparable to building a large on- premises HPC center. Rescale lets us move fast with minimal capital spending and resources overhead.” Josh Krall CTO & Co-Founder © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Boom leverages Rescale and AWS to enable supersonic travel ▪ Simulated vortex lift with 200M cell models on 512+ cores ▪ Increased simulation throughput: 100 jobs in parallel with 6x speed-up per job → 600x speed-up ▪ Eliminated IT overhead, including server capital costs & in-house IT and software teams ▪ Elastic HPC capacity and pay-as-you-go AWS clusters allow business agility & ability to scale
  • 37. Big Data Meets HPC “The IT organization has been driving a massive digital transformation and optimization of business capabilities across the organization. IT has been leading these changes by creating rich environments for the data to thrive, ensuring improvements in productivity and collaboration across the massively global organization.” - Steve Phillpot, CIO Western Digital Big Data Platform (BDP) ▪ Data from across global manufacturing sites are collected into a cloud-based Big Data Platform ▪ The BDP enables operational/logistics tracking of millions of hard drives produced each year ▪ Allows analysts to visualize data across JMP, Tableau, IBM SPSS, and SAS to support all phases of engineering, manufacturing, testing and support High Performance Computing for Design and Engineering ▪ Cloud-based HPC accelerates product optimization, using clusters of CPUs and GPUs to perform millions of drive-head and disk interface simulations, and to improve storage magnetics product capacities ▪ Cloud-based HPC is the foundation for future storage architecture analysis, materials science explorations and machine learning based investigations for multiple storage product families © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 38. HPC in design and manufacturing © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 3 Applications for engineering: ▪ Molecular dynamics, CAD, CAE, EDA ▪ Collaboration tools for engineering ▪ Big data for manufacturing yield analysis Running drive-head simulations at scale: Millions of parallel parameter sweeps, running months of simulations in just hours Over 85,000 Intel cores running at peak, using Spot Instances
  • 39. HPC in aerospace © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.40 ▪ Running parallel CFD studies using Siemens STAR-CCM+ ▪ Goal: shorten the time between Design Requirements and Configuration, and Flight Testing ▪ 1000+ cores per CFD study, multiple studies required for each workflow iteration ▪ Job-level optimizations: ▪ Enhanced Networking, Placement Groups ▪ Amazon Linux, Hyper-threading disabled ▪ Workflow optimizations: ▪ Spot instances, multiple clusters ▪ Multiple parallel studies for faster throughput
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Semiconductor Design: Cadence Cloud Products For electronic design automation (EDA) Cloud-enabled products to run in your cloud environment with Cloud Passport CLOUD READY Cloud-optimized products that run in a fully supported and Cadence- managed, ready-to-go cloud design environment. HDS CLOUD Cadence HDS Cloud includes: • Licensed software and support • Cloud-optimized services • CAD and IT infrastructure support • PDK and foundry expertise • Complete security support Customer Managed Cadence Managed • Licensed cloud software support • Freedom to manage your own IaaS account, CAD/IT, and cloud infrastructure environment
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Overview of AWS Infrastructure HPC Solution Overview HPC Use Cases in Design and Engineering Customer Success Stories Best Practices for Performance, Scalability, and Cost Additional Resources
  • 42. Important Factors for Performance and Throughput ▪ Compute performance – CPUs, GPUs, FPGAs ▪ Memory performance – high RAM requirements in many applications ▪ Network performance – throughput, latency, and consistency ▪ Storage performance – including shared filesystems ▪ Automation and cluster/job management ▪ Remote graphics for interactive applications ▪ ISV support – including license management …and SCALE © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Accelerate HPC with Rapid, Massive Scaling Scale up when needed, then scale down ▪ In a traditional HPC data center, the only certainty is that you always have the wrong number of servers – too few, or too many ▪ Every additional HPC server launched in the cloud can improve speed of innovation – if there are no other constraints to scaling ▪ Overnight or over-weekend workloads reduced to an hour or less Think BIG What if you could launch 1 million concurrent jobs? C P U C O R E S O V E R T I M E PRODUCT DEVELOPMENT CYCLE
  • 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 4 Cluster Tightly coupled, latency- sensitive applications Use larger EC2 compute instances, placement groups, enhanced networking, HPC job schedulers to “scale up” Grid Loosely coupled, pleasingly parallel Use a variety of EC2 instances, multiple AZs, Spot, Auto Scaling, to “scale out” Grids of Clusters Running parallel cluster jobs, parameter studies Use a grid strategy on the cloud to run a group of parallel, individually- clustered HPC jobs Use a “Grids of Clusters” Strategy for Scale
  • 45. Using a “grid of cluster” strategy results in faster, higher quality of results for design exploration using parameter sweeps and other parallel methods Expand the solution space Courtesy of TLG Aerospace
  • 46. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 4 Optimize Using AWS Compute Instance Types M5 T2 C5 C4 I3 D2 X1 R5 G3 EG General Purpose Compute Optimized Storage and I/O Optimized Memory Optimized GPU Graphics GPU and FPGA Compute G2 F1 P3 P2 M4 H1Z1d R4“Shape the compute to the work to be done” Z1d
  • 47. Instance types: Z1d example © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Z1d instances are optimized for memory- intensive, compute-intensive applications ▪ Custom Intel Xeon Scalable processor ▪ Up to 4 GHz sustained, all-Turbo performance ▪ Up to 385 GiB DDR4 Memory ▪ Enhanced Networking, up to 25 Gb throughput Choose an instance type and size to meet the unique needs of each application
  • 48. ▪ Powered by 2.5 GHz Intel Xeon Scalable Processors (Skylake) ▪ New larger instance size—m5.24xlarge with 96 vCPUs and 384 GiB of memory (4:1 Memory:vCPU ratio) ▪ AWS Support for Intel AVX-512 offering up to twice the performance for vector and floating point workloads ▪ Use case: FEA Implicit 14% price/performance improvement With M5 M4 M5 M5: Next Generation General Purpose Instance Also Available: M5d with local NVMe-based SSD storage
  • 49. C5: Compute-Optimized Instances Based on Intel Skylake ▪ Based on 3.0 GHz Intel Xeon Scalable Processors (Skylake) ▪ Run up to 3.5 GHz using Intel Turbo Boost Technology ▪ Up to 72 vCPUs and 144 GiB of memory (2:1 Memory:vCPU ratio) ▪ 25-Gbps NW bandwidth ▪ Support for Intel AVX-512 ▪ Use case: CFD, FEA Explicit 25% price/performance improvement over C4 C4 C5 Also available: C5d with local NVMe-based SSD storage
  • 50. P3: Next Generation of GPU Compute Instances • Up to eight NVIDIA Tesla V100 GPUs • 1 PetaFLOPs of computational performance – Up to 14x better than P2 • 300 GB/s GPU-to-GPU communication (NVLink) – 9X better than P2 • 16GB GPU memory with 900 GB/sec peak GPU memory bandwidth • Use cases: ML/AI, Accelerated Engineering 0 1 2 3 4 5 6 7 8 K80 P100 V100 FP64 Perf (TFLOPS) 2.6X 0 20 40 60 80 100 120 140 K80 P100 V100 Mixed/FP16 Perf (TFLOPS) FP32 14X over K80’s max perf.
  • 51. F1 Instances: First Cloud Instance with FPGA • Use cases: • Engineering simulations • Image and video processing • Big data and ML F 1 i n s t a n c e W i t h y o u r c u s t o m l o g i c D e v e l o p , s i m u l a t e , d e b u g & c o m p i l e y o u r c o d e P a c k a g e a s F P G A I m a g e s Speed up applications over 30x
  • 52. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FPGA acceleration using F1 PCIe DDR controllers DDR-4 attached memory EC2 F1 Launch instance and load AFI Amazon Machine Image (AMI) CPU Application Amazon FPGA Image (AFI) An F1 instance can have any number of AFIs An AFI can be loaded into the FPGA in seconds
  • 53. Performance considerations © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. For tightly-coupled cluster workloads Test using real-world examples ▪ Use large cases for testing: do not benchmark scalability using only small examples MPI libraries ▪ Test with Intel MPI and OpenMPI 3.0, and make use of available tunings Domain decomposition ▪ Choose number of cells per core for either pre-core efficiency or for faster results Network ▪ Use a placement group ▪ Enable enhanced networking 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.0 500.0 1000.0 1500.0 2000.0 2500.0 0 500 1000 1500 2000 2500 3000 3500 4000 Time(S) Scale-Up Cores WRF 2.5 km CONUS Benchmark Scale-Up time
  • 54. Performance considerations © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. For all HPC workloads OS version ▪ Use Amazon Linux, RHEL 7.5, or an updated 3.10+ kernel – 4.0+ if using NVME on Z1d, R5d, F1, I3, etc. Instance types ▪ Always test with the latest EC2 instances – and with earlier generation instance types as well Processor states ▪ Use P-states to reduce processor variability Hyper-threading and affinity ▪ Test with Hyper-threading (HT) on and off – usually off is best, but not always ▪ Use CPU affinity to pin threads to CPU cores when HT is off
  • 55. Optimize Using EC2 Purchasing Options On-Demand Pay for compute capacity by the second with no long-term commitments Spiky workloads, to define needs Reserved Make a 1 or 3 Year commitment and receive a significant discount off On-Demand prices Committed, steady-state usage Spot Spare EC2 capacity at savings of up to 90% off On-Demand prices Fault-tolerant, dev/test, time- flexible, stateless workloads Per Second Billing for EC2 Linux instances & EBS volumes
  • 56. Cost optimization RESERVED INSTANCES ON-DEMAND INSTANCES Conservative: SPOT RESERVED INSTANCES ON-DEMAND INSTANCES Optimized: SPOT SPOT RESERVED INSTANCES ON-DEMAND INSTANCES Optimized with scale out (magnify the peak): Reserved Make a low, one- time payment and receive a significant discount on the hourly charge For committed utilization On-Demand Pay for compute capacity by the hour, with per- second billing and no long-term commitments For spiky workloads, or to define needs Spot Use unused capacity, charged at a Spot price that fluctuates based on supply and demand For high-scale, time-flexible workloads
  • 57. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Overview of AWS Infrastructure HPC Solution Overview HPC Use Cases in Design and Engineering Customer Success Stories Best Practices for Performance, Scalability, and Cost Additional Resources
  • 58. Further reading White Papers • Introduction to HPC on AWS • Optimizing Electronic Design Automation (EDA) Workflows on AWS Reference Architecture • HPC Lens - Well Architected Framework Blogs • Ansys - Getting Faster, Cost-effective Simulation on the Cloud • Real World Scalability with AWS Docs • Processor State Control for Your EC2 Instance • Optimizing CPU Options • Big data • EFS performance • Monitor performance • AWS Online Tech Talk https://aws.amazon.com/hpc/
  • 60. Numbers are like pictures 72 26 75 90 10 6000 50000 They are worth a thousand words
  • 61. Numbers that made a difference
  • 62. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Submit session feedback 1. Tap the Schedule icon. 2. Select the session you attended. 3. Tap Session Evaluation to submit your feedback.
  • 63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!