SlideShare una empresa de Scribd logo
1 de 55
Descargar para leer sin conexión
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Jhen-Wei Huang (黃振維)
Solutions Architect, Amazon Web Services
High Performance Computing on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
 Overview of AWS Infrastructure
 Why HPC on AWS
 HPC Solution Components
 Cost Optimization
 Performance Considerations
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Global Infrastructure
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Over 100 Global CloudFront PoPs
AWS Global Infrastructure
Regions
Amazon Global
Network
• Redundant 100 GbE network
• Redundant private capacity
between all Regions except China
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Global Infrastructure Region & Number of Availability Zones
US West EU
Oregon (3) Ireland (3)
Northern California (3) Frankfurt (3)
London (3)
US East Paris (3)
N. Virginia (6), Ohio (3)
Asia Pacific
Canada Singapore (3)
Central (2) Sydney (3), Tokyo (4),
Osaka-Local (1)
Seoul (2), Mumbai (2)
AWS GovCloud
US-West (3)
China
South America Beijing (2)
São Paulo (3) Ningxia (2)
55 Availability Zones within 18 geographic
Regions and 1 Local Region around the world
Announced Regions
Bahrain, Hong Kong, SAR(China), AWS
GovCloud (US-East)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why HPC on AWS?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Running HPC Workloads Everyday
 Logistics
 Machine learning
 Data center, network, and server design
 Consumer product design
 Robotics
 Semiconductor design
 Retail and financial analytics
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why HPC on AWS
Faster Time to
Results
Better ROI
 Virtually unlimited infrastructure
enabling scaling and agility not
attainable on-premises
 Flexible configuration options
quickly iterate resource selection
and ensure cost optimization
 Increased collaboration with
secure access to clusters around
the world
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why HPC on AWS – Multiple Clusters
$ qsub –q monolith iwait.sh
$ qsub dev.sh
$ qsub prod.sh
$ qsub critical.sh
$ qsub bigrun.sh
On-Prem
Launch clusters by group, user,
application – no more waiting!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Transcoding and
Encoding
Monte Carlo
Simulations
Computational
Chemistry
Government and
Educational Research
Modeling and
Simulation
Genome Processing
Popular HPC Workloads on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HPC Workload Types
Tightly Coupled
Parallel
Computing
Loosely Coupled
Parallel
Computing
Accelerated
Computing
Visualization and
Interpretation
High Performance
Data Storage and
Analytics
 Similar
instance types,
fixed size
clusters of EC2
instances
 Network
intensive
 Customers
price sensitive
 High
utilization
 Not typically
resilient to
interruptions
 Scalable,
flexible
infrastructure
 Workloads are
also easily
interruptible
 Data Intensive
 Typically
massively
parallel
application
 Need compute
optimized
GPUs or FPGAs
 Workloads run
on graphics-
optimized
GPUs
 Need
additional
managed
services like
Workspaces or
AppStream 2.0
 Workloads
require
moving
customer data
to AWS
 Value creation
based on
innovative
analytics
strategies like
AI/ML/DL
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Industry Verticals and common HPC Workloads
 Computational
Chemistry
 Genomics
 Proteomics
 Bioinformatics
 Neuroimaging
 Clinical Trials
Simulations
 Molecular
Dynamics
 RNA Sequencing
 Risk analysis /
modeling
 Regulatory
compliance
 Monte Carlo
simulation
 Actuarial Grid
 High Frequency
Trading
 Bitcoin / Block
chain
 Weather
Simulation
 Reservoir
Simulation
 Geographical
Information
Systems
 Operations,
management,
and analysis
 Electronic
design
automation
 Computational
fluid dynamics
 Engineering
Simulations
 BIM
 FEA
 Rendering
 Content
Creation
 Post production
Life Sciences Financial Services
Energy &
Geosciences
Manufacturing
Media and
Entertainment
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global-scale grids for research
Large Hadron Collider (LHC)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Global-scale grids for research
Best-practices using Spot: diversify computing with many instance types,
multiple AZs, multiple regions, and with stateless architectures
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1.1M vCPUs for machine learning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HPC grids in financial services
U s i n g G P U A c c e l e r a t i o n The challenge
Spinning up large numbers of GPUs quickly and
inexpensively to meet ABSI’s customers financial
modeling and reporting needs
ABSI uses proprietary algorithms (Monte Carlo
simulations) running millions of times
The solution
ABSI moved its infrastructure to AWS and deprecated its
co-located data center
ABSI built a front end on AWS for its processing
solution, automatically running GPU instances on
Amazon EC2 using Amazon EBS in an Amazon VPC for
security.
The result
Can be as much as 500 times more efficient in terms of
performance per dollar for some clients
“Using AWS helps us reduce a 10-
day process to 10 minutes. That’s
transformative: it broadens our
ability to discover.”
–Peter Phillips
Managing Director, Aon Benfield Securities
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HPC in design and manufacturing
Applications for engineering:
 Molecular dynamics, CAD, CAE, EDA
 Collaboration tools for engineering
 Big data for manufacturing yield analysis
Running drive-head
simulations at scale:
Millions of parallel parameter
sweeps, running months of
simulations in just hours
Over 85,000 Intel cores
running at peak, using Spot
Instances
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Tightly coupled HPC—weather
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fluid dynamics—Ansys Fluent
 C4.8xlarge instance type
 140M cell model
 F1 car CFD benchmark
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HPC Solution Components
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Important enablers in HPC
 Compute performance—CPUs, GPUs, FPGAs
 Memory performance—high RAM requirements in many applications
 Network performance—throughput, latency, and consistency
 Storage performance—including shared filesystems
 Automation and cluster/job management
 Graphics for pre- and post-processing
…and SCALE
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HPC Solutions Storage
EBS EFS
S3
Networking
Enhanced
Networking
Placement
Groups
Automation &
Orchestration
AWS Batch
CfnCluster
NICE EnginFrame
Visualization
NICE DCV
AppStream 2.0
Compute
EC2 Instances
(Compute and Accelerated)
EC2 Spot
Auto Scaling
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EC2 Instances
General
purpose
Dense
storage
Compute
optimized
FPGA
GPU
Compute
Storage
optimized
Graphics
intensive
Memory
optimized
High
I/O P2M4 D2 X1 G2T2 R4I3C5 F1M5 P3H1 EC2 Bare MetalG3T2 Unlimited X1eI2C4
High
I/O
General
purpose
burstable Direct access to
physical server
resources
Optimize the price/performance of your HPC Workloads with the
widest range of compute instances
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
C5 Instances - Intel XEON Scalable Processor
 Intel Skylake @ 3.0
GHz (turbo to 3.5 GHz)
 Supports AVX512
 C-state controls
 Nitro System, a
combination of
dedicated hardware and
lightweight hypervisor
 Up to 25-Gbps network
AVX 512
72 vCPUs
“Skylake”
144-GiB memory
C5
12 Gbps to EBS
2X vCPUs
3X throughput
2.4X memory
C4
36 vCPUs
“Haswell”
4 Gbps to EBS
60-GiB memory
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Xilinx
UltraScale+
FPGANVIDIA GPU
P2/P3: GPU-accelerated computing
 Enabling a high degree of parallelism
– each GPU has thousands of cores
 Consistent, well documented set of
APIs (CUDA, OpenACC, OpenCL)
 Supported by a wide variety of ISVs
and open-source frameworks
F1: FPGA-accelerated computing
 Massively parallel – each FPGA includes
millions of parallel system logic cells
 Flexible – no fixed instruction set, can
implement wide or narrow datapaths
 Programmable using available, cloud-
based FPGA development tools
GPU and FPGA for Accelerated Computing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deep learning on GPU
MXNet training on EC2 P2
instances:
 Training of a popular image
analysis algorithm, Inception v3,
using MXNet and running on P2
instances
 Scaling efficiency of 85%
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
FPGA use-cases and F1 partners
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Network Performance
AWS Proprietary Network, 10 Gbps & 25 Gbps
 Highest performance in largest EC2 instance sizes
 Full bisection bandwidth in Placement Groups, with no network oversubscription
Enhanced Networking
 Over 1M PPS performance, reduced instance-to-instance latencies, more
consistent network performance
Amazon EC2 to Amazon S3
 Traffic to and from Amazon ​S3 can now take advantage of up to 25 Gbps of
bandwidth
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Instance sizes: R4 example
R4 instances are optimized for
memory-intensive applications
 Xeon E5-2686 v4 processors
 DDR4 Memory
 Enhanced Networking, up to
25 Gb throughput
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EFS
File
Amazon EBS
Amazon EC2
Instance Store
Block
Amazon
S3 / S3-IA
Amazon Glacier
Object
Data Transfer
AWS Direct
Connect
ISV
Connectors
Amazon
Kinesis Data
Firehose
AWS Storage
Gateway
S3 Transfer
Acceleration
AWS Storage is a Platform
AWS
Snowball
Amazon
CloudFront
Internet/
VPN
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Storage Classes and Tiering on Amazon S3
Standard
• Primary data
• Big data analytics
• Small objects
• Temporary scratch space
• Archive data
• Deep/offline archives
• Tape vaulting replacement
• WORM-compliant data
• File sync and share
• Active archive
• Enterprise backup
• Media transcoding
• Georedundancy/DR
Standard - Infrequent Access Amazon Glacier
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Block Storage
Two Block Storage options for EC2 Instances: Amazon EBS and Instance Store
EC2 Instance
/dev/xvda
/dev/xvdb
/dev/xvdc
Block Device Mapping Instance Store
ephemeral0
ephemeral1
vol-xxxxxxxx
vol-xxxxxxxx
/dev/xvdd
EBS Volumes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
File Systems on AWS
 Amazon Elastic File System (Amazon EFS)
 Distributed across multiple AZs
 Petabyte-scale
 Easy to bring up, no management
 Build your own NFS
 Use for a POC
 Ephemeral data (i3.*)
 Parallel file systems
 Build your own or use APN solutions
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High-performance NFS on AWS
 EC2+EBS is the most performant method of
creating scale-up file servers on AWS
 Build your own NFS or CIFS implementation
or use a partner solution
 EC2 instances as fileservers, using EBS for
block storage—tuned for application needs
 Single fileserver performance up to 25 Gb/s
over the network
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3
Secure, durable, highly
scalable object
storage. Fast access,
low cost.
For long-term durable
storage of data, in a
readily accessible
get/put access format.
Primary durable and
scalable storage for
critical data
Amazon Glacier
Secure, durable, long
term, highly cost-
effective object
storage.
For long-term storage
and archival of data
that is infrequently
accessed.
Use for long-term,
lower-cost archival of
critical data
EBS+EC2
Create a single-AZ,
shared file system
using EC2 and EBS,
with third-party or
open source software
(ZFS, Weka.io, Avere,
Intel Lustre, etc.).
For near-line storage
of files optimized for
high IOPS.
Use for high-IOPS,
temporary working
storage
Optimize HPC storage
Amazon EFS
Highly available,
multi-AZ, fully
managed network-
attached elastic file
system.
For near-line, highly-
available storage of
files in a traditional
NFS format (NFSv4).
Use for read-often,
temporary working
storage
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data transfer
HPC Data Flow on AWS Storage
Corporate data center
Amazon
Glacier
Amazon S3
AWS Direct
Connect
ISV
Connectors
Storage
Gateway
AWS
Snowball
Internet/VPN
Ingress
Egress
Lifecycle
EC2 Instance
EBS
Instance
Store
Object, block, file storage
Kinesis Data
Firehose
S3 Transfer
Acceleration
Amazon
CloudFront
Other Shared File
System
​EFS
25 Gbps to S3
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automation and Batch Processing
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Traditional Job Schedulers Integrate Easily
Bring your scheduler to AWS, or build your own
 IBM Platform LSF
 Univa Grid Engine
 Altair PBS Pro
 SLURM
 Design your own using AWS services
 Do you actually need a scheduler?
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HPC Automation and Orchestration
Choose from several options to adapt your workloads
 CfnCluster
 AWS Batch
 AWS-NICE DCV and EnginFrame
 Build your own AWS CloudFormation templates
 ISV offerings on AWS Marketplace or use an SI
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HPC automation with CfnCluster
CfnCluster simplifies
deployment of HPC in the cloud,
including integrating with
popular HPC schedulers
Built on AWS CloudFormation,
easy to modify to meet specific
application or project
requirements
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fully managed
No software to install or
servers to manage. AWS
Batch provisions, manages,
and scales your
infrastructure.
Integrated with AWS
Natively integrated with the
AWS Platform, AWS Batch
jobs can easily and securely
interact with services such as
Amazon S3, DynamoDB, and
Amazon Rekognition.
Cost-optimized
resource provisioning
AWS Batch
automatically provisions
compute resources
tailored to the needs of
your jobs using Amazon
EC2 and EC2 Spot.
AWS Batch for HPC workloads
Focus on your applications and results!
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HPC Architecture on AWS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HPC Architecture on AWS
Corporate data center
Availability Zone
Auto Scaling group
Parallel
FS
Local
NFS
S3
Data
ingress/egress
Amazon
EFS
 Three file systems: Amazon
EFS, Local NFS, and Parallel
FS
 Snapshot of Amazon EBS to
Amazon S3
 Data tiering FS to Amazon S3
 Auto Scaling allows for scaling
when needed
Master instance
$ qsub job.sh
EBS
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hybrid HPC
U s i n g A m a z o n
E C 2 S y s t e m s M a n a g e r
Capabilities
Run
Command
State
Manager
Inventory Maintenance
Window
Patch
Manager
Automation Parameter
Store
Documents
AWS cloud
corporate data
center
IT admin, DevOps engineer
Role-based access control
 Manage thousands of Windows and Linux nodes
running on Amazon EC2 or on premises
 Control user actions and scope with secure,
granular access control
 Safely execute changes with rate control to
reduce blast radius
 Audit every user action with change tracking
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Graphics for HPC applications
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Desktop application streaming
with Amazon AppStream 2.0
Stream desktop applications securely
to any web browser
Pay as you go Scale globally
Secure apps and dataRun desktop apps
in a web browser
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AppStream 2.0 graphics support
 Multiple Instance types—including General Purpose,
Compute Optimized, Memory Optimized, Graphics
Design, Graphics Pro, and Graphics Desktop
 Always-On or On-Demand pricing models
 Support for OpenGL, DirectX, OpenCL and CUDA
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cost Optimization
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
EC2 Purchasing Options
On-Demand
Pay for compute capacity by
the second with no long-
term commitments
Spiky workloads, to define
needs
Reserved
Make a 1- or 3-year commitment
and receive a significant discount
off On-Demand prices
Committed, steady-state usage
Spot
Spare EC2 capacity at savings of
up to 90% off On-Demand prices
Fault-tolerant, dev/test, time-
flexible, stateless workloads
Per Second Billing for EC2 Linux instances & EBS volumes
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cost Optimization
Weather Forecasting and Modeling
On demand
Spot
Reserved
instances
Forecasting
00z, 06z, 12z, 18z
Climate
modeling
Weather
events
Daily forecasts
Climate
modeling
Hurricane
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Performance considerations
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Performance considerations
f o r t i g h t l y c o u p l e d c l u s t e r w o r k l o a d s
Test using real-world examples
 Use large cases for testing: do
not benchmark scalability using
only small examples
Domain decomposition
 Choose number of cells per core
for either per-core efficiency or
for faster results
MPI libraries
 Test with Intel MPI and
OpenMPI 3.0, and make use of
available tunings
Network
 Use a placement group
 Enable enhanced networking
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0.0
500.0
1000.0
1500.0
2000.0
2500.0
0 500 1000 1500 2000 2500 3000 3500 4000
Time(S)
Scale-Up
Cores
WRF 2.5 km CONUS Benchmark
Scale-Up time
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Performance considerations
f o r a l l H P C w o r k l o a d s
OS version
 Use Amazon Linux or an
updated 3.10+ kernel–4.0+ if
using NVME on F1 or I3
Processor states
 Use P-states to reduce
processor variability
Instance types
 C5, C4, M4, R4 are the best
choices today—but always test
with the latest EC2 instances
Hyper-threading and affinity
 Test with Hyper-Threading (HT)
on and off—usually off is best,
but not always
 Use CPU affinity to pin threads
to CPU cores when HT is off
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you, and how can I help you
run HPC workloads on AWS?
aws.amazon.com/hpc
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Reuters Lives: Scaling & Monitoring Live Video in the Cloud (DEV316-S) - AWS ...
Reuters Lives: Scaling & Monitoring Live Video in the Cloud (DEV316-S) - AWS ...Reuters Lives: Scaling & Monitoring Live Video in the Cloud (DEV316-S) - AWS ...
Reuters Lives: Scaling & Monitoring Live Video in the Cloud (DEV316-S) - AWS ...
 
Under the Hood of Amazon Route 53 (ARC408-R1) - AWS re:Invent 2018
Under the Hood of Amazon Route 53 (ARC408-R1) - AWS re:Invent 2018Under the Hood of Amazon Route 53 (ARC408-R1) - AWS re:Invent 2018
Under the Hood of Amazon Route 53 (ARC408-R1) - AWS re:Invent 2018
 
OTT 成功的關鍵:打造影劇品質監控儀表板 (Level: 200)
OTT 成功的關鍵:打造影劇品質監控儀表板 (Level: 200)OTT 成功的關鍵:打造影劇品質監控儀表板 (Level: 200)
OTT 成功的關鍵:打造影劇品質監控儀表板 (Level: 200)
 
Studio in the Cloud: Producing Content on AWS (MAE202) - AWS re:Invent 2018
Studio in the Cloud: Producing Content on AWS (MAE202) - AWS re:Invent 2018Studio in the Cloud: Producing Content on AWS (MAE202) - AWS re:Invent 2018
Studio in the Cloud: Producing Content on AWS (MAE202) - AWS re:Invent 2018
 
Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...
Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...
Industrialize Machine Learning Using CI/CD Techniques (FSV304-i) - AWS re:Inv...
 
Modernizing Media Supply Chains with AWS Serverless (API301) - AWS re:Invent ...
Modernizing Media Supply Chains with AWS Serverless (API301) - AWS re:Invent ...Modernizing Media Supply Chains with AWS Serverless (API301) - AWS re:Invent ...
Modernizing Media Supply Chains with AWS Serverless (API301) - AWS re:Invent ...
 
Operationalizing Machine Learning to Deliver Content at Scale (MAE306) - AWS ...
Operationalizing Machine Learning to Deliver Content at Scale (MAE306) - AWS ...Operationalizing Machine Learning to Deliver Content at Scale (MAE306) - AWS ...
Operationalizing Machine Learning to Deliver Content at Scale (MAE306) - AWS ...
 
From Russia with Love: Fox Sports World Cup Production (ARC333) - AWS re:Inve...
From Russia with Love: Fox Sports World Cup Production (ARC333) - AWS re:Inve...From Russia with Love: Fox Sports World Cup Production (ARC333) - AWS re:Inve...
From Russia with Love: Fox Sports World Cup Production (ARC333) - AWS re:Inve...
 
Announcing AWS RoboMaker: A New Cloud Robotics Service (ROB201-R) - AWS re:In...
Announcing AWS RoboMaker: A New Cloud Robotics Service (ROB201-R) - AWS re:In...Announcing AWS RoboMaker: A New Cloud Robotics Service (ROB201-R) - AWS re:In...
Announcing AWS RoboMaker: A New Cloud Robotics Service (ROB201-R) - AWS re:In...
 
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
Improve Accessibility Using Machine Learning (AIM332) - AWS re:Invent 2018
 
Any Given Thursday, Friday, Saturday: How Pac-12 Streams Hundreds of Live Eve...
Any Given Thursday, Friday, Saturday: How Pac-12 Streams Hundreds of Live Eve...Any Given Thursday, Friday, Saturday: How Pac-12 Streams Hundreds of Live Eve...
Any Given Thursday, Friday, Saturday: How Pac-12 Streams Hundreds of Live Eve...
 
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
Optimizing Storage for Enterprise Workloads and Migrations (STG202) - AWS re:...
 
Top Strategic Priorities You Can Tackle with VMware Cloud on AWS (ENT215-R1) ...
Top Strategic Priorities You Can Tackle with VMware Cloud on AWS (ENT215-R1) ...Top Strategic Priorities You Can Tackle with VMware Cloud on AWS (ENT215-R1) ...
Top Strategic Priorities You Can Tackle with VMware Cloud on AWS (ENT215-R1) ...
 
AWS re:Invent 2018: Deep Dive: Hybrid Cloud Storage Arch. w/Storage Gateway, ...
AWS re:Invent 2018: Deep Dive: Hybrid Cloud Storage Arch. w/Storage Gateway, ...AWS re:Invent 2018: Deep Dive: Hybrid Cloud Storage Arch. w/Storage Gateway, ...
AWS re:Invent 2018: Deep Dive: Hybrid Cloud Storage Arch. w/Storage Gateway, ...
 
Introduction to AI services for Developers - Builders Day Israel
Introduction to AI services for Developers - Builders Day IsraelIntroduction to AI services for Developers - Builders Day Israel
Introduction to AI services for Developers - Builders Day Israel
 
[NEW LAUNCH!] Scaling Tightly-coupled HPC workloads on HPC with Elastic Fabri...
[NEW LAUNCH!] Scaling Tightly-coupled HPC workloads on HPC with Elastic Fabri...[NEW LAUNCH!] Scaling Tightly-coupled HPC workloads on HPC with Elastic Fabri...
[NEW LAUNCH!] Scaling Tightly-coupled HPC workloads on HPC with Elastic Fabri...
 
如何以 serverless 架構打造快速回應客戶需求的零售情境 (Level: 200)
如何以 serverless 架構打造快速回應客戶需求的零售情境 (Level: 200)如何以 serverless 架構打造快速回應客戶需求的零售情境 (Level: 200)
如何以 serverless 架構打造快速回應客戶需求的零售情境 (Level: 200)
 
Amazon Cloud Directory Deep Dive (DAT364) - AWS re:Invent 2018
Amazon Cloud Directory Deep Dive (DAT364) - AWS re:Invent 2018Amazon Cloud Directory Deep Dive (DAT364) - AWS re:Invent 2018
Amazon Cloud Directory Deep Dive (DAT364) - AWS re:Invent 2018
 
[REPEAT 1] Elastic Load Balancing: Deep Dive and Best Practices (NET404-R1) -...
[REPEAT 1] Elastic Load Balancing: Deep Dive and Best Practices (NET404-R1) -...[REPEAT 1] Elastic Load Balancing: Deep Dive and Best Practices (NET404-R1) -...
[REPEAT 1] Elastic Load Balancing: Deep Dive and Best Practices (NET404-R1) -...
 
Building and Moving Live Broadcasting to AWS (CTD305) - AWS re:Invent 2018
Building and Moving Live Broadcasting to AWS (CTD305) - AWS re:Invent 2018Building and Moving Live Broadcasting to AWS (CTD305) - AWS re:Invent 2018
Building and Moving Live Broadcasting to AWS (CTD305) - AWS re:Invent 2018
 

Similar a 成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)

100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
Amazon Web Services
 

Similar a 成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300) (20)

AWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWSAWS Compute Evolved Week: High Performance Computing on AWS
AWS Compute Evolved Week: High Performance Computing on AWS
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech TalksAccelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
Accelerating Life Sciences with HPC on AWS - AWS Online Tech Talks
 
CMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWSCMP207_High Performance Computing on AWS
CMP207_High Performance Computing on AWS
 
Computação de Alta Performance (HPC) na AWS - CMP201 - Sao Paulo Summit
Computação de Alta Performance (HPC) na AWS -  CMP201 - Sao Paulo SummitComputação de Alta Performance (HPC) na AWS -  CMP201 - Sao Paulo Summit
Computação de Alta Performance (HPC) na AWS - CMP201 - Sao Paulo Summit
 
SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...SRV317_Unlocking High Performance Computing for Financial Services with Serve...
SRV317_Unlocking High Performance Computing for Financial Services with Serve...
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
AWS Compute Leadership Session: What’s New in Amazon EC2, Containers, and Ser...
AWS Compute Leadership Session: What’s New in Amazon EC2, Containers, and Ser...AWS Compute Leadership Session: What’s New in Amazon EC2, Containers, and Ser...
AWS Compute Leadership Session: What’s New in Amazon EC2, Containers, and Ser...
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
 
Standard Chartered Bank Cloud Journey
Standard Chartered Bank Cloud JourneyStandard Chartered Bank Cloud Journey
Standard Chartered Bank Cloud Journey
 
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
Hands-On Building and Deploying .NET Applications on AWS (DEV331-R1) - AWS re...
 
AWS cloud computing.pptx
AWS cloud computing.pptxAWS cloud computing.pptx
AWS cloud computing.pptx
 
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of ManufacturingGPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
GPSTEC326-GPS Industry 4.0 AI and the Future of Manufacturing
 
GPS: Industry 4.0: AI and the Future of Manufacturing - GPSTEC326 - re:Invent...
GPS: Industry 4.0: AI and the Future of Manufacturing - GPSTEC326 - re:Invent...GPS: Industry 4.0: AI and the Future of Manufacturing - GPSTEC326 - re:Invent...
GPS: Industry 4.0: AI and the Future of Manufacturing - GPSTEC326 - re:Invent...
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
 
Cox Automotive’s Data Center Migration to the AWS Cloud - ENT330 - re:Invent ...
Cox Automotive’s Data Center Migration to the AWS Cloud - ENT330 - re:Invent ...Cox Automotive’s Data Center Migration to the AWS Cloud - ENT330 - re:Invent ...
Cox Automotive’s Data Center Migration to the AWS Cloud - ENT330 - re:Invent ...
 
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
透過最新的 AWS 服務在 2019 年為您的業務轉型 (Level 200)
 
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
Better, Faster, Cheaper – Cost Optimizing Compute with Amazon EC2 Fleet #savi...
 

Más de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

成本節約之道:加速設計週期 x 大規模運行高效能運算 (HPC) 工作負載 (Level: 300)

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Jhen-Wei Huang (黃振維) Solutions Architect, Amazon Web Services High Performance Computing on AWS
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda  Overview of AWS Infrastructure  Why HPC on AWS  HPC Solution Components  Cost Optimization  Performance Considerations
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Global Infrastructure
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Over 100 Global CloudFront PoPs AWS Global Infrastructure Regions Amazon Global Network • Redundant 100 GbE network • Redundant private capacity between all Regions except China
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Global Infrastructure Region & Number of Availability Zones US West EU Oregon (3) Ireland (3) Northern California (3) Frankfurt (3) London (3) US East Paris (3) N. Virginia (6), Ohio (3) Asia Pacific Canada Singapore (3) Central (2) Sydney (3), Tokyo (4), Osaka-Local (1) Seoul (2), Mumbai (2) AWS GovCloud US-West (3) China South America Beijing (2) São Paulo (3) Ningxia (2) 55 Availability Zones within 18 geographic Regions and 1 Local Region around the world Announced Regions Bahrain, Hong Kong, SAR(China), AWS GovCloud (US-East)
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why HPC on AWS?
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Running HPC Workloads Everyday  Logistics  Machine learning  Data center, network, and server design  Consumer product design  Robotics  Semiconductor design  Retail and financial analytics
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why HPC on AWS Faster Time to Results Better ROI  Virtually unlimited infrastructure enabling scaling and agility not attainable on-premises  Flexible configuration options quickly iterate resource selection and ensure cost optimization  Increased collaboration with secure access to clusters around the world
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why HPC on AWS – Multiple Clusters $ qsub –q monolith iwait.sh $ qsub dev.sh $ qsub prod.sh $ qsub critical.sh $ qsub bigrun.sh On-Prem Launch clusters by group, user, application – no more waiting!
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Transcoding and Encoding Monte Carlo Simulations Computational Chemistry Government and Educational Research Modeling and Simulation Genome Processing Popular HPC Workloads on AWS
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. HPC Workload Types Tightly Coupled Parallel Computing Loosely Coupled Parallel Computing Accelerated Computing Visualization and Interpretation High Performance Data Storage and Analytics  Similar instance types, fixed size clusters of EC2 instances  Network intensive  Customers price sensitive  High utilization  Not typically resilient to interruptions  Scalable, flexible infrastructure  Workloads are also easily interruptible  Data Intensive  Typically massively parallel application  Need compute optimized GPUs or FPGAs  Workloads run on graphics- optimized GPUs  Need additional managed services like Workspaces or AppStream 2.0  Workloads require moving customer data to AWS  Value creation based on innovative analytics strategies like AI/ML/DL
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Industry Verticals and common HPC Workloads  Computational Chemistry  Genomics  Proteomics  Bioinformatics  Neuroimaging  Clinical Trials Simulations  Molecular Dynamics  RNA Sequencing  Risk analysis / modeling  Regulatory compliance  Monte Carlo simulation  Actuarial Grid  High Frequency Trading  Bitcoin / Block chain  Weather Simulation  Reservoir Simulation  Geographical Information Systems  Operations, management, and analysis  Electronic design automation  Computational fluid dynamics  Engineering Simulations  BIM  FEA  Rendering  Content Creation  Post production Life Sciences Financial Services Energy & Geosciences Manufacturing Media and Entertainment
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Global-scale grids for research Large Hadron Collider (LHC)
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Global-scale grids for research Best-practices using Spot: diversify computing with many instance types, multiple AZs, multiple regions, and with stateless architectures
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 1.1M vCPUs for machine learning
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. HPC grids in financial services U s i n g G P U A c c e l e r a t i o n The challenge Spinning up large numbers of GPUs quickly and inexpensively to meet ABSI’s customers financial modeling and reporting needs ABSI uses proprietary algorithms (Monte Carlo simulations) running millions of times The solution ABSI moved its infrastructure to AWS and deprecated its co-located data center ABSI built a front end on AWS for its processing solution, automatically running GPU instances on Amazon EC2 using Amazon EBS in an Amazon VPC for security. The result Can be as much as 500 times more efficient in terms of performance per dollar for some clients “Using AWS helps us reduce a 10- day process to 10 minutes. That’s transformative: it broadens our ability to discover.” –Peter Phillips Managing Director, Aon Benfield Securities
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. HPC in design and manufacturing Applications for engineering:  Molecular dynamics, CAD, CAE, EDA  Collaboration tools for engineering  Big data for manufacturing yield analysis Running drive-head simulations at scale: Millions of parallel parameter sweeps, running months of simulations in just hours Over 85,000 Intel cores running at peak, using Spot Instances
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Tightly coupled HPC—weather
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fluid dynamics—Ansys Fluent  C4.8xlarge instance type  140M cell model  F1 car CFD benchmark
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. HPC Solution Components
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Important enablers in HPC  Compute performance—CPUs, GPUs, FPGAs  Memory performance—high RAM requirements in many applications  Network performance—throughput, latency, and consistency  Storage performance—including shared filesystems  Automation and cluster/job management  Graphics for pre- and post-processing …and SCALE
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. HPC Solutions Storage EBS EFS S3 Networking Enhanced Networking Placement Groups Automation & Orchestration AWS Batch CfnCluster NICE EnginFrame Visualization NICE DCV AppStream 2.0 Compute EC2 Instances (Compute and Accelerated) EC2 Spot Auto Scaling
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 Instances General purpose Dense storage Compute optimized FPGA GPU Compute Storage optimized Graphics intensive Memory optimized High I/O P2M4 D2 X1 G2T2 R4I3C5 F1M5 P3H1 EC2 Bare MetalG3T2 Unlimited X1eI2C4 High I/O General purpose burstable Direct access to physical server resources Optimize the price/performance of your HPC Workloads with the widest range of compute instances
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. C5 Instances - Intel XEON Scalable Processor  Intel Skylake @ 3.0 GHz (turbo to 3.5 GHz)  Supports AVX512  C-state controls  Nitro System, a combination of dedicated hardware and lightweight hypervisor  Up to 25-Gbps network AVX 512 72 vCPUs “Skylake” 144-GiB memory C5 12 Gbps to EBS 2X vCPUs 3X throughput 2.4X memory C4 36 vCPUs “Haswell” 4 Gbps to EBS 60-GiB memory
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Xilinx UltraScale+ FPGANVIDIA GPU P2/P3: GPU-accelerated computing  Enabling a high degree of parallelism – each GPU has thousands of cores  Consistent, well documented set of APIs (CUDA, OpenACC, OpenCL)  Supported by a wide variety of ISVs and open-source frameworks F1: FPGA-accelerated computing  Massively parallel – each FPGA includes millions of parallel system logic cells  Flexible – no fixed instruction set, can implement wide or narrow datapaths  Programmable using available, cloud- based FPGA development tools GPU and FPGA for Accelerated Computing
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deep learning on GPU MXNet training on EC2 P2 instances:  Training of a popular image analysis algorithm, Inception v3, using MXNet and running on P2 instances  Scaling efficiency of 85%
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. FPGA use-cases and F1 partners
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Network Performance AWS Proprietary Network, 10 Gbps & 25 Gbps  Highest performance in largest EC2 instance sizes  Full bisection bandwidth in Placement Groups, with no network oversubscription Enhanced Networking  Over 1M PPS performance, reduced instance-to-instance latencies, more consistent network performance Amazon EC2 to Amazon S3  Traffic to and from Amazon ​S3 can now take advantage of up to 25 Gbps of bandwidth
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Instance sizes: R4 example R4 instances are optimized for memory-intensive applications  Xeon E5-2686 v4 processors  DDR4 Memory  Enhanced Networking, up to 25 Gb throughput
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EFS File Amazon EBS Amazon EC2 Instance Store Block Amazon S3 / S3-IA Amazon Glacier Object Data Transfer AWS Direct Connect ISV Connectors Amazon Kinesis Data Firehose AWS Storage Gateway S3 Transfer Acceleration AWS Storage is a Platform AWS Snowball Amazon CloudFront Internet/ VPN
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Storage Classes and Tiering on Amazon S3 Standard • Primary data • Big data analytics • Small objects • Temporary scratch space • Archive data • Deep/offline archives • Tape vaulting replacement • WORM-compliant data • File sync and share • Active archive • Enterprise backup • Media transcoding • Georedundancy/DR Standard - Infrequent Access Amazon Glacier
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Block Storage Two Block Storage options for EC2 Instances: Amazon EBS and Instance Store EC2 Instance /dev/xvda /dev/xvdb /dev/xvdc Block Device Mapping Instance Store ephemeral0 ephemeral1 vol-xxxxxxxx vol-xxxxxxxx /dev/xvdd EBS Volumes
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. File Systems on AWS  Amazon Elastic File System (Amazon EFS)  Distributed across multiple AZs  Petabyte-scale  Easy to bring up, no management  Build your own NFS  Use for a POC  Ephemeral data (i3.*)  Parallel file systems  Build your own or use APN solutions
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. High-performance NFS on AWS  EC2+EBS is the most performant method of creating scale-up file servers on AWS  Build your own NFS or CIFS implementation or use a partner solution  EC2 instances as fileservers, using EBS for block storage—tuned for application needs  Single fileserver performance up to 25 Gb/s over the network
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon S3 Secure, durable, highly scalable object storage. Fast access, low cost. For long-term durable storage of data, in a readily accessible get/put access format. Primary durable and scalable storage for critical data Amazon Glacier Secure, durable, long term, highly cost- effective object storage. For long-term storage and archival of data that is infrequently accessed. Use for long-term, lower-cost archival of critical data EBS+EC2 Create a single-AZ, shared file system using EC2 and EBS, with third-party or open source software (ZFS, Weka.io, Avere, Intel Lustre, etc.). For near-line storage of files optimized for high IOPS. Use for high-IOPS, temporary working storage Optimize HPC storage Amazon EFS Highly available, multi-AZ, fully managed network- attached elastic file system. For near-line, highly- available storage of files in a traditional NFS format (NFSv4). Use for read-often, temporary working storage
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data transfer HPC Data Flow on AWS Storage Corporate data center Amazon Glacier Amazon S3 AWS Direct Connect ISV Connectors Storage Gateway AWS Snowball Internet/VPN Ingress Egress Lifecycle EC2 Instance EBS Instance Store Object, block, file storage Kinesis Data Firehose S3 Transfer Acceleration Amazon CloudFront Other Shared File System ​EFS 25 Gbps to S3
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Automation and Batch Processing
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Traditional Job Schedulers Integrate Easily Bring your scheduler to AWS, or build your own  IBM Platform LSF  Univa Grid Engine  Altair PBS Pro  SLURM  Design your own using AWS services  Do you actually need a scheduler?
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. HPC Automation and Orchestration Choose from several options to adapt your workloads  CfnCluster  AWS Batch  AWS-NICE DCV and EnginFrame  Build your own AWS CloudFormation templates  ISV offerings on AWS Marketplace or use an SI
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. HPC automation with CfnCluster CfnCluster simplifies deployment of HPC in the cloud, including integrating with popular HPC schedulers Built on AWS CloudFormation, easy to modify to meet specific application or project requirements
  • 41. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fully managed No software to install or servers to manage. AWS Batch provisions, manages, and scales your infrastructure. Integrated with AWS Natively integrated with the AWS Platform, AWS Batch jobs can easily and securely interact with services such as Amazon S3, DynamoDB, and Amazon Rekognition. Cost-optimized resource provisioning AWS Batch automatically provisions compute resources tailored to the needs of your jobs using Amazon EC2 and EC2 Spot. AWS Batch for HPC workloads Focus on your applications and results!
  • 42. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. HPC Architecture on AWS
  • 43. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. HPC Architecture on AWS Corporate data center Availability Zone Auto Scaling group Parallel FS Local NFS S3 Data ingress/egress Amazon EFS  Three file systems: Amazon EFS, Local NFS, and Parallel FS  Snapshot of Amazon EBS to Amazon S3  Data tiering FS to Amazon S3  Auto Scaling allows for scaling when needed Master instance $ qsub job.sh EBS
  • 44. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hybrid HPC U s i n g A m a z o n E C 2 S y s t e m s M a n a g e r Capabilities Run Command State Manager Inventory Maintenance Window Patch Manager Automation Parameter Store Documents AWS cloud corporate data center IT admin, DevOps engineer Role-based access control  Manage thousands of Windows and Linux nodes running on Amazon EC2 or on premises  Control user actions and scope with secure, granular access control  Safely execute changes with rate control to reduce blast radius  Audit every user action with change tracking
  • 45. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Graphics for HPC applications
  • 46. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Desktop application streaming with Amazon AppStream 2.0 Stream desktop applications securely to any web browser Pay as you go Scale globally Secure apps and dataRun desktop apps in a web browser
  • 47. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AppStream 2.0 graphics support  Multiple Instance types—including General Purpose, Compute Optimized, Memory Optimized, Graphics Design, Graphics Pro, and Graphics Desktop  Always-On or On-Demand pricing models  Support for OpenGL, DirectX, OpenCL and CUDA
  • 48. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cost Optimization
  • 49. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. EC2 Purchasing Options On-Demand Pay for compute capacity by the second with no long- term commitments Spiky workloads, to define needs Reserved Make a 1- or 3-year commitment and receive a significant discount off On-Demand prices Committed, steady-state usage Spot Spare EC2 capacity at savings of up to 90% off On-Demand prices Fault-tolerant, dev/test, time- flexible, stateless workloads Per Second Billing for EC2 Linux instances & EBS volumes
  • 50. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cost Optimization Weather Forecasting and Modeling On demand Spot Reserved instances Forecasting 00z, 06z, 12z, 18z Climate modeling Weather events Daily forecasts Climate modeling Hurricane
  • 51. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Performance considerations
  • 52. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Performance considerations f o r t i g h t l y c o u p l e d c l u s t e r w o r k l o a d s Test using real-world examples  Use large cases for testing: do not benchmark scalability using only small examples Domain decomposition  Choose number of cells per core for either per-core efficiency or for faster results MPI libraries  Test with Intel MPI and OpenMPI 3.0, and make use of available tunings Network  Use a placement group  Enable enhanced networking 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.0 500.0 1000.0 1500.0 2000.0 2500.0 0 500 1000 1500 2000 2500 3000 3500 4000 Time(S) Scale-Up Cores WRF 2.5 km CONUS Benchmark Scale-Up time
  • 53. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Performance considerations f o r a l l H P C w o r k l o a d s OS version  Use Amazon Linux or an updated 3.10+ kernel–4.0+ if using NVME on F1 or I3 Processor states  Use P-states to reduce processor variability Instance types  C5, C4, M4, R4 are the best choices today—but always test with the latest EC2 instances Hyper-threading and affinity  Test with Hyper-Threading (HT) on and off—usually off is best, but not always  Use CPU affinity to pin threads to CPU cores when HT is off
  • 54. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you, and how can I help you run HPC workloads on AWS? aws.amazon.com/hpc
  • 55. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!