4. Amazon Elastic Compute Cloud (EC2):
Virtual servers in the cloud
Physical Servers in
AWS Global Regions
Host server
Hypervisor
Guest 1 Guest 2 Guest n
5. Amazon EC2 12 years ago…
Scale up or
down quickly,
as needed
Pay for what
you use
M1
“One size fits all”
8. Amazon Machine Images (AMIs)
Amazon
maintained
Set of Linux and
Windows images
Kept up-to-date by
Amazon in each
region
Community
maintained
Images published by
other AWS users
Managed and
maintained by
Marketplace partners
Your machine
images
AMIs you have
created from EC2
instances
Can be kept private or
shared with other
accounts
11. M5: Next-Generation General Purpose instance
• Powered by 2.5 GHz Intel Xeon Scalable
Processors (Skylake)
• New larger instance size—m5.24xlarge with
96 vCPUs and 384 GiB of memory
(4:1 Memory:vCPU ratio)
• Improved network and EBS performance on smaller
instance sizes
• Support for Intel AVX-512 offering up to twice the
performance for vector and floating point workloads
14% price/performance
improvement With M5
M4 M5
13. T2: General Purpose Burstable instances
T2 Burstable Performance instances
provide a generous baseline level of
CPU performance with the ability to burst
above the baseline
Lowest cost EC2 instance at $0.004
per hour, and available on AWS Free Tier
WithT2 Unlimited, burst whenever you
want for as long as you want
Just $0.05 per vCPU-hour over baseline,
averaged over 24 hours
T2.nano
0.5GiB
1 vCPU
Base perf
5%
…7 sizes
T2.2xlarge
32 GiB
8 vCPU
Base perf
135%
14. R4: Memory Optimised instances
In-memory databases In-memory analyticsIn-memory caches
• 8:1 GiB to vCPU ratio
• Memory-optimised instances with Intel Xeon (Broadwell)
processors
• Up to 25 Gbps NW bandwidth
r4.large
15.3 GiB
2 vCPU
488 GiB
64 vCPU
r4.16xlarge
…6 sizes
15. X1 and X1e—Large-Scale Memory-Optimised
For memory-intensive workloads and
very large in-memory workloads
32:1 GiB to vCPU ratio
High-performance databases, Large in-
memory databases (e.g. SAP HANA), and DB
workloads with vCPU based licensing
(Oracle, SAP)
For large in-memory workloads
16:1 GiB to vCPU ratio
In-memory databases (e.g., SAP
HANA), big data processing engines
(Apache Spark, Presto), in-memory
analytics
X1eX1
1 TB
64 vCPU
x1.16xlarge
2 TB
128 vCPU
x1.32xlarge
2 TB
64 vCPU
X1e.16xlarge
4 TB
128 vCPU
x1e.32xlarge
…6 sizes
Roadmap through 2018: up to
16TB memory instances!
…2 sizes
16. I3: I/O optimised instances
9X as many IOPS
as I2
I2 I3
• Intel Xeon E5 v4 (Broadwell) processors, with up to
15.2TB of locally attached NVMe SSD storage, 64
vCPUs, and 488 GiB memory
• Lowest cost per IOPS ($/IOPS)
• Offers very high Random I/O (up to 3.3 million IOPS)
and disk throughput (up to 16 GB/s)
• Up to 25 Gbps NW bandwidth
High-perf databases Real-time analytics
No SQL databasesTransactional workloads
SQL
17. EC2 Bare Metal
EC2 Bare Metal
Run bare metal workloads on EC2 with
all the elasticity, security, scale, and
services of AWS
i3.metal
36 hyperthreaded cores
15.2 TB SSD-based NVMe storage
512 GiB RAM
Designed for workloads that
are not virtualised, require
specific types of hypervisors,
or have licensing models that
restrict virtualization
Powers the VMware Cloud on AWS
18. Dense Storage workloads—D2 and H1
Data warehousing
Log processing
HDFS
d2.8xlarge
244 GiB
36 vCPU
48 TB
HDD
• Lowest cost per storage
($/GB)
• Supports high sequential
disk throughput
• More vCPUs and
memory per terabyte
of disk
• Lower costs for big
data uses cases
256 GiB
64 vCPU
16 TB
HDD
h1.16xlarge
Big data
Kafka
MapReduce
20. C5: Compute-optimised instances based on
Intel Skylake
• Based on 3.0 GHz Intel Xeon Scalable Processors
(Skylake)
• Up to 72 vCPUs and 144 GiB of memory
(2:1 Memory:vCPU ratio)
• 25 Gbps NW bandwidth
• Support for Intel AVX-512
25% price/performance
improvement over C4
C4 C5
“We saw significant performance improvement on
Amazon EC2 C5, with up to a 140% performance
improvement in industry standardCPU benchmarks
over C4.”
21. Accelerated computing on AWS
Parallelism increases throughout
CPU: High speed, highly flexible
GPUs and FPGAs can provide massive parallelism and higher efficiency than CPUs
for many categories of applications
22. High-performance graphics with G3
Seismic exploration and
analytics for oil and gas
“The exploration and
production models are
increasingly complex with
very large datasets, 3D and
dynamic algorithms, security,
and global reach... . Amazon
EC2 G3 instances enable
Landmark to deliver value to
our clients in ways that were
not possible before.”
- Chandra Yeleshwarapu,
Global Head of Services and Cloud
Landmark, Halliburton
Ideal for workloads needing massive
parallel processing power
Visualizations
Cloud workstation
3D rendering
Video encoding
Virtual reality
4 GPUs, 64 vCPUs, 488 GiB of host
memory, and 20 Gbps of network
bandwidth
Tesla M60 GPU offers 8 GB of GPU
memory, 2048 parallel processing cores
and a hardware encoder
10 H.265 (HEVC) 1080p30 streams
18 H.264 1080p30 streams
23. Graphics Acceleration: Elastic GPUs
Allows customers to add
low-cost graphics
acceleration to Amazon EC2
instances over the network
Come in a wide range of sizes;
you can attach GPUs to a wide
range of EC2 instances to
achieve optimal performance
OpenGL compliant, giving you
the confidence to run any
graphics-intensive application
1GiB
GPU Memory
2 GiB
4 GiB
8 GiB
C u r r e n t
G e n e r a t i o n
E C 2
I n s t a n c e
24. Use Cases for GPU Compute
Machine learning/AI High-performance computing
Natural language
processing
Image and video
recognition
Autonomous vehicle
systems
Recommendation
systems
Computational fluid
dynamics
Financial and data
analytics
Weather
simulation
Computational chemistry
25. Next Generation of GPU Compute
Instances—P3 Instances
• Industry’s most powerful GPU-based platform
• Based on NVIDIA’s latest GPUTeslaV100
• 1 PetaFLOP of computational performance in a single instance
• Provides up to 14X performance improvement over P2 for machine learning
use cases
• Up to 2.6X performance improvement over P2 for HPC use cases
P3
27. Amazon EC2 instance store
• Local to instance
• Non-persistent data store
• Data not replicated
(by default)
• No snapshot support
• SSD or HDD
EC2 instances
Physical Host
Instance Store
or
28. Amazon Elastic Block Store (EBS)
EC2
instance
EBS
volume
• Block storage as a service
• Create, attach volumes through an API
• Service accessed over the network
• Select storage and compute based on
your workload
• Volumes persist independent of EC2
• Detach and attach between instances
• Choice of magnetic and SSD-based
volume types
• Supports Snapshots: Point-in-time backup
of modified volume blockssc1st1
io1gp2
EBS
SSD-backed
volumes
EBS
HDD-backed
volumes
ElasticVolumes let you increase volume size or change volume type
30. Amazon Virtual Private Cloud (VPC)
Flow LogsNATGatewayVirtual Private Cloud
Provision a logically isolated
cloud where you can launch
AWS resources into a
virtual network
SecurityGroups & ACLs
34. Placement Groups
CLUSTER SPREAD
Placement Groups enable you to influence our selection of capacity for
member instances, optimizing the experience for a workload
EC2 places instances closely
together in order to optimise
the performance of
inter-instance network traffic
EC2 places instances on distinct
hardware in order to help
reduce correlated failures
35. Spread Placement Groups
Database
Cluster
When deploying a No SQL database
cluster in EC2, Spread Placement
will ensure the instances in your
cluster are on distinct hardware,
helping to insulate a single
hardware failure to a single node
36. Elastic Load Balancing
Load balancer used to route
incoming requests to
multiple EC2 instances,
Containers, or IP addresses
in yourVPC
ElasticLoad
Balancing provides
high-availability byutilizing
multiple
AvailabilityZones
ELB
EC2
Instance
EC2
Instance
EC2
Instance
37. Auto Scaling
Auto Scaling groupAuto Scaling group
Dynamic scaling
ELB
EC2 instances
ELB
CPU
Utilization
EC2 instances
Replace unhealthy instances
Fleet management
39. Launching Instances with Launch Templates
Tags
Launch
Parameters
User Data
Network Interface
Placement
AMI ID
EBSVolume
InstanceType Console
CLI
API
Instances
Launch
Block Device Mapping
40. Launch Templates
CONSISTENT
EXPERIENCE
Templatise launch requests in order to streamline and simplify future
launches in Auto Scaling, Spot Fleet, and On-Demand Instances
SIMPLE
PERMISSIONS
GOVERNANCE &
BEST
PRACTICES
INCREASE
PRODUCTIVITY
41. Amazon CloudWatch
• Monitoring service for AWS cloud resources and the applications you run
on AWS
• You can use Amazon CloudWatch to collect and track metrics, collect and
monitor log files, set alarms, and automatically react to changes in your
AWS resources
42. AWS Systems Manager
• Securely manage Windows and Linux
instances, EC2, or on-premises
• Stay compliant with patching, config drift
management, and software inventory
• Automate daily tasks with delegated
administration and approval
• Centrally manage secrets and config items
AWS cloud
On-premise data
center
IT Admin, DevOps Engineer
Granular Role-based Access
Control with Audit Trail
44. EC2 Purchasing Options
On-Demand
Pay for compute capacity by the
second with no long-term
commitments
Spiky workloads, to define needs
Reserved
Make a 1- or 3-year commitment and
receive a significant discount off of
On-Demand prices
Committed, steady-state usage
Spot
Spare EC2 capacity at a savings of up
to 90% off of On-Demand prices
Fault-tolerant, dev/test,
time-flexible, stateless workloads
Per Second Billing for EC2 Linux instances & EBS volumes
45. EC2 Reserved Pricing
Discount up to 75% off of the On-
Demand price
Steady state and
committed usage
1- and 3-year terms
Payment flexibility with
3 upfront payment options (all,
partial, none)
ReserveCapacity or opt for
flexibility across AZs and
instance sizes
Convertible RIs
Change instance family, OS,
tenancy, and payment
1-Year Convertible RIs
47. EC2 Spot Pricing
Turbo Boost your results
with Spot Fleet
Spare EC2Capacity that AWS
can reclaim with
2-minutes notice
Savings up to 90% off of theOn-
Demand price
Eliminate the bid! No need to learn
new APIs
Pause and resume with
Stop/Start and Hibernate
48. To optimise EC2, combine all 3 options
1. Use Reserved Instances for
known/steady-state workloads
2. Scale using Spot, On-Demand or
both
3. AWS services make this easy and
efficient (e.g., Auto Scaling, Spot
fleet, ECS, EMR, Thinkbox
Deadline, AWS Batch)
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
/Spot
On-Demand
Spot
Reserved
Good morning everybody, Welcome to the inaugural AWS Pop-up Loft in Dublin. Thanks for coming to the first session of the week.
This session is a Deep Dive on EC2
My name is , I’m a solution architect here at Amazon Web Services
If you have any questions come and find me after, ill be manning the Ask and Architect booths today and tomorrow and many of my colleagues are here also to help
3 questions to audience – 1) How many are new to the Cloud?
2) Who has spun up an EC2 server?
3) How many are using EC2 for production workloads and are here to learn more about whats new on EC2?
This graphic shows the topics and the order of what we will cover today
Over the next ¾ of an hour we will discuss EC2 in these terms
Resources: the EC2 instances themselves, the storage they use and networking concepts in effect on AWS – will spend much of the time chatting about this subject as it is the meat of the EC2 story
Availability: We will discuss how you can design your applications on EC2 for HA
Management : As your fleet grows you need tools and services to manage it so we spend some time on that
Purchase Options: How would you like to pay for your instances?
Segway – we will hone in on resources to begin …. Click …We will come back to this image as we progress to show how the different components relate to each other
Specifically we will talk about EC2 instances
[Segway]
First lets focus on EC2 resources….. Click……. Starting with EC2 instances
Message: This is where EC2 sits relative to physical servers and hypervisors
You can think as the hypervisor as mechanism to carve up the physical server into virtual machines.
[Transition] VMs on AWS are Amazon EC2 instances
We will take care of security of the Cloud up to the hypervisor level while we provide you with the tools to look after security in the Cloud
In 2006 we had a goal of ‘the on-demand delivery of IT resources via the Internet with pay-as-you-go pricing’
We wanted to create compute as a service –scale up an down, pay for what you use and that one size would fit all use cases
The first two still hold true today, we still want elasticity, the ability to scale up and down as needed, you still pay only for what you use but customers told us one size does not fit all.
[Segway]…. Click Now have over 100 instances types
Now have over 100 instance types.
15 instance families, across 6 instance categories.
Explain the term I mentioned previously ..
I3.xlarge is the API name
3 is generation number
Xlarge is the tshirt size – all the sizes have the same ratio but they get bigger
[Segway] – what runs on these instances .. Click…
Defines the software and the configuration of these instances.
Marketplace have over 4000 products from over 1200 ISV.
Will talk about some of these in more detail in a second but this represents the spectrum of the EC2 instance families
Everything with a ‘New’ over it has been released in the last 18 month
Segway - Lets start with workloads, because everything starts with workloads - click….. and work back to the instances that meet these needs
Good balance of CPU to memory
Segway – we just introduced the latest M5 ….click
8 mins
M1 only had 1 vCPU and 2 GiB – we've come a long way.
Also have smaller t-shirt sizes in same ratio
We have 25 Gbs of networking
AVX512 vector extensions which lets you double your performance on certain floating point workloads
Segway – when new looked at the CPU utilisation for our general purpose instances we notices that they are not at all busy …. Click
9 mins
Left hand size represents 0% utilisation , right hand side 100%
Most usage is in the region of 20 – 30 % CPU usage
Typical pattern is not busy, then get web request and spike and then go back to not being busy.
Segway - We wanted to come up with a way to make use of the unused CPU capacity and then to pass that saving on to our customer .. Click
9 mins
We came up with a way to share the unused CPU across customers, by overselling instances and passing on the savings to our customers in the form of burst credits
Customers loved these instances so we kept adding instance types to the family
But if they run over the baseline consistently we throttle them down, to address this we Introduced T2 unlimited
Burst whenever you want for as long as you want, you will be charged a small sum if you your average usage is above averaged CPU usage.
Segway – we talked about general purpose instances having 4:1 Memory(GB): vCPU ratio … click.. R4 offer 8:1 …
Mention workload type
Ideal for these workloads
Segway - Feedback we got was that customers have larger datasets, SAP Hana, in memory caches, apache spark .. Click
For this we released the X1 instance family, this is great for these workloads
Customers shared with us that they wanted us to grow with their needs, so we released the X1e family - 32:1 GiB to vCPU ratio, up to 4 TB
[Message: is that we are growing ahead of the customer requirement – that’s why we show the second last and the last instance type in each family]
We are releasing bigger instance sizes as per the roadmap to 16TB of memory
2 types of storage:
High performance high IO SSD
High throughput , magnetic storage
Lets start with I/O optimised,
3.3 million IOPs, 9 times the IOPs offered in the previous generation I2
Segway – first instance where we have a baremetal option .. Click
Have been working on offloading from the hypervisor to hardware offloads, that sit inside the host but don’t use up all the processer and memory
Offloading software define networking, packet processing, EBS management, encryption
Running on hardware offloads means you can have all the resources that you were previously used for these operations
Now the hypervisor is optional, you can bring your own hypervisor, or bring OSs that don’t run on vms because of licensing
Segway – some workloads are constrained by the disk throughput .. Click .. For these workloads we have D2
D2 - Workloads don’t need high IOs per second but need high sequential disk throughput
D2 – Lowest cost per storage
H1 Segway - Since we introduced this new workloads have come to the cloud - Big data, Kafka, MapReduce – that don’t need that ratio, need more vCPU for the disk, and memory for the disk they have
Shipped the H1 - for Big data and Kafka they get lower costs it has same memory but twice as many vCPU - more CPUs and memory per TB of disk
Segway – on to the workhorses of EC2 .. Click
The workloads are ..
Segway - For these workloads we introduced the C5
Like M5 they use Skylake, but we worked with them to create a Skylake chip that has extremely high performance for these workloads, 3 GHz frequency
25% price/ performance improvement over previous generation.
AVX-512 – makes inference and vector based processing much faster
Leverage AVX-512 vector extensions Netflix saw 140% performance improvement
Segway - Compute is not just about CPUs anymore…
Trend in the industry with ML and HPC to get more done faster
Metaphor, think of business jet, similar to a CPU, get 15-30 folks somewhere very quickly
But a bullet train is not quite as fast but can carry thousands of passengers – bullet train is your man. like GPU/FPGA
Parallelisation is key
CPUs are constrained by the number of cores you can put on a die, and the number of ALUs (Arithmitic Logic Units) on a core
GPUs can have 5000 cores, 8 on single machine
FPGA can have millions of programmable logic cells
Can do things in parallel that would take an age in a CPU
Segway - we are leading the industry in adaption and acceleration of GPUs
Haliburton use G3s to process large datasets in the Oil and gas exploration and production industry.
Segway - If you need just a little GPU
.
Landmark’s DecisionSpace 365, the industry’s first end-to-end Exploration and Production SaaS solution, is empowering oil and gas companies to make the shift to cloud where they’re experiencing breakthroughs in efficiency across the E&P lifecycle
If you need a little GPU, Elastic GPUs lets you attach a GPU over the network
We manage remoting the GPU calls to the Elastic GPU, use standard graphics drivers – open GL compliant
Take you choice of instance, add your GPU according to you size, pay only for what you need.
Segway – GPUs are not just about graphics workloads…. Click .. They are also used for compute – more general purpose use cases
They are also used for natural language processing, ML
Segway – for these workloads we have the GPU powerhouse … click .. The P3
Provides 14x performance for ML
2.6x for HPC
1 Petaflop of computational performance
We discussed the various instance types but what about the storage these instances run on
Segway – when we first launched EC2 in 2006 we only offered instance store ephemeral storage …… click
Instance store is still available today
The management of backups and persistent is managed elsewhere as is the case for desktop, laptops and on premise servers
Segway – but most of our customers use EBS
Volumes are logical, under the covers its more than one disk
Can choose how much disk you need.
Variety of volumes SSD, or provisioned IOPs – gp2 – General Purpose SSD; io1 – provisioned IOPs, st1 - throughput optimised HDD
Can be snapshotted, 1st snapshot, copy every block - 2nd and subsequent on snapshots changed blocks
Elastic volumes, change the volumes without service interruption
Segway – so that’s the storage the instances use, so what are the networking considerations
so that’s the storage the instances use, so what are the networking considerations
Mention the Ugur session on VPC that is coming up at 11:30 – VPC – Networking Fundamentals and Connectivity.. .
Logically isolated part of the AWS Cloud
NAT GW, map a public ip to a private one to allow private instances to access resources on the internet
Flow logs - look at traffic flow on your network adaptors and see what traffic was accepted and declined
All accessible from on premise resources using DX or VPN
Segway – lets recap on the EC2 resources we have covered thus far … click
So to bring together what we have discussed so far
Segway -
So we are back to out guiding graphic, heres where we are, we have talked about AWS resources, now lets look at the what underpins the high availability that can be achieved using EC2
I wanted to share with you this image which show the Amazon Global network that interconnects our regions.
Traverses Atlantic, Pacific, and Indian Oceans as well as the Mediterranean, Red and South China seas
Its redundant 100 GbE cables, operate without impact through a link cut
It not passed from one provider to another provider to another interconnection site to another interconnection but rather network is operated by one company, so we will give better quality service.
Segway – Placement groups
Historically used to place instances close together, we have supported this for some time.
Last year we introduced Spread
Segway – so what's a typically use case for Spread . .click
Just a parameter in the run instances called strategy = Cluster, or Strategy = Spread
Segway - ELB
Intsances across Azs, active/active config, not disruption.
Segway -
Transitions – watch out!
Segway -
Immediate Segway - As your fleets get bigger, you no longer just have 1 or a handful of servers to manage but you have hundreds, thousands or tens of thousands of servers, you need to get clever about how to manage your fleet.
In the beginning you need to specify many parameters to get you instances launched
We launched launch templates, encapsulate all these parameters into an EC2 resources, which parameters are optional, which are editable etc.
You can say that resources can only be created with certain launch templates
Segway -
Segway -
Pro tip use Cloudwatch, identify issues and mitigate them before they are noticed by your customers
Many customers have capabilities on premises for patch, config drift etc
Can now do in cloud native way to accomplish they management tasks that they have done on premise with a variety of tools
There is no additional cost for using the AWS Systems Manager, you pay only for the underlying resources managed and created by AWS Systems Manager
Segway – so that covers a sample of our management tools, click , now let me tell you about the purchase options you have available to you for EC2
Segway -
Will cover the highlights on how you should think about purchasing EC2 capacity
Ondemand – getting started, you dont know what you need
Reserved – when you know your workload
Spot – unused capacity
Segway -
Feedback we got , historically RI1 gives you 1) Capacity reservation and 2) discount
Cust were only interested in discount, we said if they waive your capacity reservation we will let your RIs float across the Azs and instance size
Segway -
How do I know how much RIs to buy
RI recommendations in Cost Explorer
Segway – we will talk about Spot
Let you take advantage of excess capacity by purchasing at a discount
Customers typically use sot for workloads that they wouldn’t otherwise pay for, things like the 30th or 40th simulation, how much is that worth to you.
Catch is 2 min warning when we need the capacity when we need it during peak times
New - Eliminate the bid: Predictive pricing - 70 - 90 discount on price, no longer need to bid, you will get the instance, and you will know its at a good price.
New - Spot acquisition experience has been integrated into the run instances API , pass a market parameter to use spot instances and if they are available you will get them
New - Hibernate Spot Instance, Can now hibernate the Spot instance when you get this 2 minute warning. We will intercept warning, freeze the running state of the instance to disk like you would if you shut laptop lid, when the capacity is available again we will hydrate that into your spot instance. This makes spot much mores accessible to a broader set of use case
Segway -
Which to use? All of the above.
Segway -
So we have completed the loop, we talked about EC2 in terms of it Resources, how we implement high availability, Management and Purchasing option
Segway - I hope you have learned something and most important of all that you enjoyed todays session
Ill be available in the ask the architect booths today and tomorrow, come and introduce yourself and bring any questions you have on this or anything else AWS