SlideShare una empresa de Scribd logo
1 de 35
From imagination to impact
10ThingsYou Didn’t Know About Cloud
Platforms: Azure, GAE and AWS
Dr. Anna Liu, Dr. Hiroshi Wada, Kevin Lee
National ICT Australia
The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic
The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic
5
The Reality of Eventual Consistency in
Amazon SimpleDB
• The probability to read updated data in SimpleDB in US West
– An application reads data X (ms) after it has written data
• SimpleDB has two
read operations
– Eventual Consistent
Read
– Consistent Read
• This pattern is
consistent
regardless of the
time of day
Eventual ConsistentConsistent Read
6
Consistent vs. Eventual Consistent Read
• SimpleDB’s consistent read guarantees to read
updated data
• What is the cost you need to pay for consistency?
– RTT is same as that of eventual consistent read
– Monetary cost (usage fee) is exactly same as eventual
consistent read
 Trade-off is not clear! We suspect consistent read is
less scalable and slower under datacenter failures.
However, we’ve not observed any differences
7
Other Commercial NoSQL Databases
• Google App Engine
– Offers eventual consistent read and consistent read
– Behavior of eventual consistent read is completely
different from Amazon’s
– In GAE, both types of reads behave exactly same unless
data centers have a failure(s)
• Windows Azure
– Offers no options for read
– Always consistent
The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic
Limitations and Quotas
Limitations Quotas
Amazon
Web
Services
•Manually setup all
applications
•Maximum 5 GB per file in S3
•Maximum 5 seconds query
execution time in SimpleDB
•20 On-Demand or Reserved
Instances and 100 Spot Instances by
default
•1GB free outgoing bandwidth per
month in SimpleDB, S3 and EC2
Microsof
t
Windows
Azure
•2 deployments per service
(production and staging)
•.NET, PHP or Java
programming language
•Up to 50 GB for a SQL Azure
•20 concurrent small compute
instances or equivalent per month
•10 TB of total data transfers per
month
Google
App
Engine
•Java or Python programming
language
•Maximum 30 seconds for
each request
•1 MB for each Datastore
entity
•Maximum 2 GB per file in
Blobstore (per API call
manipulate <1MB)
•10 web applications per user
•43, 200, 000 requests per day
•1 GB (1, 046 GB maximum if billing
enabled) incoming/outgoing
bandwidth per day
•6.5 CPU-hours (1, 729 CPU-hours
maximum if billing enabled) per day
The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic
Performance Unpredictability in Cloud
• Performance unpredictability is one of the major
obstacles
– Performance variance of a MapReduce job for a 50-node
EC2 cluster and a 50-node local cluster
– Examples (time as performance
metric)
• Repeatability of results for
researchers
• Time critical tasks for enterprises
Benchmark Details
Metrics Measurements
Benchma
rk Tools
Instance Startup
elapsed time from the moment a request
for an instance is sent to the moment that
the requested instance is available.
CPU
a single score by executing various
concurrent integer and floating point
calculations
Ubench
Memory Speed
a single score by executing random memory
allocations as well as memory to memory
copying
Ubench
Disk I/O
sequential reads/writes and random reads
block I/O Bonnie++
Network Bandwidth bandwidth, delay jitter and diagram loss Iperf
S3 Access
uploading a 100 MB file from one unused
node of physical cluster at Saarland
University to a newly created bucket on S3
Benchmark Results in EC2
CPU
Memor
y
Sequen
tial
Read
Rando
m
Read
Networ
k
S3
Access
COV in
Physical
Cluster
0.1% 0.3% 0.6% 1.9% 0.2%
COV in
Small EC2 21% 8% 17% 9%
19% 54%
COV in
Large EC2 24% 10% 20% 13%
The COV of large instance is higher than the small. However,
both are at least by an order magnitude less stable than on a
physical cluster.
The COV of S3 Access may be influenced by other traffic on
the network, showing this experiment just for completeness.
Reference - Schad, Jo rg, Jens Dittrich, and Jorge-Arnulfo Quiané-Ruiz. 2010. Runtime Measurements in the Cloud: Observing, Analyzing, and̈
Reducing Variance. In Proceedings of the 36th international conference on Very large data bases. Vol. 3. 1. Singapore, Singapore: VLDB
Endowment.
The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic
Distributed Transactions in Cloud
• There is now a range of Cloud Database types
• NOSQL (Azure Table, GAE Datastore, Amazon SimpleDB...)
– Much more ‘shardable’ architecture; No joins, not full ACID support
• SQL (Azure SQL, Amazon RDS, Oracle on EC2...)
– Variable distributed transactional support compared to their traditional
RDBMS counterpart
• Experience with porting PetShop
• Challenge with porting the data access layer
– Some JDO interface not supported by App Engine, eg. ‘Join query’
– No distributed transaction support in Azure SQL atm
15
The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic
Pricing fluctuates over space and time
• On demand pricing (hourly, per GB, per ‘000 requests)
• Reserved instances (1 or 3 year term + unit cost)
• Spot pricing (typically cheaper in US-East!)
• Similar pricing schemes observed for GAE and Azure
17
The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic
Sticky Session Support
• Autoscaling alone does not guarantee that clients of the
same session will always contact the same instance
• Clients cannot perform a series of connected operations
• Amazon ELB supports Session Affinity
– Session affinity allows mapping to be created at the ELB
– Limitations
• Session affinity cannot handle HTTPS
• Autoscaling down an instance with a live session
• MS Azure advocates stateless sessions
– If you must – store session state in eg table storage
• Design issue - Server to remember conversation context? Or
for client to remind it every time? How long should it ‘stick’?
Too long: compromise server ability to distribute load
The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic
Infrastructure Configuration
(VPN, VMs, Disk, …)
Infrastructure Configuration
(VPN, VMs, Disk, …)
OS/ApplicationSecurity
(e.g.,ActiveDirectory)
OS/ApplicationSecurity
(e.g.,ActiveDirectory)
OS/Middleware Installation/ConfigurationOS/Middleware Installation/Configuration
OS
Patching
OS
Patching
Application Installation/ConfigurationApplication Installation/Configuration
Application
Patching
Application
Patching
Billing
(CostCenterCharging)
Billing
(CostCenterCharging)
AntivirusAntivirus OS
Backup
OS
Backup
OS
Monitoring
OS
Monitoring
App Data
Backup
App Data
Backup
Application
Monitoring
Application
Monitoring
Amazon EC2
(IaaS providers)
Infrastructure
Monitoring
(CPU, Disk, Net, …)
Infrastructure
Monitoring
(CPU, Disk, Net, …)
Usage Report
and
Basic Billing
Usage Report
and
Basic Billing
Access Control
to IaaS
Access Control
to IaaS
Customers’ Responsibility in IaaS Cloud
Customers’
Responsibility
The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic
Secure Connection to the Cloud
23
Performance Implications
• Low Security Option – max throughput 5.6MB/sec
• High Security Option - connection throughput is 4MB/sec
– Performance hit due to encryption, decryption and firewall
• Other interesting observations:
– VPC only available US East-1 and EU-west1
– in single availability zone only
– S3 not working well with VPC yet (very slow), EBS is a workaround
– MS Azure VPN support next year
– Google Secure Connector
The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic
Time to Getting a New Instance
• Typically takes minutes to create an instance from its image
on EC2
• Trick to “create” instances quicker
– Create a pool of instances in advance, and stop (hibernate) them all
• Pay no instance cost but need to pay for storage cost (for stopped
instances)
– Revive stopped instances if new instances are needed
Operating
System
Method Time
Windows Create from image 10-15 minutes
Linux Create from image 5-10 minutes
Windows Revive stopped instance 30 seconds
Linux Revive stopped instance 30 seconds
The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic
Autoscaling is Not All Magic
• Amazon EC2
“… your application can automatically scale itself up and down depending on its
needs.”
• Windows Azure
“Optimizd for scale-out applications-designed so that developers can easily build
scale-out applications…”
• Google App Engine
“No matter how many users you have or how much data your application stores,
App Engine can scale to meet your needs”
Autoscaling is Not All Magical (contd)
Provider How to Scale? Limitations
Amazon EC2 • Load balancing with Elastic Load
Balancer (ELB)
• Event processing with Autoscaling API
• Monitoring through CloudWatch
• Load balancer is the bottle-neck,
hence limited throughput
• Limited load balancing options (e.g.,
no hardware load balancer)
• Limited rule support (e.g. no
conjunctions allowed in rules)
• Limited monitoring support (e.g.
limited to minute granularity)
Windows
Azure
• Load balancing with Azure Queue
Storage
• Event processing with WF rules engine
• Monitoring through Azure Diagnostics
• Create/Delete instances with
Management API
• Throughput limited by Azure Queue
• Limited monitoring support (e.g.
billing information not monitored)
Google App
Engine
• Built-in with App Engine • No control over how it scales
• Number of simultaneous sessions
limited by per-minute (burst) quota
(500 requests per sec by default),
server request time-out (30 secs), etc.
The 10 Things are...
1. How long does it take for data in cloud to become
consistent
2. Limitation and quotas
3. How unpredictable/variable is the cloud?
4. Distributed transaction support in Cloud
5. Pricing variations over time and space
6. Sticky session support
7. The new matrix of roles and responsibilities for cloud
providers, consumers and system integrators
8. Secure connections to the cloud
9. Time to getting a new instance
10. Auto-scaling is not all magic
Getting Involved
• Linkage with National ICT Australia
•Contract Research, Expert Advisory Services,
Architecture Reviews
•Public and In-house Training Courses
•Market Surveys, Case Studies
•Professional in Research Residence
Anna.Liu@nicta.com.au, @annaliu
http://blogs.unsw.edu.au/annaliu/
From imagination to impact
Virtual Machine ‘Stolen Time’
• Using traditional system resource monitoring tools in cloud
– Measuring system performance within a virtual instance (using tools
such as vmstat and top) can give misleading information
– Example: An EC2 instance (e.g. m1.small with 1 EC2 compute unit)
does not go above around 40% CPU load as observed from vmstat
• Certain percentage (around 50-60%) appears on vmstat as ‘st’
“st – Time stolen from a virtual machine” (from vmstat manpage)
• Does it mean I am not getting what I paid for? No, not really
– Amazon instances are measured by EC2 compute units
– “One EC2 compute Unit provides the equivalent CPU capacity of a 1.0-
1.2GHz 2007 Opteron or 2007 Xeon process”
• Monitoring system performance in cloud
– Use Cloud monitoring tools such as CloudWatch and RightScale
Limitation of Virtual Private Cloud (VPC)
• VPC hosts are logically detached from (but physically
attached to) the Amazon network
– No direct connection to and from S3 via the Amazon local network
– Connection via internet only
• What happen if we need to transfer data from S3 to a VPC
host?
– E.g. If we ship a removable media to Amazon, it would be uploaded to
S3. How do we transfer the data to a VPC host?
– Option 1: Direct transfer from S3 to VPC host
• Traffic routes through the remote side and comes back (High latency)
– Option 2: Transfer to EBS and mount EBS to VPC host
• Traffic routes through local network (Low latency)
35
How Long You Need to Wait to Get Updated
with Eventual Consistent Read?
• Result of the “5 minutes run” for one week
• t1: the first time to
read updated data
• t2: the first time to
reach 100% of
reading updated
• t3: the last time to
read stale data
 Mostly updated
after 600ms but no
guarantee

Más contenido relacionado

La actualidad más candente

Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaCloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaAmazon Web Services
 
Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...javier ramirez
 
Backup Exec 2014 Customer Success Story - Mitre 10
Backup Exec 2014 Customer Success Story  - Mitre 10 Backup Exec 2014 Customer Success Story  - Mitre 10
Backup Exec 2014 Customer Success Story - Mitre 10 Symantec
 
IT Geek Week 2016 - Introduction To Cloud Computing
IT Geek Week 2016 - Introduction To Cloud ComputingIT Geek Week 2016 - Introduction To Cloud Computing
IT Geek Week 2016 - Introduction To Cloud ComputingHaim Ateya
 
Make a Move to the Azure Cloud with SoftNAS
Make a Move to the Azure Cloud with SoftNASMake a Move to the Azure Cloud with SoftNAS
Make a Move to the Azure Cloud with SoftNASBuurst
 
Google’s Committed Use Discounts vs. AWS Reserved Instances and More Ways to ...
Google’s Committed Use Discounts vs. AWS Reserved Instances and More Ways to ...Google’s Committed Use Discounts vs. AWS Reserved Instances and More Ways to ...
Google’s Committed Use Discounts vs. AWS Reserved Instances and More Ways to ...RightScale
 
Highly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakiHighly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakijavier ramirez
 
Spend Less on Azure
Spend Less on AzureSpend Less on Azure
Spend Less on AzureFrans Lytzen
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with AzureMarco Parenzan
 
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...Michaël Figuière
 
Cisco & Microsoft Converged Infrastructure
Cisco & Microsoft Converged InfrastructureCisco & Microsoft Converged Infrastructure
Cisco & Microsoft Converged InfrastructureAymen Mami
 
Azure Cloud Patterns
Azure Cloud PatternsAzure Cloud Patterns
Azure Cloud PatternsTamir Dresher
 
Cloud Computing : Security and Forensics
Cloud Computing : Security and ForensicsCloud Computing : Security and Forensics
Cloud Computing : Security and ForensicsGovind Maheswaran
 
Nats meetup sf 20150826
Nats meetup sf   20150826Nats meetup sf   20150826
Nats meetup sf 20150826Apcera
 
Architecting applications in the AWS cloud
Architecting applications in the AWS cloudArchitecting applications in the AWS cloud
Architecting applications in the AWS cloudCloud Genius
 
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...javier ramirez
 

La actualidad más candente (20)

Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh VariaCloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
Cloud-powered Continuous Integration and Deployment architectures - Jinesh Varia
 
Analytics on AWS - IP Expo 2013
Analytics on AWS - IP Expo 2013Analytics on AWS - IP Expo 2013
Analytics on AWS - IP Expo 2013
 
Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...Everything you always wanted to know about Distributed databases, at devoxx l...
Everything you always wanted to know about Distributed databases, at devoxx l...
 
Enterprise Journey to the Cloud
Enterprise Journey to the CloudEnterprise Journey to the Cloud
Enterprise Journey to the Cloud
 
Backup Exec 2014 Customer Success Story - Mitre 10
Backup Exec 2014 Customer Success Story  - Mitre 10 Backup Exec 2014 Customer Success Story  - Mitre 10
Backup Exec 2014 Customer Success Story - Mitre 10
 
IT Geek Week 2016 - Introduction To Cloud Computing
IT Geek Week 2016 - Introduction To Cloud ComputingIT Geek Week 2016 - Introduction To Cloud Computing
IT Geek Week 2016 - Introduction To Cloud Computing
 
Cloud architecture
Cloud architectureCloud architecture
Cloud architecture
 
Make a Move to the Azure Cloud with SoftNAS
Make a Move to the Azure Cloud with SoftNASMake a Move to the Azure Cloud with SoftNAS
Make a Move to the Azure Cloud with SoftNAS
 
Google’s Committed Use Discounts vs. AWS Reserved Instances and More Ways to ...
Google’s Committed Use Discounts vs. AWS Reserved Instances and More Ways to ...Google’s Committed Use Discounts vs. AWS Reserved Instances and More Ways to ...
Google’s Committed Use Discounts vs. AWS Reserved Instances and More Ways to ...
 
Highly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowakiHighly available distributed databases, how they work, javier ramirez at teowaki
Highly available distributed databases, how they work, javier ramirez at teowaki
 
Spend Less on Azure
Spend Less on AzureSpend Less on Azure
Spend Less on Azure
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
Xebia Knowledge Exchange (jan 2011) - Trends in Enterprise Applications Archi...
 
Cisco & Microsoft Converged Infrastructure
Cisco & Microsoft Converged InfrastructureCisco & Microsoft Converged Infrastructure
Cisco & Microsoft Converged Infrastructure
 
Azure Cloud Patterns
Azure Cloud PatternsAzure Cloud Patterns
Azure Cloud Patterns
 
Cloud Computing : Security and Forensics
Cloud Computing : Security and ForensicsCloud Computing : Security and Forensics
Cloud Computing : Security and Forensics
 
Nats meetup sf 20150826
Nats meetup sf   20150826Nats meetup sf   20150826
Nats meetup sf 20150826
 
Architecting applications in the AWS cloud
Architecting applications in the AWS cloudArchitecting applications in the AWS cloud
Architecting applications in the AWS cloud
 
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
Basics of the Highly Available Distributed Databases - teowaki - javier ramir...
 

Similar a 10 Things you didn't know about Cloud Platforms: AWS, GAE, Azure

Openstack.pptx.pdf
Openstack.pptx.pdfOpenstack.pptx.pdf
Openstack.pptx.pdfKnoldus Inc.
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAmazon Web Services
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople
 
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | QuboleVasu S
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data DataCentred
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupAndrei Savu
 
AZUG.BE - Azure User Group Belgium - First public meeting
AZUG.BE - Azure User Group Belgium - First public meetingAZUG.BE - Azure User Group Belgium - First public meeting
AZUG.BE - Azure User Group Belgium - First public meetingMaarten Balliauw
 
Docker-N-Beyond
Docker-N-BeyondDocker-N-Beyond
Docker-N-Beyondsantosh007
 
Declare Victory with Big Data
Declare Victory with Big DataDeclare Victory with Big Data
Declare Victory with Big DataJ On The Beach
 
Cambridge Breakfast Seminar
Cambridge Breakfast SeminarCambridge Breakfast Seminar
Cambridge Breakfast SeminarNuoDB
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformDATAVERSITY
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeAdrian Cockcroft
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Cloud Spotting 2017: An overview of cloud computing
Cloud Spotting 2017: An overview of cloud computingCloud Spotting 2017: An overview of cloud computing
Cloud Spotting 2017: An overview of cloud computingPatrice Kerremans
 
AWS Serverless Community Day Keynote and Vendia Launch 6-26-2020
AWS Serverless Community Day Keynote and Vendia Launch 6-26-2020AWS Serverless Community Day Keynote and Vendia Launch 6-26-2020
AWS Serverless Community Day Keynote and Vendia Launch 6-26-2020Tim Wagner
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreDataStax Academy
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAlluxio, Inc.
 

Similar a 10 Things you didn't know about Cloud Platforms: AWS, GAE, Azure (20)

02-WhyCloud.pdf
02-WhyCloud.pdf02-WhyCloud.pdf
02-WhyCloud.pdf
 
Openstack.pptx.pdf
Openstack.pptx.pdfOpenstack.pptx.pdf
Openstack.pptx.pdf
 
AWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data AnalyticsAWS Sydney Summit 2013 - Big Data Analytics
AWS Sydney Summit 2013 - Big Data Analytics
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
2020 Cloud Data Lake Platforms Buyers Guide - White paper | Qubole
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
 
AZUG.BE - Azure User Group Belgium - First public meeting
AZUG.BE - Azure User Group Belgium - First public meetingAZUG.BE - Azure User Group Belgium - First public meeting
AZUG.BE - Azure User Group Belgium - First public meeting
 
Docker-N-Beyond
Docker-N-BeyondDocker-N-Beyond
Docker-N-Beyond
 
Declare Victory with Big Data
Declare Victory with Big DataDeclare Victory with Big Data
Declare Victory with Big Data
 
Cambridge Breakfast Seminar
Cambridge Breakfast SeminarCambridge Breakfast Seminar
Cambridge Breakfast Seminar
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Gluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A ChallengeGluecon Monitoring Microservices and Containers: A Challenge
Gluecon Monitoring Microservices and Containers: A Challenge
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Cloud computing What Why How
Cloud computing What Why HowCloud computing What Why How
Cloud computing What Why How
 
Cloud Spotting 2017: An overview of cloud computing
Cloud Spotting 2017: An overview of cloud computingCloud Spotting 2017: An overview of cloud computing
Cloud Spotting 2017: An overview of cloud computing
 
AWS Serverless Community Day Keynote and Vendia Launch 6-26-2020
AWS Serverless Community Day Keynote and Vendia Launch 6-26-2020AWS Serverless Community Day Keynote and Vendia Launch 6-26-2020
AWS Serverless Community Day Keynote and Vendia Launch 6-26-2020
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 

Último

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Último (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

10 Things you didn't know about Cloud Platforms: AWS, GAE, Azure

  • 2. 10ThingsYou Didn’t Know About Cloud Platforms: Azure, GAE and AWS Dr. Anna Liu, Dr. Hiroshi Wada, Kevin Lee National ICT Australia
  • 3. The 10 Things are... 1. How long does it take for data in cloud to become consistent 2. Limitation and quotas 3. How unpredictable/variable is the cloud? 4. Distributed transaction support in Cloud 5. Pricing variations over time and space 6. Sticky session support 7. The new matrix of roles and responsibilities for cloud providers, consumers and system integrators 8. Secure connections to the cloud 9. Time to getting a new instance 10. Auto-scaling is not all magic
  • 4. The 10 Things are... 1. How long does it take for data in cloud to become consistent 2. Limitation and quotas 3. How unpredictable/variable is the cloud? 4. Distributed transaction support in Cloud 5. Pricing variations over time and space 6. Sticky session support 7. The new matrix of roles and responsibilities for cloud providers, consumers and system integrators 8. Secure connections to the cloud 9. Time to getting a new instance 10. Auto-scaling is not all magic
  • 5. 5 The Reality of Eventual Consistency in Amazon SimpleDB • The probability to read updated data in SimpleDB in US West – An application reads data X (ms) after it has written data • SimpleDB has two read operations – Eventual Consistent Read – Consistent Read • This pattern is consistent regardless of the time of day Eventual ConsistentConsistent Read
  • 6. 6 Consistent vs. Eventual Consistent Read • SimpleDB’s consistent read guarantees to read updated data • What is the cost you need to pay for consistency? – RTT is same as that of eventual consistent read – Monetary cost (usage fee) is exactly same as eventual consistent read  Trade-off is not clear! We suspect consistent read is less scalable and slower under datacenter failures. However, we’ve not observed any differences
  • 7. 7 Other Commercial NoSQL Databases • Google App Engine – Offers eventual consistent read and consistent read – Behavior of eventual consistent read is completely different from Amazon’s – In GAE, both types of reads behave exactly same unless data centers have a failure(s) • Windows Azure – Offers no options for read – Always consistent
  • 8. The 10 Things are... 1. How long does it take for data in cloud to become consistent 2. Limitation and quotas 3. How unpredictable/variable is the cloud? 4. Distributed transaction support in Cloud 5. Pricing variations over time and space 6. Sticky session support 7. The new matrix of roles and responsibilities for cloud providers, consumers and system integrators 8. Secure connections to the cloud 9. Time to getting a new instance 10. Auto-scaling is not all magic
  • 9. Limitations and Quotas Limitations Quotas Amazon Web Services •Manually setup all applications •Maximum 5 GB per file in S3 •Maximum 5 seconds query execution time in SimpleDB •20 On-Demand or Reserved Instances and 100 Spot Instances by default •1GB free outgoing bandwidth per month in SimpleDB, S3 and EC2 Microsof t Windows Azure •2 deployments per service (production and staging) •.NET, PHP or Java programming language •Up to 50 GB for a SQL Azure •20 concurrent small compute instances or equivalent per month •10 TB of total data transfers per month Google App Engine •Java or Python programming language •Maximum 30 seconds for each request •1 MB for each Datastore entity •Maximum 2 GB per file in Blobstore (per API call manipulate <1MB) •10 web applications per user •43, 200, 000 requests per day •1 GB (1, 046 GB maximum if billing enabled) incoming/outgoing bandwidth per day •6.5 CPU-hours (1, 729 CPU-hours maximum if billing enabled) per day
  • 10. The 10 Things are... 1. How long does it take for data in cloud to become consistent 2. Limitation and quotas 3. How unpredictable/variable is the cloud? 4. Distributed transaction support in Cloud 5. Pricing variations over time and space 6. Sticky session support 7. The new matrix of roles and responsibilities for cloud providers, consumers and system integrators 8. Secure connections to the cloud 9. Time to getting a new instance 10. Auto-scaling is not all magic
  • 11. Performance Unpredictability in Cloud • Performance unpredictability is one of the major obstacles – Performance variance of a MapReduce job for a 50-node EC2 cluster and a 50-node local cluster – Examples (time as performance metric) • Repeatability of results for researchers • Time critical tasks for enterprises
  • 12. Benchmark Details Metrics Measurements Benchma rk Tools Instance Startup elapsed time from the moment a request for an instance is sent to the moment that the requested instance is available. CPU a single score by executing various concurrent integer and floating point calculations Ubench Memory Speed a single score by executing random memory allocations as well as memory to memory copying Ubench Disk I/O sequential reads/writes and random reads block I/O Bonnie++ Network Bandwidth bandwidth, delay jitter and diagram loss Iperf S3 Access uploading a 100 MB file from one unused node of physical cluster at Saarland University to a newly created bucket on S3
  • 13. Benchmark Results in EC2 CPU Memor y Sequen tial Read Rando m Read Networ k S3 Access COV in Physical Cluster 0.1% 0.3% 0.6% 1.9% 0.2% COV in Small EC2 21% 8% 17% 9% 19% 54% COV in Large EC2 24% 10% 20% 13% The COV of large instance is higher than the small. However, both are at least by an order magnitude less stable than on a physical cluster. The COV of S3 Access may be influenced by other traffic on the network, showing this experiment just for completeness. Reference - Schad, Jo rg, Jens Dittrich, and Jorge-Arnulfo Quiané-Ruiz. 2010. Runtime Measurements in the Cloud: Observing, Analyzing, and̈ Reducing Variance. In Proceedings of the 36th international conference on Very large data bases. Vol. 3. 1. Singapore, Singapore: VLDB Endowment.
  • 14. The 10 Things are... 1. How long does it take for data in cloud to become consistent 2. Limitation and quotas 3. How unpredictable/variable is the cloud? 4. Distributed transaction support in Cloud 5. Pricing variations over time and space 6. Sticky session support 7. The new matrix of roles and responsibilities for cloud providers, consumers and system integrators 8. Secure connections to the cloud 9. Time to getting a new instance 10. Auto-scaling is not all magic
  • 15. Distributed Transactions in Cloud • There is now a range of Cloud Database types • NOSQL (Azure Table, GAE Datastore, Amazon SimpleDB...) – Much more ‘shardable’ architecture; No joins, not full ACID support • SQL (Azure SQL, Amazon RDS, Oracle on EC2...) – Variable distributed transactional support compared to their traditional RDBMS counterpart • Experience with porting PetShop • Challenge with porting the data access layer – Some JDO interface not supported by App Engine, eg. ‘Join query’ – No distributed transaction support in Azure SQL atm 15
  • 16. The 10 Things are... 1. How long does it take for data in cloud to become consistent 2. Limitation and quotas 3. How unpredictable/variable is the cloud? 4. Distributed transaction support in Cloud 5. Pricing variations over time and space 6. Sticky session support 7. The new matrix of roles and responsibilities for cloud providers, consumers and system integrators 8. Secure connections to the cloud 9. Time to getting a new instance 10. Auto-scaling is not all magic
  • 17. Pricing fluctuates over space and time • On demand pricing (hourly, per GB, per ‘000 requests) • Reserved instances (1 or 3 year term + unit cost) • Spot pricing (typically cheaper in US-East!) • Similar pricing schemes observed for GAE and Azure 17
  • 18. The 10 Things are... 1. How long does it take for data in cloud to become consistent 2. Limitation and quotas 3. How unpredictable/variable is the cloud? 4. Distributed transaction support in Cloud 5. Pricing variations over time and space 6. Sticky session support 7. The new matrix of roles and responsibilities for cloud providers, consumers and system integrators 8. Secure connections to the cloud 9. Time to getting a new instance 10. Auto-scaling is not all magic
  • 19. Sticky Session Support • Autoscaling alone does not guarantee that clients of the same session will always contact the same instance • Clients cannot perform a series of connected operations • Amazon ELB supports Session Affinity – Session affinity allows mapping to be created at the ELB – Limitations • Session affinity cannot handle HTTPS • Autoscaling down an instance with a live session • MS Azure advocates stateless sessions – If you must – store session state in eg table storage • Design issue - Server to remember conversation context? Or for client to remind it every time? How long should it ‘stick’? Too long: compromise server ability to distribute load
  • 20. The 10 Things are... 1. How long does it take for data in cloud to become consistent 2. Limitation and quotas 3. How unpredictable/variable is the cloud? 4. Distributed transaction support in Cloud 5. Pricing variations over time and space 6. Sticky session support 7. The new matrix of roles and responsibilities for cloud providers, consumers and system integrators 8. Secure connections to the cloud 9. Time to getting a new instance 10. Auto-scaling is not all magic
  • 21. Infrastructure Configuration (VPN, VMs, Disk, …) Infrastructure Configuration (VPN, VMs, Disk, …) OS/ApplicationSecurity (e.g.,ActiveDirectory) OS/ApplicationSecurity (e.g.,ActiveDirectory) OS/Middleware Installation/ConfigurationOS/Middleware Installation/Configuration OS Patching OS Patching Application Installation/ConfigurationApplication Installation/Configuration Application Patching Application Patching Billing (CostCenterCharging) Billing (CostCenterCharging) AntivirusAntivirus OS Backup OS Backup OS Monitoring OS Monitoring App Data Backup App Data Backup Application Monitoring Application Monitoring Amazon EC2 (IaaS providers) Infrastructure Monitoring (CPU, Disk, Net, …) Infrastructure Monitoring (CPU, Disk, Net, …) Usage Report and Basic Billing Usage Report and Basic Billing Access Control to IaaS Access Control to IaaS Customers’ Responsibility in IaaS Cloud Customers’ Responsibility
  • 22. The 10 Things are... 1. How long does it take for data in cloud to become consistent 2. Limitation and quotas 3. How unpredictable/variable is the cloud? 4. Distributed transaction support in Cloud 5. Pricing variations over time and space 6. Sticky session support 7. The new matrix of roles and responsibilities for cloud providers, consumers and system integrators 8. Secure connections to the cloud 9. Time to getting a new instance 10. Auto-scaling is not all magic
  • 23. Secure Connection to the Cloud 23
  • 24. Performance Implications • Low Security Option – max throughput 5.6MB/sec • High Security Option - connection throughput is 4MB/sec – Performance hit due to encryption, decryption and firewall • Other interesting observations: – VPC only available US East-1 and EU-west1 – in single availability zone only – S3 not working well with VPC yet (very slow), EBS is a workaround – MS Azure VPN support next year – Google Secure Connector
  • 25. The 10 Things are... 1. How long does it take for data in cloud to become consistent 2. Limitation and quotas 3. How unpredictable/variable is the cloud? 4. Distributed transaction support in Cloud 5. Pricing variations over time and space 6. Sticky session support 7. The new matrix of roles and responsibilities for cloud providers, consumers and system integrators 8. Secure connections to the cloud 9. Time to getting a new instance 10. Auto-scaling is not all magic
  • 26. Time to Getting a New Instance • Typically takes minutes to create an instance from its image on EC2 • Trick to “create” instances quicker – Create a pool of instances in advance, and stop (hibernate) them all • Pay no instance cost but need to pay for storage cost (for stopped instances) – Revive stopped instances if new instances are needed Operating System Method Time Windows Create from image 10-15 minutes Linux Create from image 5-10 minutes Windows Revive stopped instance 30 seconds Linux Revive stopped instance 30 seconds
  • 27. The 10 Things are... 1. How long does it take for data in cloud to become consistent 2. Limitation and quotas 3. How unpredictable/variable is the cloud? 4. Distributed transaction support in Cloud 5. Pricing variations over time and space 6. Sticky session support 7. The new matrix of roles and responsibilities for cloud providers, consumers and system integrators 8. Secure connections to the cloud 9. Time to getting a new instance 10. Auto-scaling is not all magic
  • 28. Autoscaling is Not All Magic • Amazon EC2 “… your application can automatically scale itself up and down depending on its needs.” • Windows Azure “Optimizd for scale-out applications-designed so that developers can easily build scale-out applications…” • Google App Engine “No matter how many users you have or how much data your application stores, App Engine can scale to meet your needs”
  • 29. Autoscaling is Not All Magical (contd) Provider How to Scale? Limitations Amazon EC2 • Load balancing with Elastic Load Balancer (ELB) • Event processing with Autoscaling API • Monitoring through CloudWatch • Load balancer is the bottle-neck, hence limited throughput • Limited load balancing options (e.g., no hardware load balancer) • Limited rule support (e.g. no conjunctions allowed in rules) • Limited monitoring support (e.g. limited to minute granularity) Windows Azure • Load balancing with Azure Queue Storage • Event processing with WF rules engine • Monitoring through Azure Diagnostics • Create/Delete instances with Management API • Throughput limited by Azure Queue • Limited monitoring support (e.g. billing information not monitored) Google App Engine • Built-in with App Engine • No control over how it scales • Number of simultaneous sessions limited by per-minute (burst) quota (500 requests per sec by default), server request time-out (30 secs), etc.
  • 30. The 10 Things are... 1. How long does it take for data in cloud to become consistent 2. Limitation and quotas 3. How unpredictable/variable is the cloud? 4. Distributed transaction support in Cloud 5. Pricing variations over time and space 6. Sticky session support 7. The new matrix of roles and responsibilities for cloud providers, consumers and system integrators 8. Secure connections to the cloud 9. Time to getting a new instance 10. Auto-scaling is not all magic
  • 31. Getting Involved • Linkage with National ICT Australia •Contract Research, Expert Advisory Services, Architecture Reviews •Public and In-house Training Courses •Market Surveys, Case Studies •Professional in Research Residence Anna.Liu@nicta.com.au, @annaliu http://blogs.unsw.edu.au/annaliu/
  • 33. Virtual Machine ‘Stolen Time’ • Using traditional system resource monitoring tools in cloud – Measuring system performance within a virtual instance (using tools such as vmstat and top) can give misleading information – Example: An EC2 instance (e.g. m1.small with 1 EC2 compute unit) does not go above around 40% CPU load as observed from vmstat • Certain percentage (around 50-60%) appears on vmstat as ‘st’ “st – Time stolen from a virtual machine” (from vmstat manpage) • Does it mean I am not getting what I paid for? No, not really – Amazon instances are measured by EC2 compute units – “One EC2 compute Unit provides the equivalent CPU capacity of a 1.0- 1.2GHz 2007 Opteron or 2007 Xeon process” • Monitoring system performance in cloud – Use Cloud monitoring tools such as CloudWatch and RightScale
  • 34. Limitation of Virtual Private Cloud (VPC) • VPC hosts are logically detached from (but physically attached to) the Amazon network – No direct connection to and from S3 via the Amazon local network – Connection via internet only • What happen if we need to transfer data from S3 to a VPC host? – E.g. If we ship a removable media to Amazon, it would be uploaded to S3. How do we transfer the data to a VPC host? – Option 1: Direct transfer from S3 to VPC host • Traffic routes through the remote side and comes back (High latency) – Option 2: Transfer to EBS and mount EBS to VPC host • Traffic routes through local network (Low latency)
  • 35. 35 How Long You Need to Wait to Get Updated with Eventual Consistent Read? • Result of the “5 minutes run” for one week • t1: the first time to read updated data • t2: the first time to reach 100% of reading updated • t3: the last time to read stale data  Mostly updated after 600ms but no guarantee

Notas del editor

  1. Presentation Abstract: Everyone knows about eventual consistency properties of the cloud, but do you know how long it will take for a piece of data to become consistent/fresh? Despite the aim of providing infinite scalability, is there any hard limits on some of the leading cloud platform services? We know cloud platforms aims to provide auto-scaling, but is it really all magic? We at the University of NSW and National ICT Australia (NICTA) have been evaluating Cloud platforms over the last 18 months. In this session, we will share with the audience some of these (often surprising) evaluation findings, that should be of interest to application architects and developers looking at designing and building solutions using the cloud.
  2. Reduce cost, reduce complexity
  3. Reduce cost, reduce complexity
  4. Reduce cost, reduce complexity
  5. Quotas are resource constrains configured by the vendors. You probably can contact the vendors for more resources beyond the quotas, but communication takes time, and it will bring about opportunity cost. Limitations mostly are functions restrictions, you probably can’t go beyond it by making a phone call. Amazon Web Services Manually setup all applications – large maintenance cost and operation cost, including upgrading systems, installing applications and configuration. Maximum 5 GB per file in S3 – e.g. TB magnitude files can not be put into S3 directly. Extra efforts are needed, i.e. It has to be divided into small trunks (5GB each) before storing. Same efforts are also required during retrieval, all retrieved trunks have to be merged manually. Maximum 5 seconds query execution time in SimpleDB – no long time query in SimpleDB. If thousands items are query in SimpleDB, it could be failed due to timeout. Developers need to estimate the query time before hand, and separate a large query into small queries. And combine/merge the query results on client sides. 20 On-Demand or Reserved Instances and 100 Spot Instances by default – You can have more instances by contacting Amazon, but that definitely will increase your opportunity cost, if you need a scale out immediately. 1GB free outgoing bandwidth per month in SimpleDB, S3 and EC2 – Yep, you need to pay for extra usages. Microsoft Windows Azure 2 deployments per service (production and staging) – The two deployments are used for deploying production version and staging version separately, targeting the end-users and test users correspondingly. But it is not efficient enough to run multiple test versions at the same time. .NET, PHP or Java programming language – limited languages for .NET, PHP and Java developers Up to 50 GB for SQL Azure – The maximum size of a single SQL Azure database is 50 GB. If your data is more than 50 GB, then you probably have to consider data partitioning to scale out your database to multiple databases. 20 concurrent small compute instances or equivalent per month – 1 clock hour to an extra large instance equates to 8 small instance hours. Therefore, you can only have 10 TB of total data transfers per month – Probably you can get more if you send a request to Microsoft Up to 750 GB SQL Azure databases per month – For SQL Azure, it originally states 150 Web Edition databases (not sure it is or/and, see http://www.microsoft.com/windowsazure/offers/popup/popup.aspx?lang=en&amp;locale=en-us&amp;offer=MS-AZR-0013P) 15 Business Edition databases, since the maximum size for each Web Edition is 5GB and maximum size for each Business Edition is 50GB. I do the simple math, 150*5 or 15*50, calculating the result as 750 GB. Google App Engine Java or Python programming language – PHP developer can do nothing on Google App Engine. Maximum 30 seconds for each request – Each request has to be responded within 30 seconds, otherwise, exceptions will be returned instead of results. In this case, high computational tasks is not applicable in GAE. The alternative is still splitting the task. GAE has made an early experimental release of MapReduce to fulfill the alternative. But only Mapper is implemented at this stage. 1 MB for each Datastore entity – Only 1MB for each data item. You probably will find it hard to store a photo in GAE. And also due to the 30 seconds limitation, your query should also be processed within 30 seconds. Maximum 2 GB per file in Blobstore – The same reason as AWS. Plus: maximum size of Blobstore data that can be read by the app with one API call is only 1 MB. So even you stored 2GB in Blobstore, it is still difficult to manipulate these data in GAE. 10 web applications per user – since the case of bush fire in 2009. I think all the following parameters can be adjusted by Google. 43, 200, 000 requests per day 1 GB (1, 046 GB maximum if billing enabled) incoming/outgoing bandwidth per day 6.5 CPU-hours (1, 729 CPU-hours maximum if billing enabled) per day
  6. Reduce cost, reduce complexity
  7. Reference – Saaland paper at VLDB
  8. Reduce cost, reduce complexity
  9. Reduce cost, reduce complexity
  10. Reduce cost, reduce complexity
  11. Reduce cost, reduce complexity
  12. IaaS provides Basic Infrastructure Monitoring: such as CPU/RAM/disk/network usage. Need to convert and feed the data into a dashboard system in AMP Usage report and basic billing: Usually one account = one bill Customers’ responsibility Access control to IaaS: password and secret key management. change password and keys regularly to tighten security. access log of IaaS console (not available in EC2) Infrastructure Configuration: Establish VPN. Choose appropriate machine images, or upload machine images. Adding disks to virtual machines OS/Middleware installation/configuration: depending on machine images. Pre-configured machine images reduce workload OS patching: need to perform by customers Antivirus: need to install by customers if not included in a machine image OS backup: IaaS usually allows for taking snapshots of virtual machines OS Monitoring: if a monitoring facility provided by IaaS is not enough, you need to run yours. Feed the data into a dashboard system in AMP Application installation/configuration: as usual Application patching: App data backup: Taking snapshots using IaaS’s functionality. Or do your self such as running rsync. Application monitoring: Feed the data into a dashboard system in AMP OS/application security: such as access control by Active Directory Billing: Need to translate IaaS’s bill into cost center-based bill.
  13. Reduce cost, reduce complexity
  14. Figure 1 shows a typical set up of the Amazon VPC. This VPC setup allows a company’s infrastructure to be connected with the Amazon EC2 infrastructure via a VPN connection. It requires setting up two VPN gateways (one on each of the local and remote sides). A secure VPN connection is established between the two gateways via the IPsec protocol. EC2 instances on the remote side (Amazon side) are operated within subnets behind the remote VPN gateway. That is, these EC2 instances are isolated from the rest of the EC2 network and only these instances can access the hosts on the local side. Similarly, hosts can be added on the local side behind the customer gateway (local VPN gateway) and only these hosts have access to the remote EC2 instances.   A typical VPC connection meets the following security requirements: Utilise the AES 128-bit encryption function Utilise the SHA-1 hashing function
  15. An example business report query took 16min 30sec takes less than 1min in the existing on-premise dev environment Data transfer over SSIS takes 14min (only 42KB/sec of throughput) No bottleneck observed on CPU (3-10%), memory (6G free), disk (low activity) or network (0.03% usage of 1Gbps)  SSIS protocol? ----------------- Done.  It works!  I did the following:1.  Start an EC2 micro instance outside the VPC and attach an EBS volume to it2. Copy file from S3 to the EBS volume attached to the micro instance3. Detach the EBS volume from the micro instance4. Attach EBS volume to an instance inside the VPCNote that, we did NOT route through NICTA here at all.The file I used for this experiment is ~700MB in size.  Step 2 took 130s (i.e. 5.39MB/s).
  16. Reduce cost, reduce complexity
  17. Reduce cost, reduce complexity
  18. References: http://aws.amazon.com/ec2/ http://code.google.com/appengine/whyappengine.html#scale http://www.microsoft.com/windowsazure/appliance/
  19. An article (with link to his paper) by Huan Liu discussing limitations of load balancers and autoscaling: http://huanliu.wordpress.com/tag/auto-scaling/ http://codecrafter.wordpress.com/2008/10/03/google-app-engine-scalability-that-doesnt-just-work/ An example on scaling in Azure: http://code.msdn.microsoft.com/azurescale/Release/ProjectReleases.aspx?ReleaseId=4167
  20. Reduce cost, reduce complexity