SlideShare una empresa de Scribd logo
1 de 18
Solving the problem of
downtime in the cloud
AWS Cloud Disaster Recovery Plan Checklist
Are You Ready?
 Founded: 2012
 Offers Disaster Recovery as a
Service for cloud-based applications
 Using Continuous Replication of
your Entire Application Stack
Source: Forrester
About CloudEndure
Some Of Our Customers
Agenda
 DR 101 – Definitions and Terminology
 Why AWS for DR?
 AWS Global Infrastructure
 4 Types of Disaster
 3 Takeaways
 Q&A
Disaster Recovery in 30 Words
Disaster recovery (DR) is the process, policies and
procedures that are related to preparing for
recovery or continuation of technology
infrastructure which are vital to an organization
after a natural or human induced crisis
DR Key Terminology
 RPO – Recovery Point Objective – The maximum tolerable period in
which data might be lost.
 RTO – Recovery Time Objective - The duration of time and a
service level within which a business process must be restored
after a disaster (or disruption) in order to avoid unacceptable
consequences.
 Data replication – sharing information so as to ensure consistency
between redundant resources.
DR – What it’s not
 Unlike Backup, which is mostly about data
loss prevention, DR is about service
availability - low RPO and RTO.
 DR complements other High Availability
activities, but while those deal with
disaster prevention, DR is for those times
when the preventions failed.
Why DR?
 54% of Cloud IT Managers experienced
an outage in the past 3 months
 Top challenges in meeting availability
goals: Insufficient IT resources, Budget
limitations, Software Bugs
 79% reports a service availability goal
of “Three Nines” (99.9%)
Source: 2014 Cloud Disaster Recovery Survey
Available for download in the “Resources” tab of the webinar
Why AWS for DR
Flexible
Define different
recovery objectives for
different components
and change them on the
fly. You can grow and
shrink your disaster site
whenever necessary
(even automatically).
Cheap
Pay for hourly usage of
resources. Only create your
disaster site when it’s
needed. Don’t pay for two
running sites all the time
Easy
DR and HA made easier –
No need to build your
DR solution from
scratch. AWS already
has many of the building
blocks built-in –
AutoScale, snapshots,
CloudFormation…
AWS Global Infrastructure
AWS Region
Availability Zone
AWS Global Infrastructure
 Regions
 8 publicly available regions.
 Spread all over the world.
 Completely independent. Different teams. Different infrastructure.
 Availability Zones (AZs)
 Each region contains one or more availability zones.
 Physically separated, but in the same geographical location.
 Share teams and software infrastructure.
 Dynamic Resource Allocation
 Pay for resources on an hourly basis.
 Create and destroy resources quickly on demand using AWS dashboard,
CLI or API.
 Automation is built into several services (such as Autoscale). APIs let
you add additional automation layers.
Types of downtime
Single-AZ
disaster
Whole-region
disaster
Single-service
disaster
Single-resource
disaster
Disaster Type 1 - Single-resource disaster
 A single resource (instance, EBS, ELB…)
stops functioning.
 Very high. For example, instances are
sometimes terminated by AWS or just
stop working without warning.
 Make sure that no single resource is a
point of failure. Use clusters for
stateless instances (you can use
AutoScale and AMIs to help you).
Configure RAIDs for volumes. Use
services that are managed by AWS such
as RDS to store your state and data.
What is it?
Frequency
How to prepare?
Single-
resource
disaster
Disaster Type 2 - Single-AZ disaster
 A whole AZ goes down, but all the
other AZs in the region still function.
 More than 10 times a year (may be a
different AZ every time).
 Build your system so that it’s spread
across multiple AZs and can survive
downtime of any single AZ failure.
Connect subnets in different AZs to
your ELB and turn on multi-AZ for
RDS.
Single-
AZ
disaster
What is it?
Frequency
How to prepare?
Disaster Type 3 - Single-service disaster
 A specific service goes down across the
entire region. Almost always contained
within a single region.
 Several times a year (a different service
every time).
 Resist the temptation to use AWS
services for everything. Choose your
services carefully. Be ready to recreate
your system in a different region, where
the service works well (see next slide).
Single-
service
disaster
What is it?
Frequency
How to prepare?
Disaster Type 4 - Whole-region disaster
 An entire region goes down taking all the
applications running on it with it.
 Several times a year (a different region
every time) – see CloudEndure blog post
comparing the uptime of all AWS regions.
 Implement cross-region DR methodology.
Take snapshots of your instances and copy
them to a different region. Use
CloudFormation to define your application
stack. Copy AMIs to a different region. Use
cross-region read replicas for RDS. Use
continuous data replication.
Whole-region
disaster
What is it?
Frequency
How to prepare?
Beyond AWS
 Not all outages are caused by your cloud provider. Downtime of
used 3rd party services can take your application down too. For
example – DNS, CDN, 3-rd part login services…
 Pick your 3rd party services carefully.
Check the historical stability of the
considered services. Don’t rely on 3-rd
party services more than you need to.
3 Takeaways
Design DR into your
system – the earlier
you implement DR the
easier it is to recover.
It’s too late to think
about DR after disaster
strikes.
Take advantage of
what AWS offers. AWS
provides many building
blocks to help you
build a DR solution for
your application – you
don’t need to do
everything from
scratch.
Understand the impact
of relying on services –
each used service can
cause downtime.
Check the stability of
the service you’re
using and design your
system to stay up even
if some of the services
it depends on are
down.
1 2 3
Thank You
Leonid Feinberg
VP Products
leonid@cloudendure.com

Más contenido relacionado

La actualidad más candente

Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results
Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results
Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results
Amazon Web Services
 
Data Center Migration to the AWS Cloud
Data Center Migration to the AWS CloudData Center Migration to the AWS Cloud
Data Center Migration to the AWS Cloud
Tom Laszewski
 
AWS Training For Beginners | AWS Certified Solutions Architect Tutorial | AWS...
AWS Training For Beginners | AWS Certified Solutions Architect Tutorial | AWS...AWS Training For Beginners | AWS Certified Solutions Architect Tutorial | AWS...
AWS Training For Beginners | AWS Certified Solutions Architect Tutorial | AWS...
Simplilearn
 

La actualidad más candente (20)

Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)Cloud cost optimization (AWS, GCP)
Cloud cost optimization (AWS, GCP)
 
App Modernization with Microsoft Azure
App Modernization with Microsoft AzureApp Modernization with Microsoft Azure
App Modernization with Microsoft Azure
 
Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results
Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results
Cloud Adoption Framework Define Your Cloud Strategy and Accelerate Results
 
Cloud Computing and Microsoft Azure
Cloud Computing and Microsoft AzureCloud Computing and Microsoft Azure
Cloud Computing and Microsoft Azure
 
Cloud strategy briefing 101
Cloud strategy briefing 101 Cloud strategy briefing 101
Cloud strategy briefing 101
 
Data Center Migration to the AWS Cloud
Data Center Migration to the AWS CloudData Center Migration to the AWS Cloud
Data Center Migration to the AWS Cloud
 
AWS Cloud Cost Optimization
AWS Cloud Cost OptimizationAWS Cloud Cost Optimization
AWS Cloud Cost Optimization
 
Introduction to Microsoft Azure
Introduction to Microsoft AzureIntroduction to Microsoft Azure
Introduction to Microsoft Azure
 
Azure fundamentals
Azure   fundamentalsAzure   fundamentals
Azure fundamentals
 
Microsoft Azure Fundamentals
Microsoft Azure FundamentalsMicrosoft Azure Fundamentals
Microsoft Azure Fundamentals
 
AWS 101
AWS 101AWS 101
AWS 101
 
AZ-900T01 Microsoft Azure Fundamentals-01.pptx
AZ-900T01 Microsoft Azure Fundamentals-01.pptxAZ-900T01 Microsoft Azure Fundamentals-01.pptx
AZ-900T01 Microsoft Azure Fundamentals-01.pptx
 
AWS Training For Beginners | AWS Certified Solutions Architect Tutorial | AWS...
AWS Training For Beginners | AWS Certified Solutions Architect Tutorial | AWS...AWS Training For Beginners | AWS Certified Solutions Architect Tutorial | AWS...
AWS Training For Beginners | AWS Certified Solutions Architect Tutorial | AWS...
 
Enterprise Disaster Recovery Strategies by CloudEndure
Enterprise Disaster Recovery Strategies by CloudEndureEnterprise Disaster Recovery Strategies by CloudEndure
Enterprise Disaster Recovery Strategies by CloudEndure
 
AWS Cost Optimisation Solutions
AWS Cost Optimisation SolutionsAWS Cost Optimisation Solutions
AWS Cost Optimisation Solutions
 
Perform a Cloud Readiness Assessment for Your Own Company
Perform a Cloud Readiness Assessment for Your Own CompanyPerform a Cloud Readiness Assessment for Your Own Company
Perform a Cloud Readiness Assessment for Your Own Company
 
Platform as a Service (PaaS) Providers
Platform as a Service (PaaS) ProvidersPlatform as a Service (PaaS) Providers
Platform as a Service (PaaS) Providers
 
Cloud Migration Cookbook: A Guide To Moving Your Apps To The Cloud
Cloud Migration Cookbook: A Guide To Moving Your Apps To The CloudCloud Migration Cookbook: A Guide To Moving Your Apps To The Cloud
Cloud Migration Cookbook: A Guide To Moving Your Apps To The Cloud
 
Microsoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiencyMicrosoft Azure Cost Optimization and improve efficiency
Microsoft Azure Cost Optimization and improve efficiency
 
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
 

Destacado

AWS and Disaster Recovery - Bixler
AWS and Disaster Recovery - BixlerAWS and Disaster Recovery - Bixler
AWS and Disaster Recovery - Bixler
Amazon Web Services
 

Destacado (11)

Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS
 
Journey Through the Cloud: Disaster Recovery
Journey Through the Cloud: Disaster RecoveryJourney Through the Cloud: Disaster Recovery
Journey Through the Cloud: Disaster Recovery
 
CloudEndure Technology Introduction For Partners
CloudEndure Technology Introduction For PartnersCloudEndure Technology Introduction For Partners
CloudEndure Technology Introduction For Partners
 
Enterprise-Grade Disaster Recovery Without Breaking the Bank
Enterprise-Grade Disaster Recovery Without Breaking the BankEnterprise-Grade Disaster Recovery Without Breaking the Bank
Enterprise-Grade Disaster Recovery Without Breaking the Bank
 
(BAC304) Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum...
(BAC304) Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum...(BAC304) Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum...
(BAC304) Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum...
 
AWS SQS for better architecture
AWS SQS for better architectureAWS SQS for better architecture
AWS SQS for better architecture
 
AWS and Disaster Recovery - Bixler
AWS and Disaster Recovery - BixlerAWS and Disaster Recovery - Bixler
AWS and Disaster Recovery - Bixler
 
Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS...
Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS...Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS...
Massive Message Processing with Amazon SQS and Amazon DynamoDB (ARC301) | AWS...
 
Enterprise grade disaster recovery without breaking the bank
Enterprise grade disaster recovery without breaking the bankEnterprise grade disaster recovery without breaking the bank
Enterprise grade disaster recovery without breaking the bank
 
AWS Direct Connect
AWS Direct ConnectAWS Direct Connect
AWS Direct Connect
 
Disaster Recovery using AWS -Architecture blueprints
Disaster Recovery using AWS -Architecture blueprintsDisaster Recovery using AWS -Architecture blueprints
Disaster Recovery using AWS -Architecture blueprints
 

Similar a AWS Cloud Disaster Recovery Plan Checklist - Are you ready?

Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the Cloud
Federico Feroldi
 
Disaster recovery webinar - oct.7
Disaster recovery   webinar - oct.7Disaster recovery   webinar - oct.7
Disaster recovery webinar - oct.7
Amazon Web Services
 

Similar a AWS Cloud Disaster Recovery Plan Checklist - Are you ready? (20)

CVx_Pilot_DR_DS
CVx_Pilot_DR_DSCVx_Pilot_DR_DS
CVx_Pilot_DR_DS
 
AWS Office Hours: Disaster Recovery
AWS Office Hours: Disaster RecoveryAWS Office Hours: Disaster Recovery
AWS Office Hours: Disaster Recovery
 
How to Build Scalable Websites in the Cloud
How to Build Scalable Websites in the CloudHow to Build Scalable Websites in the Cloud
How to Build Scalable Websites in the Cloud
 
AWS Webcast - Using the AWS Cloud for Disaster recovery_Public Sector
AWS Webcast - Using the AWS Cloud for Disaster recovery_Public SectorAWS Webcast - Using the AWS Cloud for Disaster recovery_Public Sector
AWS Webcast - Using the AWS Cloud for Disaster recovery_Public Sector
 
AWS Webcast - Discover Disaster Recovery Solutions in the Cloud
AWS Webcast - Discover Disaster Recovery Solutions in the CloudAWS Webcast - Discover Disaster Recovery Solutions in the Cloud
AWS Webcast - Discover Disaster Recovery Solutions in the Cloud
 
AWS Summit Stockholm 2014 – T3 – disaster recovery on AWS
AWS Summit Stockholm 2014 – T3 – disaster recovery on AWSAWS Summit Stockholm 2014 – T3 – disaster recovery on AWS
AWS Summit Stockholm 2014 – T3 – disaster recovery on AWS
 
Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the Cloud
 
Building Scalable Websites for the Cloud
Building Scalable Websites for the CloudBuilding Scalable Websites for the Cloud
Building Scalable Websites for the Cloud
 
AWS Webcast - Disaster Recovery
AWS Webcast - Disaster RecoveryAWS Webcast - Disaster Recovery
AWS Webcast - Disaster Recovery
 
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
Disaster Recovery, Continuity of Operations, Backup, and Archive on AWS | AWS...
 
AWS Basics .pdf
AWS Basics .pdfAWS Basics .pdf
AWS Basics .pdf
 
Disaster recovery sites on AWS: minimal costs maximum efficiency
Disaster recovery sites on AWS: minimal costs maximum efficiencyDisaster recovery sites on AWS: minimal costs maximum efficiency
Disaster recovery sites on AWS: minimal costs maximum efficiency
 
Cost Optimization Best Practices: Rotem Yosef
Cost Optimization Best Practices: Rotem Yosef Cost Optimization Best Practices: Rotem Yosef
Cost Optimization Best Practices: Rotem Yosef
 
Disaster recovery webinar - oct.7
Disaster recovery   webinar - oct.7Disaster recovery   webinar - oct.7
Disaster recovery webinar - oct.7
 
AWS Webcast - Business Continuity in the AWS Cloud
AWS Webcast - Business Continuity in the AWS CloudAWS Webcast - Business Continuity in the AWS Cloud
AWS Webcast - Business Continuity in the AWS Cloud
 
Airline DR - AWS Case Study
Airline DR - AWS Case StudyAirline DR - AWS Case Study
Airline DR - AWS Case Study
 
Cloud computing What Why How
Cloud computing What Why HowCloud computing What Why How
Cloud computing What Why How
 
AWS vs. Azure
AWS vs. AzureAWS vs. Azure
AWS vs. Azure
 
AWS Session.pptx
AWS Session.pptxAWS Session.pptx
AWS Session.pptx
 
Aws best practices
Aws best practicesAws best practices
Aws best practices
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

AWS Cloud Disaster Recovery Plan Checklist - Are you ready?

  • 1. Solving the problem of downtime in the cloud AWS Cloud Disaster Recovery Plan Checklist Are You Ready?
  • 2.  Founded: 2012  Offers Disaster Recovery as a Service for cloud-based applications  Using Continuous Replication of your Entire Application Stack Source: Forrester About CloudEndure Some Of Our Customers
  • 3. Agenda  DR 101 – Definitions and Terminology  Why AWS for DR?  AWS Global Infrastructure  4 Types of Disaster  3 Takeaways  Q&A
  • 4. Disaster Recovery in 30 Words Disaster recovery (DR) is the process, policies and procedures that are related to preparing for recovery or continuation of technology infrastructure which are vital to an organization after a natural or human induced crisis
  • 5. DR Key Terminology  RPO – Recovery Point Objective – The maximum tolerable period in which data might be lost.  RTO – Recovery Time Objective - The duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences.  Data replication – sharing information so as to ensure consistency between redundant resources.
  • 6. DR – What it’s not  Unlike Backup, which is mostly about data loss prevention, DR is about service availability - low RPO and RTO.  DR complements other High Availability activities, but while those deal with disaster prevention, DR is for those times when the preventions failed.
  • 7. Why DR?  54% of Cloud IT Managers experienced an outage in the past 3 months  Top challenges in meeting availability goals: Insufficient IT resources, Budget limitations, Software Bugs  79% reports a service availability goal of “Three Nines” (99.9%) Source: 2014 Cloud Disaster Recovery Survey Available for download in the “Resources” tab of the webinar
  • 8. Why AWS for DR Flexible Define different recovery objectives for different components and change them on the fly. You can grow and shrink your disaster site whenever necessary (even automatically). Cheap Pay for hourly usage of resources. Only create your disaster site when it’s needed. Don’t pay for two running sites all the time Easy DR and HA made easier – No need to build your DR solution from scratch. AWS already has many of the building blocks built-in – AutoScale, snapshots, CloudFormation…
  • 9. AWS Global Infrastructure AWS Region Availability Zone
  • 10. AWS Global Infrastructure  Regions  8 publicly available regions.  Spread all over the world.  Completely independent. Different teams. Different infrastructure.  Availability Zones (AZs)  Each region contains one or more availability zones.  Physically separated, but in the same geographical location.  Share teams and software infrastructure.  Dynamic Resource Allocation  Pay for resources on an hourly basis.  Create and destroy resources quickly on demand using AWS dashboard, CLI or API.  Automation is built into several services (such as Autoscale). APIs let you add additional automation layers.
  • 12. Disaster Type 1 - Single-resource disaster  A single resource (instance, EBS, ELB…) stops functioning.  Very high. For example, instances are sometimes terminated by AWS or just stop working without warning.  Make sure that no single resource is a point of failure. Use clusters for stateless instances (you can use AutoScale and AMIs to help you). Configure RAIDs for volumes. Use services that are managed by AWS such as RDS to store your state and data. What is it? Frequency How to prepare? Single- resource disaster
  • 13. Disaster Type 2 - Single-AZ disaster  A whole AZ goes down, but all the other AZs in the region still function.  More than 10 times a year (may be a different AZ every time).  Build your system so that it’s spread across multiple AZs and can survive downtime of any single AZ failure. Connect subnets in different AZs to your ELB and turn on multi-AZ for RDS. Single- AZ disaster What is it? Frequency How to prepare?
  • 14. Disaster Type 3 - Single-service disaster  A specific service goes down across the entire region. Almost always contained within a single region.  Several times a year (a different service every time).  Resist the temptation to use AWS services for everything. Choose your services carefully. Be ready to recreate your system in a different region, where the service works well (see next slide). Single- service disaster What is it? Frequency How to prepare?
  • 15. Disaster Type 4 - Whole-region disaster  An entire region goes down taking all the applications running on it with it.  Several times a year (a different region every time) – see CloudEndure blog post comparing the uptime of all AWS regions.  Implement cross-region DR methodology. Take snapshots of your instances and copy them to a different region. Use CloudFormation to define your application stack. Copy AMIs to a different region. Use cross-region read replicas for RDS. Use continuous data replication. Whole-region disaster What is it? Frequency How to prepare?
  • 16. Beyond AWS  Not all outages are caused by your cloud provider. Downtime of used 3rd party services can take your application down too. For example – DNS, CDN, 3-rd part login services…  Pick your 3rd party services carefully. Check the historical stability of the considered services. Don’t rely on 3-rd party services more than you need to.
  • 17. 3 Takeaways Design DR into your system – the earlier you implement DR the easier it is to recover. It’s too late to think about DR after disaster strikes. Take advantage of what AWS offers. AWS provides many building blocks to help you build a DR solution for your application – you don’t need to do everything from scratch. Understand the impact of relying on services – each used service can cause downtime. Check the stability of the service you’re using and design your system to stay up even if some of the services it depends on are down. 1 2 3
  • 18. Thank You Leonid Feinberg VP Products leonid@cloudendure.com