SlideShare una empresa de Scribd logo
1 de 31
Leveraging the Public Cloud
   for Disaster Recovery
   Lahav Savir, Architect & CEO
       Emind systems Ltd.
        lahavs@emind.co
About
Lahav Savir
• 15+ years’ experience in on-line industry
• Architect and CEO @ Emind Systems

Emind Systems (est. 2006)
• Boutique system integrator
• ~100 AWS customers
• AWS solution provider
Amazon (AWS) Certification
        Amazon Solution Provider
          & Consulting Partner




 https://aws.amazon.com/solution-providers/si/emind-systems-ltd
Disaster Recovery in a Nutshell
•   Business continuity
•   Minimize downtime and data loss
•   Recovery Time Objective (RPO)
•   Recovery Point Objective (RTO)
•   Price
DR Approaches
Complete server     Data mirroring /   Configuration
mirroring           replication        replication
Emind’s Best Practice
Why Amazon ?
                    Flexible, Global Infrastructure
•   N. Virginia
•   Oregon
•   N. California
•   Ireland
•   Singapore
•   Tokyo
•   Sydney
•   São Paulo
•   GovCloud
Secure
• VPC - Virtual Private
  Cloud on AWS's
  infrastructure
• Specify private IP address
  range
• Bridge your onsite IT
  infrastructure and the
  VPC with a VPN
  connection or Direct
  Connect
• Extending your existing
  security and management
  policies to the cloud
A different cost model
                                                                      Ability to scale –
                              Cost savings                            no arbitrary time
                                                                       limit to failback
                                w/ AWS
Infrastructure Cost




                                                                          2nd Site
                                                                          Cost

                                                                          AWS Cost


                                                                          Demand




                                                                   Time
                      Test    Test           Failover   Failback
Zoom into the technics
Disaster Recovery Terms
• RTO: Recovery Time Objective
  – Acceptable time period within which normal
    operation (or degraded operation) needs to be
    restored after event

• RPO: Recovery Point Objective
  – Acceptable data loss measured in time
Backup and Restore

   Amazon Route 53



                    Data copied
                       to S3




   Traditional server                              S3 Bucket
                                                  with Objects

                                      AWS
On-premises Infrastructure
                                  Import/Export
Backup and Restore


               Amazon EC2 Data copied from
                Instance   objects in S3



                       Data
                      Volume
Instance Quickly
                                               Amazon
provisioned from
                                              S3 Bucket
       AMI

                           Pre-bundled with
                               OS and
                             applications
                   AMI

             Availability Zone
                                              AWS Region
Backup and Restore
• Advantages
  – Simple to get started
  – Extremely cost effective (mostly backup storage)

• Preparation Phase
  – Take backups of current systems
  – Store backups in S3
  – Describe procedure to restore from backup on AWS
     • Know which AMI to use, build your own as needed
     • Know how to restore system from backups
     • Know how to switch to new system
Backup and Restore
• In Case of Disaster
   – Retrieve backups from S3
   – Bring up required infrastructure
      • EC2 instances with prepared AMIs, Load Balancing, etc.
   – Restore system from backup
   – Switch over to the new system
      • Adjust DNS records to point to AWS


• Objectives
   – RTO: as long as it takes to bring up infrastructure and
     restore system from backups
   – RPO: time since last backup
Pilot Light
                  User or system




Web                                 Web
Server                              Server
                  Amazon Route 53
                                                      Not Running
Application                         Application
Server                              Server



Database                            Database
                  Data Mirroring/   Server        Smaller Instance
Server
                   Replication


          Data                                Data
         Volume                              Volume
Pilot Light
                  User or system




Web
Web                                 Web
Server
Server                              Server
                  Amazon Route 53
                                                      Not Running
Application                         Application
Server                              Server



Database                            Database
Database
Server            Data Mirroring/   Server        Smaller Instance
Server             Replication


          Data                                Data
         Volume                              Volume
Pilot Light
                  User or system




Web
Web                                 Web
Server
Server                              Server
                  Amazon Route 53
                                                      Start in minutes
Application
                                    Application
Server
                                    Server


Database                            Database
Database
Server            Data Mirroring/   Server        Resize as desired
Server             Replication


          Data                                Data
         Volume                              Volume
Pilot Light
• Advantages
   – Very cost effective (fewer 24/7 resources)


• Preparation Phase
   – Enable replication of all critical data to AWS
   – Prepare all required resources for automatic start
       • AMIs, Network Settings, Load Balancing, etc.
Pilot Light
• In Case of Disaster
   – Automatically bring up resources around the replicated core data set
   – Scale the system as needed to handle current production traffic
   – Switch over to the new system
       • Adjust DNS records to point to AWS
• Objectives
   – RTO: as long as it takes to detect need for DR and automatically scale
     up replacement system
   – RPO: depends on replication type
Fully-Working Low Capacity Standby
                       User or system



                                        Web
    Web                                 Server

    Server
                      Amazon Route 53
                                                      Low Capacity
                                        App
    Application                         Server
    Server


    Database                            DB
    Server            Data Mirroring/   Server
                       Replication


              Data                            Data
             Volume                          Volume
Fully-Working Low Capacity Standby
                       User or system



                                        Web
    Web                                 Server
    Server
                      Amazon Route 53
                                                      Low Capacity
                                        App
    Application                         Server
    Server


    Database                            DB
    Server            Data Mirroring/   Server
                       Replication


              Data                            Data
             Volume                          Volume
Fully-Working Low Capacity Standby
                       User or system




    Web                                 Web Web
                                            Server
    Server                              Server
                      Amazon Route 53
                                                          Grow Capacity

    Application                         Application
                                           App
    Server                              Server
                                           Server



    Database                            Database
                                           DB
    Server            Data Mirroring/   Server
                                           Server
                       Replication


              Data                                Data
             Volume                              Volume
Fully-Working Low-Capacity Standby
                       User or system




    Web                                 Web Web
                                            Server
    Server                              Server
                      Amazon Route 53
                                                          Grow Capacity

    Application                         Application
                                           App
    Server                              Server
                                           Server



    Database                            Database
                                           DB
    Server            Data Mirroring/   Server
                                           Server
                       Replication


              Data                                Data
             Volume                              Volume
Fully-Working Low-Capacity Standby
• Advantages
   – Can take some production traffic at any time
   – Cost savings (IT footprint smaller than full DR)


• Preparation
   – Similar to Pilot Light
   – All necessary components running 24/7, but not scaled for production
     traffic
   – Best practice – continuous testing
       • “Trickle” a statistical subset of production traffic to DR site
Fully-Working Low-Capacity Standby
• In Case of Disaster
   – Immediately fail over most critical production load
       • Adjust DNS records to point to AWS
   – (Auto) Scale the system further to handle all production load


• Objectives
   – RTO: for critical load: as long as it takes to fail over; for all other
     load, as long as it takes to scale further
   – RPO: depends on replication type
Multi-Site Hot Standby
                    User or system




Web                                  Web Web
                                         Server
Server                               Server
                   Amazon Route 53
                                                       Full Capacity

   Application
Application                          Application
                                        App
   Server
Server                               Server
                                        Server



Database
   Database                          Database
                                        DB
Server
   Server         Data Mirroring/    Server
                                        Server
                   Replication

          Data                                 Data
         Volume                               Volume
Multi-Site Hot Standby
• Advantages
   – At any moment can take all production load
• Preparation
   – Similar to Low-Capacity Standby
   – Fully scaling in/out with production load
• In Case of Disaster
   – Immediately fail over all production load
       • Adjust DNS records to point to AWS
• Objectives
   – RTO: as long as it takes fail over
   – RPO: depends on replication type
Summary
• Plan
   – Analyze your existing applications and services
   – Find the right approach per case
• Adapt
   – Match your plan to RTO, RPO and Budget
• POC
   – Validate your plan
• Test
   – Periodic testing
• Monitor
   – Ensure continues operation of all
• goCloud – Emind’s optimal road to the cloud
  – Secure cloud architecture
  – Scalable & high-availability design
  – Customized system deployment
  – Orchestrating cloud and software
  – Cloud operation team
  – Monitoring and alerting
  – 24x7 SLA
Contact me
lahavs@emind.co @lahavsavir
       054-4321688

Más contenido relacionado

Destacado

ECSU_WaysofGivingPackage
ECSU_WaysofGivingPackageECSU_WaysofGivingPackage
ECSU_WaysofGivingPackageKatie Murray
 
2012 summer school
2012 summer school2012 summer school
2012 summer schoolKatie Murray
 
คู่มือการประเมินและรับรองคุณภาพสถานศึกษา ด้านการอาชีวศึกษา ระดับภาคพื้นเอเชีย...
คู่มือการประเมินและรับรองคุณภาพสถานศึกษา ด้านการอาชีวศึกษา ระดับภาคพื้นเอเชีย...คู่มือการประเมินและรับรองคุณภาพสถานศึกษา ด้านการอาชีวศึกษา ระดับภาคพื้นเอเชีย...
คู่มือการประเมินและรับรองคุณภาพสถานศึกษา ด้านการอาชีวศึกษา ระดับภาคพื้นเอเชีย...Totsaporn Inthanin
 
Journey Through The Cloud - Disaster Recovery
Journey Through The Cloud - Disaster RecoveryJourney Through The Cloud - Disaster Recovery
Journey Through The Cloud - Disaster RecoveryAmazon Web Services
 
The apacc accreditation manual
The apacc accreditation manualThe apacc accreditation manual
The apacc accreditation manualTotsaporn Inthanin
 
Monitoring Your AWS Cloud Infrastructure
Monitoring Your AWS Cloud InfrastructureMonitoring Your AWS Cloud Infrastructure
Monitoring Your AWS Cloud InfrastructureNewvewm
 
Disaster recovery and the cloud
Disaster recovery and the cloudDisaster recovery and the cloud
Disaster recovery and the cloudJason Dea
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
Cloudpreneurs - McKinsey Reveals Fast Growth of Cloud Adoption
Cloudpreneurs - McKinsey Reveals Fast Growth of Cloud AdoptionCloudpreneurs - McKinsey Reveals Fast Growth of Cloud Adoption
Cloudpreneurs - McKinsey Reveals Fast Growth of Cloud AdoptionNewvewm
 

Destacado (10)

ECSU_WaysofGivingPackage
ECSU_WaysofGivingPackageECSU_WaysofGivingPackage
ECSU_WaysofGivingPackage
 
2012 summer school
2012 summer school2012 summer school
2012 summer school
 
คู่มือการประเมินและรับรองคุณภาพสถานศึกษา ด้านการอาชีวศึกษา ระดับภาคพื้นเอเชีย...
คู่มือการประเมินและรับรองคุณภาพสถานศึกษา ด้านการอาชีวศึกษา ระดับภาคพื้นเอเชีย...คู่มือการประเมินและรับรองคุณภาพสถานศึกษา ด้านการอาชีวศึกษา ระดับภาคพื้นเอเชีย...
คู่มือการประเมินและรับรองคุณภาพสถานศึกษา ด้านการอาชีวศึกษา ระดับภาคพื้นเอเชีย...
 
Journey Through The Cloud - Disaster Recovery
Journey Through The Cloud - Disaster RecoveryJourney Through The Cloud - Disaster Recovery
Journey Through The Cloud - Disaster Recovery
 
Disaster Recovery in the Cloud
Disaster Recovery in the CloudDisaster Recovery in the Cloud
Disaster Recovery in the Cloud
 
The apacc accreditation manual
The apacc accreditation manualThe apacc accreditation manual
The apacc accreditation manual
 
Monitoring Your AWS Cloud Infrastructure
Monitoring Your AWS Cloud InfrastructureMonitoring Your AWS Cloud Infrastructure
Monitoring Your AWS Cloud Infrastructure
 
Disaster recovery and the cloud
Disaster recovery and the cloudDisaster recovery and the cloud
Disaster recovery and the cloud
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
Cloudpreneurs - McKinsey Reveals Fast Growth of Cloud Adoption
Cloudpreneurs - McKinsey Reveals Fast Growth of Cloud AdoptionCloudpreneurs - McKinsey Reveals Fast Growth of Cloud Adoption
Cloudpreneurs - McKinsey Reveals Fast Growth of Cloud Adoption
 

Más de Newvewm

Entrepreneur un slideshow v6
Entrepreneur un slideshow v6Entrepreneur un slideshow v6
Entrepreneur un slideshow v6Newvewm
 
The Inevitable Cloud Outage
The Inevitable Cloud OutageThe Inevitable Cloud Outage
The Inevitable Cloud OutageNewvewm
 
Newvem's Utilization Heat Map
Newvem's Utilization Heat MapNewvem's Utilization Heat Map
Newvem's Utilization Heat MapNewvewm
 
Hitting Your Cloud’s Usage Sweet Spot
Hitting Your Cloud’s Usage Sweet SpotHitting Your Cloud’s Usage Sweet Spot
Hitting Your Cloud’s Usage Sweet SpotNewvewm
 
Onavo aws summit 2012
Onavo   aws summit 2012Onavo   aws summit 2012
Onavo aws summit 2012Newvewm
 
ClickSoftware AWS Customer Case
ClickSoftware AWS Customer CaseClickSoftware AWS Customer Case
ClickSoftware AWS Customer CaseNewvewm
 
SaaS as a Security Hazard - Google Apps Security Example
SaaS as a Security Hazard - Google Apps Security ExampleSaaS as a Security Hazard - Google Apps Security Example
SaaS as a Security Hazard - Google Apps Security ExampleNewvewm
 
Cloud security management by newvem
Cloud security management by newvemCloud security management by newvem
Cloud security management by newvemNewvewm
 
Secure Your AWS Cloud Data by Porticor
Secure Your AWS Cloud Data by PorticorSecure Your AWS Cloud Data by Porticor
Secure Your AWS Cloud Data by PorticorNewvewm
 

Más de Newvewm (9)

Entrepreneur un slideshow v6
Entrepreneur un slideshow v6Entrepreneur un slideshow v6
Entrepreneur un slideshow v6
 
The Inevitable Cloud Outage
The Inevitable Cloud OutageThe Inevitable Cloud Outage
The Inevitable Cloud Outage
 
Newvem's Utilization Heat Map
Newvem's Utilization Heat MapNewvem's Utilization Heat Map
Newvem's Utilization Heat Map
 
Hitting Your Cloud’s Usage Sweet Spot
Hitting Your Cloud’s Usage Sweet SpotHitting Your Cloud’s Usage Sweet Spot
Hitting Your Cloud’s Usage Sweet Spot
 
Onavo aws summit 2012
Onavo   aws summit 2012Onavo   aws summit 2012
Onavo aws summit 2012
 
ClickSoftware AWS Customer Case
ClickSoftware AWS Customer CaseClickSoftware AWS Customer Case
ClickSoftware AWS Customer Case
 
SaaS as a Security Hazard - Google Apps Security Example
SaaS as a Security Hazard - Google Apps Security ExampleSaaS as a Security Hazard - Google Apps Security Example
SaaS as a Security Hazard - Google Apps Security Example
 
Cloud security management by newvem
Cloud security management by newvemCloud security management by newvem
Cloud security management by newvem
 
Secure Your AWS Cloud Data by Porticor
Secure Your AWS Cloud Data by PorticorSecure Your AWS Cloud Data by Porticor
Secure Your AWS Cloud Data by Porticor
 

Leveraging the Public Cloud for Disaster Recovery

  • 1. Leveraging the Public Cloud for Disaster Recovery Lahav Savir, Architect & CEO Emind systems Ltd. lahavs@emind.co
  • 2. About Lahav Savir • 15+ years’ experience in on-line industry • Architect and CEO @ Emind Systems Emind Systems (est. 2006) • Boutique system integrator • ~100 AWS customers • AWS solution provider
  • 3. Amazon (AWS) Certification Amazon Solution Provider & Consulting Partner https://aws.amazon.com/solution-providers/si/emind-systems-ltd
  • 4. Disaster Recovery in a Nutshell • Business continuity • Minimize downtime and data loss • Recovery Time Objective (RPO) • Recovery Point Objective (RTO) • Price
  • 5. DR Approaches Complete server Data mirroring / Configuration mirroring replication replication
  • 7. Why Amazon ? Flexible, Global Infrastructure • N. Virginia • Oregon • N. California • Ireland • Singapore • Tokyo • Sydney • São Paulo • GovCloud
  • 8. Secure • VPC - Virtual Private Cloud on AWS's infrastructure • Specify private IP address range • Bridge your onsite IT infrastructure and the VPC with a VPN connection or Direct Connect • Extending your existing security and management policies to the cloud
  • 9. A different cost model Ability to scale – Cost savings no arbitrary time limit to failback w/ AWS Infrastructure Cost 2nd Site Cost AWS Cost Demand Time Test Test Failover Failback
  • 10. Zoom into the technics
  • 11. Disaster Recovery Terms • RTO: Recovery Time Objective – Acceptable time period within which normal operation (or degraded operation) needs to be restored after event • RPO: Recovery Point Objective – Acceptable data loss measured in time
  • 12. Backup and Restore Amazon Route 53 Data copied to S3 Traditional server S3 Bucket with Objects AWS On-premises Infrastructure Import/Export
  • 13. Backup and Restore Amazon EC2 Data copied from Instance objects in S3 Data Volume Instance Quickly Amazon provisioned from S3 Bucket AMI Pre-bundled with OS and applications AMI Availability Zone AWS Region
  • 14. Backup and Restore • Advantages – Simple to get started – Extremely cost effective (mostly backup storage) • Preparation Phase – Take backups of current systems – Store backups in S3 – Describe procedure to restore from backup on AWS • Know which AMI to use, build your own as needed • Know how to restore system from backups • Know how to switch to new system
  • 15. Backup and Restore • In Case of Disaster – Retrieve backups from S3 – Bring up required infrastructure • EC2 instances with prepared AMIs, Load Balancing, etc. – Restore system from backup – Switch over to the new system • Adjust DNS records to point to AWS • Objectives – RTO: as long as it takes to bring up infrastructure and restore system from backups – RPO: time since last backup
  • 16. Pilot Light User or system Web Web Server Server Amazon Route 53 Not Running Application Application Server Server Database Database Data Mirroring/ Server Smaller Instance Server Replication Data Data Volume Volume
  • 17. Pilot Light User or system Web Web Web Server Server Server Amazon Route 53 Not Running Application Application Server Server Database Database Database Server Data Mirroring/ Server Smaller Instance Server Replication Data Data Volume Volume
  • 18. Pilot Light User or system Web Web Web Server Server Server Amazon Route 53 Start in minutes Application Application Server Server Database Database Database Server Data Mirroring/ Server Resize as desired Server Replication Data Data Volume Volume
  • 19. Pilot Light • Advantages – Very cost effective (fewer 24/7 resources) • Preparation Phase – Enable replication of all critical data to AWS – Prepare all required resources for automatic start • AMIs, Network Settings, Load Balancing, etc.
  • 20. Pilot Light • In Case of Disaster – Automatically bring up resources around the replicated core data set – Scale the system as needed to handle current production traffic – Switch over to the new system • Adjust DNS records to point to AWS • Objectives – RTO: as long as it takes to detect need for DR and automatically scale up replacement system – RPO: depends on replication type
  • 21. Fully-Working Low Capacity Standby User or system Web Web Server Server Amazon Route 53 Low Capacity App Application Server Server Database DB Server Data Mirroring/ Server Replication Data Data Volume Volume
  • 22. Fully-Working Low Capacity Standby User or system Web Web Server Server Amazon Route 53 Low Capacity App Application Server Server Database DB Server Data Mirroring/ Server Replication Data Data Volume Volume
  • 23. Fully-Working Low Capacity Standby User or system Web Web Web Server Server Server Amazon Route 53 Grow Capacity Application Application App Server Server Server Database Database DB Server Data Mirroring/ Server Server Replication Data Data Volume Volume
  • 24. Fully-Working Low-Capacity Standby User or system Web Web Web Server Server Server Amazon Route 53 Grow Capacity Application Application App Server Server Server Database Database DB Server Data Mirroring/ Server Server Replication Data Data Volume Volume
  • 25. Fully-Working Low-Capacity Standby • Advantages – Can take some production traffic at any time – Cost savings (IT footprint smaller than full DR) • Preparation – Similar to Pilot Light – All necessary components running 24/7, but not scaled for production traffic – Best practice – continuous testing • “Trickle” a statistical subset of production traffic to DR site
  • 26. Fully-Working Low-Capacity Standby • In Case of Disaster – Immediately fail over most critical production load • Adjust DNS records to point to AWS – (Auto) Scale the system further to handle all production load • Objectives – RTO: for critical load: as long as it takes to fail over; for all other load, as long as it takes to scale further – RPO: depends on replication type
  • 27. Multi-Site Hot Standby User or system Web Web Web Server Server Server Amazon Route 53 Full Capacity Application Application Application App Server Server Server Server Database Database Database DB Server Server Data Mirroring/ Server Server Replication Data Data Volume Volume
  • 28. Multi-Site Hot Standby • Advantages – At any moment can take all production load • Preparation – Similar to Low-Capacity Standby – Fully scaling in/out with production load • In Case of Disaster – Immediately fail over all production load • Adjust DNS records to point to AWS • Objectives – RTO: as long as it takes fail over – RPO: depends on replication type
  • 29. Summary • Plan – Analyze your existing applications and services – Find the right approach per case • Adapt – Match your plan to RTO, RPO and Budget • POC – Validate your plan • Test – Periodic testing • Monitor – Ensure continues operation of all
  • 30. • goCloud – Emind’s optimal road to the cloud – Secure cloud architecture – Scalable & high-availability design – Customized system deployment – Orchestrating cloud and software – Cloud operation team – Monitoring and alerting – 24x7 SLA