Leveraging the Public Cloud for Disaster Recovery

Leveraging the Public Cloud
for Disaster Recovery
Lahav Savir, Architect & CEO
Emind systems Ltd.
lahavs@emind.co

About
Lahav Savir
• 15+ years’ experience in on-line industry
• Architect and CEO @ Emind Systems

Emind Systems (est. 2006)
• Boutique system integrator
• ~100 AWS customers
• AWS solution provider

Amazon (AWS) Certification
Amazon Solution Provider
& Consulting Partner

https://aws.amazon.com/solution-providers/si/emind-systems-ltd

Disaster Recovery in a Nutshell
• Business continuity
• Minimize downtime and data loss
• Recovery Time Objective (RPO)
• Recovery Point Objective (RTO)
• Price

DR Approaches
Complete server Data mirroring / Configuration
mirroring replication replication

Why Amazon ?
Flexible, Global Infrastructure
• N. Virginia
• Oregon
• N. California
• Ireland
• Singapore
• Tokyo
• Sydney
• São Paulo
• GovCloud

Secure
• VPC - Virtual Private
Cloud on AWS's
infrastructure
• Specify private IP address
range
• Bridge your onsite IT
infrastructure and the
VPC with a VPN
connection or Direct
Connect
• Extending your existing
security and management
policies to the cloud

A different cost model
Ability to scale –
Cost savings no arbitrary time
limit to failback
w/ AWS
Infrastructure Cost

2nd Site
Cost

AWS Cost

Demand

Time
Test Test Failover Failback

Disaster Recovery Terms
• RTO: Recovery Time Objective
– Acceptable time period within which normal
operation (or degraded operation) needs to be
restored after event

• RPO: Recovery Point Objective
– Acceptable data loss measured in time

Backup and Restore

Amazon Route 53

Data copied
to S3

Traditional server S3 Bucket
with Objects

AWS
On-premises Infrastructure
Import/Export

Backup and Restore

Amazon EC2 Data copied from
Instance objects in S3

Data
Volume
Instance Quickly
Amazon
provisioned from
S3 Bucket
AMI

Pre-bundled with
OS and
applications
AMI

Availability Zone
AWS Region

Backup and Restore
• Advantages
– Simple to get started
– Extremely cost effective (mostly backup storage)

• Preparation Phase
– Take backups of current systems
– Store backups in S3
– Describe procedure to restore from backup on AWS
• Know which AMI to use, build your own as needed
• Know how to restore system from backups
• Know how to switch to new system

Backup and Restore
• In Case of Disaster
– Retrieve backups from S3
– Bring up required infrastructure
• EC2 instances with prepared AMIs, Load Balancing, etc.
– Restore system from backup
– Switch over to the new system
• Adjust DNS records to point to AWS

• Objectives
– RTO: as long as it takes to bring up infrastructure and
restore system from backups
– RPO: time since last backup

Pilot Light
User or system

Web Web
Server Server
Amazon Route 53
Not Running
Application Application
Server Server

Database Database
Data Mirroring/ Server Smaller Instance
Server
Replication

Data Data
Volume Volume

Pilot Light
User or system

Web
Web Web
Server
Server Server
Amazon Route 53
Not Running
Server Server

Database Database
Database
Server Data Mirroring/ Server Smaller Instance
Server Replication

Data Data
Volume Volume

Pilot Light
User or system

Web
Web Web
Server
Server Server
Amazon Route 53
Start in minutes
Application
Application
Server
Server

Database Database
Database
Server Data Mirroring/ Server Resize as desired
Server Replication

Data Data
Volume Volume

Pilot Light
• Advantages
– Very cost effective (fewer 24/7 resources)

• Preparation Phase
– Enable replication of all critical data to AWS
– Prepare all required resources for automatic start
• AMIs, Network Settings, Load Balancing, etc.

Pilot Light
– Automatically bring up resources around the replicated core data set
– Scale the system as needed to handle current production traffic
– Switch over to the new system
• Objectives
– RTO: as long as it takes to detect need for DR and automatically scale
up replacement system
– RPO: depends on replication type

Fully-Working Low Capacity Standby
User or system

Web
Web Server

Server
Amazon Route 53
Low Capacity
App
Application Server
Server

Database DB
Server Data Mirroring/ Server
Replication

Data Data
Volume Volume

User or system

Web
Web Server
Server
Amazon Route 53
Low Capacity
App
Application Server
Server

Database DB
Replication

Data Data
Volume Volume

User or system

Web Web Web
Server
Server Server
Amazon Route 53
Grow Capacity

App
Server Server
Server

Database Database
DB
Server
Replication

Data Data
Volume Volume

Fully-Working Low-Capacity Standby
User or system

Web Web Web
Server
Server Server
Amazon Route 53
Grow Capacity

App
Server Server
Server

Database Database
DB
Server
Replication

Data Data
Volume Volume

• Advantages
– Can take some production traffic at any time
– Cost savings (IT footprint smaller than full DR)

• Preparation
– Similar to Pilot Light
– All necessary components running 24/7, but not scaled for production
traffic
– Best practice – continuous testing
• “Trickle” a statistical subset of production traffic to DR site

– Immediately fail over most critical production load
– (Auto) Scale the system further to handle all production load

• Objectives
– RTO: for critical load: as long as it takes to fail over; for all other
load, as long as it takes to scale further

Multi-Site Hot Standby
User or system

Web Web Web
Server
Server Server
Amazon Route 53
Full Capacity

Application
App
Server
Server Server
Server

Database
Database Database
DB
Server
Server
Replication

Data Data
Volume Volume

Multi-Site Hot Standby
• Advantages
– At any moment can take all production load
• Preparation
– Similar to Low-Capacity Standby
– Fully scaling in/out with production load
– Immediately fail over all production load
• Objectives
– RTO: as long as it takes fail over

Summary
• Plan
– Analyze your existing applications and services
– Find the right approach per case
• Adapt
– Match your plan to RTO, RPO and Budget
• POC
– Validate your plan
• Test
– Periodic testing
• Monitor
– Ensure continues operation of all

• goCloud – Emind’s optimal road to the cloud
– Secure cloud architecture
– Scalable & high-availability design
– Customized system deployment
– Orchestrating cloud and software
– Cloud operation team
– Monitoring and alerting
– 24x7 SLA

Contact me
lahavs@emind.co @lahavsavir
054-4321688

Leveraging the Public Cloud for Disaster Recovery

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (10)

Más de Newvewm

Más de Newvewm (9)

Leveraging the Public Cloud for Disaster Recovery