VMware's Site Recovery Manager (SRM) provides disaster recovery and site migration capabilities for virtualized applications. It supports replication using vSphere Replication or third-party storage-based replication. SRM simplifies management of recovery plans, automates failover processes, and enables non-disruptive testing. While no single product can guarantee disaster recovery or business continuity, SRM helps automate restoration of IT infrastructure and must be combined with effective planning. SRM supports various replication options, flexible topologies, application coverage across multiple tiers, and simplifies processes like failback and migration between sites.
Presentation end-user computing in the post-pc era
Presentation v center site recovery manager
1. 1
Business Continuity & Disaster Recovery -
Protecting Your Customers' Mission
Critical Environment
Sin Cheong Wong
Senior Systems Consultant
VMware ASEAN
2. 2
Disasters Happen. Do You Need Protection?
43% of companies experiencing disasters never re-open,
and 29% close within two years
(McGladrey and Pullen)
93% of business that lost their data center for 10 days
went bankrupt within one year
(National Archives & Records Administration)
40% of all companies that experience a major disaster
will go out of business if they cannot gain access
to their data within 24 hours
(Gartner)
Top executives say 10 hours to recovery;
IT managers say up to 30 hours
(Harris Interactive)
3. 3
Business-Critical Applications Require Business Continuity
Availability Expectations on vSphere Continue to Increase
RTO’s decreasing from >24 hours to <12 hours
38%
43%
53%
25% 25%
18%
% of Application Instances Running on VMware in Customer Base
MS
Exchange
MS
SQL
MS
SharePoint
Oracle
Middleware
Oracle
DB
SAP
Source: VMware customer survey, Jan 2010 and April 2011 interim results,
Data: Total number of instances of that workload deployed in your organization and the percentage of those instances that are virtualized
2010
2011
42%
47%
67%
34% 28% 28%
4. 4
Tradeoffs Of Traditional Business Continuity Solutions
Middleware /
Java
Oracle RAC
Oracle
DataGuard DB Mirroring
MS
Clustering
DB Access
Groups
CCR / SCR
App Server
Cluster
Session State
Replication
Backup Data replication
Application-level availability silos: Complex and expensive
Data protection services:
Longer RTOs and RPOs
5. 5
Challenges of Traditional Disaster Recovery
Expensive/
Dependencies
Complex
Recovery Plans
?
?
?
?
?
?
?
?
Unreliable
Failovers
Apps
Hosts
Storage
Network
Software
Hosts
Storage
Facilities
>$10K per app
Failure to meet business requirements
• Long RTOs – days to weeks
• Too much time and resources consumed
7. 7
vSphere Provides The Best Foundation For Disaster Recovery
Flexible Infrastructure
• Eliminate need for identical hardware across
sites
• Enable waterfalling of equipment to recovery site
Simple Application Protection
• Entire system – including application, OS,
and data – is stored as virtual machine files
• Entire system can be protected with data
protection tools
Cost-Efficient Infrastructure
• Reduced hardware requirements at recovery site
• Use recovery hardware to run low-priority apps
Encapsulation
Consolidation
Hardware
Independence
vSphere
vSphere vSphere
Automation is needed to lower risk, increase confidence
8. 8
vCenter Site Recovery Manager Ensures Simple, Reliable DR
Provide cost-efficient replication of
applications to failover site
• Built-in vSphere Replication
• Broad support for storage-based replication
Simplify management of recovery and
migration plans
• Replace manual runbooks with centralized
recovery plans
• From weeks to minutes to set up new plan
Automate failover and migration
processes for reliable recovery
• Enable frequent non-disruptive testing
• Ensure fast, automated failover
• Automate failback processes
Site Recovery Manager Complements vSphere to provide the simplest
and most reliable disaster protection and site migration for all applications
VMware vSphere
VMware
vCenter Server
Site Recovery
Manager
VMware
vCenter Server
Site Recovery
Manager
VMware vSphere
Site A (Primary) Site B (Recovery)
Servers Servers
9. 9
Simple Setup And Management of Recovery And Migration Plans
§ Weeks or months to set up
§ Error-prone
§ Quickly falls out of sync with apps
and infrastructure changes
§ Simple recovery plan set up in minutes
§ Fewer steps means far less room for errors
§ Simple to keep in sync with changes
…to Simple Recovery PlansFrom Complex Runbooks…
10. 10
Risk With Infrequent DR Plan Testing
Unproven
Recoverability
TimeDR Test DR Test
Changes to Applications &
Infrastructure Configuration
TESTING GAP
Recovery
Risk
IT Environment without
Virtualization & DR Automation
Infrequent DR testing = high risk, low confidence
11. 11
Frequent DR Testing Reduces Risk
SRM facilitates frequent testing of recovery plans
Virtualization & DR automation reduces recovery risk
Recovery
Risk
DR Test
Frequent
DR Testing
Time
Virtualization + DR Automation
DR Test
12. 12
Non-Disruptive DR Testing
SRM provides non-disruptive testing of disaster
recovery plans
Production Site Recovery Site
Copy of
Production
Replication
Suspended
Test/Dev VMs
Isolated test
network at
recovery site
13. 13
DR Coverage Often Limited Due To High Protection Costs
Tier 1 Apps - Protected
Tier 2 / 3 Apps – Backup only
Corporate Datacenter
Small Sites – Backup only
Small Business
Remote Office / Branch Office
Need to expand DR protection
• Tier 2 / 3 applications in larger
datacenters
• Small and medium businesses
• Remote office / branch offices
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
OS
APP
14. 14
vSphere Replication For Cost-Efficient, Simple Replication
Reduce storage costs by 2X
• Support for heterogeneous
storage across sites,
including non-replicating
storage
• Use lower-end or older
storage at failover site
Eliminate replication
software costs
• vSphere Replication
included with Site Recovery
Manager at no additional
cost
Manage replication directly
from vCenter
• Eliminate complex
interactions with storage
teams
Manage replication at the
individual VM level
• Eliminate need for
complicated VM-to-LUN
mapping
15 minute RPOs
• Set RPOs between 15
minutes and 24 hours
Efficient network utilization
• Replicate only changed disk
areas
Highly scalable
• 500 virtual machines
Limitations
• No automated failback
• File-level consistency only
(except planned migration)
• No FT, templates, linked
clones, physical RDMs
Cost-efficient Simple Powerful
15. 15
SRM Provides Broad Choice of Replication Options
vSphere Replication
Simple, cost-efficient replication for Tier 2 applications and smaller sites
vCenter Server
Site
Recovery
Manager
vSphere
vCenter Server
Site
Recovery
Manager
vSphere
vSphere
Replication
Storage-based
replication
Site A (Primary) Site B (Recovery)
Storage-based Replication
High-performance replication for business-critical applications in larger sites
16. 16
Automate DR Failover Processes
Overview
Benefits
Automatically detect site failures
§ Require user to manually initiate failover
Automate recovery process
§ Stop replication and present replicated LUNs to
vSphere
§ Execute user-defined recovery plan
Ensure fast and predictable failovers and
migrations
§ Consistently meet business requirements
Minimize risk of user errors
Site BSite A
Replication
1 Raise alert when
hearbeat lost
2 User initiates
failover
3
Stop replication and
present LUNs to vSphere
4 Recover VMs
DR Failover
vSphere vSphere
17. 17
Planned Migrations For App Consistency & No Data Loss
Overview
Benefits
Two workflows can be applied to recovery plans:
§ DR failover
§ Planned migration
Planned migration ensures application
consistency and no data-loss during migration
§ Graceful shutdown of production VMs in
application consistent state
§ Data sync to complete replication of VMs
§ Recover fully replicated VMs
Better support for planned migrations
§ No loss of data during migration process
§ Recover ‘application-consistent’ VMs at
recovery site
Planned Migration
Site BSite A
Replication
1 Shut down
production VMs
2
Sync data, stop replication
and present LUNs to vSphere
3 Recover app-
consistent VMs
vSphere vSphere
18. 18
Simplify failback process
§ Automate replication management
§ Eliminate need to set up new recovery plan
Streamline frequent bi-directional migarations
Automated Failback To Streamline Bi-Directional Migrations
Re-protect VMs from Site B to Site A
§ Reverse replication
§ Apply reverse resource mapping
Automate failover from Site B to Site A
§ Reverse original recovery plan
Restrictions
§ Does not apply if Site A has undergone major
changes / been rebuilt
§ Not available with vSphere Replication
Overview
Benefits
Automated Failback
Site BSite A
Reverse
Replication
Reverse original recovery plan
vSphere vSphere
21. 21
Beyond DR: Disaster Avoidance And Planned Migrations
Recover from unexpected
site failure
• Full or partial site failure
The most critical but least
frequent use-case
• Unexpected site failures do
not happen often
• When they do, fast recovery
is critical to the business
Anticipate potential
datacenter outages
• For example: in case of
planned hurricane, floods,
forced evacuation, etc.
Initiate preventive failover
for smooth migration
• Leverage SRM ‘planned
migration’ to ensure no
data-loss
• ‘Automated failback’
enables easy return to
original site
Most frequent SRM use case
• Planned datacenter
maintenance
• Global load balancing
Streamline routine
migrations across sites
• Test to minimize risk
• Execute partial failovers
• Leverage SRM ‘planned
migration’ to ensure no
data-loss
• ‘Automated failback’
enables bi-directional
migrations
Disaster Failover Disaster Avoidance Planned Migration
3 typical use-cases for SRM
22. 22
SRM Provides Broad Application Coverage
Continuous
Hours
Days
App-level geo-clustering / load balancing
RTO
RTO: 30 minutes to hours
RPO: Flexible based on storage replication
RPO
SynchronousHoursDays
Site Recovery Manager
Tier 1
Apps
Tier 2
Apps
Tier 3
Apps
23. 23
SRM Supports Flexible Topologies
Active-Passive
Failover
Active-Active
Failover
Bi-directional
Failover
Shared
Recovery Sites
Production
Recovery
Production
Recovery
Production
Production
• Most common
traditional scenario
• Expensive dedicated
resources
• Leverage recovery
infrastructure for test,
development, training
• Utilize sunk cost of
recovery site
• Production applications
at both sites
• Each site acts as the
recovery site for the
other
• Many-to-one failover
• Particularly useful for
Remote Office /
Branch Office
24. 24
Points to consider first
§ Distinguish between Service Disruption and Disaster – What is and
What’s Not
§ Availability <not equal> to Disaster Recovery
§ Disaster Recovery Procedures (DRP) and Business Continuity
Procedures – Understand the Differences
§ Important point to note about DRP and BCP solution.
25. 25
Important point to note about DRP and BCP solution
§ No single product provides disaster recovery or business
continuity
§ Companies are dynamic
§ New systems and applications are brought on-line
§ Old systems and applications are retired
§ DRPs and BCPs must be constantly updated to match the current
operations reality.
§ Disaster recovery and business continuity are not products
§ No one product can give a company “instant disaster recovery protection” or
“instant business continuity planning”.
§ VMware Site Recovery Manager (SRM) is a product that helps companies
quickly restores an organization’s IT infrastructure with automation.
§ SRM must be combined with other products and technologies and with a
effective disaster recovery planning and effective business continuity
planning.
26. 26
Key Components Of SRM 5
vCenter Server
Site
Recovery
Manager
Protected Site Recovery Site
Storage
vCenter Server
Site
Recovery
Manager
vSpherevSphere
Storage
Replication Options
vSphere Replication
• Bundled with SRM
Storage-Based Replication (3rd party)
Site Recovery Manager 5
• 1 per site
vCenter Server 5
• 1 per site
• Standard or Foundation
vSphere 3.5, 4.x or 5
• Standard, Enterprise or Enterprise Plus
27. 27
SRM Architecture with vSphere Replication (VR)
“Protected” Site “Recovery” Site
StorageStorage
vSphere Client vSphere Client
VMFS VMFS
Storage
VMFS VMFS
SRM Server
VRMS
vCenter Server
VR
Server
SRM Plug-in
ESX
SRM Plug-in
SRM ServervCenter Server
ESX ESXESX ESX
VRMS
VRA VRAVRAVRAVRA