2. 22
Agenda
Why Virtualize
Causes of Downtime and Planning a strategy
Scenario 1 – Baseline High Availability
Scenario 2 – AlwaysOn Availability Groups
Scenario 3 – SQL Server Failover Clustering
Scenario 4 – Rolling Upgrades
Disaster Recovery and Backup
Summary
3. 33
Setting Expectations
This is NOT a Best Practices Session
• This session will cover Availability and Recovery for SQL Server database
VMs
• This session does NOT cover performance, sizing, scaling, or
consolidation…for more information on these topics, please attend VAPP1006-
GD SQL/MS Apps with Jeff Szastak (a group discussion)
4. 44
Summary
TimetoMarket
QualityofService
Availability
Quality of Service (QoS)
Guaranteed performance SLAs through
resource controls, dynamic load balancing,
capacity & performance management
Simplified security SLAs with app protection
Time to Market (TTM)
Availability
Protection against app failures through high
availability and fault tolerance
Simplified business continuity with automated
disaster recovery & backup
Reduced app provisioning times to
minutes through use of templates & intelligent
policy management
Dynamic scaling of apps through scale-
up/scale-out capacity on demand
Complete Flexibility.
Non-Stop Reliability
5. 55
Causes of Downtime
Planned Downtime
• Software upgrade (OS patches, SQL Server cumulative updates)
• Hardware/BIOS upgrade
Unplanned Downtime
• Datacenter failure (natural disasters, fire)
• Server failure (failed CPU, bad network card)
• I/O subsystem failure (disk failure, controller failure)
• Software/Data corruption (application bugs, OS binary corruptions)
• User Error (shutdown a SQL service, dropped a table)
6. 66
Failover Clustering
Local server redundancy
Instance level failover
Zero data loss
Local server and storage redundancy
Disaster recovery
Database level failover
Zero data loss with high safety mode
Database Mirroring
Log Shipping
Multiple disaster recovery sites for databases
Manual failover required
App/user error recovery
New in SQL Server 2012
AlwaysOn Failover Cluster Instance with shared disk
architecture, native support for multi-site cluster
AlwaysOn Availability Group with non-shared disk
architecture, support for multiple secondary,
readable secondary
AlwaysOn
SQL Server Native Availability Features
7. 77
Planning a High Availability Strategy
Requirements
• Recovery Time Objective (RTO)
• What does 99.99% availability really mean?
• Recovery Point Objective (RPO)
• Zero data lost?
• HA vs. DR requirements
Evaluating a technology
• What’s the cost for implementing the technology?
• What’s the complexity of implementing, and managing the technology?
• What’s the downtime potential?
• What’s the data loss exposure?
Availability % Downtime / Year Downtime / Month * Downtime / week
"Two Nines" - 99% 3.65 Days 7.2 Hours 1.69 Hours
"Three Nines" - 99.9% 8.76 Hours 43.2 Minutes 10.1 Minutes
"Four Nines" - 99.99% 52.56 Minutes 4.32 Minutes 1.01 Minutes
"Five Nines" - 99.999% 5.26 Minutes 25.9 Seconds 6.06 Seconds
* Using a 30 day month
8. 88
HardwareFailureTolerance
Application Coverage
VMware FT
Unprotected
Automated
Restart
Continuous
0% 10% 100%
VMware HA
VMotion
(Planned Downtime)
DB Mirroring / RAC
/ AAG
Microsoft Clustering
/ Data Guard / AAG
High Availability Options
Clustering too complex and expensive for most applications
VMware HA and FT provide simple, cost-effective availability
VMotion provides continuous availability against
planned downtime
11. 1111
VMware vSphere High Availability (HA)
Protection against host or operating system failure
• Automatic restart of virtual machines on any available host in cluster
• Provides simple and reliable first line of defense for all databases
• Minutes to restart
• OS and application independent, does not require complex configuration
or expensive licenses
12. 1212
VM Mobility
Server Maintenance
• VMware vSphere® vMotion® and
VMware vSphere Distributed
Resource Scheduler (DRS)
Maintenance Mode
• Migrate running VMs to other servers
in the pool
• Automatically distribute workloads
for optimal performance
Storage Maintenance
• VMware vSphere® Storage vMotion
• Migrate VM disks to other storage
targets without disruption
Key Benefits
• Eliminate downtime for common
maintenance
• No application or end user impact
• Freedom to perform maintenance
whenever desired
13. 1313
App-Aware HA Through Health Monitoring APIs
Leverage third-party solutions that integrate with VMware HA
(for example, Symantec ApplicationHA)
OS
APP
OS
APP
Database Health Monitoring
• Detect database service failures inside VMVMware HA
1
Database Service Restart Inside VM
• App start / stop / restart inside VM
• Automatic restart when app problem detected
2
Integration with VMware HA
• VMware HA automatically initiated when
• App restart fails inside VM
• Heartbeat from VM fails
3
App
Restart
1
2
3
14. 1414
Standalone SQL Server VM with VMware HA, DRS, & vMotion
Highlights:
• Quickly restore service after host
failure
• Simple to configure and easy to
manage
• Can use Standard Windows and
SQL Server editions
Note :
• Protection against hardware failures
only
• Does not provide application-level
protection
16. 1616
What are SQL Server Always On Availability Groups?
• Database-level replication over IP…, no shared storage requirement
• Same advantages as failover clustering (service availability, patching, etc.)
• Two copies of the data…, protection from data corruption
• Readable secondary
• Automatic or manual failover through WSFC policies
17. 1717
Scenario 2 – Improving on AlwaysOn High Availability
Technology Chosen
• AlwaysOn AG for HW and SW protection
• VMware HA & vMotion for added protection
• SRM for DR, SRM integration to restore AG on remote site
Benefits
• Quickly restart failed AAG node to bring cluster back to full capabilities
• Migrate nodes off physical hardware (hosts or storage) without downtime
or impact
• Automate Disaster Recovery at remote site with SRM
18. 1818
vSphere HA with AlwaysOn Availability Group (AG)
Protection against HW/SW
failures and DB corruption
Storage flexibility
(FC, iSCSI, NFS)
Compatible w/ vMotion,
DRS, HA
RTO in few seconds
vSphere HA + AlwaysOn AG
• Seamless integration, VMs rejoins
AG after vSphere HA recovery
• Can shorten time that database
is in unprotected state
• Reduces synchronization time
after VM recovery
20. 2020
Deploying AlwaysOn Availability Group on vSphere
Step 1: vSphere platform setup
• Ensure disk is created as Thick Eager Zeroed
• Create DRS anti-affinity to avoid running VMs on the same host
Step 2: Create WSFC
• Install Failover Clustering feature
• Create a cluster for the Availability Group
• Add SQL Server VMs as cluster nodes
• Configure quorum policy to use “Node and File Share majority”
Step 3: Enable SQL Server for AlwaysOn
• Configure SQL Server service to enable AlwaysOn High Availability
Groups on each SQL Instances
• Restart SQL service
21. 2121
Deploying AlwaysOn Availability Group on vSphere – Continued
Step 4: Create AG for AdventureWorks2012 database
• Prerequisite: Set database to use full recovery mode
• Prerequisite: Take a full backup of the database
• Create a 2 node AG with synchronous commit, automatic failover
• Create a Database Listener for the AG
Step 5: Monitor AG from Dashboard
• Dashboard shows the heath state of the AG, and status of each replica
23. 2323
What is Microsoft Failover Clustering?
• Provides application high-availability through a shared-disk architecture
• One copy of the data, rely on storage technology to provide data redundancy
• Automatic failover for any application or user
• Suffers from restrictions in storage and VMware configuration
24. 2424
vSphere HA with Failover Clustering
Highlights:
• RTO in few seconds
• Protection against HW/SW failures
but not DB corruption
• Legacy application support (those
not mirror-aware)
Note:
• DRS and vMotion not available (only
cold migration)
• No protection from data corruption or
storage failures
• Storage must be FC
• Must use RDMs
25. 2525
VMware Support For Microsoft Clustering On vSphere
Microsoft
Clustering on
VMware
vSphere
support
VMware
HA
support
vMotion
DRS
support
Storage
vMotion
support
MSCS
Node
Limits
Storage Protocols support Shared Disk
FC
In-
Guest
OS
iSCSI
Native
iSCSI
In-
Guest
OS
SMB
FCoE RDM VMFS
Shared
Disk
MSCS with
Shared Disk
Yes Yes1 No No
2
5 (5.1 only)
Yes Yes No Yes5 Yes4 Yes2 Yes3
Exchange
Single
Copy Cluster
Yes Yes1 No No
2
5 (5.1 only)
Yes Yes No Yes5 Yes4 Yes2 Yes3
SQL Clustering Yes Yes1 No No
2
5 (5.1 only)
Yes Yes No Yes5 Yes4 Yes2 Yes3
SQL AlwaysOn
Failover Cluster
Instance
Yes Yes1 No No
2
5 (5.1 only)
Yes Yes No Yes5 Yes4 Yes2 Yes3
Non
shared
Disk
Network Load
Balance
Yes Yes1 Yes Yes
Same as
OS/app
Yes Yes Yes N/A Yes N/A N/A
Exchange CCR Yes Yes1 Yes Yes
Same as
OS/app
Yes Yes Yes N/A Yes N/A N/A
Exchange DAG Yes Yes1 Yes Yes
Same as
OS/app
Yes Yes Yes N/A Yes N/A N/A
SQL AlwaysOn
Availability
Group
Yes Yes1 Yes Yes
Same as
OS/app
Yes Yes Yes N/A Yes N/A N/A
Shared Disk Configurations: Supported on
vSphere with additional considerations for storage
protocols and disk configs
Non-Shared Disk Configurations: Supported on
vSphere just like on physical
* Use affinity/anti-affinity rules when using vSphere HA
** RDMs required in “Cluster-across-Box” (CAB) configurations, VMFS required in “Cluster-in-Box” (CIB) configurations
VMware Knowledge Base Article: http://kb.vmware.com/kb/1037959
27. 2727
Patching Non-clustered Databases
Benefits
• No need to deploy an MS cluster
simply for patching / upgrading the
OS and database
• Ability to test in a controlled manner
(multiple times if needed)
• Minimal impact to production site
until OS patching completed
and tested
• Patching of secondary VM
can occur during regular
business hours
Requires you to layout
VMDKs correctly to
support this scenario
28. 2828
Scripted MS SQL Server Rolling Patch Upgrades
VMware PowerCLI and Powershell provide a reproducible result
What about…
Audit trail / log of execution?
Which roles participate in managing upgrade and how?
VMware ESX VMware ESXi
29. 2929
Use vCenter Orchestrator and vCloud Automation Center
to Enhance Rolling Patch Upgrades
Automation Execution and Status
• Workflows provide a powerful means for process flow
and control
• Creates a standard definition of infrastructure processes
• Execution status available in realtime
Integrates with Scripting and Systems
• Managed Powershell execution
Self Service
• Self Service Portal
• Initiated by assigned user Roles
• Delegated Approvals
31. 3131
Rolling Patch Upgrade Using Standby VM
Step 1: Configure Standby VM
• Create VM using SQL Server Sysprep or using OS only clone + SQL install
• Apply any server level configurations changes
• Patch Standby VM to the target service pack level
• Start client app (for demo purpose only)
Step 2: Remove Primary VM from public network
• Disconnect public nic
• Observe: client is experiencing temporary connection down,
and in a loop to reconnect
Step 3: Hot remove resource from Primary VM
• Detach database from SQL Server instance using a script
• Take disk offline
• Hot remove VMDK from VM
32. 3232
Rolling Patch Upgrade Using Standby VM – Continued
Step 4: Hot add resource to Standby VM
• Hot add VMDK to Standby VM
• Bring disk online
• Attach database to SQL Server instance
Step 5: Perform final role switch
• Configure Standby VM to take the IP address of the Primary VM public nic.
Standby is now the new primary.
• Observe: client is automatically reconnected to the new primary with
update service pack
The old Primary VM can be taken down for application of
service patch
See blog post on: http://blogs.vmware.com/apps/2011/11/sql-
server-rolling-patch-upgrade-using-standby-vm.html
34. 3434
VMware vCenter Site Recovery Manager™ (SRM)
• Relies on storage or vSphere host replication
• Allows creation, maintenance, and execution of automated process to
facilitate site recovery
• Safe testing without impacting production environment
• Self-documenting
35. 3535
VMware vCenter SRM with SQL Server AAG
• AAG provides local availability
• Storage replication keeps DR facility in sync
• During a site failure, the admin has full control of recovery
• After workflow is initiated, SRM automates the recovery process
• The entire process can be tested without actually failing over services!
36. 3636
In-guest SQL Server-Aware Backup Solution
• Standard method for physical or virtual
• Agent runs in the VM guest and handles database quiescing
• Data is sent over the IP network
• Can affects CPU utilization in the guest OS
37. 3737
Array-based Backup
• Backup vendor software coordinates with VSS to create a supported backup
image of the SQL Server databases
• Snap-shotted databases can later be streamed to tape as flat files with no IO
impact to the production SQL Server
38. 3838
VMware
Putting It All Together
Planned downtime avoidance
• vMotion & Storage vMotion
• Rolling SQL Server upgrades with vCO / vCAC
Un-Planned downtime recovery
• vSphere HA + AppAware HA
• vSphere FT
Disaster recovery
• Site Recovery Manager
SQL Server 2012
• AlwaysOn Availability Groups
Pre-SQL Server 2012
• Failover Clustering
• Database Mirroring
• Log Shipping
• Replication
39. 3939
Summary
TimetoMarket
QualityofService
Availability
Quality of Service (QoS)
Guaranteed performance SLAs through
resource controls, dynamic load balancing,
capacity & performance management
Simplified security SLAs with app protection
Time to Market (TTM)
Availability
Protection against app failures through high
availability and fault tolerance
Simplified business continuity with automated
disaster recovery & backup
Reduced app provisioning times to minutes
through use of templates & intelligent policy
management
Dynamic scaling of apps through scale-
up/scale-out capacity on demand
Complete Flexibility.
Non-Stop Reliability
40. 4040
Resources
Visit us on the web to learn more on specific apps
• http://www.vmware.com/solutions/business-critical-apps/
Visit our Business Critical Application blog
• http://blogs.vmware.com/apps/
…and please attend our sessions listed below for more detailed information on virtualizing and
managing Tier 1 Apps on VMware!
VAPP5473 – Automated Management of Tier-1 Applications on VMware
VAPP5613 – Successfully Virtualize Microsoft Exchange Server
VAPP5932 – Virtualizing Highly Available SQL Servers
VAPP6124 – Automating VMware Cloud and Virtualization Deployments with Dell Active Infrastructure
VAPP5618 – Virtualize Active Directory, the Right Way!
VAPP4906 – Architecting Oracle Databases on vSphere 5 with NetApp Storage
VAPP5834 – Virtualizing Mission Critical Oracle RAC with vSphere and vCOPS
BCO4905 – Disaster Recovery Solution with Oracle Data Guard and Site Recovery Manager
VAPP4813 Real-world Design Examples for Virtualized SAP Environments
VCM4891 Performance Management of Business Critical Applications using vCenter Operations Management
42. 4242
Other VMware Activities Related to This Session
HOL:
HOL-SDC-1304 and HOL-SDC-1317
vSphere Performance Optimization
vCloud Suite Use Cases - Business Critical Applications
Group Discussions:
VAPP1006-GD
SQL/MS Apps with Jeff Szastak