This document provides an overview of setting up disaster recovery for an application using Azure technologies. It discusses deploying the application across two regions with a web front end and SQL backend. AlwaysOn Availability Groups are used to replicate the SQL database between regions. Azure Site Recovery is used to replicate the web VMs and failover the SQL database using Azure Automation runbooks. Steps shown include configuring networking, deploying the application, setting up AlwaysOn and Site Recovery, testing the failover process, and various aspects of the disaster recovery approach and architecture.
Azure BCDR in Action: From Setup to Failover and Back
1. Azure BCDR in Action
From Setup to Failover and Back
Yung Chou
Cloud Solution Architect
US West
2. References
• Microsoft Cloud Workshop
• Building a resilient IaaS architecture
• Selected Readings
• Multi-tier web application built for HA/DR
• Tutorial: AG in multiple subnets - SQL Server on Azure VMs
• Azure Application Architecture Fundamentals
• Microsoft Azure Well-Architected Framework
• Security documentation
• General questions about the Azure Site Recovery service
2
3/17/2023
3. Agenda
• The app
• Architecture and Deployment
• SQL AlwaysOn
• DR Approach
• DR Settings
• SQL Back End
• Automation and DR Plan
• IIS Front End
• Failover Test
This delivery covers most of Exercise 2 and beyond of the workshop.
3/17/2023 3
4. 3/17/2023 4
The following slides are:
• Taken from multiple deployments of the workshop
• Intended as additional information to facilitate your workshop deployment
by providing a reference for the context and expected results
These slides are:
• NOT for replacing the workshop instructions
• With most values if referenced along with the workshop instructions on
relevant exercises and tasks
Notice while the slides taken from deployments may show resource names
inconsistent from one section to another, the process flows with expected
resource states remain correctly depicted.
6. West US3 East US
ContosoWebLBPrimary ContosoWebLBSecondary
Contoso Front Door
Contoso Insurance App
• Front Door pointing to Contoso origin
• External LB
• HTTP only on port 80
• Source and DR sites
• Front Door routes
• Web/IIS Tier
• Zone redundancy
• Backup with CRR
• Internal LB
• Port 1443 from IIS Tier only
• Source and DR sites
• Data/SQL Tier
• Three-node failover cluster
• Cloud Witness
• Zone redundancy
• AlewaysOn with listener on 1443
• One vnet with a DC in each zone
• Vnet peering between westus3 and eastus
• RSV
• Backup RSV in westus3
• Site Recovery RSV in eastus
3/17/2023 9
7. Disaster Recovery Approach
10
Tier DR Strategy
Web Failover using Azure Site Recovery
SQL Secondary SQL AlwaysOn Availability Group replica with asynchronous
replication. Failover steps are integrated into Azure Site Recovery using
Azure Automation.
AD Active-active domain controllers
3/17/2023
8. Create a Cloud Witness for SQL Failover Cluster
11
3/17/2023
9. Add SQLVMs to Load Balancer Backend Pool
12
3/17/2023
10. Create a SQL Failover Cluster (in SQLVM1)
New-Cluster -Name AOGCLUSTER -Node SQLVM1,SQLVM2 -StaticAddress 10.0.2.99
13
3/17/2023
27. Contoso Insurance App
• External LB
• HTTP only on port 80 only
• Web/IIS Tier
• Zone redundancy
• Internal LB
• Port 1443 from IIS Tier only
• Data/SQL Tier
• Two-node failover cluster
• Cloud Witness
• Zone redundancy
• AlewaysOn with listener on 1443
• One vnet with one DC in each zone
HTTP Requests
3/17/2023 34
28. Contoso Insurance App Deployment Highlights
• Contoso.ins
• Source in westus3
• contoso-vnet-westus3
• Contoso.ins.DR-Site
• DR site in eastus
• contoso-vnet-eastus
• Contoso.ins.RSV
• Backup: contoso-RSV-westus3
• IIS1, IIS2, SQL1 and SQL2 backup
• DR: contoso-RSV-eastus
• Automation account
• Failover runbooks
3/17/2023 35
48. Extend the SQL AlwaysOn Created Earlier
to include SQLVM3 to the Always On group as an asynchronous replica
1. In Azure portal, add SQLVM3 to the load-balancer backend pool in the DR site.
2. In SQLVM1, add SQLVM3 to the existing Windows Server Failover Cluster.
3. In SQLVM3, 3nable AlwaysOn and set the domain login credentials.
4. In SQLVM1,
• update the Availability Group Listener to include the SQLVM3 IP address,
• add SQLVM3 as an asynchronous replica in the existing Always On Availability Group.
5. In SQLVM1, run PowerShell script to update the failover cluster with the
Listener IP addresses.
55
3/17/2023
50. Add SQLVM3 to the Failover Cluster
• Restart ADs as needed to ensure DNS entries are current
• RDP into SQLVM1 and
Add-ClusterNode -Name SQLVM3
58
3/17/2023
160. Step Description Documentation Reference
Test Test the ASR configuration routinely and often for failing over from
source site to DR one to ensure it works as expected
Run a test failover (disaster
recovery drill) to ASR
Failover Initiate failover to switch over to the replicated environment for DR
or planned maintenance
About failover and failback in
ASR - Modernized - ASR
Commit Commit the changes made during the failover process to the
replicated environment to ensure it's up-to-date
Run a failover during disaster
recovery with ASR
Re-protect Re-protect the production environment to ensure it's ready for the
next failover
Reprotect Azure VMs to the
primary region with ASR
Re-test Re-test the ASR configuration to ensure it works as expected after
re-protecting
Fall back Fall back to the production environment if the failover was initiated
for planned maintenance or testing
Re-commit Re-commit the changes made during the failover process to the
production environment to ensure it's up-to-date
Re-protect Re-protect the replicated environment to ensure it's ready for the
next failover after falling back
Failover/Failback Routine
3/17/2023 174
Notas del editor
In my experience, many companies viewed implementing Business Continuity and Disaster Recovery (BCDR) as too technically complex and financially unfeasible, resulting in it becoming more of an academic exercise than an attainable, predictable, measurable, and verifiable business process.
With Azure Recovery Services, I have found this perception no longer accurate.
I used the Microsoft Cloud Workshop to showcase Azure BCDR with step-by-step guidance to
Configure a DR plan for a database app in Azure West US 3 region
Drill/rehearse the plan to failover the app to Azure East US region is a DR scenario
Execute a failover to mimic conducting a DR episode
Commit the failover upon verifying the plan executed with expected results
Later
Follow a series of steps for reversing and falling back the app to its original region, West US 3
Reenable the protection, i.e., DR pan, and ensure readiness for future DR needs
The slide deck includes screen captures of relevant processes and resource settings, serves as a reference for context and expected results. While the deck is not intended to replace the workshop instructions and despite inconsistent resource names in some sections, the process flows with expected resource states are accurately depicted. One may find it handy for realizing the how and what of executing the workshop exercises and tasks.
Upon deployed the application, ContosoWebLBPrimaryIP has the public IP and the DNS name of the app.
The landing page here is slightly different from that provided by the workshop.
Policy page
Customer info page
Policy Holder page
The presented demo infrastructure is not necessarily a recommendation.
For instance, instead of internal LB, another option may be
https://learn.microsoft.com/en-us/azure/architecture/example-scenario/infrastructure/multi-tier-app-disaster-recovery
Enable Disaster Recovery for the Contoso application
https://github.com/microsoft/MCW-Building-a-resilient-IaaS-architecture/blob/master/Hands-on%20lab/HOL%20step-by%20step%20-%20Building%20a%20resilient%20IaaS%20architecture.md#exercise-2-enable-disaster-recovery-for-the-contoso-application
Configure HA for the SQL Server tier
https://github.com/microsoft/MCW-Building-a-resilient-IaaS-architecture/blob/master/Hands-on%20lab/HOL%20step-by%20step%20-%20Building%20a%20resilient%20IaaS%20architecture.md#task-3-configure-ha-for-the-sql-server-tier
Select the Subnet of 10.0.2.0/24 and then add IPv4 10.0.2.100 and select OK. This is the IP address of the Internal Load Balancer that is in front of the SQLVM1 and SQLVM2 in the Data subnet running in the Primary Site.
SQLAlwaysOn
10.0.2.100
The automation account and associated runbooks can be placed in any region other than the source/primary region, as in DR the source/primary region is expected experiencing an outage.
Bastion host names are difference due to an unplanned redeployment on Bastion in westus3.
Location: Any region that support automation except for your primary region.
Repeat the steps to import and published both runbooks.
Configure DR for the SQL Server tier
https://github.com/microsoft/MCW-Building-a-resilient-IaaS-architecture/blob/master/Hands-on%20lab/HOL%20step-by%20step%20-%20Building%20a%20resilient%20IaaS%20architecture.md#task-3-configure-dr-for-the-sql-server-tier
Add-ClusterNode -Name SQLVM3
Enable Disaster Recovery for the Contoso application
https://github.com/microsoft/MCW-Building-a-resilient-IaaS-architecture/blob/master/Hands-on%20lab/HOL%20step-by%20step%20-%20Building%20a%20resilient%20IaaS%20architecture.md#exercise-2-enable-disaster-recovery-for-the-contoso-application
Exercise 3: Enable Backup for the Contoso application
https://github.com/microsoft/MCW-Building-a-resilient-IaaS-architecture/blob/master/Hands-on%20lab/HOL%20step-by%20step%20-%20Building%20a%20resilient%20IaaS%20architecture.md#exercise-3-enable-backup-for-the-contoso-application
Task 3: Enable Backup for the SQL Server tier
https://github.com/microsoft/MCW-Building-a-resilient-IaaS-architecture/blob/master/Hands-on%20lab/HOL%20step-by%20step%20-%20Building%20a%20resilient%20IaaS%20architecture.md#task-3-enable-backup-for-the-sql-server-tier
AS NEEDED
Register-AzResourceProvider -ProviderNamespace Microsoft.SqlVirtualMachine
New-AzSqlVM -Name ‘ci-sql1' -ResourceGroupName ‘ci-w3' -SqlManagementType Full -Location ‘westus3' -LicenseType PAYG
New-AzSqlVM -Name ‘ci-sql2' -ResourceGroupName ‘ci-w3' -SqlManagementType Full -Location ‘westus3' -LicenseType PAYG
New-AzSqlVM -Name ‘ci-sql3' -ResourceGroupName ‘ci-eus' -SqlManagementType Full -Location ‘eastus' -LicenseType PAYG
Task 5: Configure a public endpoint using Azure Front Door
https://github.com/microsoft/MCW-Building-a-resilient-IaaS-architecture/blob/master/Hands-on%20lab/HOL%20step-by%20step%20-%20Building%20a%20resilient%20IaaS%20architecture.md#task-5-configure-a-public-endpoint-using-azure-front-door
Task 2: Validate Disaster Recovery - Failover IaaS region to region
https://github.com/microsoft/MCW-Building-a-resilient-IaaS-architecture/blob/master/Hands-on%20lab/HOL%20step-by%20step%20-%20Building%20a%20resilient%20IaaS%20architecture.md#task-2-validate-disaster-recovery---failover-iaas-region-to-region
Task 2: Validate Disaster Recovery - Failover IaaS region to region
https://github.com/microsoft/MCW-Building-a-resilient-IaaS-architecture/blob/master/Hands-on%20lab/HOL%20step-by%20step%20-%20Building%20a%20resilient%20IaaS%20architecture.md#task-2-validate-disaster-recovery---failover-iaas-region-to-region