3. Background
Most major Cloud Service Providers
guarantee SLA > 99.9% for the services on
their platform
But outages do happen for all type of cloud
services at every level
1. IaaS,
2. PaaS
3. SaaS
Smallest of application hosted on a could
platform uses 5-10 different services which
pushes the probability of outage beyond SLAs
Good news – you can gain control
and minimize the effect of a failure
Resiliency is not an add-on. It must be
designed into the system and put into
operational practice
4. Principles of designing resilient apps
Define: Requirements Design
1. What it means for the application to be
available?
2. How much will potential downtime cost
your business?
3. How much downtime is acceptable?
4. Data loss that is acceptable during a
disaster?
5. Identify RTO – RPO
1. Failure mode analysis
a. Identify all of the components in the system,
point of failure
b. For each component, identify potential failures
that could occur
c. Identify is the likelihood of the failure?
d. Determine how the application will respond and
recover
e. Consider tradeoffs in cost and application
complexity
2. Design resiliency at each failure point level
Building a reliable application in the cloud is different than building a
reliable application in an enterprise setting
6. Case Study
Client is a major construction company head
quartered in South Central region and has operations
spread across US
The company recently developed and deployed a system
on Azure cloud platform which
• Enabling it to streamline and optimize its construction
site operations
• Enabling it to centralize data and provided IT team better
control of it
• Providing valuable insight to its leadership team and
helping them in key decision making
During recent Azure services outage, system suffered a major
unexpected downtime which resulted in disruption in company’s
operations
IT team reached-out to WinWire to assist in assessment of the system and
take steps to achieve resiliency
7. Application Details
1. Azure web app – PaaS
2. Azure database – PaaS
3. Virtual Machine (VM) - IaaS
4. 3rd party API hosted on VM
5. Azure storage - PaaS
6. Application Insights - PaaS
7. ADF (Azure data factory) - PaaS
Resource Group
Azure South-Central US
Application
Insights
App Service
API App
Storage Account SQL databases
API App Web App
VM hosting OCR API
App Service
Https traffic
Azure Data factory
Integration
Gateway
ERP Data source
On-premise
Integration Gateway
Azure AD
Requirements
1. RPO – 30 mins
2. RTO – 2 Hr.
8. Disaster Recovery Options
Option 1
Active/ Passive
with
Hot standby
1. Application level RPO : 15 Min
a. Storage RPO : 15 Min
b. SQL DB RPO : < 30 Sec
2. RTO : < 30 Sec
Option 2
Active/ Passive
with
Cold standby
1. Application level RPO : 15 Min
a. Storage RPO : 15 Min
b. SQL DB RPO : < 30 Sec
2. RTO : 1 hour
9. Option 1: Active/Passive with Hot standby architecture
1
Traffic goes to Active region, while
the other waits on Hot standby
3
All components of the application
are provisioned and running in both
Active and Standby regions
2
Redundancy at each component
level
4
Automatic failover to standby
region during planned or
unplanned outages
10. Secondary SQL databases
SQL failover group
Auto Failover SQL Connection
String
Active traffic
Geo - Replication
Failover traffic
Azure South-Central US (Primary)
Application
Insights
App Service
API App
Primary storage
Account
API App Web App
VM hosting OCR API
App Service
Azure Data factory
Integration
Gateway
Integration Gateway
Azure North-Central US (Stand BY)
Application
Insights
App Service
API AppAPI AppWeb App
Standby Storage
Account
Sync Job
On-Prem
ERP Data source
Read-only
Secondary Storage
Geo-Replicated Storage with Read-only Access
Primary SQL databases
Traffic Manager
Primary traffic Failover traffic
App Service
VM hosting OCR API
Azure Data factory
Option 1: Active/Passive with Hot standby architecture
11. Option 2: Active/Passive with Cold standby
architecture
1
Traffic goes to Active region, while
the other waits on cold standby
3
Scripted provisioning - Components
on the Standby region will be
provisioned in the event of outage
2
Redundancy at each component
level across Active and Standby
regions
4 Scripted failover
12. Secondary SQL databases
SQL failover group
Geo - Replication
Auto Failover SQL Connection
String
Active traffic
Azure South-Central US (Primary)
Application
Insights
App Service
API App
Primary Account
API App Web App
VM hosting OCR API
App Service
Azure Data factory
Integration
Gateway
Integration Gateway
Azure North-Central US (Standby)
On-Prem
ERP Data source
Read only
Secondary Storage
Geo-Replicated Storage with Read-only Access
Primary SQL databases
Azure Data factory
VM hosting OCR API
Standby Storage
Account
Utility
Primary traffic
Failover traffic
Application
Insights
App Service
Web App API App
13. Option1: Azure Resource Cost Estimation
The cost is taken from Azure Price Calculator and is per pay-as-you-go model, and is subject to change.
The actual cost might vary by 10 ~ 15% based on utilization ; Azure Pricing Calculator Link : https://azure.microsoft.com/en-in/pricing/calculator/
Service type Primary Region Standby Region Description
Primary Region
Estimated Cost
Secondary Region
Estimated Cost
App Service South Central US North Central US
Standard Tier; 1 S1 (1 Core(s), 1.75 GB RAM, 50 GB Storage) x 730 Hours;
Windows OS
$73.00 $0.00
Application Insights South Central US East Us 5 GB Logs collected, 0 Multi-step Web Tests $0.00
Traffic Manager North Central US North Central US
5 million DNS queries/mo, 4 Azure endpoint(s), 0 Fast Azure endpoint(s), 0
External endpoint(s), 0 Fast External endpoint(s), 0 million(s) of user
measurements, 1 million(s) of data points processed.
$6.14
Storage South Central US North Central US
Block Blob Storage, General Purpose V2, RA-GRS Redundancy, Hot Access
Tier, 1,000 GB Capacity, 100,000 Write operations, 100,000 List and Create
Container Operations, 100,000 Read operations, 1 Other operations. 1,000
GB Data Retrieval, 1,000 GB Data Write, 1000 GB Geo-replication data
transfer
$68.04
Storage North Central US North Central US
Block Blob Storage, General Purpose V2, LRS Redundancy, Hot Access Tier,
1,000 GB Capacity, 100,000 Write operations, 100,000 List and Create
Container Operations, 100,000 Read operations, 1 Other operations. 1,000
GB Data Retrieval, 1,000 GB Data Write
$21.84
Azure SQL Database South Central US North Central US
Single Database, DTU Purchase Model, Standard Tier, S1: 20 DTUs, 250 GB
included storage per DB, 2 Database(s) x 730 Hours, 5 GB Retention
$29.43 $0.00
Virtual Machines North Central US
1 D1 (1 vCPU(s), 3.5 GB RAM) x 730 Hours; Windows – (OS Only); Pay as you
go; 0 managed OS disks – S4, 100,000 transaction units
NA $0.00
Virtual Machines South Central US
2 D1 (1 vCPU(s), 3.5 GB RAM) x 730 Hours; Windows – (OS Only); Pay as you
go; 0 managed OS disks – S4, 100,000 transaction units
$225.80
Monthly Total $402.41 $27.98
Annual Total $4,828.92 $335.76
14. Option2: Azure Resource Cost Estimation
The cost is taken from Azure Price Calculator and is per pay-as-you-go model, and is subject to change.
The actual cost might vary by 10 ~ 15% based on utilization ; Azure Pricing Calculator Link : https://azure.microsoft.com/en-in/pricing/calculator/
Service type Primary Region Standby Region Description
Primary Region
Estimated Cost
Secondary Region
Estimated Cost
App Service South Central US North Central US Standard Tier; 1 S1 (1 Core(s), 1.75 GB RAM, 50 GB Storage) x 730 Hours; Windows OS $73.00 $73.00
Application Insights South Central US East Us 5 GB Logs collected, 0 Multi-step Web Tests $0.00
Traffic Manager North Central US North Central US
5 million DNS queries/mo, 4 Azure endpoint(s), 0 Fast Azure endpoint(s), 0 External
endpoint(s), 0 Fast External endpoint(s), 0 million(s) of user measurements, 1 million(s)
of data points processed.
$6.14
Storage South Central US North Central US
Block Blob Storage, General Purpose V2, RA-GRS Redundancy, Hot Access Tier, 1,000 GB
Capacity, 100,000 Write operations, 100,000 List and Create Container Operations,
100,000 Read operations, 1 Other operations. 1,000 GB Data Retrieval, 1,000 GB Data
Write, 1000 GB Geo-replication data transfer
$68.04
Storage North Central US North Central US
Block Blob Storage, General Purpose V2, LRS Redundancy, Hot Access Tier, 1,000 GB
Capacity, 100,000 Write operations, 100,000 List and Create Container Operations,
100,000 Read operations, 1 Other operations. 1,000 GB Data Retrieval, 1,000 GB Data
Write
$21.84
Azure SQL Database South Central US North Central US
Single Database, DTU Purchase Model, Standard Tier, S1: 20 DTUs, 250 GB included
storage per DB, 2 Database(s) x 730 Hours, 5 GB Retention
$29.43 $29.43
Virtual Machines North Central US
1 D1 (1 vCPU(s), 3.5 GB RAM) x 730 Hours; Windows – (OS Only); Pay as you go; 0
managed OS disks – S4, 100,000 transaction units
NA $130.90
Virtual Machines South Central US
2 D1 (1 vCPU(s), 3.5 GB RAM) x 730 Hours; Windows – (OS Only); Pay as you go; 0
managed OS disks – S4, 100,000 transaction units
$225.80 NA
Monthly Total $657.58
Annual Total $7,890.96
15. Post implementation situation
1. Given the insignificant difference in the cost between two options, Option 1
was recommended and implemented
2. During number of rigorous exercises it was observed that system can failover
and failback with little to no downtime
3. No manual intervention is needed, failover and failback operations are
completely scripted and automatic
4. System is now resilient enough to withstand/minimize impact of expected
or unexpected outages of Azure services