Benefits of IaaS
QUICKER MIGRATION TIMES THAN OTHER CLOUD
OFFERINGS
ABILITY TO KEEP SIMILAR ARCHITECTURE
INTRODUCE CLOUD SERVICES AND FEATURES
REMOVE THE DATACENTER
Insanity Is Doing the Same Thing
Over and Over Again and
Expecting Different Results
~Einstein
*Also Infrastructure folks who continually try to lift and
shift the infrastructure for database workloads…
Migrate the Workload, not the
Hardware
Servers may not have been sized appropriately for the workload.
Workload of database may have changed over time.
May cost you more in licensing than what the workload actually
requires.
For different databases, there are
different tools to assist:
SQL Server: DMVs, PerfMon, Scripting, (Randal, Klee, etc) Redgate
SQL Monitor
Oracle: AWR, OEM, ASH, SASH, Statspack, Tracing
MySQL: Solarwinds DPA, Instrumental, Panopta
Architect for the Cloud
Deploy all tiers to the cloud
Avoid ingress or egress charges
Reduce latency
Remove complexity and centrally locate to
the cloud
Refactor processes that utilize
large percentages of resources
and network. In the cloud, this
has an impactful cost.
A lift and shift does not equal
taking what you have on-prem
and duplicating it. Success
means you take the database
and lift and shift it with the
support of cloud services.
https://azure.microsoft.com/en-
us/pricing/details/virtual-machines/series/
Understand
IaaS VM
Series
• A and B-series commonly won’t work for
databases.
• D-series can work for some, but consider matching
series to production vms, but lesser resources
• L and H-series are outliers for database workloads.
• Identify workload needs
• D-series is for general use
• E-series and M-series are the most common VMs in the
database industry
• E-series for average production databases
• M-series, but verify IO storage/network limits!
https://docs.microsoft.com/en-us/azure/virtual-
machines/windows/constrained-vcpu
When one VM
is too Much-
Constrained
VMs
• Allows for isolation of vCPU to application
licensing for database and app workloads
• Matched in existing series VMs in the Azure
Pricing Calculator
• Share storage between databases or apps
• Before choosing, ensure your product licensing
support constrained vCPU VMs
• Carefully match workloads on IO and memory,
not just vCPU usage when combining.
Specialized
Constrained
vCPU VMs
Name vCPU Specs
Standard_M8-2ms 2 Same as M8ms
Standard_M8-4ms 4 Same as M8ms
Standard_M16-4ms 4 Same as M16ms
Standard_M16-8ms 8 Same as M16ms
Standard_M32-8ms 8 Same as M32ms
Standard_M32-16ms 16 Same as M32ms
Standard_M64-32ms 32 Same as M64ms
Standard_M64-16ms 16 Same as M64ms
Standard_M128-64ms 64 Same as M128ms
Standard_M128-32ms 32 Same as M128ms
Standard_E4-2s_v3 2 Same as E4s_v3
Standard_E8-4s_v3 4 Same as E8s_v3
Standard_E8-2s_v3 2 Same as E8s_v3
Standard_E16-8s_v3 8 Same as E16s_v3
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/constrained-vcpu
https://www.oracle.com/database/technologies/high-availability/maa.html
Architect for
the Cloud
• Maximum Availability Architecture
• Different names for different vendors.
• Get a clear understanding of the SLA uptime for the business
and environment.
• Onprem datacenters are not the same as cloud architecture.
• Pivot products and services to cover what you need.
• High Availability
• Identify what HA means to stakeholders.
• Often, it’s specific features, not a product, then marry these to
a cloud product which:
• Matches the IaaS architecture
• Doesn’t introduce overhead
• Has vendor support
• Identify what cloud services may duplicate or simulate
the same feature if unavailable.
Azure Location Concepts
Concept Description
Region Multiple datacenters within a specific perimeter and connected
through a low-latency network
Geography A specific location area. The area may have more than one Azure
region
Availability Zone Physical regions located within a region. Each zone has one or more
datacenters equipped with independent power, cooling and
network.
Geo-Region Current region recommended with the appropriate services and
redundancy for the database and other workloads.
Secondary Region Utilized to spread a workload for HA and/or recovery
Use Availability Zones
• High Availability, (HA) offering to
protect data and apps from
datacenter failures.
• Contain multiple locations
within a single Azure region.
• Not all products or services are
available for AZ or in every
region.
• No additional cost to deploy
VMs in an Availability Zone.
https://docs.microsoft.com/en-us/azure/availability-zones/az-overview
Disaster
Recovery
• Along with AZ/AG,
etc.
• Use DR products
that best support
cloud
• Always-on
Availability Groups
and Oracle
DataGuard
• Implement
advanced,
automation features
to remove manual
intervention
• Clearly identify RPO,
(Recovery Point
Objective) and RTO,
(Recovery Time
Objective) for your
business.
• Ensure that the HR,
DR, backup and
recovery decisions
meet these and
have been fully
TESTED.
Storage is
SEPARATE
and
Important
• Ensure you know the IO workload for your
database going to the cloud
• Understand both the MB/s and the IO
throughput for the database.
• Oracle has demonstrated, on average,
much higher demands for IO than MSSQL,
MySQL or PostgreSQL.
• Storage is separate to ensure the right
combination in IaaS can be reached.
Storage
Considerations
What is the storage to
be used for?
Data- OLTP,
DSS, OLAP, Big
Data?
Logging
Backup
Ensure that backups and data
refresh requirements are calculated
into the IO demands for the
database.
Ultra Disk
Ultra Disk Offerings
Disk Size
(GiB)
4 8 16 32 64 128 256 512
1,024-
65,536 (in
increment
s of 1 TiB)
IOPS
Range
1,200 2,400 4,800 9,600 19,200 38,400 76,800 160,000 160,000
Throughpu
t Range
(MB/s)
300 600 1,200 2,000 2,000 2,000 2,000 2,000 2,000
Ultradisks
• Often the first recommendation by Infra
• Be aware of the limitations before
recommending for database workloads:
• Oracle 12.2 later is supported
• Only supports un-cached reads and un-cached writes
• Doesn't support disk snapshots, VM images, availability
sets, Azure Dedicated Hosts, or Azure disk encryption
• No integration with Azure Backup or Azure Site Recovery
• Offers up to 16 TiB per region per subscription
unless upped via support.
• Isn’t available in all regions.
Capacity
per disk
(GiB)
IOPS per
disk
Throughput
per disk
(MB/s)
Mininum 4 100 1
Maximum 65536 160000 2000
https://docs.microsoft.com/en-us/azure/virtual-machines/disks-enable-ultra-ssd#ga-scope-and-limitations
GiB * .05, MBPs * 1.01, IOPs * .12, vCPU * 4.83
Types of cache
Settings
• Available to Premium Storage
• A Multi-tier caching technology, aka BlobCache
• OS Disk- ReadWrite is fine, which is the default,
but not for datafiles.
• ReadOnly Cache is, as it caches reads, while
letting writes pass through to disk.
• Limit of 4095Gib on per individual premium disk
• Results in any disk above a P40 for entirety
will silently disable read caching.
• Larger disks are preferably used without
caching, otherwise additional space is
wasted. P50, just allocate 4095 of the 4096
size.
• Use smaller disks and choose to stripe and
mirror.
• M-series available and VM series dependent.
IO Throttling
• Why it happens?
• No, you can’t have all the
resources for yourself.
• What all can be involved?
• It’s not just the database.
• How to identify it?
• What do to when it is
identified?
https://docs.microsoft.com/en-us/azure/virtual-machines/sizes-memory?toc=/azure/virtual-
machines/linux/toc.json&bc=/azure/virtual-machines/linux/breadcrumb/toc.json
Bring in
Additional
Solutions
• High IOPS-
• MBPs: Azure NetApp
Files
• Higher IO
throughput:
Consider Silk,
Flashgrid Storage,
Pure Storage or
Excelero.
• Consider disk
striping of smaller
disks and parallel
processing at the
database level.
• Backups, batch loading
and other challenges:
• Offload backups
with secondary
backup solutions.
• Refactor batch
processing with
other services,
(Azure Data Factory,
Azure Analysis
Services, Databricks,
etc.)
Azure NetApp
Files
• Fully Managed, PaaS,
Microsoft Azure Storage
Service
• All Flash Baremetal Storage
• Only dependent on Nic, not
VM.
• *Available in Standard,
Premium, (common) and
Ultra, (optimal)
• ANF is native to Azure
Azure
Files
Premium
Files
Azure NetApp
Files
Premium
Disk
Performance 1K IOPs 100K IOPs 320K IOPs 20K IOPs
Capacity Pool 5TB 100TB 500TB 32TB
AD Integration Azure AD N/A Bring Your Own
AD
/ Azure AD
N/A
Protocol SMB SMB NFS & SMB Disk
Data Protection LRS Only Snapshots
Back Up Tools
Snapshots
*Be aware of pricing with scaling to meet IO
FAQs About Azure NetApp Files | Microsoft Docs
When To Go
Old-School
• Depending on the combination of storage, striping
and RAID, performance can vary greatly.
• Verify that disk is striped correctly, (log creation
commands and document.)
• Consider smaller disk size and stripe vs. larger,
single drive to offer better performance.
• In Linux, consider huge pages and use LVM,
(Linux Volume Manager) or Oracle ASM,
(Automatic Storage Management) to provide
advanced features for diskgroup layout.
• Keep an eye on disk sector size, (there’s a bug
requiring 512 byte sector size in Oracle 12.1)
Failure Due to
Backups
• Modernize the way the database is backed up and
restore if RMAN is 40% of total IO in AWR or
database has small window to backup.
• Archaic backup and data refresh strategies can
impact a cloud environment heavily in IO and
network latency
• Snapshot technology with database consistency
should be your FIRST choice in backup solutions for
large databases.
• Oracle AWR can demonstrate the impact on the
overall database workload of RMAN and
datapump jobs.
• The Profiler can identify the workload impact in
SQL Server.
Simplify the
Shift to the
Cloud
• Migrate your tools that you already use to
monitor and manage the database on-prem into
the cloud whenever possible.
• For Oracle, we implement Oracle Enterprise
Manager, (Cloud Control) to ensure the
cloud environment looks just like their
onprem one.
• Redgate SQL Monitor, Solarwinds SQL
Sentry, Dynatrace, Idera Uptime
Infrastructure Monitor, etc.
• Use features to automate OS patching using
Azure Linux/Windows automated patching
service.
• Incorporate DevOps automation to the cloud
changes FIRST.
It’s Not Just
Infrastructure
• No matter if during the migration or when there are
issues:
• Infrastructure support will be the first line of
defense.
• Database workload will be an afterthought.
• Data support may be a request only option.
• First inclination is to “throw iron” at the problem.
• Demand to look at the code, database design,
etc.
• If you fix the real cause, you fix it once vs.
revisiting it over and over.
• Do have support take advantage of advanced
Azure tools to help identify where the problem
is, (IO, memory, CPU)
Manage with
What You Know
• Use the cloud services of what you already use on-
prem.
• If you can deploy your existing, on-prem tool on a
VM, consider doing this, (Oracle Enterprise Manager,
Redgate, Idera, Solarwinds, etc.- and its cloud ready,
do it!)
• Keep backup, replication tools as often as you are
able- don’t create larger learning curves than what is
required.
Simulate PaaS in
IaaS
• Use Azure Managed Instance for SQL Server
• Use Lifecycle Management Pack with Oracle
Enterprise Manager to automate
monitoring, management and database
patching.
• Use Linux Automated Patching, (preview) to
automate OS patching of VMs.
• Introduce Azure services to simplify the
current products used onprem
• Automate using DevOps, including
deployment builds with Terraform, Ansible,
etc.
Review: Database Workloads on IaaS
Know
Know the
infrastructure
Know
Migrate the
workload, not the
onprem hardware.
Know
Know what is the
cause of the
problem- don’t
guess.
Bring in
Bring in existing
tools that are cloud
enabled
Know
Know what tools are
available in the
cloud and when
stuck, bring in Azure
support.
References
SQL Server Performance Guidelines on Azure: Checklist: Best practices & guidelines - SQL Server on
Azure VM | Microsoft Docs
Oracle on Azure: Oracle solutions on Microsoft Azure - Azure Virtual Machines | Microsoft Docs
Understanding AZ and AS: Availability options for Azure Virtual Machines - Azure Virtual Machines |
Microsoft Docs
Virtual Machine and Disk Performance: Virtual machine and disk performance - Azure Virtual
Machines | Microsoft Docs
Azure Premium Storage: Azure Premium Storage: Design for high performance - Azure Virtual
Machines | Microsoft Docs
Azure Network Performance for IaaS: Optimize VM network throughput | Microsoft Docs
Infrastructure Automation: Use infrastructure automation tools - Azure Virtual Machines |
Microsoft Docs
Ultradisks for Azure Linux VMs:
• https://docs.microsoft.com/en-us/azure/virtual-machines/linux/disks-enable-ultra-ssd
P10 is my favored OS Disk- try to always use Premium SSD, available in the VM series with the designation of “S” in the name.
30-P50 is the most common for datafiles and we turn on readonly Host caching to achieve what we need. The P50 is over the limit of 4095, so just don’t allocate the last 1g and capture a huge performance benefit!
Azure Premium Storage have a multi-tier caching technology called BlobCache, which uses a combination of the host vRAM and local SSD for caching I/O. By default, this cache setting is set to Read/Write for OS disks, which is the disk on which the Linux OS resides, and ReadOnly for data disks, which are the disks on which Oracle database files might reside.
As the name suggests, ReadWrite caches both read I/O and write I/O from the VM, and because writes are not persisted directly to storage, this is unsuitable for database applications. Also as the name suggests, ReadOnly caches only read I/O, allowing write I/O to write-through directly to storage, which is appropriate for databases.
No one can have it all. One of the benefits of the cloud is also one of the challenges- how to give everyone a share. Throttling occurshttps://docs.microsoft.com/en-us/azure/virtual-machines/sizes-memory?toc=/azure/virtual-machines/linux/toc.json&bc=/azure/virtual-machines/linux/breadcrumb/toc.json