IaaS for DBAs in Azure

Connecting Data to People, Accelerating Innovation and a Force of Nature en Delphix
16 de Sep de 2021

Más contenido relacionado

Similar a IaaS for DBAs in Azure(20)


IaaS for DBAs in Azure

  1. Infrustructure DBA in the Cloud aka “I Have Control Issues”
  2. About me Principal Data Engineer @DBAKevlar SME for Oracle on Azure at Microsoft
  3. “The best DBAs always have control issues” ~Kellyn Gorman Still a DBA
  5. How is IaaS Different from PaaS
  7. Insanity Is Doing the Same Thing Over and Over Again and Expecting Different Results ~Einstein *Also Infrastructure folks who continually try to lift and shift the infrastructure for database workloads…
  8. Migrate the Workload, not the Hardware Servers may not have been sized appropriately for the workload. Workload of database may have changed over time. May cost you more in licensing than what the workload actually requires. For different databases, there are different tools to assist: SQL Server: DMVs, PerfMon, Scripting, (Randal, Klee, etc) Redgate SQL Monitor Oracle: AWR, OEM, ASH, SASH, Statspack, Tracing MySQL: Solarwinds DPA, Instrumental, Panopta
  9. Architect for the Cloud Deploy all tiers to the cloud Avoid ingress or egress charges Reduce latency Remove complexity and centrally locate to the cloud Refactor processes that utilize large percentages of resources and network. In the cloud, this has an impactful cost. A lift and shift does not equal taking what you have on-prem and duplicating it. Success means you take the database and lift and shift it with the support of cloud services.
  10. us/pricing/details/virtual-machines/series/ Understand IaaS VM Series • A and B-series commonly won’t work for databases. • D-series can work for some, but consider matching series to production vms, but lesser resources • L and H-series are outliers for database workloads. • Identify workload needs • D-series is for general use • E-series and M-series are the most common VMs in the database industry • E-series for average production databases • M-series, but verify IO storage/network limits!
  11. VM Series H-Series….
  12. machines/windows/constrained-vcpu When one VM is too Much- Constrained VMs • Allows for isolation of vCPU to application licensing for database and app workloads • Matched in existing series VMs in the Azure Pricing Calculator • Share storage between databases or apps • Before choosing, ensure your product licensing support constrained vCPU VMs • Carefully match workloads on IO and memory, not just vCPU usage when combining.
  13. Specialized Constrained vCPU VMs Name vCPU Specs Standard_M8-2ms 2 Same as M8ms Standard_M8-4ms 4 Same as M8ms Standard_M16-4ms 4 Same as M16ms Standard_M16-8ms 8 Same as M16ms Standard_M32-8ms 8 Same as M32ms Standard_M32-16ms 16 Same as M32ms Standard_M64-32ms 32 Same as M64ms Standard_M64-16ms 16 Same as M64ms Standard_M128-64ms 64 Same as M128ms Standard_M128-32ms 32 Same as M128ms Standard_E4-2s_v3 2 Same as E4s_v3 Standard_E8-4s_v3 4 Same as E8s_v3 Standard_E8-2s_v3 2 Same as E8s_v3 Standard_E16-8s_v3 8 Same as E16s_v3
  14. Architect for the Cloud • Maximum Availability Architecture • Different names for different vendors. • Get a clear understanding of the SLA uptime for the business and environment. • Onprem datacenters are not the same as cloud architecture. • Pivot products and services to cover what you need. • High Availability • Identify what HA means to stakeholders. • Often, it’s specific features, not a product, then marry these to a cloud product which: • Matches the IaaS architecture • Doesn’t introduce overhead • Has vendor support • Identify what cloud services may duplicate or simulate the same feature if unavailable.
  15. Azure Location Concepts Concept Description Region Multiple datacenters within a specific perimeter and connected through a low-latency network Geography A specific location area. The area may have more than one Azure region Availability Zone Physical regions located within a region. Each zone has one or more datacenters equipped with independent power, cooling and network. Geo-Region Current region recommended with the appropriate services and redundancy for the database and other workloads. Secondary Region Utilized to spread a workload for HA and/or recovery
  16. Availability Sets
  17. High Availability Zones (and Regions)
  18. Use Availability Zones • High Availability, (HA) offering to protect data and apps from datacenter failures. • Contain multiple locations within a single Azure region. • Not all products or services are available for AZ or in every region. • No additional cost to deploy VMs in an Availability Zone.
  19. Availability Regions Some SLAs will require both Regions and AZs
  20. Disaster Recovery • Along with AZ/AG, etc. • Use DR products that best support cloud • Always-on Availability Groups and Oracle DataGuard • Implement advanced, automation features to remove manual intervention • Clearly identify RPO, (Recovery Point Objective) and RTO, (Recovery Time Objective) for your business. • Ensure that the HR, DR, backup and recovery decisions meet these and have been fully TESTED.
  21. Storage is SEPARATE and Important • Ensure you know the IO workload for your database going to the cloud • Understand both the MB/s and the IO throughput for the database. • Oracle has demonstrated, on average, much higher demands for IO than MSSQL, MySQL or PostgreSQL. • Storage is separate to ensure the right combination in IaaS can be reached.
  22. Storage Considerations What is the storage to be used for? Data- OLTP, DSS, OLAP, Big Data? Logging Backup Ensure that backups and data refresh requirements are calculated into the IO demands for the database.
  23. us/azure/virtual-machines/premium- storage-performance Storage Accounts • Know the difference between storage Account Types: • General Purpose V1 vs. V2 • Block Blob Storage • File Storage- Premium? • Blob Storage: Use Type v2 whenever possible. • Shared Storage/NFS Storage • Most database workloads are going to requite Premium SSD storage.
  24. Premium SSD- Size and Limits Name Capacity (GiB) IOPS per disk Max burstable IOPS Throughput per disk (MB/s) Max burstable throughput per disk (MB/s) Cache limit per disk (GiB) P1 4 120 3,500 25 170 4 P2 8 120 3,500 25 170 8 P3 16 120 3,500 25 170 16 P4 32 120 3,500 25 170 32 P6 64 240 3,500 50 170 64 P10 128 500 3,500 100 170 128 P15 256 1,100 3,500 125 170 256 P20 512 2,300 3,500 150 170 512 P30 1,024 5,000 200 1,024 P40 2,048 7,500 250 2,048 P50 4,096 7,500 250 4,095 P60 8,192 16,000 500 4,095 P70 16,384 18,000 750 4,095 P80 32,727 20,000 900 4,095 Source: Managed disks pricing 1000 MB/s
  25. Ultra Disk Ultra Disk Offerings Disk Size (GiB) 4 8 16 32 64 128 256 512 1,024- 65,536 (in increment s of 1 TiB) IOPS Range 1,200 2,400 4,800 9,600 19,200 38,400 76,800 160,000 160,000 Throughpu t Range (MB/s) 300 600 1,200 2,000 2,000 2,000 2,000 2,000 2,000
  26. Ultradisks • Often the first recommendation by Infra • Be aware of the limitations before recommending for database workloads: • Oracle 12.2 later is supported • Only supports un-cached reads and un-cached writes • Doesn't support disk snapshots, VM images, availability sets, Azure Dedicated Hosts, or Azure disk encryption • No integration with Azure Backup or Azure Site Recovery • Offers up to 16 TiB per region per subscription unless upped via support. • Isn’t available in all regions. Capacity per disk (GiB) IOPS per disk Throughput per disk (MB/s) Mininum 4 100 1 Maximum 65536 160000 2000 GiB * .05, MBPs * 1.01, IOPs * .12, vCPU * 4.83
  27. Types of cache Settings • Available to Premium Storage • A Multi-tier caching technology, aka BlobCache • OS Disk- ReadWrite is fine, which is the default, but not for datafiles. • ReadOnly Cache is, as it caches reads, while letting writes pass through to disk. • Limit of 4095Gib on per individual premium disk • Results in any disk above a P40 for entirety will silently disable read caching. • Larger disks are preferably used without caching, otherwise additional space is wasted. P50, just allocate 4095 of the 4096 size. • Use smaller disks and choose to stripe and mirror. • M-series available and VM series dependent.
  28. IO Throttling • Why it happens? • No, you can’t have all the resources for yourself. • What all can be involved? • It’s not just the database. • How to identify it? • What do to when it is identified? machines/linux/toc.json&bc=/azure/virtual-machines/linux/breadcrumb/toc.json
  29. Bring in Additional Solutions • High IOPS- • MBPs: Azure NetApp Files • Higher IO throughput: Consider Silk, Flashgrid Storage, Pure Storage or Excelero. • Consider disk striping of smaller disks and parallel processing at the database level. • Backups, batch loading and other challenges: • Offload backups with secondary backup solutions. • Refactor batch processing with other services, (Azure Data Factory, Azure Analysis Services, Databricks, etc.)
  30. Azure NetApp Files • Fully Managed, PaaS, Microsoft Azure Storage Service • All Flash Baremetal Storage • Only dependent on Nic, not VM. • *Available in Standard, Premium, (common) and Ultra, (optimal) • ANF is native to Azure Azure Files Premium Files Azure NetApp Files Premium Disk Performance 1K IOPs 100K IOPs 320K IOPs 20K IOPs Capacity Pool 5TB 100TB 500TB 32TB AD Integration Azure AD N/A Bring Your Own AD / Azure AD N/A Protocol SMB SMB NFS & SMB Disk Data Protection LRS Only Snapshots Back Up Tools Snapshots *Be aware of pricing with scaling to meet IO FAQs About Azure NetApp Files | Microsoft Docs
  31. Silk Performance
  32. When To Go Old-School • Depending on the combination of storage, striping and RAID, performance can vary greatly. • Verify that disk is striped correctly, (log creation commands and document.) • Consider smaller disk size and stripe vs. larger, single drive to offer better performance. • In Linux, consider huge pages and use LVM, (Linux Volume Manager) or Oracle ASM, (Automatic Storage Management) to provide advanced features for diskgroup layout. • Keep an eye on disk sector size, (there’s a bug requiring 512 byte sector size in Oracle 12.1)
  33. Failure Due to Backups • Modernize the way the database is backed up and restore if RMAN is 40% of total IO in AWR or database has small window to backup. • Archaic backup and data refresh strategies can impact a cloud environment heavily in IO and network latency • Snapshot technology with database consistency should be your FIRST choice in backup solutions for large databases. • Oracle AWR can demonstrate the impact on the overall database workload of RMAN and datapump jobs. • The Profiler can identify the workload impact in SQL Server.
  34. Simplify the Shift to the Cloud • Migrate your tools that you already use to monitor and manage the database on-prem into the cloud whenever possible. • For Oracle, we implement Oracle Enterprise Manager, (Cloud Control) to ensure the cloud environment looks just like their onprem one. • Redgate SQL Monitor, Solarwinds SQL Sentry, Dynatrace, Idera Uptime Infrastructure Monitor, etc. • Use features to automate OS patching using Azure Linux/Windows automated patching service. • Incorporate DevOps automation to the cloud changes FIRST.
  35. It’s Not Just Infrastructure • No matter if during the migration or when there are issues: • Infrastructure support will be the first line of defense. • Database workload will be an afterthought. • Data support may be a request only option. • First inclination is to “throw iron” at the problem. • Demand to look at the code, database design, etc. • If you fix the real cause, you fix it once vs. revisiting it over and over. • Do have support take advantage of advanced Azure tools to help identify where the problem is, (IO, memory, CPU)
  36. Manage with What You Know • Use the cloud services of what you already use on- prem. • If you can deploy your existing, on-prem tool on a VM, consider doing this, (Oracle Enterprise Manager, Redgate, Idera, Solarwinds, etc.- and its cloud ready, do it!) • Keep backup, replication tools as often as you are able- don’t create larger learning curves than what is required.
  37. Simulate PaaS in IaaS • Use Azure Managed Instance for SQL Server • Use Lifecycle Management Pack with Oracle Enterprise Manager to automate monitoring, management and database patching. • Use Linux Automated Patching, (preview) to automate OS patching of VMs. • Introduce Azure services to simplify the current products used onprem • Automate using DevOps, including deployment builds with Terraform, Ansible, etc.
  38. Review: Database Workloads on IaaS Know Know the infrastructure Know Migrate the workload, not the onprem hardware. Know Know what is the cause of the problem- don’t guess. Bring in Bring in existing tools that are cloud enabled Know Know what tools are available in the cloud and when stuck, bring in Azure support.
  39. References SQL Server Performance Guidelines on Azure: Checklist: Best practices & guidelines - SQL Server on Azure VM | Microsoft Docs Oracle on Azure: Oracle solutions on Microsoft Azure - Azure Virtual Machines | Microsoft Docs Understanding AZ and AS: Availability options for Azure Virtual Machines - Azure Virtual Machines | Microsoft Docs Virtual Machine and Disk Performance: Virtual machine and disk performance - Azure Virtual Machines | Microsoft Docs Azure Premium Storage: Azure Premium Storage: Design for high performance - Azure Virtual Machines | Microsoft Docs Azure Network Performance for IaaS: Optimize VM network throughput | Microsoft Docs Infrastructure Automation: Use infrastructure automation tools - Azure Virtual Machines | Microsoft Docs Ultradisks for Azure Linux VMs: •
  40. Thank you! Kellyn Gorman Twitter: @DBAKevlar

Notas del editor

  1. P10 is my favored OS Disk- try to always use Premium SSD, available in the VM series with the designation of “S” in the name. 30-P50 is the most common for datafiles and we turn on readonly Host caching to achieve what we need. The P50 is over the limit of 4095, so just don’t allocate the last 1g and capture a huge performance benefit!
  2. Azure Premium Storage have a multi-tier caching technology called BlobCache, which uses a combination of the host vRAM and local SSD for caching I/O. By default, this cache setting is set to Read/Write for OS disks, which is the disk on which the Linux OS resides, and ReadOnly for data disks, which are the disks on which Oracle database files might reside. As the name suggests, ReadWrite caches both read I/O and write I/O from the VM, and because writes are not persisted directly to storage, this is unsuitable for database applications. Also as the name suggests, ReadOnly caches only read I/O, allowing write I/O to write-through directly to storage, which is appropriate for databases.
  3. No one can have it all. One of the benefits of the cloud is also one of the challenges- how to give everyone a share. Throttling occurs
  4. *