Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

STO7535 Virtual SAN Proof of Concept - VMworld 2016

3.478 visualizaciones

Publicado el

VMworld 2016 - Conducting a Virtual SAN (VSAN) Proof of Concept (POC) - STO7535

Publicado en: Tecnología
  • Sé el primero en comentar

STO7535 Virtual SAN Proof of Concept - VMworld 2016

  1. 1. Conducting a Successful Virtual SAN 6.2 Proof of Concept Paudie ORiordan, VMware, Inc Cormac Hogan, VMware, Inc STO7535 #STO7535
  2. 2. • This presentation may contain product features that are currently under development. • This overview of new technology represents no commitment from VMware to deliver these features in any generally available product. • Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Technical feasibility and market demand will affect final delivery. • Pricing and packaging for any new technologies or features discussed or presented have not been determined. Disclaimer CONFIDENTIAL 2
  4. 4. Agenda 1 Introduction to Session 2 Introduction to Virtual SAN 3 Tools to conduct a successful Virtual SAN proof of concept (POC) 4 POC validation scenarios 5 Data Services Considerations 6 Measuring Performance CONFIDENTIAL 4
  5. 5. This session… • Virtual SAN has been available since March 2014, almost 2.5 years • To date, we have now almost 5,000 VSAN customers. • VMware recognises that conducting a Virtual SAN proof of concept can be challenging • Since the launch of Virtual SAN, additional tools for managing, monitoring and troubleshooting Virtual SAN have become available • In this session, the tools available to vSphere and Virtual SAN administrators will be discussed, and how they can help deliver a Virtual SAN proof of concept 5CONFIDENTIAL
  6. 6. Introduction to VMware Virtual SAN • Storage scale out architecture built into the hypervisor • Aggregates locally attached storage from each ESXi host in a cluster • Dynamic capacity and performance scalability • Flash optimized storage solution – Fully integrated with vSphere and interoperable: • vMotion, DRS, HA, VDP, VR … • VM-centric data operations • Many new data services CONFIDENTIAL 6 + + + + + + + … + Datastore Virtual SAN
  7. 7. What I Need to Be Successful Tools to conduct a successful Virtual SAN POC
  8. 8. Before YOU BEGIN: Verify Your Components Against HCL • VMware Virtual SAN Hardware • Server, Controller, SSD, Disk on HCL • Controller Firmware, Driver • Disk Firmware, • Enclosure Firmware • SAS/SATA SSD Minimum Firmware is Critical – Rule is minimum or higher • NVMe Firmware – HCL lists absolute version only CONFIDENTIAL 8
  9. 9. Success Tool #1 : Health Plugin – Reactive Health Checks • Introduced with Virtual SAN 6.0 • Incorporate in the vSphere Web Client • Virtual SAN Health Check tool include: – General Health – Proactive tests – Virtual SAN HCL health – Physical disk health 9 • Especially useful when injecting errors into cluster and verifying that they have been remediated CONFIDENTIAL
  10. 10. Success Tool #1 : Health Plugin – Proactive Health Checks • Proactive tools running on Virtual SAN cluster and pre-production tests – VM Creation test – Storage Performance – Multicast performance test 10CONFIDENTIAL
  11. 11. Success Tool #2 : Capacity Views • Dedupe and Compression Savings • Group by Object Type – Filesystem overhead – Dedupe overhead – Checksum overhead – Virtual disks – Swap – Home namespace 11CONFIDENTIAL
  12. 12. Success Tool #3 : Performance Service • Enable it once • Integrated with vSphere • Simplified metrics – Backend (VSAN) – Frontend (VM) • Distributed Architecture – No SPOF • Historical data • Status monitored by health checks 12CONFIDENTIAL
  13. 13. Success Tool #4 : HCIbench • Hyperconverged Infrastructure benchmark • Based on Vdbench • Designed to work on distributed architectures like Virtual SAN • UI Driven • Free • Provides results in both text format, and format that can be viewed in VSAN Observer • Now available from 13CONFIDENTIAL
  14. 14. Success Tool #5 : RVC/Virtual SAN Observer • Native tools installed on Linux/Appliance and Windows versions of vCenter Server • Used for Configuration and Status of the Virtual SAN Cluster • For Performance and Activity monitoring on demand – VM level – Host level – VMDK level – HDD/SSD Level • Any anomalies will show up with the metric in question shown in red • Follow the I/O : VM -> VMDK -> Disk Group -> Disk -> Congestion 14CONFIDENTIAL
  15. 15. Success Tool #5 : RVC/Virtual SAN Observer (ctd.) 15 vsan.apply_license_to_cluster vsan.enable_vsan_on_cluster vsan.disable_vsan_on_cluster vsan.clear_disks_cache vsan.cluster_change_autoclaim vsan.cluster_set_default_policy vsan.enter_maintenance_mode vsan.fix_renamed_vms vsan.object_reconfigure vsan.host_wipe_vsan_disks vsan.recover_spbm vsan.reapply_vsan_vmknic_config Cluster vsan.check_limits vsan.check_state vsan.cluster_info vsan.cmmds_find vsan.whatif_host_failures vsan.resync_dashboard Disk vsan.disk_object_info vsan.disks_info vsan.disks_stats Host vsan.host_info vsan.host_consume_disks Networking vsan.lldpnetmap VM vsan.vm_object_info vsan.vm_perf_stats vsan.vmdk_stats vsan.obj_status_report vsan.object_info Troubleshooting vsan.support_information Virtual SAN Operation Virtual SAN Information Virtual SAN Monitoring CONFIDENTIAL
  16. 16. Validation Scenarios Expected outcomes from POC activities
  17. 17. PoC Validation • What are the most important test validation? 1. Successful VSAN configuration 2. Successful VM deployments on VSAN datastore 3. VM Availability in the event of failures (host, storage device, network) 4. VSAN Serviceability (maintenance of hosts, disk groups, disks) 5. VM Performance meets expectations 6. VSAN Data Services (Dedupe, Compression, RAID-5/6, Checksum) working as expected 17CONFIDENTIAL
  18. 18. Case #1 – Successfully VSAN Deployment – Checklist • Correct vSphere versions • Appropriate licenses – especially if PoC is expected to take a long time (> 60 days) • Correctly Configured Network – VSAN requires multicast, so prep the network team • Minimum of three servers – Or 2 servers plus a witness appliance if doing Remote Office/Branch Office (ROBO) 18 Remember, the VSAN Health Check will do most of this work for you. CONFIDENTIAL
  19. 19. Case #1 – Successfully VSAN Deployment – Checklist (ctd.) • Minimum of three servers contributing storage: • At least one storage controller – you’ve checked the HCL, and drivers and firmware are valid, right? • At least one flash device (SSD, PCIe) for cache – check the HCL • At least one magnetic disk (hybrid) or flash device (all-flash) for capacity – check the HCL • Or consider VSAN Ready Nodes as an option … 15 Remember, the VSAN Health Check will do most of this work for you. CONFIDENTIAL
  20. 20. Case #1 – Successfully VSAN Deployment – Device Claiming • Devices not visible – Some RAID controllers won’t present individual disks without RAID configuration – May need RAID-0 configuration set on storage devices via controller • Devices not being claimed – Some controllers allow devices to be shared; so devices get presented as “non-local” – VSAN will only claim devices that are local • SSD showing up as HDD – Placing devices in RAID-0 will do this • All-Flash using wrong devices for cache/capacity – Set VSAN to “Manual mode” when setting up all-flash – Gives control over which devices are used for cache and which devices are use for capacity 20CONFIDENTIAL
  21. 21. Case #1 – Successfully VSAN Deployment – Overall health 21 Run health checks after every test! Clear Alarms! Use it to verify a problem that was previously introduced is now fixed! Check the Virtual SAN Health Check regularly CONFIDENTIAL
  22. 22. Case #2 : Successful VM Deployment on VSAN 22 Use the Health Check – Proactive Tests to do initial VM deployment check Part of the Proactive Tests. This will verify if virtual machines can be created on VSAN cluster CONFIDENTIAL
  23. 23. Case #2 : Successful VM Deployment on VSAN 23 Component host location I created a new VM, but where/how is the VM is stored CONFIDENTIAL
  24. 24. Case #3 : VM Availability in the Event of Failures • Various failures may be introduced as part of a typical POC – Host failure – Flash device / Magnetic Disk failure – Cache/Capacity device failures – Network failure • Objective: ensure that the VM continues to be available in the event of a failure. VM maybe restarted on another node in the cluster. • vSphere HA is fully integrated with Virtual SAN so that virtual machines on the failed host are restarted on other hosts elsewhere on the cluster 24CONFIDENTIAL
  25. 25. Case #3.1 : Host Failures • How many hosts do I really need? • A minimum of 3 hosts is needed to support VSAN. • What about rebuilding after a failure or maintenance mode operations? • If you want virtual machines to remain highly available on VSAN during these scenarios, consider configuring for additional capacity i.e. minimum 4 nodes. 25CONFIDENTIAL
  26. 26. Case #3.2 : Storage Failures • The Virtual SAN 6.0 Proof Of Concept Guide has details on how to inject temporary disk errors for the purpose of testing. – A real disk failure results in immediate rebuild activity initiated by VSAN 26 Eject/Offline/Unplug: Absent Wait 60 minutes before remediation Failure: Degraded Immediate remediation CONFIDENTIAL
  27. 27. Case #3.2 : Storage Failures (ctd.) • Additional considerations when dedupe/compression are enabled on VSAN – Deduplication and compression hash tables/metadata are spread across all disks in a disk group – A single device failure in the disk group will render the whole of the disk group unavailable – All data in disk group will be rebuilt elsewhere in the cluster (if resources allow) 27 Rebuild Rebuild Rebuild CONFIDENTIAL
  28. 28. Case #3.3 : Network Failure 28 Part of the Proactive Tests. This will verify if multicast performance is acceptable can for VSAN cluster Multicast configuration is the most common issue Start simple If you want feature like LACP, don’t implement initially. Turn off QoS/Flow Control, then build it afterwards CONFIDENTIAL
  29. 29. Case #3.4 : Validating Rebuild Activity After Failure • Virtual SAN might need to move data around in the background: change policy, host failure, long term/permanent component loss, user triggered reconfig, maintenance mode, etc. • UI Resync Dashboard shows the VMs that are resyncing and remaining bytes to sync 29 Remember! Test one thing at a time! CONFIDENTIAL
  30. 30. Case #4 : VSAN Serviceability – Maintenance Mode 30 I want to update one of my ESXi host in a VSAN cluster, what do I do ? VSAN provides multiple options for maintenance mode CONFIDENTIAL
  31. 31. Case #4 : VSAN Serviceability – Maintenance Mode 31 Ensure Accessibility Full Data Migration No data Migration Lost of VM compliance Full VM Data compliance No VM availability ensured Short time maintenance More than one hour of Maintenance Short time maintenance Short Storage preparation Long storage preparation No Impact Limited Free Storage space required Free Storage space requirements on the other nodes No Impact Full migration unvailable in 3 node clusters! CONFIDENTIAL
  32. 32. Case #5 : Management – Disks Serviceability 32 Disk serviceability feature enables identification of to be replaced magnetic disks and flash based CONFIDENTIAL
  33. 33. Case #5 : Management – Disk/Disk Group Evacuation • Allows you to evacuate data from disk groups and individual disks before removing a disk/disk group from a Virtual SAN host • Allows Virtual SAN to ensure all workloads stay fully compliant with their policy! – Supported in the UI, ESXCLI and RVC. – Check box in the “Remove disk/disk group” UI screen. 33CONFIDENTIAL
  34. 34. PoC considerations for New Data Services in VSAN 6.2
  35. 35. New Data Services in VSAN 6.2 • Erasure Coding – RAID-5/RAID-6 Support • Deduplication / Compression • Checksum • IOPS limits / QoS 35 There are performance considerations associated with all of the above. There are also some issues to be aware of! CONFIDENTIAL
  36. 36. Capacity Overhead of the New Data Services • Overheads are all calculated in advance – Deduplication/Compression maintain hash tables • Approx. 5% overhead – Checksum Metadata is stored separately from data • Approx. 1.2 % overhead CONFIDENTIAL 36 Many customers are surprised by the amount of overhead when data services are first enabled
  37. 37. Data Services File System Overheads – Don’t Panic • Deduplication and Compression File System Overhead is 5% (approx.) of Total Virtual SAN Capacity • Checksum Overhead is approx. 1.2% of capacity 37CONFIDENTIAL
  38. 38. How to Measure Virtual SAN Performance?
  39. 39. How to Test Performance… • Distributed architecture => best performance when the pooled compute and storage resources in the cluster are well utilized. • This usually means a number of VMs each running the specified workload should be distributed in the cluster and run in a consistent manner to deliver aggregated performance. • This part of an evaluation can be complex and time-consuming • Real application workloads are best, but … – synthetic workloads (IOmeter) might be easier to set up – simplistic workloads don’t really reflect what Virtual SAN can do • Worth a read: Pro Tips For Storage Performance Testing – 39CONFIDENTIAL
  40. 40. Performance Testing Considerations (Primarily for Hybrid) 40 Is the test utilising the distributed storage resources of Virtual SAN? • Multiple VMs across multiple hosts delivers better performance than one VM on one host. Is the working set fully in cache, utilising flash performance? • Read-cache misses will incur latency. Is the workload cache friendly? • Sustained sequential write workloads fill cache, which must then be destaged. Mixed R/W workloads with repeat patterns are best. Is the cache warmed if using VSAN hybrid? • Initial results from starts of tests will not be reflective of overall performance. Warning : Make sure dedupe scrubber is disabled. Causes performance issue on hybrid * * KB 2146267 CONFIDENTIAL
  41. 41. Performance Test with HCIbench/vdbench • VMs will be distributed equally across all hosts • Select I/O size • Select R/W ratio • Select random/sequential • Select duration of test • Disks can be zeroed with “dd”* • VMs will be removed (optionally) when test completes • Produces results per VM – IOPS, Latency, Throughput, etc • Produces results consumable by VSAN Observer 41 * Avoid zeroing disks if deduplication enabled – will create hot-spot CONFIDENTIAL
  42. 42. 42 Q & A CONFIDENTIAL