Virstov disksa compellingalternativetomicrosoftcsvs(mar11)

EDUCATION

[Type text]

Virsto vDisks: A Compelling
Alternative to Microsoft CSVs

EDUCATION Virsto vDisks: A Compelling Alternative to Microsoft CSVs

Introduction
In Microsoft Hyper-V R2, Cluster Shared Volumes (CSVs) allow any given volume to be owned by any and
all nodes in a Windows Server Failover Cluster without having to resort to a proprietary shared volume
format. This has significant advantages in failover clustering environments, enabling simpler and more
flexible cluster configuration for more efficient recovery. While CSVs do enable some “high availability”
(HA) capabilities that are necessary in server virtualization environments, they also impose a number of
operational limitations in the areas of performance, backup and restore, provisioning, and management
that are generally not discovered until well into the deployment cycle.

Virsto Software provides most of the advantages of shared volume ownership that CSV addresses with
none of the disadvantages. Virsto offers a simple software plug-in to Hyper-V that creates a storage
virtualization layer that increases the number of IOPS that can be pushed through a given number of
spindles, leverages thin provisioning to significantly reduce storage capacity consumption, and supports
unlimited snapshots without consuming any additional storage capacity or imposing any performance
degradation. Storage objects in this virtualization layer, called vDisks, appear to Hyper-V as standard
Microsoft fixed virtual hard disks (VHDs), except that they are thin provisioned and they outperform
them in terms of both input/output operations per second (IOPS) and provisioning.

This white paper will examine the CSV issues and explain how Virsto addresses them with a solution that
seamlessly integrates into the suite of System Center tools used to manage Hyper-V environments.
There is only one CSV use case where Virsto is not applicable, and we’ll discuss that as well. Note that
this paper assumes a good familiarity with the Microsoft storage infrastructure layer in Windows 2008
Hyper-V R2, including VHDs, the differences between fixed, dynamic, and differencing VHDs, and a good
understanding of the Windows Volume Shadow Copy Services (VSS) API and its use in data protection
operations.

Quick Review of Native Hyper-V VHD Options

Hyper-V provides three VHD options: fixed, dynamic, and differencing. Each of these three types
behave differently in terms of performance, storage capacity utilization, and in some cases management
options.

Note that in the industry the terms “snapshot” and “clone” mean different things for different vendors.
In defining the three VHD options, we’ll define “snapshot” as an immutable, disk-based, point-in-time
image, and we’ll define “clone” as a writable, disk-based image whose starting point was a given
snapshot.

Fixed Disks

Fixed disks are “thick” disks, with the size of the data file being equal to the size of the VHD that is
actually created. Within the file itself, all blocks are pre-zeroed. As I/O occurs in the guest, Hyper-V
does not need to zero the blocks prior to the I/O occurring. As a result, fixed disks provide higher
performance than either dynamic or differencing disks, and are recommended by Microsoft for most
production environments.

Virsto Software 1


Fixed disks have several downsides, however. First, because they pre-allocate storage, systems can end
up with a lot of allocated but unused storage capacity. In larger configurations, this can easily lead to
hundreds of GBs or even TBs of wasted space. Second, because a lot of back-end storage operations
must occur up front, it can take a long time to create a fixed disk, and this creation lag time negatively
impacts provisioning times as well.

Dynamic Disks

Dynamic disks are thinly provisioned disks, with the size of the data file reflecting only how much
capacity is actually used within a disk at any given point in time instead of the fully allocated amount.
Initial creation and provisioning times can be much faster than those for fixed disks. As I/O occurs in the
guest, Hyper-V zeroes out the space needed right before the guest I/O is committed and grows the file
similarly. As a result, dynamic disks use space much more efficiently than fixed disks.

Dynamic disks have a large downside, however. New blocks must be allocated and zeroed out on
demand, and each of these metadata updates causes volume-wide changes that can impose significant
additional latency. With comparable storage configurations, fixed disks can easily outperform dynamic
disks by 3x or more because of the metadata update penalty. In long term use, all blocks within a
dynamic disk may eventually get allocated so that the dynamic disk is in fact the same size as the fixed
disk for that allocation would be, negating the space savings. Because of the performance impacts
associated with dynamic disks, Microsoft does not recommend them for production use.

Differencing Disks

Differencing disks can be thought of as “clones”, creating a parent/child relationship with another disk
that you do not want to change. Any changes made to the differencing disk are recorded in the child
disk, leaving the parent disk untouched. The parent disk can be either a fixed or a dynamic disk, but the
differencing disk will always be a dynamic disk that is the same size as the parent disk. For that reason,
differencing disks can be created very quickly, but they have the same performance characteristics as
dynamic disks. Differencing disks are primarily used in test and development environments, and
Microsoft does not recommend their use in production environments.

Cluster Shared Volumes
Cluster Shared Volumes are an option that is turned on only in a Windows Server Failover Cluster that is
built with Hyper-V R2 Hosts, and functions as a distributed-access file system for access to VHDs. CSV
allows all Hyper-V Hosts to have shared, concurrent read/write access to the VHDs of the VMs they are
hosting. This is implemented as a network re-director layer above NTFS, and the only NTFS file types
that are supported are VHDs. This means that a CSV can have the characteristics of either a fixed,
dynamic, or differencing disk, depending on how it is initially created.

Virsto Software 2


The Implications of CSV Use
The “good” of CSVs in server virtualization environments
is simple, offering two advantages not available before.
First, CSVs provide the ability for each cluster node to
have full read/write access to storage. This not only
means that every node can “see” the volume whether it
is using it or not, but it also means that CSVs can support
shared concurrent read/write access to disk.

And second, CSVs offer improved granularity in handling
VMs. Before CSVs, ownership of shared storage objects
was defined at the LUN level. Each LUN likely hosted a
number of VHDs, with each of these representing a VM.
In a failover scenario, all VMs on that LUN had to be
failed over together, while in a backup or recovery
scenario, all operations had to be performed at the LUN
level. CSVs, on the other hand, define storage object
ownership at the volume, not the LUN, level. This
provides more flexibility in configuring failover, allowing
VMs to be operated on individually in failover, backup,
restore, and workload balancing scenarios without
regard for which LUNs actually hosted which VMs.

Microsoft made it easy to migrate existing data to CSVs. CSVs used the standard NTFS format that
Windows had used for years, so conversion to CSVs was a simple, GUI-driven process handled at the
VHD level.

But there is a “bad” side to CSVs as well. Consider the two basic types of writes: pure data writes and
metadata updates. A pure data write is a write operation which does not require any changes to
volume-wide data structures (e.g. a write in the middle of an existing allocated VHD would be a pure
data write). Any write which appends data to the end of an existing file requires allocations and
therefore modifications to common volume-wide data structures, and is considered a metadata update.

As mentioned earlier, CSV is implemented as a network re-director layer above NTFS. NTFS was never
intended to be a shared storage cluster file system, and its transactional mechanism for metadata
updates is not easily adapted for cluster operation. All pure data write operations are passed through
the CSV layer directly on the NTFS instance on the same node, but this does not occur with metadata
updates. Each CSV is owned by a particular coordinator node, and all metadata updates for that volume
which modify volume-wide data structures must pass through that coordinator node. Any node that
“owns” a CSV but is not the coordinator node for it must pass its metadata update requests across the
LAN to the coordinator node, and this puts the CSV into a state known as “re-directed I/O” (RIO) mode.
When a CSV goes into RIO mode, the following message is displayed:

Virsto Software 3


When CSVs go into this mode, performance generally slows down significantly. In fact, in configurations
where CSVs are likely to go into RIO mode often, it slows down so significantly that most Hyper-V admin-
istrators consider CSVs to be unusable.

And now, the “ugly” side. In regular production usage,
SOME MICROSOFT BEST PRACTICE
there are a lot of cases where CSVs go into RIO mode.
RECOMMENDATIONS ON HYPER-V
Although this is not always true, generally any metadata
operation will put a CSV into RIO mode for all of the
cluster nodes that share access to it except for the
Use fixed disks in production Hyper-V coordinator node. Reads, on the other hand, are
environments where performance matters handled directly between the requesting node and the
CSVs can only be used on Hyper-V and with
CSV and do not require RIO mode. Extremely common
failover clustering use cases that put CSVs into RIO mode include creating
VHDs, hosting VHDs on dynamic or differencing disks, or
All backups should be taken using Microsoft performing any Windows Volume Shadowcopy Services
DPM and Windows VSS
(VSS)-based backups. Since metadata updates and
If you need to use VSS with CSVs, you should backups are relatively common operations in production
move to an enterprise-class storage array environments, using CSVs can be problematic.

When a fixed disk is created, there are a lot of metadata
operations up front to fully allocate and zero out the requested storage. Provisioning performance for
fixed CSVs is poor because the CSV is in RIO mode during the entire time the volume is being provision-
ed. If you create a snapshot of a fixed CSV, all the copy-on-write operations to the relevant differencing
disk will be performed in RIO mode. Once fixed disks are mounted and in use, they generally do not
spend an inordinate amount of time doing metadata updates (file/directory creation/deletion, etc.)
although, depending on the workload, they might. And any time you perform any kind of write other
than a pure data write against a fixed CSV, you will also be operating in RIO mode for any node other
than the coordinator node. Many virtualization workloads, both for virtual servers and for virtual
desktops, tend to be very write-intensive, a fact which means that even fixed CSVs in these environ-
ments may spend a good deal of time in RIO mode. However, fixed CSVs are often more likely to meet
performance requirements than dynamic or differencing disks.

Virsto Software 4


Dynamic and differencing disks generally do a lot of metadata
updates. As the disk grows in size, storage must be allocated and
zeroed out on the fly, all of which are metadata operations and
Re-directed all of which, for CSVs, occur in RIO mode. All copy-on-write
I/O Mode
operations performed on both dynamic and differencing disks
are metadata operations and occur in RIO mode. These
performance limitations generally make dynamic CSVs and
snapshots of CSVs (differencing disks) unusable in production
environments.

Backing up CSVs can also present challenges. DPM, Microsoft’s
recommended disk-based backup appliance for Hyper-V
environments, creates snapshots at the LUN level. Each LUN can
potentially have multiple CSVs on it, and if those CSVs are owned
by different nodes during the backup process, there will be a lot
of “coordinator node” overhead. If you are trying to run backups
against multiple CSVs in parallel, which LUNs they physically
reside on can impact backup performance, again because of
coordinator node overhead. Even if you thought about this up front, some of the VMs residing on those
CSVs may have migrated elsewhere else due to maintenance, failover, or workload balancing
operations, resulting in a storage layout where CSV access may again go into RIO mode. This suggests
that you may want to think very carefully about how you lay your storage out up front and how you may
be migrating VMs around if you will be backing up CSVs. Even with this up front planning, there are
going to be a lot of cases where you just can’t avoid going into RIO mode when CSVs are in use.

Microsoft’s recommended work-
around to the CSV performance issue
when VSS and DPM are involved is to
use enterprise-class arrays that offer
snapshot/clone and thin provisioning
technologies that, when configured
appropriately, can help to keep CSVs
out of RIO mode. These arrays
introduce another layer of virtual
storage that can allow metadata
operations to reliably occur without
causing CSVs hosted on them to go
into RIO mode. While this approach
may work fine for larger information
technology (IT) organizations, it can
present difficulties for smaller
organizations. Many Windows
administrators may not know how to install and configure these types of arrays, assuming that their
companies can even afford to purchase them in the first place.

Virsto Software 5


Virsto vDisks: A Compelling Alternative
When Virsto is deployed, it creates a new virtual storage layer. This layer establishes two storage
spaces, a “live space” and a “log space”. The live space is both a physical and a logical storage object,
visible in storage namespaces, that functions just like primary storage does in native Hyper-V
environments. The log space is only a physical object, not visible in any storage namespaces, where data
is initially written from the VMs before it is de-staged to live space. The log space is shared by all VMs
that reside on a given host, and because data is written sequentially into it, it tends to operate at the
maximum IOPS rate of which its underlying physical disks are capable. The way this log is managed is
part of Virsto’s intellectual property, and the resulting storage performance improvements it achieves
can be significantly greater than the IOPS possible in native Hyper-V without Virsto installed. Internal
tests at Virsto indicate that vDisks will generally perform 15 – 30% faster than fixed disks and at least 3x
– 4x faster than dynamic or differencing disks, given the same underlying storage configuration.

Within this virtual storage layer, Virsto defines virtual volumes that interact with Hyper-V just like
natively defined VHDs. These volumes, called vDisks, are always thinly provisioned and inherently
support shared volume ownership, incorporating all of the “good” functionality associated with CSVs
described earlier. They can be owned by all nodes in a Microsoft cluster, define ownership at the
volume level, and enable live migration and failover. But the similarities stop there, because Virsto
vDisks suffer from none of the limitations of CSVs.

vDisks not only significantly outperform CSVs, but they also provide consistently high performance.
The performance of a VHD on a Virsto vDisk is driven by the performance at which the Virsto log space
operates. RIO mode is a totally foreign and unneeded concept for vDisks, against which all metadata
updates are done through a faster, more reliable block-level interface. Not only will they outperform
native CSVs of any type, but they will also outperform native dynamic and differencing disks and in most
cases they will also outperform native fixed disks.

vDisks take full advantage of thin provisioning technology to reduce storage capacity consumption
with no performance degradation – in fact, vDisks are ALWAYS thin provisioned. In native Hyper-V
environments, customers are often faced with a choice between using thin provisioning and meeting
performance requirements. Virsto supports both in storage objects that also support shared ownership.

vDisks allow the use of Windows VSS and Microsoft DPM for backup purposes without requiring high
end storage arrays. Virsto provides all of its functionality against any block-based storage, enabling
customers to often use a lower class of storage (SATA instead of FC) and storage array (low or midrange
storage instead of high end) for significant cost savings while at the same time leveraging VSS and DPM.

vDisks do not require Windows Server Failover Clustering. The use of vDisks does not require failover
clustering but they are perfectly compatible with failover clustering implementations that run in the
guest VM from Microsoft as well as other vendors. vDisks give customers the benefits of shared storage
objects even in environments where customers may not want to deploy failover clustering.

vDisks enable management advantages not available with CSVs. Virsto enables high performance
snapshot backups regardless of the type of back end storage, allows snapshots to be deleted without
having to power off VMs, and supports VHDs larger than 2TB in size.

Virsto Software 6


vDisks support extremely rapid provisioning. Virsto’s snapshot technology supports an unlimited
number of snapshots, without consuming any additional storage capacity, and without imposing any
performance impacts as the number of snapshots and/or clones increases.

vDisks provide reliable data integrity. Virsto enforces the exclusive open both at the host level and at
the VHD level within a given host, thus ensuring reliable shared access to volumes.

Shared Concurrent Volume Access

The one feature that CSVs support that Virsto vDisks do not is shared concurrent volume access, but
Microsoft supports only a very limited set of uses around this capability. First, CSVs are not intended for
use with general purpose “clustered” applications like Oracle Reliable Application Clusters (RAC) or
clustered file systems. Second, CSVs can only be used in Hyper-V environments in conjunction with
Windows Server Failover Clusters. And third, all of the caveats about RIO mode still apply so Microsoft
tends to recommend using CSVs to support shared concurrent access only in situations that are very
read-intensive. In practice, this means it has been limited mostly to maintaining system and VHD
configuration files that must be readable by all nodes in a failover cluster. When using failover clusters
in conjunction with Virsto, you will still use a CSV for this data but it will not impact cluster performance
because the I/O load against it is very light and it is mostly all reads.

Conclusion
Virsto vDisks provide a compelling alternative to CSVs. They support all the shared volume ownership
advantages provided by CSVs that allow customers to get the most out of their virtualized environ-
ments, while at the same time supporting the high performance required in production environments
with thinly provisioned disks. Virsto’s snapshot/clone technology provides the same benefits that
enterprise-class, high end storage arrays offer in a software implementation that works with any
heterogeneous storage, fully enabling the use of Windows VSS, Microsoft DPM, and other snapshot
backup solutions in configurations that also require shared volume ownership.

Combining Virsto and Hyper-V provides compelling competitive advantages against other hypervisor
options like Citrix XenServer and VMware vSphere. This combination creates a high performance, thinly
provisioned, highly available, and flexible server and desktop virtualization solution that will solidly
outperform XenServer and vSphere while consuming less storage capacity.

219 Moffett Park Drive, Sunnyvale, CA 94089 | 1.408.899.5694 | info@virsto.com | www.virsto.com
7
Copyright 2011, Virsto Software Inc. All Rights Reserved. This document is provided for information purposes only, and the contents hereof are subject to change without notice. The information contained herein is
the proprietary information of Virsto and maynot be reproduced or transmitted in any form for any purposes without Virsto’s prior written permission. Virsto is a registered trademark of Virsto Software Corporation.

Virstov disksa compellingalternativetomicrosoftcsvs(mar11)

Recomendados

Recomendados

Más contenido relacionado

Último

Último (20)

Destacado

Destacado (20)

Virstov disksa compellingalternativetomicrosoftcsvs(mar11)