Proper configuration of virtual infrastructure environments is key to ensure availability and scalability of business-critical applications in the emerging world of software-defined data centers. We offer a game plan for working with iSCSI gateways and other elements in the virtual architecture to handle the abstraction layers and other crucial issues.
1. Virtualizing Business-Critical Applications:
Foundational Components
Separating ârun-the-businessâ from other business applications and then
identifying the IT infrastructure necessary to ensure their high availabil-
ity, scalability and performance are a must for organizations that seek to
reap the greatest operational benefits from emerging virtual computing
architectures.
Put another way, not succeeding at getting the
most complex and compute-intensive workloads
to thrive in virtual infrastructure such that they
are as easily deployed as any other application is
one of the greatest barriers to achieving the goal
of the SDDC.
One Size Does Not Fit All
When most organizations first deploy virtual
infrastructure environments, they do so with the
goal of reducing their data center footprint by
consolidating server workloads onto fewer hard-
ware components. This results in immediate and
tangible savings. Then, over time, they begin to
realize that the average virtual infrastructure
environment, when properly tuned and managed,
will provide notably higher levels of availability
for those applications running on them. When
combined with the initial cost savings achieved,
organizations are often drawn to virtualize as
much as they can.
⌠And then they hit the wall.
Executive Summary
It should come as no surprise that the jour-
ney to the software defined data center (SDDC)
requires fundamental shifts in how applications
are deployed and managed. To fully realize the
vision of SDDC, organizations must first embrace
the fact that the journey includes not only
moving 100% of their servers into the virtual
world, but also 100% of the storage and network
components that support them.
As a practical matter, this becomes a journey
that is far from easy. Getting all applications
migrated into a virtual infrastructure platform
alone requires new skills and ways of managing
capacity. In addition, licensing issues require spe-
cial attention as vendors also stay current with
the idea that compute workloads will no longer
be directly tied to physical hardware components.
But most important to this journey is understand-
ing and successfully migrating the most business-
critical applications onto virtual infrastructure
such that they not only function well, but thrive.
cognizant 20-20 insights | august 2013
⢠Cognizant 20-20 Insights
2. cognizant 20-20 insights 2
Virtualization and
virtual infrastructure
environments
do add a layer
of abstraction of
resources, and this
abstraction layer
changes the way in
which applications
can be run. But
the way in which
virtual infrastructure
environments create
this abstraction
layer is exactly the
same regardless
of the applications
running in that
environment.
The first time a business-critical application
requires higher levels of availability â or far
greater compute resources â than is traditionally
made available on basic virtual infrastructure,
problems quickly arise. At first, the business-
critical application runs slowly and can become
much more unstable. It is then moved back to its
original physical infrastructure at least as quickly
as it was moved onto the virtual infrastructure
environment in the first place. Then the virtual
environment is blamed.
To be fair, the virtual infrastructure environment
actually is to blame when this happens. But thatâs
usually due to a combination of the way the
virtual infrastructure environment was config-
ured and how the business-critical application was
then deployed on top of it. Generally speaking, itâs
not because virtual infrastructure platforms are
ill equipped to handle these applications.
What Makes an Application Business-Critical?
Of note, the qualities that make an application
business-critical often have little to do with the
technology or platform said application uses. In
the end, business-criticality is best determined
by answering a simple question: Can I run my
business without this application? From there, a
corollary question emerges: How long can I run
my business without this application?
If the answers to these questions are âno,â or ânot
for very long,â then that application is critical to
the business.
Nevertheless, most business-critical applications
share key technological characteristics. They
include:
⢠High compute loads â either with heavy thread-
ing or heavy math processing.
⢠High RAM utilization.
⢠High and specialized I/O â particularly storage.
⢠High availability configurations â often requir-
ing OS or application clustering.
⢠Complex networking configurations â public and
private networks, often to support clustering.
Applications with any of these qualities will need
extra care and attention to configuration and
resource management in order to virtualize them
successfully. Moreover, the majority of applica-
tions that do fall into the business-critical cate-
gory have more than one of these qualities in play.
Because every application has something unique
about the way it runs in any given environment,
itâs easy to quickly reach a conclusion that every
application will then have its own set of best prac-
tices that need to be explicitly defined to make
that application thrive in a virtual infrastructure
environment. In reality, this is not actually the
case. The fact is that virtu-
alization and virtual infra-
structure environments do
add a layer of abstraction of
resources, and this abstrac-
tion layer changes the way
in which applications can
be run. But the way in
which virtual infrastruc-
ture environments create
this abstraction layer is
exactly the same regard-
less of the applications run-
ning in that environment.
Thus, there exists a set of
common practices that
must be accounted for that
will enable every business-
critical application to run
successfully on virtual
infrastructure. Whatâs actu-
ally different is the way in
which these common ele-
ments are expressed. This
expression is indeed as
unique as any application.
Virtualization software vendor VMware identi-
fies the following six key applications that are
considered business-critical:
⢠Oracle â and Oracle RAC.
⢠Microsoft SQL Server.
⢠Microsoft Exchange.
⢠Microsoft SharePoint.
⢠SAP.
⢠Custom Java on Linux.
Most organizations run at least one of these six
applications; all exhibit a subset of at least some
of the characteristics listed above. Again, while
they are not the only business-critical applications
in use at most organizations, the independent
research commissioned by VMware shows they
are the most common ones. In addition, a second
and less often found set of applications exist that
businesses will often identify as business-critical.
Again, these applications also share qualities that
can make virtualization more difficult.
3. cognizant 20-20 insights 3
As the Star Trek
character Mr. Scott â
or âScottyâ â once
put it, âThe more
complicated the
plumbing, the easier
it is to stop up the
drain.â
These âhonorable mentionâ business-critical apps:
⢠DB2.
⢠WebSphere.
⢠WebLogic.
⢠Hadoop/HBase.
⢠Cassandra.
⢠Tomcat.
⢠Message queue systems such as Tibco, Rabbit
MQ, MQ Series, etc.
⢠Custom, in-house built and maintained âhome-
grownâ applications.
Again, each of these applications will have
specific, individual ways in which they should
be tuned to thrive on a virtual infrastructure
platform. This is no different than how they
are optimized when running on bare metal
hardware. But compute resources themselves
are very consistent. Therefore, if an organiza-
tion properly accounts for how an application
will make use of its compute resources, common
themes begin to emerge.
The Four Food Groups of Computing
When planning a virtual infrastructure envi-
ronment, architects are taught to consider the
following four types of compute resources,
which are sometimes referred to as the âfour
food groupsâ of computing:
⢠CPU.
⢠RAM.
⢠Disk â including both disk space and disk I/O.
⢠Network â including number of connections
and bandwidth.
All applications (not just business-critical ones)
consume different quantities of these compute
resources at any given point in time depend-
ing on the tasks at hand. The difference is that
most business-critical applications will consume
disproportionate amounts of one or more of
these resources compared with other applica-
tions. They also will have requirements for higher
levels of redundancy, availability and recoverabil-
ity compared with other applications. Remember,
we answered ânoâ and ânot for very longâ to the
questions about if, and for how long, we could run
the business without these applications.
The following set of general guidelines will help
organizations assemble applications that thrive
on virtual infrastructure:
⢠Always follow the KISS principle: As the Star
Trek character Mr. Scott â or âScottyâ â once
put it, âThe more complicated the plumbing,
the easier it is to stop up the drain.â There is
elegance in simplicity of design. But more than
that, simple designs are
generally more stable,
more scalable and easier
to maintain. Business-
critical applications are
already inherently more
complex, so adding
complexity when virtu-
alizing them only makes
things worse. Examples
of mistakes in this area
include:
Âť Needlessly adding disks and spreading
them across multiple data stores. Just
because your physical server splits out a
separate drive letter for each class of data,
logs, etc., isnât necessarily a reason to do
the same in a virtual world. More than one
disk â and even more than one data store
â is often necessary, but take an eyes-
open approach that stresses less rather
than more.
Âť Splitting out base files that are part of
a virtual machineâs (VMâs) core compo-
nents, including vswap and others, is not an
effective way of increasing efficiency, per-
formance or storage management. Sadly,
it is a good way of introducing complexity,
loss of function and loss of portability into
your environment.
Âť Duplicating features for high availability
or redundancy through external or home-
grown tools that are already present in
the base systems or architecture. This
often leads to managing or implementing
abstracted features that donât actually do
what they are intended to. They also make
troubleshooting more difficult.
⢠Architect hardware from a âtotal perfor-
manceâ perspective: Your virtual environ-
ment should always be optimized from bottom
to top â not top to bottom or from the middle
out. High school and college students seem
to be the most willing to put $6,000 stereos
into $3,000 cars. This doesnât work nearly
as well with high-compute, business-critical
4. cognizant 20-20 insights 4
applications running on general class hardware
with virtual infrastructure on top of it. Even
though vSphere will support
the so-called âmonster VMâ
with 64 vCPUs , 1TB of RAM
and a million IOPS, no VM
can truly be bigger or faster
than the host hardware on
which it runs. Make sure all
hardware components that
are part of the virtual infra-
structure environment are
appropriately sized to han-
dle the anticipated work-
loads placed on top of them.
Be sure to also optimize
resources across all four of
the computing food groups.
Itâs easy today to become
distracted by CPU cores and
GHz speeds of the newest
generation processors, and
then forget about RAM â the
compute resource that is almost always exhausted
first on a virtual infrastructure environment. From
a storage perspective, make sure to spread I/O
appropriately across your storage area network
(SAN). Take appropriate advantage of solid state
drive (SSD) and cache capabilities to boost perfor-
mance, and do so in a way that is easy to replicate.
For IP SAN technologies â iSCSI and NFS â jumbo
frames should be enabled as the norm.
From a network perspec-
tive, Gig-E connections are
no longer enough. With
todayâs price/performance
advantages, 10GbE should
be the minimum standard
for all network connectiv-
ity in virtual infrastructure
environments. Reserve
Gig-E connectivity for out
of bandwidth management
of hardware only. As stan-
dards evolve and prices
recede, plan your network
investments wisely to be ready to take advantage
of 40GbE and 100GbE. These standards will likely
creep into your data center faster than anyone
expects.
⢠Understand specific compute needs:
Remember, each application will use resources
uniquely, but also predictably. The key is
to translate how any application would use
resources when running on native hardware to
the way these would be used when abstracted
into the virtual world.
For CPU utilization, assigning more CPU cores
is not necessarily better. In fact, assigning
too many vCPUs will slow performance. If an
application has eight vCPUs but only four
vCPUs worth of work to do, it will force the
hypervisor to find a way to schedule four cores
High school and
college students
seem to be the
most willing to put
$6,000 stereos
into $3,000 cars.
This doesnât work
nearly as well with
high-compute,
business-critical
applications
running on general
class hardware
with virtual
infrastructure on
top of it.
With todayâs price/
performance
advantages, 10GbE
should be the
minimum standard
for all network
connectivity
in virtual
infrastructure
environments.
Business Critical Application Optimization Methodology
Figure 1
Application
Oriented
Optimization
Virtual
Infrastructure
Oriented
Optimization
Physical Hardware
Server, Storage, Network
Hypervisor
Resource Pools, HA, DRS, Data Stores, Parameter Tuning
Operating System
Paravirtual Drivers, Kernel Parameter Tuning (Linux)
Virtual Machine Hardware
Optimize RAM, vCPU, Storage, Resource Limits & Reservations
Java Application
Resource Allocation, App Tunables
Java Virtual Machine
Heap Size, Threads, âŚ
Application
Cache, SGA, RAM
Commitment
App Specific Tunables
Optimize
Bottom to Top
5. cognizant 20-20 insights 5
on the processor that are
servicing those vCPUs
to do nothing. Heavily
threaded applications tend
to use more cores while
those which crunch num-
bers use fewer cores and
more cycles.
When it comes to RAM,
allocate based on what
the application will actu-
ally use. Also be sure to set
memory reservations for
that RAM which will be needed. For example, an
Oracle database server should have a memory
reservation that is equal to the size of the OS
plus the SGA. For Java applications, an appro-
priate memory reservation would include the
OS plus the Java heap size as well as a couple
of other smaller items. If necessary, itâs prefer-
able as a good practice to have ever so slightly
more RAM assigned for these, as opposed to
slightly less. However, itâs also good practice to
keep memory reservations as small as practi-
cal. Making them too large will interfere with
the ability to vMotion a VM from one host to
another (by extension impeding the workload
balancing capabilities of VMware Distributed
Resource Scheduler), complicate HA admis-
sion control in the event of a host failure
(interfering with HA recovery) or even prevent
the VM from being able to start at all.
Storage is arguably the most complex of all
of the resources to manage because it is the
component in virtual infrastructure that itself
is almost always abstracted in multiple lay-
ers and in widely varying ways depending on
the make and model of storage system used.
As a result, it is also the area where applica-
tion performance problems tend to arise first
and most frequently. As a general rule, stor-
age capabilities should be pushed as low in the
hardware stack as practical. That stated, if a
given storage system doesnât have a feature
needed or desired, implement and integrate
these features at other layers while taking
care to not add undue complexity. Make sure
that individual components are not easily
overwhelmed, just as you would when archi-
tecting shared storage for high-capacity I/O
systems and applications. Align these capabili-
ties so they are easily identified and presented
in standard data stores so your applications
using them remain just as logically configured.
Finally, use raw disk mappings (RDMs) as a last
resort only. With todayâs virtual infrastructure
systems, there is no performance advantage
to using an RDM over a virtual disk located in a
properly configured data store. Further, RDMs
will add natural complexity to your virtual
infrastructure environment from both a con-
figuration and system management perspec-
tive. Where feasible, use the OS-level storage
systems â such as ASM on Oracle â as recom-
mended by respective application vendors, but
layered on top of the optimized storage envi-
ronment that is created.
Networks should be kept as simple as possible.
Thereâs no need to do things like vNIC teaming
and bonding inside a VM in almost every con-
ceivable situation. This is already handled by
the hypervisor. Instead, use one virtual network
interface controller (NIC)
for each distinct network
to which you need to
connect. For example, a
typical Oracle RAC node
will need two vNICs: one
for the public network
and one for the private
network. The SCAN and
associated virtual IPs do
not need a vNIC.
⢠Build VMs to be trans-
parent and simple:
When building virtual
machines, less is defi-
nitely more. If you know
you will never need a
specific feature, youâre
probably better off not
installing it. Just as is
the norm with any OS
build, turn off unneces-
sary services and follow
the best practices for hardening the OS in
question. The goal here is to have a âsqueaky-
cleanâ OS on the VM that feels the same to the
application as it would on any other optimized
environment.
⢠Storage should appear as simple, local disks,
and networks should appear as simple connec-
tions â because all of the optimization of these
The key is to
translate how any
application would
use resources when
running on native
hardware to the
way these would
be used when
abstracted into the
virtual world.
Storage is
arguably the most
complex of all of
the resources to
manage because
it is the component
in virtual
infrastructure that
itself is almost
always abstracted
in multiple layers
and in widely
varying ways
depending on
the make and
model of storage
system used.
6. cognizant 20-20 insights 6
components has already been accomplished
within the virtual infrastructure environment
itself.
Then Take Advantage
Only after the virtual environment is optimized
should your organization be truly concerned
about taking full advantage of its unique benefits
and features. At this point, your organization
should be able to do so easily. But for business-
critical applications, there is still more work to do.
High Availability: When (Not) to Cluster
Business-critical applications naturally have
requirements for very high availability and recov-
erability. In many cases, the enhanced availability
provided by a well-engineered virtual infrastruc-
ture platform will meet this need. When it does,
certain high-availability configurations â system
clustering in particular â that are a must for physi-
cal infrastructure deployments can be eliminated.
Understanding when and when not to cluster, as
well as how to best accomplish it, can depend
greatly on the capabilities of the application in
question, but there are some common guidelines.
First, a properly engineered vSphere HA/DRS
cluster can be expected to reliably achieve
somewhere between three nines and four nines
of availability for all systems running on it.
By comparison, traditional database clustering
techniques used by the likes of Oracle RAC and
Microsoft SQL Cluster Services are intended to
provide only three nines of availability at best
in and of themselves. To achieve higher levels of
availability requires work at the application layer.
What this means is that, unless something explic-
itly is performed at the application layer to
enhance availability (which is actually not all that
common), a properly optimized vSphere HA/DRS
cluster can provide equal or better levels of avail-
ability than clustering at the OS layer can. This is
an excellent opportunity to consider simplifying
some clustered systems.
⌠But before running off to destroy every cluster
in the data center, consider that systems are often
clustered for reasons beyond availability. Itâs not
unusual for clustered systems to be active-active,
or to be clustered to minimize downtime during
patches â also known as rolling upgrades. If these
kinds of operations are part of your organizationâs
regular maintenance, clustering is still required.
Thus, the key to knowing when to cluster systems
on virtual infrastructure
is to fully understand the
specific application require-
ments, and then validate if
the requirements hold up
when migrating to a virtual
infrastructure environment.
When clustering on top of
virtual infrastructure, the
high-availability features of
each layer should be opti-
mized to complement one
another. At the same time,
your organization should
avoid clustering techniques
that might interfere with
infrastructure layers above
and below. Operating sys-
tem clusters on virtual infrastructure will gener-
ally require that shared disk is used between the
individual nodes (voting and quorum drives) and
usually involve one of four methods:
⢠Shared via RDM.
⢠Shared via iSCSI or NFS on SAN/NAS.
⢠Shared via multi-write virtual disk.
⢠Shared via iSCSI or NFS target VM.
While all of these options can be made to
work well, they have distinct advantages and
disadvantages. Sharing via RDM is the oldest
and most well known, but provides the least
advantages and greatest limitations. With this
option, VMs in a cluster use an RDM to share data.
While well-known, this option also introduces
a condition of SCSI bus sharing into the cluster
between the nodes. Migrating VMs via vMotion
is not supported in this configuration, so VMs
are fixed to whichever host they are running on
unless and until restarted on another node. Data
on the shared disk is also kept in a different for-
mat using the native file system of the OS on a
LUN as compared to on a virtual disk in a data
store. This can impact how data is protected. Of
all of the options, share via RDM provides the
least amount of flexibility and should be used only
when other methods are not available.
Share via iSCSI or NFS on SAN/NAS resolves the
issue of SCSI bus sharing, thus enabling sup-
port for vMotion on cluster nodes. However, this
option is not available when using FC SAN stor-
age systems. Organizations with an investment
in FC SAN may not wish to change the storage
The key to knowing
when to cluster
systems on virtual
infrastructure is to
fully understand
the specific
application
requirements, and
then validate if the
requirements hold
up when migrating
to a virtual
infrastructure
environment.
7. cognizant 20-20 insights 7
A very simple way
to share a disk is to
share via a multi-
write virtual disk.
This option allows
all data to remain
in virtual disk files
on a data store.
infrastructure just to enable this method. Finally,
this option has the same issues of data protection
differences that are present with share via RDM.
A very simple way to share
a disk is to share via a
multi-write virtual disk.
This option allows all data
to remain in virtual disk
files on a data store. Here,
the shared virtual disk is
located in a folder where
all cluster nodes can access
it. It is formatted Eager
Zeroed Thick and the multi-
write flag is set, allowing all VMs to write to it at
will. There are distinct advantages to this method.
It is easy to set up, allows for vMotion and makes
data protection consistent. Its primary drawbacks
are that the shared virtual disk is associated with
more than one virtual machine, so data protec-
tion systems must account for this, and that the
host HA/DRS cluster where such a configuration
is running can have no more than eight ESXi host
systems.
Also, disks that have multi-write flags set on them
can have support issues with certain vStorage
API based backup tools. While it is expected that
future versions of vSphere should address this
issue, be sure that your data protection systems
take this into account.
The iSCSI/NFS Gateway VM method is growing
in popularity because it resolves almost all of
the limitations of the others. Here, an additional
VM is configured as an iSCSI or NFS target to re-
share the SAN storage over a private virtual net-
work. This VM can be a single vCPU, which means
vSphere Fault Tolerance can be used to increase
its availability. The nodes of the OS cluster then
use the iSCSI or NFS share provided by the target
VM for their shared storage (see Figure 2).
Database
Node
VM Disk VM Disk VM Disk VM DiskVM Disk
Database
Node
Guest to Guest iSCSI
Disk Sharing
Gateway
Shared iSCSI Disk
Database
Node
iSCSI
Gateway
VMware
Fault
Tolerance iSCSI
Gateway
(FT Clone)
ESXi ESXi ESXi
SAN Infrastructure
vSphere Datastore
ESXi
Highlights:
All Storage is VMDK on SAN
iSCSI Gateway virtualizes and re-shares disk
over VM Network (Virtual SAN on SAN)
HA, DRS, and FT work together
All Systems can be vMotioned
Portable to any vSphere architecture
Virtualization Schematic
Figure 2
Physical Disk
ŠVMware, Inc.
Gateway Shared Disk
iSCSI Gateway VM Configuration
8. About Cognizant
Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process
outsourcing services, dedicated to helping the worldâs leading companies build stronger businesses. Headquartered
in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep
industry and business process expertise, and a global, collaborative workforce that embodies the future of work.
With over 50 delivery centers worldwide and approximately 164,300 employees as of June 30, 2013, Cognizant is a
member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the
top performing and fastest growing companies in the world.
Visit us online at www.cognizant.com for more information.
World Headquarters
500 Frank W. Burr Blvd.
Teaneck, NJ 07666 USA
Phone: +1 201 801 0233
Fax: +1 201 801 0243
Toll Free: +1 888 937 3277
Email: inquiry@cognizant.com
European Headquarters
1 Kingdom Street
Paddington Central
London W2 6BD
Phone: +44 (0) 207 297 7600
Fax: +44 (0) 207 121 0102
Email: infouk@cognizant.com
India Operations Headquarters
#5/535, Old Mahabalipuram Road
Okkiyam Pettai, Thoraipakkam
Chennai, 600 096 India
Phone: +91 (0) 44 4209 6000
Fax: +91 (0) 44 4209 6060
Email: inquiryindia@cognizant.com
Š Copyright 2013, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is
subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.
About the Author
Christopher (Chris) A. Williams is a Director of Cognizant Virtual Solutions, within CBC-ITIS Enterprise
Computingâs Infrastructure Technology Management Services Practice. In this role, Chris is responsi-
ble for designing and developing innovative virtual infrastructure, private and hybrid cloud solutions,
optimizing business critical applications and database systems including Oracle RAC, DB2, SQL Server
clusters, MySQL and Sybase. Chris has an M.B.A., information systems emphasis, from the University of
Colorado, and a bachelor of science degree, with aerospace science and management emphasis, from
Metropolitan State University of Denver. He can be reached at Chris.Williams@cognizant.com.
This configuration allows all nodes â and even
the iSCSI Gateway â to be vMotioned, works with
every supported vSphere storage system and can
be used on HA/DRS clusters with more than eight
nodes. It also clearly associates the shared disk
with a specific VM. The primary drawback of this
configuration is that it also is arguably the most
complex to both set up and maintain. Also, when
the iSCSI/NFS target is made fault tolerant, its
disk is marked Eager Zeroed Thick and the multi-
write flag is set. If using a vStorage API based
tool, organizations may need to add a script to
temporarily disable vSphere Fault Tolerance when
backing up this VM.
Regardless of the clustering methodology used,
anti-affinity policies between the various cluster
nodes is a must. This ensures that no two nodes
will run on the same physical host at the same
time, and thus defeat one of the high-availability
purposes of clustering. This is true even for share
via RDM configurations because, in the event of
a host failure, VMware HA will follow DRS rules
for placement when deciding where to restart the
failed cluster node.
Looking Forward
Business-critical applications have special com-
pute needs that go well beyond those of other
systems usually found in virtual infrastructure.
When not carefully attended to, this can cause
these applications to perform poorly and deliver
reduced functionality. Fortunately, while each
application expresses how it consumes resources
differently, the four food groups of computing are
always involved. As a result, common methods
and themes arise when abstracting infrastructure
for these applications.
Properly configured, mission-critical applications
can thrive on virtual infrastructure, gaining the
same benefits of performance, consistency, avail-
ability and recoverability as all other systems.
Understanding how each application uses avail-
able compute resources is the key to successfully
virtualizing business-critical applications, and
accelerating the journey to both cloud computing
and the software defined data center.