VMworld 2013
Banit Agrawal, VMware
Warren Ponder, VMware
Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare
VMworld 2013: A Technical Deep Dive on VMware Horizon View 5.2 Performance and Best Practices
1. A Technical Deep Dive on VMware Horizon View 5.2
Performance and Best Practices
Banit Agrawal, VMware
Warren Ponder, VMware
EUC5706
#EUC5706
2. 2
Session Outline
Introduction
View 5.2 Shared Graphics (3D) Feature and Performance
VMware View 5.2 Enhancements and Performance Results
• PCoIP protocol Improvements
• Windows 8 performance
• SE-Sparse Performance
Performance Tuning and Best Practices
Conclusion
3. 3
Task Worker
Basic data
entry/usage is
central to work
Productivity /
Knowledge Worker
Standard productivity
tools are central to work
Desktop Power User
Some compute
intensive apps, require
3D graphics
performance
Workstation Users
Workstation class
performance for
compute with dedicated
graphics
Image Quality
Interactivity
Cost/Seat
2D / 3D
Heavy Users
Many Applications
Light Users
Fewer Applications
vSGA
Virtualized 3D Hardware Graphics Resources
--VRAM ++ VRAM
vDGA
GPU PCI
Passthrough
Heavy Users
Native Driver
Soft 3D
Software Rendered
Accelerated 3D
Virtual Desktop User Segmentation
4. 4
Overview
Benefits
• Supports DirectX 9 and OpenGL 2.1 apps
• No physical GPU required
• Lower initial VDI CAPEX
• No client side dependencies
Soft 3D – Basic 3D without GPU
Software renderer provides 3D to productivity apps
• Basic 3D graphics capabilities for
productivity workers
• Targeted at Task and Knowledge Workers
who need AERO or applications that
require basic 3D graphics
5. 5
Overview
Benefits
Enable workstation class use cases
Reduce Cost - with multiple VMs sharing
3D graphics cards
Compatible with key platform features
such as vMotion, DRS
Support for mixing physical host clusters
with and without physical GPUS
vSGA - Shared 3D Graphics Among Multiple Virtual Machines
Run rich 3D applications with higher consolidation
Enables shared access to physical
graphics cards for 3D and high
performance graphical workloads.
Desktops use VMware SVGA device for
maximum virtual machine compatibility &
portability.
Cost effective with multiple VMs sharing
single graphics card for maximum benefit
6. 6
Overview
Benefits
Full capabilities of physical GPUs
True workstation replacement option
vDGA – Direct Passthrough to a Specific Virtual Machine
Full workstation class user experience
Enables dedicated access to physical
GPU hardware for 3D and high
performance graphical workloads.
Uses native nVidia drivers
CUDA available to virtual machine
Best for super high performance needs
like manufacturing, oil & gas
7. 7
Tracking vSGA Performance
On a vSphere host, you can execute the commands below to track
system/GPU performance
System Performance (Run “esxtop”) GPU Stats (Run “nvidia-smi -l”)
*More details can be found in the View 5.2 vSGA performance whitepaper:
http://www.vmware.com/files/pdf/view/vmware-horizon-view-hardware-accelerated-
3Dgraphics-performance-study.pdf
8. 8
vSGA Configuration Best Practices
Virtual Machine Hardware
• Latest Virtual Machine Hardware Setting
• Configure VMs to use VMXNET3 NICs
In Guest Virtual Machine Settings
• Throttle the application frame rate to match the configured PCoIP frame rate.
• This configuration is achieved by using the following registry setting
(REG_DWORD):
HKLMSOFTWAREVMware, Inc.VMware SVGA DevTapMaxAppFrameRate
• Setting this registry entry has been found to significantly improve performance and
consolidation ratios
• Consider disabling PCoIP’s build-to-lossless mode
10. 10
vSGA Workload Testing: Light 3D workload
Composed of common desktop applications
• View Planner: Office 2010, Adobe Reader, 720p video, IE9 displaying a web
album
• Google Earth
Aero Enabled
Screen Resolution: 1600 x 1200
Represents a use-case scenario typical of a knowledge worker
11. 11
vSGA Performance: Light 3D Workload
• CPU was getting bottlenecked first while peak GPU utilization
was around 20%
• 112 VMs ran light 3D workload with good response time
12. 12
vSGA Workloads: Interactive 3D UE benchmark
Composed of common 3D and Interactive operations
• Some simple 3D rendering operations
• Dragging
• Scrolling
• Windows Maximize and Minimize
Screen Resolution: 1600 x 1200
User Experience or responsiveness metric based on frame arrival
and inter-frame delay
13. 13
vSGA Performance: UE Benchmark
• Using hardware accelerated 3D improves responsiveness in comparison with a
software solution, even at lower consolidation ratios, where CPU is not exhausted.
• Adding GPUs to an existing software-renderer solution enables the VM
consolidation ratio to be almost doubled while maintaining user experience.
14. 14
vSGA Workloads: Light CAD Workload
Composed of some common apps and CAD viewer
• View Planner: Office 2010, Adobe Reader, 720p video, IE9 displaying a web
album
• SolidWorks CAD Viewer with these models
Response metric: 95% response time and FPS
15. 15
vSGA Performance: Light CAD Workload
• Could scale to 64 VMs without reaching the threshold
• CPU utilization less than 100% at 64 VMs signifies the
View Planner threshold crossing doesn’t mean CPU needs to be
pegged 100%
16. 16
vSGA Workloads: Complex CAD Workload
Solid Edge Viewer ran in isolation
• A 3-1 reducer model was used
Response metric: Remoted Frames per sec (FPS)
17. 17
vSGA Performance: Complex CAD workload
• We see 30 VMs scale nicely and remains with in 80% threshold of the
normalized best case frame rate
• The range above each line bar shows the FPS variation in each VM – The
narrow range suggests that all VMs are fairly distributed to the CPU/GPU
and doesn’t show much variance.
18. 18
Session Outline
Introduction
View 5.2 Shared Graphics (3D) Feature and Performance
VMware View 5.2 Enhancements and Performance Results
• PCoIP protocol Improvements
• Windows 8 performance
• SE-Sparse Performance
Performance Tuning and Best Practices
Conclusion
19. 19
Overview
Benefits
Better scrolling performance
Downlink bandwidth reduction
More users supported in the same
network link
PCoIP Protocol Performance Improvements
Efficient Client size Caching to improve bandwidth usage
Improved client side caching with new
compression techniques
Improved Cache handling of progressive
build operations
Caching support of scrolling operations
Dynamic GPO settings
Relative mouse support
20. 20
Experimental Setup: System and Network Configurations
Network
conditions
Bandwidth and
Round-trip latency
LAN 100Mbps with 1ms
latency
WAN 2Mbps connection
with 100ms latency
Extreme
WAN
300kbps
connection with
100ms latency
Host
Configur
ation
VMware vSphere 5.1
Dell T610
2.53 GHz Nehalem
48 GB Physical RAM
On local SSD
Desktop
Guest VM
32-bit Win7 desktop
1-VCPU,
1GB RAM
1152x864 resolution
32-bit WinXP SP3
1-VCPU, 768 MB
1152x864 resolution
Network link Display
protocol
22. 22
Workload: VMware View Planner
Workload generator and sizing tool
• Platform characterization (CPU, memory, storage)
• Evaluate user experience
• Understand scaling issues and identify bottlenecks
Workload parameters
• All applications selected (PowerPoint, Excel, Word,
Outlook, Web album, Video, Firefox, Adobe, 7Zip, IE9)
• Thinktime of 10 seconds
A newer benchmark version (3.0) was just
released. For more info, send email to
• viewplanner-info@vmware.com
23. 23
Run Configurations
Settings PCoIP (View 5.2)
Resolution and color depth 1152x864 and 32-bit color
Clear Type fonts Enabled (default)
Window-maximize transient effect Disabled
Busy animated cursor Changed to default cursor
Image Quality BTL off
Max. Initial image quality (70)
Frame rate 24
24. 24
PCoIP Caching Improvements
Reducing cache size
• View 5.2 with 5x cache reduction can provide equivalent
bandwidth savings and slightly higher compared to View 5.1
with 250MB RAM
• Good for memory constrained thin-clients and tablet devices
25. 25
PCoIP Caching Improvements
• View 5.2 provides about 5% lower bandwidth usage in LAN
and WAN and about 5-10% in extreme WAN conditions
• Lower bandwidth, more caching of display data using new
compression techniques
26. 26
Overview
Benefits
Use the latest Windows 8 OS for desktops
and clients
Windows 8 Support
Full support of Windows 8 as desktop and client
View 5.2 fully supports Windows 8 as
desktop
View clients also supported in Windows 8
27. 27
Windows 8 Performance and Optimizations
• With the optimizations, bandwidth
usage can be reduced up to 60%
28. 28
PCoIP and RDP 8 Performance
• Windows 8 PCoIP consumes least bandwidth usage once all the
optimizations are applied
• PCoIP is 10-20% better than RDP8
29. 29
Overview
Benefits
Reduced storage capacity requirements
(lower CAPEX) for Persistent Desktops,
even on lower-tier hardware.
View Composer or Mirage can be used for
provisioning simplicity, even if recompose
is never used (e.g. knowledge workers).
SE Sparse Disk Utilization
More efficient use of storage capacity
Leverages new vSphere capability…
A new disk format for VMs on VMFS.
Reduces grain size & more efficiently
utilize every allocated block by filling it
with real data.
Unused space is reclaimed and View
Composer desktops stay small.
30. 30
Dell PowerEdge R710 with
16-core Intel Xeon E5-
2660 @ 2.2 GHz with
392G RAM with SSD
storage
VMware
vSphere 5.1
32-bit Win7 desktop
1-vCPU
1GB RAM
32-bit WinXP SP3
1-vCPU
768 MB RAM
Dell PowerEdge R710
with 12-core Intel Xeon
E5645 @ 2.4 GHz with
296G RAM with SSD
storage
VMware vSphere
5.1
PCoIP
SE Sparse Performance: Experiment Setup
31. 31
SE Sparse Performance: Workload and Configurations
View Planner workload with custom apps
• Install and Uninstall VI Client and VLC Player
• Download files from web and delete the files
• Copy some files and delete these files
10s think time, 2 iterations, remote mode with PCoIP protocol
Number of VMs tested : 100 VMs
All desktop VMs are placed on SSD disk
Wipe/shrink done at the rate of 10 in every 6 minutes, so for 100
VMs, it took 60 minutes (1 hour)
32. 32
SE Sparse Disk Space Reclamation
• Since the wipe/shrink operation can be I/O-intensive for space reclamation, View
administrators are encouraged to use the blackout periods appropriately (available
in the View admin UI) to minimize any perturbation in the user experience.
• Also, depending upon the underlying storage, administrators can tune the concurrency
level in LDAP (under OU=Properties, OU=Virtual Center) and edit the pae-
SeSparseOperationsLimit for the desired vCenter.
33. 33
View Admin Operations Enhancements
Significant acceleration of the Admin Backend by servings request
from an in-memory cache as opposed to fetching data from LDAP
• Improvements in backend time (20 pools, 10K simulated VMs):
• 2x for Inventory -> Desktops
• 4x for Inventory -> Pools
Support of cluster with 32 hosts (now with both NFS and VMFS)
Operational time of View management operations such as
provisioning, recomposing, and rebalancing has improved
significantly (by up to 2x) in View 5.2
34. 34
Session Outline
Introduction
View 5.2 Shared Graphics (3D) Feature and Performance
VMware View 5.2 Enhancements and Performance Results
• PCoIP protocol Improvements
• Windows 8 performance
• SE-Sparse Performance
Performance Tuning and Best Practices
• Platform Best Practices
• Guest-level Optimizations
• Protocol and Network Best Practices
Conclusion
35. 35
Platform Best Practices
Config Best Practices
View Storage Acceleration
(CBRC)
Always enable CBRC (on by default)
Will reduce bootstorm IOPS requirement by 80%
Will also reduce loginstorm IOPS requirement
Space-efficient Sparse
Disks (SE-Sparse) disks
Use SE-sparse disks and you can reclaim the wasted space.
Use the wipe/shrink operations in blackout periods as IOPS
requirement may be high
VDI replica Keep the desktop replica on SSD
Memory-overcommitment Use memory over-commitment as long as the active memory fits in
the physical memory (you can use View Planner custom apps
features to get an estimation)
Try to avoid Ballooning or Swapping
IOPS requirement Typical knowledge worker about 10-15 IOPS.
Depending on your applications, YMMV
CPU requirement About 200 to 500 MHz per user depending upon the application
requirements
36. 36
Guest Level Optimizations
More details in the white paper: http://www.vmware.com/files/pdf/view/vmware-horizon-view-best-practices-performance-study.pdf
Parameter Best Practices
vCPU 1 for WinXP/Win7/Win8, 2 for multimedia intensive apps
Memory 512-768 MB for WinXP, 1GB for 32-bit Win7 and Win8
2GB for 64-bit Win7 and Win8, 1.5-2GB for WinXP, Win7, and Win8
32-bit, 3GB for Win7 and Win8 64-bit for memory-intensive apps
Network adapter Vmxnet3, flexible
Storage adapter pvSCSI or LSI logic SAS
VMware Tools Latest installed
Visual settings “Adjust to Best performance”, Disable Animations for Windows
Maximize and Minimize operations
Use default cursor for busy and working cursor
Disabling services Windows Update, Super-fetch, Windows Index,
Group policy settings Disable Hibernation, System restore disable, Screensaver to None
Other settings Turn off clear-type
Disable fading effects
Disable last access timestamp
37. 37
All Desktop / Network Condition Tuning Recommendations
Setting Recommendation Benefit Description
Build to lossless
Disable – Standard Desktops
Enable – CAD/CAM –
Medical Imaging
Saves 10-15%
bandwidth
Used to enable / disable
image quality building to fully
lossless
Session Audio BW limit 50 - 100Kbps
Reduces
bandwidth and
CPU usage
Reduces BW usage of audio
with usable quality
Maximum frame rate
10 / 15 FPS
Standard Desktops
Reduces
Bandwidth and
CPU usage
In WAN conditions, this will
be helpful for video playback
and fast graphics operations
Client side cache size
50 – 100MB
Depending on available client
RAM
Avg. 30% reduction
in bandwidth
This allows you to configure
the client side image cache
size.
More details in the white paper: http://www.vmware.com/files/pdf/view/vmware-horizon-view-best-practices-performance-study.pdf
38. 38
Specific Network Condition Tuning Recommendations
Setting Recommendation Benefit Description
Max Session
Bandwidth
Set for LAN / WAN
1-2Mb Standard Desktops
3 – 5Mb 3D Desktops
Note: Always with your use
cases for the most accurate
range
Reduced Avg.
Bandwidth and fair
sharing
Caps the peak bandwidth
per session
Session Audio BW
limit
50 - 100Kbps
Reduces
bandwidth and
CPU usage
Reduces BW usage of audio
with usable quality
Maximum Image
Quality
60-70%
Reduces
Bandwidth and
CPU usage
Helps in low bandwidth
conditions or with heavy
multimedia use cases
Configure Session
Floor
Not lower than 100MB
Depending on available client
RAM
Improved user
experience
Helps with better bandwidth
estimation and improves user
experience in high packet
loss scenarios or on WiFi,
3G/4G networks
More details in the white paper: http://www.vmware.com/files/pdf/view/vmware-horizon-view-best-practices-performance-study.pdf
39. 39
3D / Intense Graphics Tuning Recommendations
Setting Recommendation Benefit Description
Max Frame Rate
Set based on client
capability
Zero Client – 30FPS
Atom based client – 15FPS
Dual Core ARM client – 20FPS
Desktop 30+FPS
Provides
consistent end to
end user
experience
Caps the maximum frame
rate encoded and sent to the
client for decode
Max App Frame Rate
Set to match the Max PCoIP
Frame Rate
Sends only frames
that can be
encoded from the
app to PCoIP
Limits high frame rate
applications from generating
excessive FPS
More details in the white paper: http://www.vmware.com/files/pdf/view/vmware-horizon-view-best-practices-performance-study.pdf
40. 40
Conclusion
• View 5.2 provides support of hardware accelerated shared
graphics and we can easily scale from 32 to 100 desktop VMs with
different intensity of 3D graphics workload
• PCoIP caching improvements resulted in 5-10% bandwidth
improvements compared to View 5.1
• SE sparse disk can reclaim wasted space and provide significant
space savings
• View admin operations and UI performance enhancements
• With appropriate best practices, user experience can be improved
for different network conditions
41. 41
Other VMware Activities Related to This Session
HOL:
HOL-MBL-1301
Horizon View from A to Z
Group Discussions:
EUC1001-GD, EUC1006-GD
View with Matt Coppinger or Andre Leibovici
46. 46
Performance Metrics We Care About
Network link
Display
protocol
Desktop CPU
usage
Bandwidth
usage
User
Experience
Lower CPU usage
Better host consolidation
Lower cost
Lower BW usage
More users supported
Better user experience
Lower response time
Better user experience
Happy VDI users
48. 48
Overview
Benefits
Improved end user experience with
broader application support
Up to 100x bandwidth reduction
Improves installation and administration of
microphone and webcam devices
Real-Time Audio-Video
Improved Microphone and Webcam Experience
Webcams and Microphones are now
generally supported with Horizon View
Windows clients
Broader application support for webcams
with Webex, Skype and GoogleTalk
Compressed audio/video reduces
upstream BW to as low as 300kbps
View
Client
V
A
V
Compressed
A/V
Skype Webex GoogleTalk
49. 49
“Real-Time Audio-Video” Overview
Before
• Webcams were unsupported with Horizon View desktops, unless specifically
used with optimized UC vendor solutions
• USB redirection of webcams and headsets resulted in bandwidth explosion
• Single webcam stream can result in 60 Mbps upstream to remote desktop
• Some customers redirected anyway, but with poor results
After
• General support for microphones and webcams with Horizon View desktops
• Broader application support for use with webcam video and microphone audio
• Audio/video from microphone/webcam is encoded and compressed on client
endpoint
• Bandwidth reduction to as little as 300-600kbps
50. 50
How “Real-Time Audio-Video” Works
Skype
View Client
Encoded
audio/video
Compressed Webex
GoogleTalk
View Agent
• Audio and video
captured on client
machine
• Audio/video encoded
and compressed
• Compressed
audio/video sent
back to remote
desktop
• On View desktop,
audio/video decoded
and presented to
virtual webcam driver
and virtual audio
driver
51. 51
Flash URL Redirection
Streaming of live video events from Adobe Media Server
Adobe Media
Server
Overview
Benefits
Stream live video events optimally to
Horizon View desktops
Support for live video streaming on Adobe
Media Server
Supported with Windows
and Linux thin clients
Stream live video events to virtual
desktops without affect datacenter server
and network
Enables new multimedia use cases with
virtual desktops
Multicast stream
52. 52
Tuning and Optimization Strategies
Disable Build-to-lossless
• No-brainer – first and easiest way to shave 10-15% bandwidth
• Only enable when there is a defined requirement for pixel perfect accuracy
(Medical, CAD/CAM, Graphic Design)
Configure the maximum session bandwidth
• For low bandwidth links set the limit at or slightly below (10%) the max link rate
• Even on the LAN it may make sense to set a max limit
Configure the session floor when…
• PCoIP is experiencing packet loss but the network link has plenty of headroom
• May not always improve user experience – YMMV
• Packet loss is seen on WiFi or 3/4G networks
• Be careful to avoid unintentional oversaturation
53. 53
PCoIP Best Practices Recommendations
Setting Default Recommendation Description
Build to lossless On Turn Off
Enables the ability to enable or
disable build to lossless
Session Audio BW limit 500Kbps 50 - 100Kbps
Reduces BW usage of audio with
usable quality
Maximum frame rate 30
Change to 10-15 based
on network settings
In WAN conditions, this will be
helpful for video playback and
fast graphics operations
Maximum link rate -
Set it as per network
conditions
Good for better bandwidth
estimation
Client side cache size 250MB Set per client-side
memory available
This allows you to configure the
client side image cache size.
More details in the white paper: http://www.vmware.com/files/pdf/view/vmware-horizon-view-best-practices-performance-study.pdf
54. 54
Tuning and Optimization Strategies
Configure the maximum frame rate
• In almost all cases the maximum frame rate can be reduced to 18-20fps with
little noticeable impact – but also little gain.
• Settings below 15fps may be noticeable in use cases which require rich media
• Task workers without media requirements can often utilize settings as low as
6-8fps without significant visual impact
• Examine the PCoIP Server log files and WMI Image stats to determine
average frame rate for desired use case:
MGMT_IMG :log: cur_s 0 max_s 30 tbl 2 bwc 0.01 bwt 8.95 fps 5.57
MGMT_IMG :log: cur_s 0 max_s 30 tbl 2 bwc 0.01 bwt 8.95 fps 6.26
Configure the maximum initial image quality
• When on a WAN link with constrained bandwidth reduce this setting to 60-70%
• For use cases that use large amounts of multimedia/video – large impact
• Setting this value too low may result in noticeably “fuzzy” or “blurry” images
55. 55
Tuning and Optimization Strategies
Configure the minimum image quality
• This value must be below the maximum initial image quality setting
• The default value of 50% is acceptable for most cases
Configure the audio bandwidth limit
• For use cases that utilize significant amounts of audio - legal/medial
transcription for example – reducing audio bandwidth may increase user
density
• Audio bandwidth limit is a target, not a literal value
• Vary the audio bandwidth limit between 450Kbps – 50Kbps until the desired
mix of bandwidth savings and audio intelligibility is achieved
Configure the Client-side cache size
• When using thin client devices with limited RAM using a larger cache size than
the device can support may lead to dropped sessions
• Reduce the cache size until connections are unaffected, typically 50-100MB
56. 56
Overview
Benefits
Enhanced Usability: One stop shopping
for end user access to all their corporate
workloads.
Horizon Brokering of View Desktops
Horizon Supports User Entitlement to Desktops and SSO
View Desktop pools are connected into
Horizon after they are provisioned
Horizon provides single point of access for
end users to desktops, data and apps.
Horizon supports SSO brokering user to
available desktops based on entitlement
policy