Streamlining Python Development: A Guide to a Modern Project Setup
Sum209
1. SUM209: XenDesktop in the Enterprise Best Practices and Lessons Learned
Nick Rintalan, Senior Architect
Thomas Berger, Architect
Citrix Consulting
May 2011
2. HTTP(s) / WCF
XD Controller
HTTP(s)
XML
HTTP(s)
Web Interface
LDAP
XenDesktop Architecture Review
ICA
User
Virtual Desktop
Hypervisor
NFS, iSCSI, FC
UDP
Provisioning Server
CIFS
NFS
iSCSI
FC
Storage
Active Directory
SQL Server
License Server
3. HTTP(s) / WCF
XD Controller
HTTP(s)
XML
HTTP(s)
Web Interface
LDAP
XenDesktop Architecture Review
ICA
User
Virtual Desktop
Hypervisor
NFS, iSCSI, FC
UDP
Provisioning Server
CIFS
NFS
iSCSI
FC
Storage
Active Directory
SQL Server
License Server
4. Web Interface Scalability
• WI 5.x itself scales to the IIS specification
• The real bottleneck in “Web Interface Scalability” is the XML Service
• Placing WI on either 2003 or 2008 has no real impact on
scalability
• However, using 2008-based XML Brokers reduces scalability by almost 50%!
• We can no longer “exploit” the Log on Locally user right assignment in 2008
• A 2008 R2 WI box with 2 CPU and 2 GB RAM has been
proven to scale to ~31k users/hour or ~9 users/sec
• Almost 60k users/hour with 2003-based XML Brokers
5. Enterprise Considerations
• Always deploy 2 Web Interface servers for redundancy
• Use a hardware load balancer if possible (i.e. NetScaler)
• Intelligent monitoring of WI availability / XML Service
• WI is a good candidate for virtualization
• 2 vCPUs and 4 GB RAM is a good starting spec
• Check if encryption is required (User WI XML)
• Otherwise user credentials are transferred as clear text
• Disable Socket Pooling if not using SSL
Citrix Confidential - Do Not Distribute
6. HTTP(s) / WCF
XD Controller
HTTP(s)
XML
HTTP(s)
Web Interface
LDAP
XenDesktop Architecture Review
ICA
User
Virtual Desktop
Hypervisor
NFS, iSCSI, FC
UDP
Provisioning Server
CIFS
NFS
iSCSI
FC
Storage
Active Directory
SQL Server
License Server
7. The All New XenDesktop 5 Controller
• Important to understand the new XenDesktop architecture
• No more IMA service or Data Store
• New XD5 Controllers are stateless and the database (SQL) is relational
• The SQL database must be made highly available
• Many of the services (XML, etc.) have been re-written in .NET
• Registry-based discovery & registration is now used by default
• New architecture allows for greater scalability
Citrix Confidential - Do Not Distribute
8. DDC Bottlenecks (XD4 vs. XD5)
• In XenDesktop 4 it was a best practice to dedicate servers to
certain roles
• 2 DDCs for IMA and PMS
• 2 DDCs for XML and Controller
• In XenDesktop 5 it is recommended to not dedicate servers
• 2 DDCs per XD5 “site” (instead of 4 DDCs per XD4 farm)
• Site services are load balanced automatically
• Specify all XD5 Controllers as XML Brokers in Web Interface
Citrix Confidential - Do Not Distribute
9. XD5 Controller Scalability
• Scalability tests using two (physical) 2x4’s and 16 GB RAM:
• Boot Storm (20,000 desktops in 10 minutes): 40-50% CPU utilization during the
virtual desktop registration process
• Logon Storm (20,000 logons in 13 minutes): 50-60% CPU utilization during user
connection process
• User Perception: 99.9% of the brokered connections
responded to launch requests in less than 2.5 seconds
Citrix Confidential - Do Not Distribute
10. XD5 Controller Scalability
• Rough estimate based on scalability tests:
• A single XD5 site can scale to 10,000 desktops with 2 Controllers in most cases
• Need to get more granular?
• 125-180 virtual desktop registrations per minute per dedicated core
• 100-120 user logons per minute per dedicated core
• Assumes the desktops are delivered via PVS and the Controller’s CPUs are not
shared with other components
Citrix Confidential - Do Not Distribute
11. XD SQL Database Scalability
• XenDesktop 5 uses SQL Database:
• To store all configuration and session information
• A message bus between the Controllers
Allows for a more flexible architecture (stateless DDCs!)
This causes a massively higher performance impact on SQL
Citrix Confidential - Do Not Distribute
12. XD SQL Database Scalability
• Scalability tests using three (physical) 2x4’s with 16 GB RAM:
• Boot Storm (20,000 desktops in 10 minutes):
• 15-25% CPU (SQL principal database)
• 5-10% CPU (SQL mirror)
• SQL witness was essentially idle
• Logon Storm (20,000 logons in 13 minutes):
• 32% CPU (SQL principal database)
• 10% CPU (SQL mirror)
• SQL witness was essentially idle
Citrix Confidential - Do Not Distribute
13. Pay Attention to the Transaction Log!
• 20,000 desktop scenario
• High number of SQL transactions
• 666 transactions / sec equals 20k
desktops sending heart beat
(every 30s)
• Can cause transaction log to grow
excessively (gigabytes)
– Check CTX126916
Citrix Confidential - Do Not Distribute
14. Enterprise Considerations – XD SQL Database
• Ensure SQL database is made highly available
• Check Citrix XenDesktop Design Handbook (bit.ly/xdhandbook) for SQL
Database Sizing
• The database itself will be small (MBs), but the transaction
log will be big (GBs)
• Leverage database mirroring, failover clustering or “HA” built
into the hypervisor
Citrix Confidential - Do Not Distribute
15. HTTP(s) / WCF
XD Controller
HTTP(s)
XML
HTTP(s)
Web Interface
LDAP
XenDesktop Architecture Review
ICA
User
Virtual Desktop
Hypervisor
NFS, iSCSI, FC
UDP
Provisioning Server
CIFS
NFS
iSCSI
FC
Storage
Active Directory
SQL Server
License Server
16. Hypervisor Scalability
• Most frequently asked question is:
How many VMs / box?
• The most definitive answer is:
It depends!
17. Hypervisor Scalability – Why Does it Depend?
…because all users and apps are different!
• Real world ratios range from…
16 VMs per Core
(Light Task Worker)
4 Cores per VM
(Heavy Trader / CAD)
Citrix Confidential - Do Not Distribute
18. Hypervisor Scalability – Where to Begin?
• You will need to test!
• Formal P&S Testing with tools such as ESLT or LoadRunner are preferred
• If P&S Testing cannot be performed, conduct an extended
pilot within your environment
• This will at least allow you to gather some baseline data
• Don’t forget to include a “buffer” in case you’re off
Citrix Confidential - Do Not Distribute
19. What if a Pilot is Not Possible?
• Gather some performance statistics from existing
workstations
• Only works if the workload will not change when going virtual
• 3rd party software may help such as:
• Liquidware Labs Stratusphere
• Novell PlateSpin Recon
• Microsoft Assessment and Planning (MAP) Toolkit for Hyper-V
• Make sure to include an even bigger “buffer”
Citrix Confidential - Do Not Distribute
21. Hypervisor Scalability
• XD on XenServer, Hyper-V and vSphere is all about the
same in terms of user density
• Architecture and features can be slightly different
• Processors that support nested paging are highly
recommended
• Extended Page Tables (Intel)
• Rapid Virtualization Indexing (AMD)
• Remember to “save” 1 core* and ~1-3 GB of memory* for
the hypervisor itself
• However, XS 5.6 FP1 now uses 4 CPUs by default instead of 1
22. Hypervisor Scalability
• Certain memory over-commitment features can be helpful in
VDI deployments
• XenServer, vSphere and Hyper-V now all support dynamic memory
management (essentially “ballooning”)
• Transparent Page Sharing (TPS) doesn’t help much with “new” operating
systems (legacy 4KB pages vs. new large 2 MB pages)
• Don’t turn anything on/off unless you know what you’re doing
23. Hypervisors – Network Recommendations
• Dedicate NICs for certain functions
• Ensures highest scalability
• Eases monitoring / trend analysis
• Avoid single points of failure
• Create Bonds/Teams whenever possible
• NIC virtualization (i.e. HP Virtual
Connect Flex-10) can help here
greatly
24. Hypervisor NIC Bonding
Embedded (dual port)
eth0 + eth 4
User and
Infrastructure traffic
Add on (quad port)
eth1 + eth3
Storage traffic
eth2 + eth5
Host Management
and HA traffic
25. Hypervisors – General Recommendations
• Create separate resource pools or clusters for servers and
desktops
• Be aware of the limitations of your hypervisor:
• XenServer Pool Size
• Maximum supported VMs per vCenter / SCVMM instance
• Be aware of the specific requirements of your hypervisor:
• Hyper-V save state file
• Resource requirements for vCenter / SCVMM
• XenServer dom0 vCPU and RAM assignment
Citrix Confidential - Do Not Distribute
26. HTTP(s) / WCF
XD Controller
HTTP(s)
XML
HTTP(s)
Web Interface
LDAP
XenDesktop Architecture Review
ICA
User
Virtual Desktop
Hypervisor
NFS, iSCSI, FC
UDP
Provisioning Server
CIFS
NFS
iSCSI
FC
Storage
Active Directory
SQL Server
License Server
27. Provisioning Server Scalability
• CPU and Memory are not typically the bottlenecks
• Disk and Network I/O are!
• With careful design and planning, PVS can be virtualized
• The virtual vs. physical decision depends on several factors:
• Number and type of target devices (50 target devices vs. 5000, XA vs. XD, etc.)
• Networking infrastructure and NICs (1 Gb vs. 10 Gb, SR-IOV compatibility, etc.)
• Hypervisor and LACP support (vSphere vs. XenServer)
28. PVS Scalability – Eliminating the Disk Bottleneck
• PVS cannot cache, but Windows can!
• Block-level storage makes it easy
• FC, iSCSI and even local disk storage
• No caching by default with CIFS or NFS
• It is possible, but need to tweak!
• 64-bit is the key to success
• It’s all about the File/System Cache!
• Read IOPS from the vDisk Store in the Steady State
approach ZERO!
Citrix Confidential - Do Not Distribute
29. PVS Scalability – Enterprise Considerations
• The number of unique vDisks streamed simultaneously
greatly affects scalability
• Total RAM Required = 2 GB + (# vDisks x 2 GB)
• Shared storage optimized for write performance is ideal
• 80-90% writes is common in XD deployments in the steady-state
• RAID 1, 10 are good options (RAID 5 is bad)
• More streams = longer failover process
• 1,000 streams ≈ 5 minutes
• 1,500 streams ≈ 8 minutes
30. PVS Scalability – Enterprise Considerations
• Implement at least two PVS servers
• Use PVS HA Mode to distribute the load
• In case of different hardware, leverage PVS “Power Rating” feature
• Teaming NICs for throughput is also highly recommended to
achieve maximum scalability
• As a rule of thumb, for every 1 Gb NIC, expect ~500 target
devices able to be streamed by PVS
• Citrix has scaled a single PVS box up to 3300 target devices
• Consulting typically scales each PVS box to 1000-1500 target devices
31. HTTP(s) / WCF
XD Controller
HTTP(s)
XML
HTTP(s)
Web Interface
LDAP
XenDesktop Architecture Review
ICA
User
Virtual Desktop
Hypervisor
NFS, iSCSI, FC
UDP
Provisioning Server
CIFS
NFS
iSCSI
FC
Storage
Active Directory
SQL Server
License Server
32. Storage Recommendations
• Quick and Dirty estimates
• 5 simultaneous bootups per spindle
• 12 simultaneous logons per spindle
• 14 simultaneous logoffs per spindle
• 18 simultaneous users per spindle
• Capacity calculations impacted by
• Disk speed
• RAID level
• Read/Write % (20/80)
• User Activity (example values)
Disk Speed
15,000
10,000
5,400
Random IOPS
150
110
50
RAID Level
0
1 or 10
Write Cost
1
2
5
4
Activity
Startup
IOPS
26
Logon
Working
Logoff
12.5
8
10.7
33. Storage Usage: Write Cache
• It’s all about the Write IOPS
• 90/10 Write to Read Ratio is common in the steady-state
• RAID 1 or 10 is best, RAID 5 or 6 *not* recommended
(unless a huge amount of spindles or write cache on RAID
controller)
• Spread the write cache drives over several LUNs and ensure
the LUNs are sized properly
• For example, with 100 desktops and 5 GB write cache drives, consider using
4 or 5 100-150 GB RAID10 LUNs
34. Storage usage: Write Cache (cont’)
• Check registry setting (CTX123570 FILE_NO_INTERMEDIATE_BUFFERING)
• When the Write Cache fills up, expect a BSOD or
hang/freeze
• Things that cause Write Cache activity to be high:
• Boot / Shutdown / User logging on or off
• User starting application (streamed or local, hosted should have minimal effect)
• Application behavior and profile solution
• Windows Perfmon <Physical Disk Disk Writes/sec>
( Disk Transfers / sec gives you the whole picture)
35. Why is “How much disk space do I need for the write
cache” a dangerous question?
Example (very simple):
• 1,000 VM environment (Windows 7, 20 GB vDisks)
• Write Cache max. size est. to 5 GB
• IOPS est. to peak at 20 IOPS / user
= Write Cache space 5 TB
• Estimated 100 users simultaneous
11 x 500 GB disks (RAID 5)
20 x 500 GB disks (RAID 10)
• If we want to be cost-conscious,
we opt for SATA
(R/W ratio 30/70)
• IO load - Logical
2,000 IOPS (600 reads / 1400 writes)
• IO load - Physical
RAID 5 = 600 + 4*1400 = 6,200 IOPS
over 120 SATA disks required (50 IOPS / disk)
over 40 SCSI disks (150 IOPS / disk)
36. Storage Considerations
• HBAs must support disk throughput
• Especially important for shared HBA scenarios (i.e. Blades)
• Storage controllers must cope with the load
• SAN may not be exclusively for VDI
• Prevent concurrent hard disk intensive tasks
• Active Scan after antivirus pattern update / Scheduled defrag
• Use only fixed-size VHDs for write-cache drives and
Provisioning Services vDisks
• Disk can become fragmented on physical media
Citrix Confidential - Do Not Distribute
37. HTTP(s) / WCF
XD Controller
HTTP(s)
XML
HTTP(s)
Web Interface
LDAP
XenDesktop Architecture Review
ICA
User
Virtual Desktop
Hypervisor
NFS, iSCSI, FC
UDP
Provisioning Server
CIFS
NFS
iSCSI
FC
Storage
Active Directory
SQL Server
License Server
38. Virtual Desktop Operating System
• Windows XP
• Requires around 5000 IOPS each for startup
• File system alignment issues need to be dealt with
• Windows 7
• Generates more IOPS than XP for startup / logon (VRC shows +83% for boot)
• However, generates considerably less IOPS than XP when working / idle
• Optimized for virtualized environments (Host Integration Services)
• Windows 7 performed better on Hyper-V (go figure!)
39. Average Resource Allocation
User Group Operating
System
Light
Normal
Power
Heavy
vCPU
Allocation
Memory
Allocation
Avg IOPS
(Steady State)
Windows XP
1
768MB-1 GB
4-8
than
Windows 7 Do NOT give virtual desktops more resources5-10 needed
1
1-1.5 GB
Windows XP
1
1-1.5 GB
8-14
Windows 7
1
1.5-2 GB
10-15
Windows XP
1
1.5-2 GB
14-25
Windows 7
1-2
2-3 GB
15-30
Windows XP
1
2 GB
25-50
Windows 7
2
4 GB
30-60
Estimate
Users/Core
10-12
8-10
8-10
6-8
6-8
4-6
4-6
2-4
• Windows XP base image
• Uniprocessor HAL: give 2 vCPUs and hypervisor won’t utilize
• Multiprocessor HAL: use 1 vCPU and waste resources while system tries to align processors
40. Desktop Optimizations
• There is a myriad of desktop optimization guides available
• Good guides are:
• Citrix XenDesktop Design Handbook (bit.ly/xdhandbook)
• Windows XP / 7 – Optimization Guides
• CTX124239
• http://www.virtualrealitycheck.net
• http://www.virtualfeller.com
• http://www.citrixtools.net
Citrix Confidential - Do Not Distribute
41. HTTP(s) / WCF
XD Controller
HTTP(s)
XML
HTTP(s)
Web Interface
LDAP
XenDesktop Architecture Review
ICA
User
Virtual Desktop
Hypervisor
NFS, iSCSI, FC
UDP
Provisioning Server
CIFS
NFS
iSCSI
FC
Storage
Active Directory
SQL Server
License Server
42. License Server Scalability
• License check-outs
• A standalone Intel Xeon 2.83 GHz quad-core processor with 4 GB of RAM is
able to handle 248 license check-outs per second (446,400 user / 30 minutes)
• Dell PowerEdge 2650 with a 2.2 GHz processor can handle 170 license checkouts per second (306,000 user / 30 minutes)
• Virtually no CPU resources consumed for check-ins
• Great candidate for virtualization!
43. Before you leave…
• Recommended related breakout sessions:
• SUM210 - Taking user experience to another level: understanding HDX technologies
• SUM211 - It’s all about “me!”—user personalization and profiles
• Session surveys are available online at www.citrixsummit.com
starting Thursday, May 26
• Provide your feedback and pick up a complimentary gift at the registration desk
• Download presentations starting Friday, June 3, from your My
Organizer Tool located in your My Synergy Microsite event account
46. FAQ
• With the previous XD versions, we always had a chance to
specify the connection account (to the DB) with the IMA
service. Is there still such functionality available or is this
relevant at all?
• The XD5 services access the database using their computer
account logins (domainmachine$).
47. FAQ
• How big will the transaction log become?
• Number of Virtual Desktop Agents X 24 Hours X
approximately 62 kilobytes of data
• Example 1.000 Virtual Desktop Agent Farm in idle state:
• 1.000 VDA X 24 X 62K = 1.480 megabytes
• Note: This can be substantially higher in active environments.
• Check: CTX126916
48. FAQ
• How can VDA requests be effectively load balanced?
• Configure all the desktops with the addresses of all brokers.
The Virtual Desktop Agent randomly selects one DDC from
the list and tries to register with that DDC.
• Note: Hardware Load Balances / NLB does not work as communication is
Kerberos based.
49. FAQ
• How does the VDA discover its XenDesktop site and
controller?
• Registry based discovery (XD5 default)
• HKLMSoftwareCitrixVirtualDesktopAgentListOfDDCs
• Active Directory browsing
• Service Connection Point (SCP) / Controllers Group
• Quick deploy discovery (MCS only)
• C:Personality.ini contains list of Controllers
50. FAQ
• Why does Citrix recommend NFS for MCS, but not iSCSI
and Fibre Channel?
Notas del editor
Users were assigned 100 published applications
Intel EPT – Extended Page TablesAMD RVI – Rapid Virtualization I…Ballooning – typically hurts (artificial memory inflation w/ balloon driver)Transparent Page Sharing – typically helps (but not much w/ new OSes with large pages (4 KB vs. 2 MB))Host Paging – jury is still somewhat out on this one, but I’d disable until more studies are done (effectively allows hypervisor to control swapping centrally instead of at the local level (VMs))what is RVI and why do I care?Here’s why: When a virtual machine is running, inside the VM, the operating system maps out pages of RAM. The hypervisor then addresses and stores those pages in physical RAM. The address in physical RAM typically never matches the address that the VM’s operating system knows. The hypervisor (specifically the virtual machine monitor or VMM) has to "translate" the fetching and updating of pages of RAM. This puts an extra "tax" on the CPU and adds to virtualization overhead. This can specifically be seen in workloads that perform very frequent page table updates.The solution: AMD and Intel continue to enhance their CPUs to offload some of the functions that virtualization provides in software today. AMD and Intel started with their AMD-V and VT Extensions respectively and now continue on to the second generation of virtualization enhancements. AMD’s is called Rapid Virtualization Indexing (RVI) and Intel calls theirs Extended Page Tables (EPT). What these functions do is manage the mapping of virtual pages to physical pages for fetching and updating memory so that the CPU does not have to do it in software.
Memory Overcommit = performance issues when RAM scarce
Most Targets read the same vDisk blocksEven a 20 GB target vDisk usually only reads 1 – 2 GBMany files on a target vDisk are rarely, if ever, readQuick and Dirty FormulaTotal RAM Required = 2 GB + (# vDisks X 2 GB)Consider a server hosting 6 unique vDisks2 GB (for OS & services) + (6 x 2 GB) = 14 GB RAM….then add 25% buffer, so put in 16 GB RAM!
All file operations are loaded into System Cache memoryThis is true for server and desktop operating systemsThis speeds up file accessFiles remain in RAM even after they are closedProvisioning Services Server reads the vDisk just like any Windows applications reads a file (e.g. Word, Excel, etc…)X64 Editions of Windows support more memoryThis includes both physical RAM and virtual Kernel MemoryUse an x64 Edition of Windows (Preferably 2008 R2)Allocate enough RAM to cache vDisk contentsAll Windows OS’s caches data from CIFS shares; However,Provisioning Services on Windows 2003 does notProvisioning Services on Windows 2008 does; However,When a second Provisioning Server connects to the CIFS store, it does notYou will see 2008 caching from CIFS in the following lab exerciseFor production environments, do not use a CIFS StoreNew Windows 2008 (SMB 2.0) features might fix thisOur engineering is reviewing this (no guarantees it will work)Use block level storage for Stores (local disk, iSCSI, FC)
More intelligent storage management (less disk activity, larger writes when disk activity is required, pagefile management improvements to reduce the disk IOPS)