Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Federal VMUG - March - Virtual machine sizing considerations in a numa environment v3
1. Virtual Machine Sizing Considerations in NUMA Architectures Jason Shiplett – NetStar Systems Federal VMUG 3/23/11
2. Non-Uniform Memory Access A computer memory design used in multiprocessors, where the memory access time depends on the memory location relative to a processor. Under NUMA, a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors.
3. History and Modern Implementations Commercially developed in the 1990s by companies such as Burroughs (later Unisys), Convex Computer (later Hewlett-Packard), and Silicon Graphics. Modern implementations of cache-coherent NUMA (ccNUMA) include: AMD Opteron multiprocessor systems Intel Nehalem (x86) and Tukwila (IA-64) With Nehalem, Intel introduced Quick-Path Interconnect (QPI), which is a very high bandwidth point-to-point interconnect used to connect CPU sockets, ergo NUMA nodes, relieving some of the hit when accessing remote memory AMD uses a HyperTransport bus to connect CPU sockets, which competes with Intel’s QPI.
4. How NUMA support in vSphere affects you! NUMA ensures memory locality Memory locality lowers latency to access of physical memory pages Memory access within a NUMA node provides a higher bandwidth connection to memory via on-die memory controller Lower latency + higher bandwidth = better performance
6. Transparent Page Sharing and NUMA Transparent Page Sharing (TPS) is restricted to within NUMA nodes This means memory pages will only be shared within a single NUMA node. Like pages in disparate NUMA nodes will be duplicated.
7. Wide-VM NUMA Support in vSphere 4.1 A wide-VM is one which does not fit within a single NUMA node, e.g. 8 vCPU VM in a 4-socket quad-core server. In vSphere 4.1, a wide-VM is split into smaller NUMA clients, which then occupy the fewest possible NUMA nodes. Memory is interleaved between the NUMA nodes occupied by the individual NUMA clients. Interleaving the memory in an equitable manner increases the percentage of memory locality over heuristic methods. Implemented to improve performance of wide-VMs on large, e.g. quad socket, servers. Both CPU and memory performance do not differ much on two-node NUMA systems.
8. What Wide-VM Support Means for You! Few VMs will span multiple NUMA nodes Very large, tier 1 applications Only makes a real difference in large, e.g. 4+ socket, systems In large systems, memory interleaving (equitable distribution of memory pages) can make a significant performance increase The performance increase is more noticeable in memory-intensive applications.
10. Sizing guidelines and considerations As always, it depends on your workload, servers, desired consolidation, etc. As a rule of thumb, keep vCPUs to a minimum, and if you can, keep it within a single NUMA node to ensure memory locality. Wide-VM NUMA support in vSphere 4.1 gives greater flexibility when spanning NUMA nodes is necessary. Memory intensive workloads benefit more from NUMA support
12. References and links 1. "Non-Uniform Memory Access." Wikipedia, The Free Encyclopedia. Wikimedia Foundation, Inc. 22 Feb 2011. Web. 16 Mar 2011. 2. “Sizing VMS and NUMA Nodes.” Frank Denneman. 3 Feb 2010. Web. 16 Mar 2011 3. “VMware vSphere: The CPU Scheduler in ESX 4.1.” VMware. 2010. Web. 16 Mar 2011 4. “ESX 4.1 NUMA Scheduling.” Frank Denneman. 13 Sep 2010. Web. 16 Mar 2011 http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access http://www.vmware.com/files/pdf/techpaper/VMW_vSphere41_cpu_schedule_ESX.pdf http://frankdenneman.nl
13. Thanks! My blog – http://blog.shiplett.org Follow me on Twitter - @jshiplett