SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
TRND04

The Lego Cloud
Benoit Hudzia Sr. Researcher; SAP Research CEC Belfast
Aidan Shribman Sr. Researcher; SAP Research Israel
Agenda


Introduction
Hardware Trends
Live Migration
Memory Aggregation
Compute Aggregation
Summary




© 2012 SAP AG. All rights reserved.   2
Introduction
The evolution of the datacenter
Evolution of Virtualization




                                                                                            Resources Disaggregation
                                                                                            (True utility Cloud)
                                                                       Flexible Resources
                                                                       Management
                                                       Basic           (Cloud)
                                                       Consolidation


                                      No
                                      virtualization




© 2012 SAP AG. All rights reserved.                                                                                    4
Why Disaggregate Resources?


Better Performance
Replacing slow local devices (e.g. disk) with fast remote devices (e.g. DRAM).
Many remote devices working in parallel (e.g. DRAM, disk, compute)

Superior Scalability
Going beyond boundaries of the single node

Improved Economics
Do more with existing hardware
Reach better hardware utilization levels




© 2012 SAP AG. All rights reserved.                                              5
The Hecatonchires Project
Hecatonchires “Hundred Headed One”

Original idea: provide Distributed Shared Memory (DSM)
capabilities to the cloud

Strategic goal : full resource liberation brought to the cloud by:
 Providing more resource flexibility to current cloud paradigm by breaking
  down nodes to their basic elements (CPU, Memory, I/O)
 Extend existing cloud software stack (KVM, Qemu, libvirt, OpenStack)
  without degrading any existing capabilities.
 Using commodity cloud hardware: medium sized hosts (typically 64 GB
  and 8/16 cores), and standard interconnects (such as 1 Gigabit or 10 GE).

Initiated by Benoit Hudzia in 2011. Currently developed by two
small teams of researchers from the TI Practice located in
Belfast and Ra’anana

© 2012 SAP AG. All rights reserved.                                           6
High Level Architecture

                                                                   Guests
No special HW required but RDMA enabled
NICs which support the low overhead low
latency communication layer                                  VM
                                                                               VM          VM

                                                             App               App         App


VMs are not bounded by host size anymore as                                    OS
                                                                                           OS
                                                 VM
                                                             OS                            H/W
resources such as memory, I/O and compute
                                                 Ap
                                                 p

                                                OS
                                                                               H/W
can be aggregated                               H/W          H/W


Different sized VMs can share infrastructure
so we can still support the smaller VMs not    Server #1           Server #2          Server #n
requiring dedicated hosts                        CPUs                CPUs               CPUs
                                                Memory              Memory             Memory
Application stack runs unmodified                     I/O             I/O                I/O

                                                            Fast RDMA Communication


© 2012 SAP AG. All rights reserved.                                                               7
The Team - Panoramic View




© 2012 SAP AG. All rights reserved.   8
Hardware Trends
Are hosts getting closer?
CPUs stopped getting faster


Moore’s law prevailed until 2003 when core’s
speed hit a practical limit of about 3.4 Ghz

In data center core are even slower running at
2.0 - 2.8 Ghz for to power conservation
reasons

Since 2000 you do get more cores – but that
does not effect compute cycle and compute
instruction latencies

Effectively arbitrary sequential algorithms
have not gotten faster since

                                                 Source: http://www.intel.com/pressroom/kits/quickrefyr.htm


© 2012 SAP AG. All rights reserved.                                                                      10
DRAM latency has remained constant


CPU clock speed and memory bandwidth
increased steadily (at least until 2000)

But memory latency remained constant – so
local memory has gotten slower from the CPU
perspective




                                              Source: J. Karstens: In-Memory Technology at SAP. DKOM 2010

© 2012 SAP AG. All rights reserved.                                                                         11
Disk latency has virtually not improved


1980s standard disk has a 3,600 RPM                             Average Latency (ms)

                                              8.3
2010s standard disk has a 7,200 RPM
                                                      7.1
                                                               6.7
2x speedup in 30 years is negligible –                                 6.1
effectively disk has become slower from the                                    5.8 5.6
CPU perspective.
                                                                                               4.2

                                                                                                       3
                                                                                                             2.5
                                                                                                                     2



                                              3,600   4,200    4,500   4,900   5,200   5,400   7,200 10,000 12,000 15,000



                                                              Panda et al. Supercomputing 2009
© 2012 SAP AG. All rights reserved.                                                                                      12
But: Networks are Steadily Getting Faster


Since 1979 we went from 0.01 Gbit/s to up 64               Network Performance (Gbit/s)
Gbit/s a x6400 Speedup                                70
                                                      60
A competitive marketplace                             50
 10 and 40 Gbps Ethernet – originated from network   40
  interconnects                                       30
 40 Gbps QPX InfiniBand – originated from computer   20
  internal bus technology                             10
                                                       0
InfiniBand/Ethernet convergence
   Virtual Protocol Interconnects
   InfiniBand over Ethernet
   RDMA over converged enhanced Ethernet
   Using standard semantics defined by OFED
                                                            Panda et al. Supercomputing 2009
© 2012 SAP AG. All rights reserved.                                                            13
And: Communication Stacks Are Becoming Faster


Network stack deficiencies
  Application / OS context switches
  Intermediate buffer copies
  Transport processing

RDMA OFED Verbs API provides
  Zero copy
  Offloading TCP to NIC using RoCE
  Flexibility to use IB, GE or IWARP

Resulting in
  Reduced latency
  Processor offloading
  Operational flexibility

© 2012 SAP AG. All rights reserved.             14
Benchmarking Modern Interconnects


Intel MPI benchmark (IMP)                                                   Broadcast latency
Used typically in HPC and parallel computing

Comparing:
4x DDR IB using Verbs API
10 GE TOE (TCP offloading engine) iWARP
1 GE
                                                                         Exchange bandwidth
Measured latencies
IB  2 us
10 GE 8.23 us
1 GE  46.52 us
                                               Source: Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet, IBM

© 2012 SAP AG. All rights reserved.                                                                                               15
Conclusion: Remote Nodes Have Gotten Closer


Interconnects have become much faster

Fast interconnects have become a commodity
and are moving out of the High Performance
Computing (HPC) niche

IB latency 2000 ns is only 20x slower than
RAM and is 100x faster than SSD

Remote page faulting is much faster than
traditional disk backed page swapping!




                                             HANA Performance Analysis, Chaim Bendelac, 2011
© 2012 SAP AG. All rights reserved.                                                            16
Result: Blurring of the physical node boundaries




                                         2,000 ns              10,000,000 ns

  0 ns



                                100 ns




                                                    10,000,000 ns



© 2012 SAP AG. All rights reserved.                                            17
Live Migration
Pretext to Hecatonchire
Enabling Live Migration of SAP Workloads


Business Problem
 Typical SAP workloads (e.g. SAP ERP) are transactional,
  large (possibly 64 GB), with a fast rate of memory writes.
 Classic live migration fails for such workloads as rapid
  memory writes cause memory pages to be re-sent over
  and over again

Hecatonchire’s Solution
 Enable live migration by reducing both the number of
  pages re-sent and the cost of a page re-send
 Non intrusive reducing downtime, service degradation, and
  total migration time




© 2012 SAP AG. All rights reserved.                            19
Live Migration Technique


                  Pre-migration process


                    Reservation process

                                             •   Suspend on host A
                                                 VM activeVM on host A
                                             •   Activate on host in
                                                 Copy dirty pagesB successive
                        Iterative pre-copy   •   Redirect network traffic
                                                 Initialize container on target
                                                 Destination host selected host
                                             •   VM state
                                                 rounds on host A released
                                             •   Synch devices mirrored)
                                                 (Block remaining state

                            Stop and copy


                              Commitment

© 2012 SAP AG. All rights reserved.                                               20
Pre-copy live migration


Reducing number of page re-sends
Page LRU Reordering such that pages which
have a high chance of being re-dirtied soon are
delayed until later

Reducing the cost of a page re-send
By using XBZRLE delta encoder we can much
more efficiently represent page changes




© 2012 SAP AG. All rights reserved.               21
More Than One Way to Live Migrate …
                                                                                     Iterative                            Stop
 Pre-Copy Live-                        Pre-migrate;
                                                                                    Pre-copy X                            and
                                                                                                                                             Commit
   Migration                           Reservation
                                                                                     Rounds                               Copy

                                                                     Live on A                                     Downtime                       Live on B

                                                                             Total Migration Time




                                                                             Stop                Page Pushing
Post-Copy Live-                                       Pre-migrate;
                                                                             and                       1
                                                                                                                    Commit                       Commit
  Migration                                           Reservation
                                                                             Copy                   Round

                                          Live on A                       Downtime          Degraded on B                            Live on B

                                                                        Total Migration Time



                                                                             Iterative
                                                                                                            Stop              Page Pushing
Hybrid Post-Copy                                      Pre-migrate;           Pre-Copy
                                                                                                            and                     1
                                                                                                                                                         Commit
 Live-Migration                                       Reservation                X
                                                                                                            Copy                 Round
                                                                              Rounds
                                                       Live on A                                        Downtime       Degraded on B         Live on B

                                                                                         Total Migration Time




 © 2012 SAP AG. All rights reserved.                                                                                                                              22
Post-copy live migration using fast interconnects


In Post-copy live migration the state of the VM
is transferred to the destination and activated
before memory is transferred

Post-copy implementation includes
 Handling of remote page faults
 Background transfer of memory pages

Service degradation mitigated by
   RDMA zero-copy interconnects
   Pre-paging – similar in concept to pre-fetching
   Hybrid Post Copy – begins with a pre-copy phase
   MMU integration – eliminating need for VM pause



© 2012 SAP AG. All rights reserved.                   23
Demo
Memory Aggregation
In the oven …
The Memory Cloud
Turns memory into a distributed memory service



                                         Server
                                       Server 1      Server
                                                   Server 2      Server
                                                               Server 3
                                      Server1 1
                                        VM        Server2 2
                                                    VM        Server3 3
                                                                VM
  Applications                          App         App         App


         Memory                         RAM         RAM         RAM


         Storage




          Breaks memory                                     Yields double digit        Transparent
       from the bounds of the                             percentage gains in IT     deployment with
            physical box                                        economics          performance at scale
                                                                                      and Reliability
© 2012 SAP AG. All rights reserved.                                                                       26
RRAIM : Remote Redundant Array of Inexpensive Memory
Supporting Large Memory Instances On-Demand

Business Problem
                                                                            RAIM Solution
 Current instance memory sizes are constrained by physical hosts’
  memory size ( Amazon Biggest VM occupy the whole physical host)
 Heavy swap usage slows execution time for data intensive applications


Hecatonchire Solution                                                                     VM swaps to memory
                                                                          Application           Cloud
 Access remote DRAM via fast interconnects zero-copy RDMA
 Hide remote DRAM latency by using page pre-pushing
                                                                             RAM        Memory Cloud
 MMU Integration for transparency for applications and VMs
 Reliability by using a RAID-1 (mirroring) like schema



Hecatonchire Value Proposition                                                          Compression / De-
                                                                                        duplication / N-tiers
 Provide memory aggregation on-demand                                                   storage / HR-HA
 Totally transparent to workload (no integration needed)
 No hardware investment! No dedicated servers!

 © 2012 SAP AG. All rights reserved.                                                                            27
Hecatonchire / RRAIM: Breakthrough Capability
Breaking the memory box barrier for memory intensive applications

                                                        L1 cache
                               10 μsec                         L2 cache
                                                                   DRAM
                Access Speed
                                        100 μsec
                                    1 μsec




                                                                          SSD
                               1 msec




                                                                                                                          Performance




                                                                                                              Networked
                                                   Embedded
                                                   Resources




                                                                          Resources




                                                                                                              Resources
                                                                                                                          Barrier
                                                                                            Local Disk
                                10 msec




                                                                          Local
                                                                                                                               NAS
                                                                                                                               SAN


                                                       MB                   GB                           TB                             PB
                                                                                      Capacity


© 2012 SAP AG. All rights reserved.                                                                                                          28
Lego Cloud Architecture ( Memory block)

                    Memory VM                   Compute VM                        Combination VM
                      Memory Host               Memory Guest                      Memory Guest & Host




                             RAM          App                    RAM        App
                                                                                                   Memory
                                                                                                   memory
                             VM           VM                           VM                           Cloud


                                                                                                   RRAIM




                                                                                     Memory Cloud Management
                                                                                             Services
                                       Many Physical Nodes
                                      Hosting a variety of VMs



© 2012 SAP AG. All rights reserved.                                                                            29
Instant Flash Cloning On-Demand


Business Problem
 Burst load / service usage that cannot be satisfied in time

Existing solutions
 Vendors: Amazon / VMWare/ rightscale
 Startup VM from disk image
 Requires full VM OS startup sequence

Hecatonchire Solution
 Using a paused VM as source for Copy-on-Write (CoW)
 We perform a Post-Copy Live Migration

Hecatonchire Value Proposition
 Just in time (sub-second) provisioning


© 2012 SAP AG. All rights reserved.                             30
Instant Flash Cloning On-Demand


We can clone VMs to meet demand much faster
than other solutions

Reducing infrastructure costs while still minimizing
lost opportunities => Just in time provisioning

Requires Application Integration
 We track OS/application metrics in running VMs or in Load
  Balancer (LB)
 Alerts are defined if metrics pass a pre-define threshold
 According to alerts we can scale-up adding more resources
  or scale-down to save on resources not utilized




                                                              Amazon Web Services - Guide
© 2012 SAP AG. All rights reserved.                                                         31
Compute Aggregation
Our next challenge
Cost Effective “Small” HPC Grid

High Performance Computing (HPC)
 Supercomputers at the frontline of processing speeds 10k-100k core
 Typical benchmark: Grid 500 (Linear Algebra)
 Small HPC using 10-20 commodity (2 TB / 80 core) nodes


Typical Applications
 Relational Databases
 Analytics tasks (Linear Algebra)
 Simulations


Hecatonchire Value Proposition
 Optimal price / performance by using commodity hardware
 Operational flexibility: node downtime without downing the cluster
 Seamless deployment within existing cloud




 © 2012 SAP AG. All rights reserved.                                   33
Distributed Shared Memory (DSM)
Traditional cluster                   ccNUMA
Distributed memory                    Cache coherent shared memory
Standard interconnects                Fast interconnects
OS instance on each node              One OS instance
Distribution handled by application   Distribution handled by hardware
                                      Vendors: ScaleMP, Numascale, others




© 2012 SAP AG. All rights reserved.                                         34
Distributed Shared Memory – Inherent Limitations

Linux provides NUMA topology discovery
 Distance between compute cores
 Distance between cores to memory


While the Linux OS is aware of the NUMA
layout the application may not be aware …

Cache-coherency may get very expensive
 Inter-core: L3 Cache 20 ns
 Inter-socket: Main Memory 100 ns
 Inter-node (IB): Remote Memory 2,000 ns


Thus the ccNUMA architecture many not
“really” be transparent to the application!




 © 2012 SAP AG. All rights reserved.               35
Summary
Roadmap
                         • Live Migration
                           • Pre-copy XBZRLE Delta Encoding
                           • Pre-copy LRU page reordering
         2011              • Post-copy using RDMA interconnects


                         • Resource Aggregation
                           • Cloud Management Integration
                           • Memory Aggregation – RAIM (Redundant Array of Inexpensive Memory)
         2012              • I/O Aggregation – vRAID (virtual Redundant Array of Inexpensive Disks)
                         • Flash cloning



                         • Lego Landscape
                           • CPU Aggregation - ccNUMA
         2013              • Flexible resource management




© 2012 SAP AG. All rights reserved.                                                                   37
Key takeaways


Hecatonchire extends standard Linux stack requiring
standard commodity hardware

With Hecatonchire unmodified applications or VMs can
tape into remote resources tranparently

To be released as open source under GPLv2 and LGPL
licenses to Qemu and Linux communities

Developed by SAP Research TI




© 2012 SAP AG. All rights reserved.                    38
Thank you
Contact information:

Benoit Hudzia; Sr. Researcher;     Hecatonchire Wiki
SAP Research CEC Belfast
benoit.hudzia@sap.com                 https://wiki.wdf.sap.corp/wiki/display/cecbelfast/Hecatonc
                                       hire%2C++Distributed+Shared+Memory+%28DSM%29+
                                       And+Datacenter+Resources+disaggregation+for+Cloud
Aidan Shribman; Sr. Researcher;
SAP Research Israel
aidan.Shribman@sap.com
Appendix
Linux Kernel Virtual Machine (KVM)


Released as a Linux Kernel Module (LKM)
under GPLv2 license in 2007 by Qumranet

Full virtualization via Intel VT-x and AMD-V
virtualization extensions to the x86 instruction
set

Uses Qemu for invoking KVM, for handling of
I/O and for advanced capabilities such as VM
live migration

KVM considered the primary hypervisor on
most major Linux distributions such as
RedHat and SuSE


© 2012 SAP AG. All rights reserved.                41
Remote Page Faulting Architecture Comparison


Hecatonchire                                        Yobusame
No context switches                                 Context switches into user mode
Zero-copy                                           Use standard TCP/IP transport
Use iWarp RDMA




                 Hudzia and Shribman, SYSTOR 2012         Horofuchi and Yamahata, KVM Forum 2011
© 2012 SAP AG. All rights reserved.                                                                42
Legal Disclaimer

The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of
SAP. This presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAP
has no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or
release any functionality mentioned therein. This document, or any related presentation and SAP's strategy and possible future
developments, products and or platforms directions and functionality are all subject to change and may be changed by SAP at
any time for any reason without notice. The information on this document is not a commitment, promise or legal obligation to
deliver any material, code or functionality. This document is provided without a warranty of any kind, either express or implied,
including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. This
document is for informational purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or
omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent.
All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially
from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as
of their dates, and they should not be relied upon in making purchasing decisions.




© 2012 SAP AG. All rights reserved.                                                                                              43

Más contenido relacionado

La actualidad más candente

Windows azure uk universities overview march 2012
Windows azure uk universities overview march 2012Windows azure uk universities overview march 2012
Windows azure uk universities overview march 2012Lee Stott
 
Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...
Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...
Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...EMC Forum India
 
Network Configuration Example: Configuring IS-IS Dual Stacking of IPv4 and IP...
Network Configuration Example: Configuring IS-IS Dual Stacking of IPv4 and IP...Network Configuration Example: Configuring IS-IS Dual Stacking of IPv4 and IP...
Network Configuration Example: Configuring IS-IS Dual Stacking of IPv4 and IP...Juniper Networks
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.
 
2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadisdandre
 
Innovations in Apache Hadoop MapReduce Pig Hive for Improving Query Performance
Innovations in Apache Hadoop MapReduce Pig Hive for Improving Query PerformanceInnovations in Apache Hadoop MapReduce Pig Hive for Improving Query Performance
Innovations in Apache Hadoop MapReduce Pig Hive for Improving Query PerformanceDataWorks Summit
 
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure PlatformVitor Tomaz
 
[NHN] 성공적인 소셜게임 런칭과 기술
[NHN] 성공적인 소셜게임 런칭과 기술[NHN] 성공적인 소셜게임 런칭과 기술
[NHN] 성공적인 소셜게임 런칭과 기술GAMENEXT Works
 
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developerVitor Tomaz
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedKorea Sdec
 
Next Gen Datacenter
Next Gen DatacenterNext Gen Datacenter
Next Gen DatacenterRui Lopes
 
Dell high density GPU solution
Dell high density GPU solutionDell high density GPU solution
Dell high density GPU solutionClayton Li
 
DB 11g R2 Keynote: Consolidate On Low Cost Server And Storage Grids
DB 11g R2 Keynote: Consolidate On Low Cost Server And Storage GridsDB 11g R2 Keynote: Consolidate On Low Cost Server And Storage Grids
DB 11g R2 Keynote: Consolidate On Low Cost Server And Storage GridsLuís Ganhão
 
Partitioning CCGrid 2012
Partitioning CCGrid 2012Partitioning CCGrid 2012
Partitioning CCGrid 2012Weiwei Chen
 
Ibm blade center_foundation_for_cloud_seller_presentation
Ibm blade center_foundation_for_cloud_seller_presentationIbm blade center_foundation_for_cloud_seller_presentation
Ibm blade center_foundation_for_cloud_seller_presentationIBM India Smarter Computing
 

La actualidad más candente (20)

Qf deck
Qf deckQf deck
Qf deck
 
Windows azure uk universities overview march 2012
Windows azure uk universities overview march 2012Windows azure uk universities overview march 2012
Windows azure uk universities overview march 2012
 
Ta3
Ta3Ta3
Ta3
 
Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...
Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...
Track 1, Session 3 - intelligent infrastructure for the virtualized world by ...
 
Network Configuration Example: Configuring IS-IS Dual Stacking of IPv4 and IP...
Network Configuration Example: Configuring IS-IS Dual Stacking of IPv4 and IP...Network Configuration Example: Configuring IS-IS Dual Stacking of IPv4 and IP...
Network Configuration Example: Configuring IS-IS Dual Stacking of IPv4 and IP...
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
 
2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis
 
27ian2011 hp
27ian2011   hp27ian2011   hp
27ian2011 hp
 
Innovations in Apache Hadoop MapReduce Pig Hive for Improving Query Performance
Innovations in Apache Hadoop MapReduce Pig Hive for Improving Query PerformanceInnovations in Apache Hadoop MapReduce Pig Hive for Improving Query Performance
Innovations in Apache Hadoop MapReduce Pig Hive for Improving Query Performance
 
Ibm power7
Ibm power7Ibm power7
Ibm power7
 
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
[.Net Juniors Academy] Introdução ao Cloud Computing e Windows Azure Platform
 
Performance in a virtualized environment
Performance in a virtualized environmentPerformance in a virtualized environment
Performance in a virtualized environment
 
[NHN] 성공적인 소셜게임 런칭과 기술
[NHN] 성공적인 소셜게임 런칭과 기술[NHN] 성공적인 소셜게임 런칭과 기술
[NHN] 성공적인 소셜게임 런칭과 기술
 
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
[AzurePT] Desenvolvimento para o Windows Azure: Diferença para o developer
 
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speedSDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
 
Next Gen Datacenter
Next Gen DatacenterNext Gen Datacenter
Next Gen Datacenter
 
Dell high density GPU solution
Dell high density GPU solutionDell high density GPU solution
Dell high density GPU solution
 
DB 11g R2 Keynote: Consolidate On Low Cost Server And Storage Grids
DB 11g R2 Keynote: Consolidate On Low Cost Server And Storage GridsDB 11g R2 Keynote: Consolidate On Low Cost Server And Storage Grids
DB 11g R2 Keynote: Consolidate On Low Cost Server And Storage Grids
 
Partitioning CCGrid 2012
Partitioning CCGrid 2012Partitioning CCGrid 2012
Partitioning CCGrid 2012
 
Ibm blade center_foundation_for_cloud_seller_presentation
Ibm blade center_foundation_for_cloud_seller_presentationIbm blade center_foundation_for_cloud_seller_presentation
Ibm blade center_foundation_for_cloud_seller_presentation
 

Similar a Lego Cloud SAP Virtualization Week 2012

SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloudaidanshribman
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopDataWorks Summit
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudCloudera, Inc.
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage Session#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage SessionBrocade
 
IBM Upgrades SVC with Solid State Drives — Achieves Better Storage Utilization
IBM Upgrades SVC with Solid State Drives — Achieves Better Storage UtilizationIBM Upgrades SVC with Solid State Drives — Achieves Better Storage Utilization
IBM Upgrades SVC with Solid State Drives — Achieves Better Storage UtilizationIBM India Smarter Computing
 
IBM Upgrades SVC with Solid State Drives — Achieves Better Storage Utilization
IBM Upgrades SVC with Solid State Drives — Achieves Better Storage UtilizationIBM Upgrades SVC with Solid State Drives — Achieves Better Storage Utilization
IBM Upgrades SVC with Solid State Drives — Achieves Better Storage UtilizationIBM India Smarter Computing
 
Converged Data Center: FCoE, iSCSI and the Future of Storage Networking
Converged Data Center: FCoE, iSCSI and the Future of Storage NetworkingConverged Data Center: FCoE, iSCSI and the Future of Storage Networking
Converged Data Center: FCoE, iSCSI and the Future of Storage NetworkingEMC
 
Open stackbrief happylearning
Open stackbrief happylearningOpen stackbrief happylearning
Open stackbrief happylearningLigong Duan
 
Distributed Block-level Storage Management for OpenStack, by Danile lee
Distributed Block-level Storage Management for OpenStack, by Danile leeDistributed Block-level Storage Management for OpenStack, by Danile lee
Distributed Block-level Storage Management for OpenStack, by Danile leeHui Cheng
 
Danile lee -open stackblocklevelstorage
Danile lee -open stackblocklevelstorageDanile lee -open stackblocklevelstorage
Danile lee -open stackblocklevelstorageOpenCity Community
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese天青 王
 
S108283 svc-storwize-lagos-v1905d
S108283 svc-storwize-lagos-v1905dS108283 svc-storwize-lagos-v1905d
S108283 svc-storwize-lagos-v1905dTony Pearson
 
Accelerating Data Management - Dave Fellinger - RDAP12
Accelerating Data Management - Dave Fellinger - RDAP12 Accelerating Data Management - Dave Fellinger - RDAP12
Accelerating Data Management - Dave Fellinger - RDAP12 ASIS&T
 
IBM Power Systems - enabling cloud solutions
IBM Power Systems - enabling cloud solutionsIBM Power Systems - enabling cloud solutions
IBM Power Systems - enabling cloud solutionsDavid Spurway
 
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...Andrey Korolyov
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAlluxio, Inc.
 

Similar a Lego Cloud SAP Virtualization Week 2012 (20)

SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
 
Hadoop on VMware
Hadoop on VMwareHadoop on VMware
Hadoop on VMware
 
Best Practices for Virtualizing Hadoop
Best Practices for Virtualizing HadoopBest Practices for Virtualizing Hadoop
Best Practices for Virtualizing Hadoop
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in Cloud
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage Session#IBMEdge: Flash Storage Session
#IBMEdge: Flash Storage Session
 
IBM Upgrades SVC with Solid State Drives — Achieves Better Storage Utilization
IBM Upgrades SVC with Solid State Drives — Achieves Better Storage UtilizationIBM Upgrades SVC with Solid State Drives — Achieves Better Storage Utilization
IBM Upgrades SVC with Solid State Drives — Achieves Better Storage Utilization
 
IBM Upgrades SVC with Solid State Drives — Achieves Better Storage Utilization
IBM Upgrades SVC with Solid State Drives — Achieves Better Storage UtilizationIBM Upgrades SVC with Solid State Drives — Achieves Better Storage Utilization
IBM Upgrades SVC with Solid State Drives — Achieves Better Storage Utilization
 
Converged Data Center: FCoE, iSCSI and the Future of Storage Networking
Converged Data Center: FCoE, iSCSI and the Future of Storage NetworkingConverged Data Center: FCoE, iSCSI and the Future of Storage Networking
Converged Data Center: FCoE, iSCSI and the Future of Storage Networking
 
Open stackbrief happylearning
Open stackbrief happylearningOpen stackbrief happylearning
Open stackbrief happylearning
 
Distributed Block-level Storage Management for OpenStack, by Danile lee
Distributed Block-level Storage Management for OpenStack, by Danile leeDistributed Block-level Storage Management for OpenStack, by Danile lee
Distributed Block-level Storage Management for OpenStack, by Danile lee
 
Danile lee -open stackblocklevelstorage
Danile lee -open stackblocklevelstorageDanile lee -open stackblocklevelstorage
Danile lee -open stackblocklevelstorage
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese
 
S108283 svc-storwize-lagos-v1905d
S108283 svc-storwize-lagos-v1905dS108283 svc-storwize-lagos-v1905d
S108283 svc-storwize-lagos-v1905d
 
Accelerating Data Management - Dave Fellinger - RDAP12
Accelerating Data Management - Dave Fellinger - RDAP12 Accelerating Data Management - Dave Fellinger - RDAP12
Accelerating Data Management - Dave Fellinger - RDAP12
 
IBM Power Systems - enabling cloud solutions
IBM Power Systems - enabling cloud solutionsIBM Power Systems - enabling cloud solutions
IBM Power Systems - enabling cloud solutions
 
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
CloudOpen 2013: Developing cloud infrastructure: from scratch: the tale of an...
 
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and StorageAccelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
 

Lego Cloud SAP Virtualization Week 2012

  • 1. TRND04 The Lego Cloud Benoit Hudzia Sr. Researcher; SAP Research CEC Belfast Aidan Shribman Sr. Researcher; SAP Research Israel
  • 2. Agenda Introduction Hardware Trends Live Migration Memory Aggregation Compute Aggregation Summary © 2012 SAP AG. All rights reserved. 2
  • 4. Evolution of Virtualization Resources Disaggregation (True utility Cloud) Flexible Resources Management Basic (Cloud) Consolidation No virtualization © 2012 SAP AG. All rights reserved. 4
  • 5. Why Disaggregate Resources? Better Performance Replacing slow local devices (e.g. disk) with fast remote devices (e.g. DRAM). Many remote devices working in parallel (e.g. DRAM, disk, compute) Superior Scalability Going beyond boundaries of the single node Improved Economics Do more with existing hardware Reach better hardware utilization levels © 2012 SAP AG. All rights reserved. 5
  • 6. The Hecatonchires Project Hecatonchires “Hundred Headed One” Original idea: provide Distributed Shared Memory (DSM) capabilities to the cloud Strategic goal : full resource liberation brought to the cloud by:  Providing more resource flexibility to current cloud paradigm by breaking down nodes to their basic elements (CPU, Memory, I/O)  Extend existing cloud software stack (KVM, Qemu, libvirt, OpenStack) without degrading any existing capabilities.  Using commodity cloud hardware: medium sized hosts (typically 64 GB and 8/16 cores), and standard interconnects (such as 1 Gigabit or 10 GE). Initiated by Benoit Hudzia in 2011. Currently developed by two small teams of researchers from the TI Practice located in Belfast and Ra’anana © 2012 SAP AG. All rights reserved. 6
  • 7. High Level Architecture Guests No special HW required but RDMA enabled NICs which support the low overhead low latency communication layer VM VM VM App App App VMs are not bounded by host size anymore as OS OS VM OS H/W resources such as memory, I/O and compute Ap p OS H/W can be aggregated H/W H/W Different sized VMs can share infrastructure so we can still support the smaller VMs not Server #1 Server #2 Server #n requiring dedicated hosts CPUs CPUs CPUs Memory Memory Memory Application stack runs unmodified I/O I/O I/O Fast RDMA Communication © 2012 SAP AG. All rights reserved. 7
  • 8. The Team - Panoramic View © 2012 SAP AG. All rights reserved. 8
  • 9. Hardware Trends Are hosts getting closer?
  • 10. CPUs stopped getting faster Moore’s law prevailed until 2003 when core’s speed hit a practical limit of about 3.4 Ghz In data center core are even slower running at 2.0 - 2.8 Ghz for to power conservation reasons Since 2000 you do get more cores – but that does not effect compute cycle and compute instruction latencies Effectively arbitrary sequential algorithms have not gotten faster since Source: http://www.intel.com/pressroom/kits/quickrefyr.htm © 2012 SAP AG. All rights reserved. 10
  • 11. DRAM latency has remained constant CPU clock speed and memory bandwidth increased steadily (at least until 2000) But memory latency remained constant – so local memory has gotten slower from the CPU perspective Source: J. Karstens: In-Memory Technology at SAP. DKOM 2010 © 2012 SAP AG. All rights reserved. 11
  • 12. Disk latency has virtually not improved 1980s standard disk has a 3,600 RPM Average Latency (ms) 8.3 2010s standard disk has a 7,200 RPM 7.1 6.7 2x speedup in 30 years is negligible – 6.1 effectively disk has become slower from the 5.8 5.6 CPU perspective. 4.2 3 2.5 2 3,600 4,200 4,500 4,900 5,200 5,400 7,200 10,000 12,000 15,000 Panda et al. Supercomputing 2009 © 2012 SAP AG. All rights reserved. 12
  • 13. But: Networks are Steadily Getting Faster Since 1979 we went from 0.01 Gbit/s to up 64 Network Performance (Gbit/s) Gbit/s a x6400 Speedup 70 60 A competitive marketplace 50  10 and 40 Gbps Ethernet – originated from network 40 interconnects 30  40 Gbps QPX InfiniBand – originated from computer 20 internal bus technology 10 0 InfiniBand/Ethernet convergence  Virtual Protocol Interconnects  InfiniBand over Ethernet  RDMA over converged enhanced Ethernet  Using standard semantics defined by OFED Panda et al. Supercomputing 2009 © 2012 SAP AG. All rights reserved. 13
  • 14. And: Communication Stacks Are Becoming Faster Network stack deficiencies  Application / OS context switches  Intermediate buffer copies  Transport processing RDMA OFED Verbs API provides  Zero copy  Offloading TCP to NIC using RoCE  Flexibility to use IB, GE or IWARP Resulting in  Reduced latency  Processor offloading  Operational flexibility © 2012 SAP AG. All rights reserved. 14
  • 15. Benchmarking Modern Interconnects Intel MPI benchmark (IMP) Broadcast latency Used typically in HPC and parallel computing Comparing: 4x DDR IB using Verbs API 10 GE TOE (TCP offloading engine) iWARP 1 GE Exchange bandwidth Measured latencies IB  2 us 10 GE 8.23 us 1 GE  46.52 us Source: Performance of HPC Applications over InfiniBand, 10 Gb and 1 Gb Ethernet, IBM © 2012 SAP AG. All rights reserved. 15
  • 16. Conclusion: Remote Nodes Have Gotten Closer Interconnects have become much faster Fast interconnects have become a commodity and are moving out of the High Performance Computing (HPC) niche IB latency 2000 ns is only 20x slower than RAM and is 100x faster than SSD Remote page faulting is much faster than traditional disk backed page swapping! HANA Performance Analysis, Chaim Bendelac, 2011 © 2012 SAP AG. All rights reserved. 16
  • 17. Result: Blurring of the physical node boundaries 2,000 ns 10,000,000 ns 0 ns 100 ns 10,000,000 ns © 2012 SAP AG. All rights reserved. 17
  • 19. Enabling Live Migration of SAP Workloads Business Problem  Typical SAP workloads (e.g. SAP ERP) are transactional, large (possibly 64 GB), with a fast rate of memory writes.  Classic live migration fails for such workloads as rapid memory writes cause memory pages to be re-sent over and over again Hecatonchire’s Solution  Enable live migration by reducing both the number of pages re-sent and the cost of a page re-send  Non intrusive reducing downtime, service degradation, and total migration time © 2012 SAP AG. All rights reserved. 19
  • 20. Live Migration Technique Pre-migration process Reservation process • Suspend on host A VM activeVM on host A • Activate on host in Copy dirty pagesB successive Iterative pre-copy • Redirect network traffic Initialize container on target Destination host selected host • VM state rounds on host A released • Synch devices mirrored) (Block remaining state Stop and copy Commitment © 2012 SAP AG. All rights reserved. 20
  • 21. Pre-copy live migration Reducing number of page re-sends Page LRU Reordering such that pages which have a high chance of being re-dirtied soon are delayed until later Reducing the cost of a page re-send By using XBZRLE delta encoder we can much more efficiently represent page changes © 2012 SAP AG. All rights reserved. 21
  • 22. More Than One Way to Live Migrate … Iterative Stop Pre-Copy Live- Pre-migrate; Pre-copy X and Commit Migration Reservation Rounds Copy Live on A Downtime Live on B Total Migration Time Stop Page Pushing Post-Copy Live- Pre-migrate; and 1 Commit Commit Migration Reservation Copy Round Live on A Downtime Degraded on B Live on B Total Migration Time Iterative Stop Page Pushing Hybrid Post-Copy Pre-migrate; Pre-Copy and 1 Commit Live-Migration Reservation X Copy Round Rounds Live on A Downtime Degraded on B Live on B Total Migration Time © 2012 SAP AG. All rights reserved. 22
  • 23. Post-copy live migration using fast interconnects In Post-copy live migration the state of the VM is transferred to the destination and activated before memory is transferred Post-copy implementation includes  Handling of remote page faults  Background transfer of memory pages Service degradation mitigated by  RDMA zero-copy interconnects  Pre-paging – similar in concept to pre-fetching  Hybrid Post Copy – begins with a pre-copy phase  MMU integration – eliminating need for VM pause © 2012 SAP AG. All rights reserved. 23
  • 24. Demo
  • 26. The Memory Cloud Turns memory into a distributed memory service Server Server 1 Server Server 2 Server Server 3 Server1 1 VM Server2 2 VM Server3 3 VM Applications App App App Memory RAM RAM RAM Storage Breaks memory Yields double digit Transparent from the bounds of the percentage gains in IT deployment with physical box economics performance at scale and Reliability © 2012 SAP AG. All rights reserved. 26
  • 27. RRAIM : Remote Redundant Array of Inexpensive Memory Supporting Large Memory Instances On-Demand Business Problem RAIM Solution  Current instance memory sizes are constrained by physical hosts’ memory size ( Amazon Biggest VM occupy the whole physical host)  Heavy swap usage slows execution time for data intensive applications Hecatonchire Solution VM swaps to memory Application Cloud  Access remote DRAM via fast interconnects zero-copy RDMA  Hide remote DRAM latency by using page pre-pushing RAM Memory Cloud  MMU Integration for transparency for applications and VMs  Reliability by using a RAID-1 (mirroring) like schema Hecatonchire Value Proposition Compression / De- duplication / N-tiers  Provide memory aggregation on-demand storage / HR-HA  Totally transparent to workload (no integration needed)  No hardware investment! No dedicated servers! © 2012 SAP AG. All rights reserved. 27
  • 28. Hecatonchire / RRAIM: Breakthrough Capability Breaking the memory box barrier for memory intensive applications L1 cache 10 μsec L2 cache DRAM Access Speed 100 μsec 1 μsec SSD 1 msec Performance Networked Embedded Resources Resources Resources Barrier Local Disk 10 msec Local NAS SAN MB GB TB PB Capacity © 2012 SAP AG. All rights reserved. 28
  • 29. Lego Cloud Architecture ( Memory block) Memory VM Compute VM Combination VM Memory Host Memory Guest Memory Guest & Host RAM App RAM App Memory memory VM VM VM Cloud RRAIM Memory Cloud Management Services Many Physical Nodes Hosting a variety of VMs © 2012 SAP AG. All rights reserved. 29
  • 30. Instant Flash Cloning On-Demand Business Problem  Burst load / service usage that cannot be satisfied in time Existing solutions  Vendors: Amazon / VMWare/ rightscale  Startup VM from disk image  Requires full VM OS startup sequence Hecatonchire Solution  Using a paused VM as source for Copy-on-Write (CoW)  We perform a Post-Copy Live Migration Hecatonchire Value Proposition  Just in time (sub-second) provisioning © 2012 SAP AG. All rights reserved. 30
  • 31. Instant Flash Cloning On-Demand We can clone VMs to meet demand much faster than other solutions Reducing infrastructure costs while still minimizing lost opportunities => Just in time provisioning Requires Application Integration  We track OS/application metrics in running VMs or in Load Balancer (LB)  Alerts are defined if metrics pass a pre-define threshold  According to alerts we can scale-up adding more resources or scale-down to save on resources not utilized Amazon Web Services - Guide © 2012 SAP AG. All rights reserved. 31
  • 33. Cost Effective “Small” HPC Grid High Performance Computing (HPC)  Supercomputers at the frontline of processing speeds 10k-100k core  Typical benchmark: Grid 500 (Linear Algebra)  Small HPC using 10-20 commodity (2 TB / 80 core) nodes Typical Applications  Relational Databases  Analytics tasks (Linear Algebra)  Simulations Hecatonchire Value Proposition  Optimal price / performance by using commodity hardware  Operational flexibility: node downtime without downing the cluster  Seamless deployment within existing cloud © 2012 SAP AG. All rights reserved. 33
  • 34. Distributed Shared Memory (DSM) Traditional cluster ccNUMA Distributed memory Cache coherent shared memory Standard interconnects Fast interconnects OS instance on each node One OS instance Distribution handled by application Distribution handled by hardware Vendors: ScaleMP, Numascale, others © 2012 SAP AG. All rights reserved. 34
  • 35. Distributed Shared Memory – Inherent Limitations Linux provides NUMA topology discovery  Distance between compute cores  Distance between cores to memory While the Linux OS is aware of the NUMA layout the application may not be aware … Cache-coherency may get very expensive  Inter-core: L3 Cache 20 ns  Inter-socket: Main Memory 100 ns  Inter-node (IB): Remote Memory 2,000 ns Thus the ccNUMA architecture many not “really” be transparent to the application! © 2012 SAP AG. All rights reserved. 35
  • 37. Roadmap • Live Migration • Pre-copy XBZRLE Delta Encoding • Pre-copy LRU page reordering 2011 • Post-copy using RDMA interconnects • Resource Aggregation • Cloud Management Integration • Memory Aggregation – RAIM (Redundant Array of Inexpensive Memory) 2012 • I/O Aggregation – vRAID (virtual Redundant Array of Inexpensive Disks) • Flash cloning • Lego Landscape • CPU Aggregation - ccNUMA 2013 • Flexible resource management © 2012 SAP AG. All rights reserved. 37
  • 38. Key takeaways Hecatonchire extends standard Linux stack requiring standard commodity hardware With Hecatonchire unmodified applications or VMs can tape into remote resources tranparently To be released as open source under GPLv2 and LGPL licenses to Qemu and Linux communities Developed by SAP Research TI © 2012 SAP AG. All rights reserved. 38
  • 39. Thank you Contact information: Benoit Hudzia; Sr. Researcher;  Hecatonchire Wiki SAP Research CEC Belfast benoit.hudzia@sap.com  https://wiki.wdf.sap.corp/wiki/display/cecbelfast/Hecatonc hire%2C++Distributed+Shared+Memory+%28DSM%29+ And+Datacenter+Resources+disaggregation+for+Cloud Aidan Shribman; Sr. Researcher; SAP Research Israel aidan.Shribman@sap.com
  • 41. Linux Kernel Virtual Machine (KVM) Released as a Linux Kernel Module (LKM) under GPLv2 license in 2007 by Qumranet Full virtualization via Intel VT-x and AMD-V virtualization extensions to the x86 instruction set Uses Qemu for invoking KVM, for handling of I/O and for advanced capabilities such as VM live migration KVM considered the primary hypervisor on most major Linux distributions such as RedHat and SuSE © 2012 SAP AG. All rights reserved. 41
  • 42. Remote Page Faulting Architecture Comparison Hecatonchire Yobusame No context switches Context switches into user mode Zero-copy Use standard TCP/IP transport Use iWarp RDMA Hudzia and Shribman, SYSTOR 2012 Horofuchi and Yamahata, KVM Forum 2011 © 2012 SAP AG. All rights reserved. 42
  • 43. Legal Disclaimer The information in this presentation is confidential and proprietary to SAP and may not be disclosed without the permission of SAP. This presentation is not subject to your license agreement or any other service or subscription agreement with SAP. SAP has no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation and SAP's strategy and possible future developments, products and or platforms directions and functionality are all subject to change and may be changed by SAP at any time for any reason without notice. The information on this document is not a commitment, promise or legal obligation to deliver any material, code or functionality. This document is provided without a warranty of any kind, either express or implied, including but not limited to, the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. This document is for informational purposes and may not be incorporated into a contract. SAP assumes no responsibility for errors or omissions in this document, except if such damages were caused by SAP intentionally or grossly negligent. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions. © 2012 SAP AG. All rights reserved. 43