SlideShare una empresa de Scribd logo
1 de 37
IBM Research

Application Live Migration
in LAN/WAN Environment
Mahendra Kutare
Summer Intern (Georgia Tech)
Mentor – Sai Zeng
Manager – Milind Naphade, Chitra Dorai

1

IBM

© 2008 IBM Corporation
IBM Research

Talk Outline








2

Live Migration Overview
Live Migration in LAN
Live Migration in WAN
Gap Analysis for WAN
Live Migration Road Map
Live Migration Benchmarking for WAN
Future Work

IBM

© 2008 IBM Corporation
IBM Research

Live Migration Overview


Live migration





Motivation






3

Movement of a running virtual machine from one physical host to another
Goal: Minimize the response delays to end users
Maintenance and upgrade
Load balancing
Server consolidation
Business acquisition, outsourcing
High availability

IBM

© 2008 IBM Corporation
IBM Research

Live Migration Common Steps
Source Validation
Pre Migration

Destination Validation
Disk Copy

Source Machine Selection
Source VM Selection
Destination Machine Selection
Block Image Copy
Iterative Pre Copy, Suspend and Copy

Migration

Memory Copy

Network

Demand Paging
Network Traffic Redirect
Destination VM Activation

Destination VM
Validation

Destination Active VM Validation
Destination Local Device Connection

Post Migration

Destination VM Operation Resumption
Source VM Commit

Migration Complete
Discard Original VM

4

IBM

© 2008 IBM Corporation
IBM Research

Live Migration in LAN


Overview




5

Targeted at cluster environment
Top players – VMware, Xen provide LAN migration capabilities
Different migration phases and their impact

IBM

© 2008 IBM Corporation
IBM Research

Evaluation of Current Live Migration Solutions in LAN


VMware






Xen





Power 6 processors provide partition mobility support.
Disruption time – SAP demo less than a sec

Hyper – V




6

Command line tools – Xend, Xm
Least disruption time – Less than 250 ms

IBM System p





Suite of products to enable and support live migration.
VMotion, Storage VMotion, DRS, HA
Disruption time - Less than a sec

IBM

“Quick” motion – Clustering failover solution
No real time “live” migration
Disruption time – around 8 sec
© 2008 IBM Corporation
IBM Research

VMotion Live Migration Under The Hood
ResumePre-copy memory from esx01 to esx02, Page”
Virtual machine is machineonon esx01 and
Suspend the virtual machineand “Demand
virtual machine running from esx01
Delete virtual on esx02 esx01 esx01
“Background Page” server
Copy the current configuration file to esx02
from until all memoryto bebeen Completed… copied
esx01 copy memory accesses modified memory
with and about has bitmap to esx02
VMotion system moved toto a bitmap
ongoing changes logged esx02
when Successfully successfully

Shared
Gigabit
Ethernet
Backplane

7

IBM

Shared LUN
visibility on
the SAN

CPUs of the
same type

© 2008 IBM Corporation
IBM Research

IBM System P – POWER 6 Live Migration – Under the Hood

8

Armstrong, W.L., etl, IBM POWER6 partition mobility: Moving virtual servers seamlessly between physical systems
IBM

© 2008 IBM Corporation
IBM Research

Live Migration Vendor Summary
Memory Phase 
Vendors

Pure Demand
Paging

Iterative Pre Copy
of Updates

All Pages Moved in
First Iteration

IBM

Yes

No

No

VMware

Yes

No

Yes

Xen

No

Yes

Yes

9

IBM

© 2008 IBM Corporation
IBM Research

Live Migration Timeline and Performance Impact
Memory Phases 
Impact

Pure Demand
Paging

Iterative Pre Copy
of Updates

All pages not in first
iteration

Down Time

Less

More

-

Total Migration
Time

More

Less

More

Dedicated
Bandwidth

-

-

No

Performance
Impact After
Migration

More

Less

More

10

IBM

© 2008 IBM Corporation
Memory
IBM Research
Source
Application(S)

Live Migration State Diagram (Rob Strom)
Target

.Virtual App

Common Live Migration Process Under Hood
Live

Application live

Start
Initiate
migration

Precopy

Send page

Pre-cop
ied pag
es

Send page

Start Target

Demand
Paging

Start M
e

Send page

ssage

Delayed
Page

s

11

IBM

ge
t for Pa
Reques e
Don

Application live

ge
t for Pa
Reques

Done

Application(T)

Disruption

Stopped

Machine live

Time

Machine live

Stop Source

© 2008 IBM Corporation
IBM Research

Live Migration in LAN - Constraints
Source and destination hosts must be:
 Part of same datacenter.
Cluster of physical hosts in LAN
environment.
 Connected to same dedicated Gigabit
network
 Connected to the same storage
Shared storage common to both
source and target servers.
 Homogenous environment
E.g. VMware ESX servers must have
compatible CPU models.
System p: POWER6, AIX5.3 and
above

12

IBM

Control Center

VM1
Host B

Host A

Network

Disk 1

Shared SAN

© 2008 IBM Corporation
IBM Research

Live Migration Capability Requirements - WAN


13

Three Requirements
 Consistency: behave as if it is as a single VM, once destination VM
starts, its file system should be consistent to the source VM
 Minimum service disruption: Migration does not degrade
performance significantly as perceived by end user
 Network connection: After IP address change, new connections are
seamlessly redirect to the new IP address at the destination.

IBM

© 2008 IBM Corporation
IBM Research

Live Migration in WAN – Characteristics and Challenges
Source and destination hosts:
 Cannot share the same storage
Need to migration disk over
 IP address different
Network connectivity even if IP
address changes
 Connected to low bandwidth network
with high latency
Have impact on migration policies
 Homogenous environment
E.g. VMware ESX servers must have
compatible CPU models.
System p: POWER6, AIX5.3 and
above

14

IBM

Control Center

IP1

IP2

VM1
Host B

Host A

Network

Disk 1

Disk 2

Disk 2

© 2008 IBM Corporation
IBM Research

Live Migration Capability Gaps - WAN






Network : IP address changes
 Available solutions - Mobile IP and IP tunneling
 Academic effort with focus on Xen
Memory
 Same solution as LAN
Storage
 Possible solutions as replication
 Deal with local persistent state storage
Best solution has poor performance – 68 seconds disruption: no isolation of root
causes.
References





15

Bradford, R., etl, Deutsche Telekom Lab, Live Wide-Area Migration of Virtual Machines Including Local
Persistent State
Ramkrishnan, K.K., etl, AT&T Labs-Research, Live Data Center Migration across WANs: A Robust Cooperative
Context Aware Approach
Clark C., et.al, Live Migration of Virtual Machines
Many more…

IBM

© 2008 IBM Corporation
IBM Research

Live Migration Road Map
 Memory Migration
 Benchmark and evaluate existing technologies for WAN environment
 Iterative Pre copy
 Demand Paging

 Develop technologies leveraging existing solution concepts for WAN
 Storage Migration
 Benchmark and evaluate existing technologies for WAN environment.
 Xen storage migration
 Replication – Synchronous / Asynchronous

 Develop technologies leveraging existing solution concepts for WAN

16

IBM

© 2008 IBM Corporation
IBM Research

Live Migration Benchmarking in WAN






17

Focus: Memory Migration
Objective : Establish baselines for new technology development
Overview
 Investigate migration policies (e.g when to suspend source VM)
 Total migration time, disruption time
 Require base line measurement to evaluate future developed technologies
Design
 VM migration without any workload
 VM with self contained workload migration
 VM with web server, simulated backend and load generator
 VM with web server, application server, simulated backend and load
generator

IBM

© 2008 IBM Corporation
IBM Research

Experimental Setup




Redhat Enterprise Linux 5 Server/Xen – Host OS(dom0)
Redhat Enterprise Linux 5 Client – Guest OS (domU)
Sample Workloads






Network Emulator




WANem - WAN characteristics

Monitoring



18

SPECjvm2008
SPECweb2005
IBM Trader 6

IBM

Workload performance
TCP Traces

© 2008 IBM Corporation
IBM Research

Experiment 1 – VM Live Migration without Workload


Goal: Understand correlation between memory and total migration time

Storage
WAN
VM

VM

Hypervisor

Hypervisor

Host A

Host B
Simulated Network

Network
19

IBM

© 2008 IBM Corporation
IBM Research

Experiment 1 Results – Memory vs. Total Migration Time
 Observations:
 Total migration time is linear function of memory
 There is overhead time
Intercept
Memory

Coefficients
1.86102573
0.00919992

Standard Error
t Stat
P-value Lower 95%
0.162785817 11.43236
1E-10 1.52342824
0.000141606 64.96847 1.23E-26 0.00890624

Upper 95%
Lower 95.0%
2.198623212 1.523428242
0.009493588 0.008906243

Upper 95.0%
2.198623212
0.009493588

Memory Line Fit Plot

Total migration time

25
20
15

Total migration time

10

Predicted Total
migration time

5
0
0

500

1000

1500

2000

2500

Memory

20

IBM

© 2008 IBM Corporation
IBM Research

Experiment 2 - VM Live Migration with Workload



Goal: Understand impact of migration on benchmark performance
Setup: VM loaded with SPEC JVM - benchmark to evaluate
performance of JRE
 Low file I/O, no network I/O
Storage
WAN
VM

SPEC
JVM

Hypervisor

Hypervisor

Host A

Host B
Simulated Network

Network
21

IBM

© 2008 IBM Corporation
IBM Research

Experiment 2 Results – Free Heap Size Vs Total Migration
Time
Derby-1Operation-migration

on benchmark performance
 Setup: VM loaded with SPEC JVM benchmark to evaluate performance
of JRE
 Low file I/O, no network I/O
 Observations:
 Performance metrics are not the
best (operations/free heap size)

200000000
free heap size-bytes

 Goal: Understand impact of migration

250000000

150000000

100000000

50000000

0
0

50000

100000

150000

200000

250000

300000

ms

Derby-1operation-1iteration
250000000

free heap size-bytes

200000000

150000000

100000000

50000000

0
0

50000

100000

150000

200000

250000

300000

350000

ms

22

IBM

© 2008 IBM Corporation
IBM Research

Experiment 3 - VM with web server, simulated backend and load generator

…

Client 1

Client n

Prime Client

Host C

Host D

Storage
SPECweb
VM

WAN
Emulator

Hypervisor

Hypervisor

Host A

Host B
Simulated Network

Network
23

IBM

© 2008 IBM Corporation
IBM Research

Experiment 3
 Goal: Understand the relation between service disruption time/total
migration time and network bandwidth and latency
 VM loaded with benchmark tool offering the capabilities of measuring both
SSL and non-SSL request/response performance
 Status:
 Finish SPECweb installation on blade servers and laptop clients
 Setting up of TCP trace analyzing tool for measuring TCP throughput

24

IBM

© 2008 IBM Corporation
IBM Research

Future Work
 Memory Migration
 Benchmarking next steps:
 Test runs measuring TCP throughput over elapsed migration time
 Tests runs measuring total migration/disruption time Vs bandwidth for
fixed latency
 Test runs measuring total migration/disruption time Vs latency for fixed
bandwidth

 Develop technologies leveraging existing solution concepts for WAN
 Storage Migration
 Benchmark and evaluate existing technologies for WAN environment.
 Xen storage migration
 Replication – Synchronous / Asynchronous

 Develop technologies leveraging existing solution concepts for WAN

25

IBM

© 2008 IBM Corporation
IBM Research

26

IBM

© 2008 IBM Corporation
IBM Research

Live Migration Vendor Summary
Pure Demand
Paging

Iterative Pre Copy
of Updates

All Pages Moved in
First Iteration

IBM

Yes

No

No

VMware

Yes

No

Yes

Xen

No

Yes

Yes

27

IBM

© 2008 IBM Corporation
IBM Research

Live Migration Technical Impact Summary
Pure Demand
Paging

Iterative Pre Copy
of Updates

Down Time

Less

More

Total Migration
Time

More

Less

Performance
Impact After
Migration

More

Less

28

IBM

© 2008 IBM Corporation
IBM Research

SPECweb2005 Workload






29

Configured on Apache with PHP support.
Backend Simulator
Apache with PHP and Backend Simulator on 9.2.252.243 (blade server)
listening on port 80 and 81
Clients configured on 9.2.84.87 (laptop)
Characteristics
 Number of workload clients = 1
 Simultaneous session = 5
 Banking workload execution

IBM

© 2008 IBM Corporation
IBM Research

WANem



30

Installed on 9.2.252.243 (blade server) as WANem VM (9.2.252.218)
Characteristics
 Configured to run at various bandwidth and delay values
 1.5 Mbps to 155 Mbps and other..

IBM

© 2008 IBM Corporation
IBM Research

Live Migration in Xen

31

IBM

© 2008 IBM Corporation
IBM Research

Live Migration in WAN

32

IBM

© 2008 IBM Corporation
IBM Research

Live Migration Capability Challenges - WAN






33

Network
 Maintaining connectivity and application dependencies for end
users even if IP address changes
Memory
 Amount of data – Page size
 Network characteristics – No dedicated bandwidth, high latency
Storage
 No shared storage between source and target machines
 Size of storage
 VMs at Source machine have shared database

IBM

© 2008 IBM Corporation
IBM Research

Live Migration Capability Gaps - WAN






Network : IP address changes
 Available solutions - Mobile IP and IP tunneling
 Academic effort with focus on Xen
Memory
 Same solution as LAN
Storage
 Possible solutions as replication
 Deal with local persistent state storage
Best solution has poor performance – 68 seconds disruption: no isolation of root
causes.
References





34

Bradford, R., etl, Deutsche Telekom Lab, Live Wide-Area Migration of Virtual Machines Including Local
Persistent State
Ramkrishnan, K.K., etl, AT&T Labs-Research, Live Data Center Migration across WANs: A Robust Cooperative
Context Aware Approach
Clark C., etl, Live Migration of Virtual Machines
Many more…

IBM

© 2008 IBM Corporation
IBM Research

Live Migration – Why






35

Live migration
 Movement of a running virtual machine (OS, Application, etc., etc.) from
one physical host to another
 Minimizing the response delays to end users
Many reasons to do migration
 Maintenance and upgrade
 Load balancing
 Server consolidation
 Business acquisition, outsourcing
 Shut down of data center
 High availability
 Disaster recovery
 More…
Characteristics
 Frequent vs. non-frequent
 Planned vs. unplanned

IBM

© 2008 IBM Corporation
IBM Research

What is Hyper-V Quick Motion


36

Quick Motion
 Renamed feature of host
clustering (available in 2006)
 Virtual server plus clustering
 Guest virtual machine can fail
over between cluster nodes (for
both planned and unplanned
downtime)
 Offer the ability to quickly move a
virtual machine from physical
server to physical server
 Benchmark: 512Mb virtual
machine can be migrated from
one server to another in about 6
seconds using 1Gb iSCSI

IBM

Copyright of http://blogs.technet.com/daven/archive/
2007/06/08/virtual-server-quick-migration.aspx

© 2008 IBM Corporation
IBM Research

Live Migration – Setting the Stage






37

Live Application Migration Common Processes and Technological
Capabilities: An Overview
Live Application Migration Capability: Gap Analysis - LAN and WAN
Top players considered
 VMware - VMotion
 System p - System p Live Mobility
 Xen – XenMotion
 Microsoft – Hyper-V QuickMotion
Our approach – process centric approach
 Process breakdown: e.g. pre migration, migration, post migration
 Capability mapping to process steps
 Technologies and tools mapping to capabilities
IBM

© 2008 IBM Corporation

Más contenido relacionado

La actualidad más candente

Redesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) MechanismRedesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) Mechanism
The Linux Foundation
 

La actualidad más candente (20)

Virtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A SurveyVirtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A Survey
 
Scheduler Support for Video-oriented Multimedia on Client-side Virtualization
Scheduler Support for Video-oriented Multimedia on Client-side VirtualizationScheduler Support for Video-oriented Multimedia on Client-side Virtualization
Scheduler Support for Video-oriented Multimedia on Client-side Virtualization
 
Xen Memory Management
Xen Memory ManagementXen Memory Management
Xen Memory Management
 
Vm migration techniques
Vm migration techniquesVm migration techniques
Vm migration techniques
 
Xen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization OpportunitiesXen PV Performance Status and Optimization Opportunities
Xen PV Performance Status and Optimization Opportunities
 
XS Boston 2008 XenLoop
XS Boston 2008 XenLoopXS Boston 2008 XenLoop
XS Boston 2008 XenLoop
 
LFCOLLAB15: Xen 4.5 and Beyond
LFCOLLAB15: Xen 4.5 and BeyondLFCOLLAB15: Xen 4.5 and Beyond
LFCOLLAB15: Xen 4.5 and Beyond
 
ppt
pptppt
ppt
 
Redesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) MechanismRedesigning Xen Memory Sharing (Grant) Mechanism
Redesigning Xen Memory Sharing (Grant) Mechanism
 
Xen io
Xen ioXen io
Xen io
 
Performance Tuning Xen
Performance Tuning XenPerformance Tuning Xen
Performance Tuning Xen
 
cloud computing: Vm migration
cloud computing: Vm migrationcloud computing: Vm migration
cloud computing: Vm migration
 
VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series
 
XPDS13: Enabling Fast, Dynamic Network Processing with ClickOS - Joao Martins...
XPDS13: Enabling Fast, Dynamic Network Processing with ClickOS - Joao Martins...XPDS13: Enabling Fast, Dynamic Network Processing with ClickOS - Joao Martins...
XPDS13: Enabling Fast, Dynamic Network Processing with ClickOS - Joao Martins...
 
Nimbus project
Nimbus projectNimbus project
Nimbus project
 
Ian Pratt Nsdi Keynote Apr2008
Ian Pratt Nsdi Keynote Apr2008Ian Pratt Nsdi Keynote Apr2008
Ian Pratt Nsdi Keynote Apr2008
 
BSDcon Asia 2015: Xen on FreeBSD
BSDcon Asia 2015: Xen on FreeBSDBSDcon Asia 2015: Xen on FreeBSD
BSDcon Asia 2015: Xen on FreeBSD
 
Virtual Machine Migration & Hypervisors
Virtual Machine Migration & HypervisorsVirtual Machine Migration & Hypervisors
Virtual Machine Migration & Hypervisors
 
Xen.org Overview 2009
Xen.org Overview 2009Xen.org Overview 2009
Xen.org Overview 2009
 
Advanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtopAdvanced performance troubleshooting using esxtop
Advanced performance troubleshooting using esxtop
 

Destacado (7)

High performance web server
High performance web serverHigh performance web server
High performance web server
 
Remus nsdi08
Remus nsdi08Remus nsdi08
Remus nsdi08
 
RSA Europe 2013 OWASP Training
RSA Europe 2013 OWASP TrainingRSA Europe 2013 OWASP Training
RSA Europe 2013 OWASP Training
 
PyCon US 2012 - Web Server Bottlenecks and Performance Tuning
PyCon US 2012 - Web Server Bottlenecks and Performance TuningPyCon US 2012 - Web Server Bottlenecks and Performance Tuning
PyCon US 2012 - Web Server Bottlenecks and Performance Tuning
 
ConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web appsConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web apps
 
BASIC CONCEPTS OF COMPUTER NETWORKS
BASIC CONCEPTS OF COMPUTER NETWORKS BASIC CONCEPTS OF COMPUTER NETWORKS
BASIC CONCEPTS OF COMPUTER NETWORKS
 
Introduction to computer network
Introduction to computer networkIntroduction to computer network
Introduction to computer network
 

Similar a Application Live Migration in LAN/WAN Environment

Partners Enterprise Caché Unix Migration Public
Partners Enterprise Caché Unix Migration PublicPartners Enterprise Caché Unix Migration Public
Partners Enterprise Caché Unix Migration Public
walterhalvorsen
 
Hyper V In Windows Server 2008 R2.Son Vu
Hyper V In Windows Server 2008 R2.Son VuHyper V In Windows Server 2008 R2.Son Vu
Hyper V In Windows Server 2008 R2.Son Vu
vncson
 
MVUG- From Zero to Hero on Hyper V R2- Part1
MVUG- From Zero to Hero on Hyper V R2- Part1MVUG- From Zero to Hero on Hyper V R2- Part1
MVUG- From Zero to Hero on Hyper V R2- Part1
Lai Yoong Seng
 
08 sdn system intelligence short public beijing sdn conference - 130828
08 sdn system intelligence   short public beijing sdn conference - 13082808 sdn system intelligence   short public beijing sdn conference - 130828
08 sdn system intelligence short public beijing sdn conference - 130828
Mason Mei
 
Windows Server 2008 R2 Hyper V
Windows Server 2008 R2 Hyper VWindows Server 2008 R2 Hyper V
Windows Server 2008 R2 Hyper V
Amit Gatenyo
 
Windows Server 2008 R2 Overview 1225768142880746 9
Windows Server 2008 R2 Overview 1225768142880746 9Windows Server 2008 R2 Overview 1225768142880746 9
Windows Server 2008 R2 Overview 1225768142880746 9
Stephan - Gabriel Georgescu
 
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressdirectCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI Express
Heiko Joerg Schick
 

Similar a Application Live Migration in LAN/WAN Environment (20)

Partners Enterprise Caché Unix Migration Public
Partners Enterprise Caché Unix Migration PublicPartners Enterprise Caché Unix Migration Public
Partners Enterprise Caché Unix Migration Public
 
Emc vspex customer_presentation_private_cloud_v_mware_mid_2.0
Emc vspex customer_presentation_private_cloud_v_mware_mid_2.0Emc vspex customer_presentation_private_cloud_v_mware_mid_2.0
Emc vspex customer_presentation_private_cloud_v_mware_mid_2.0
 
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and VisualizationCell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
Cell/B.E. Servers: A Platform for Real Time Scalable Computing and Visualization
 
Hyper V In Windows Server 2008 R2.Son Vu
Hyper V In Windows Server 2008 R2.Son VuHyper V In Windows Server 2008 R2.Son Vu
Hyper V In Windows Server 2008 R2.Son Vu
 
70-412 Objectives
70-412 Objectives70-412 Objectives
70-412 Objectives
 
MVUG- From Zero to Hero on Hyper V R2- Part1
MVUG- From Zero to Hero on Hyper V R2- Part1MVUG- From Zero to Hero on Hyper V R2- Part1
MVUG- From Zero to Hero on Hyper V R2- Part1
 
WSI35 - WebSphere Extreme Scale Customer Scenarios and Use Cases
WSI35 - WebSphere Extreme Scale Customer Scenarios and Use CasesWSI35 - WebSphere Extreme Scale Customer Scenarios and Use Cases
WSI35 - WebSphere Extreme Scale Customer Scenarios and Use Cases
 
VMware vSphere vMotion: 5.4 times faster than Hyper-V Live Migration
VMware vSphere vMotion: 5.4 times faster than Hyper-V Live MigrationVMware vSphere vMotion: 5.4 times faster than Hyper-V Live Migration
VMware vSphere vMotion: 5.4 times faster than Hyper-V Live Migration
 
prezentációt
prezentációtprezentációt
prezentációt
 
Presentation common task differences between sdmc and hmc
Presentation   common task differences between sdmc and hmcPresentation   common task differences between sdmc and hmc
Presentation common task differences between sdmc and hmc
 
08 sdn system intelligence short public beijing sdn conference - 130828
08 sdn system intelligence   short public beijing sdn conference - 13082808 sdn system intelligence   short public beijing sdn conference - 130828
08 sdn system intelligence short public beijing sdn conference - 130828
 
Migration to Siebel IP17+
Migration to Siebel IP17+Migration to Siebel IP17+
Migration to Siebel IP17+
 
Windows Server 2008 R2 Hyper V
Windows Server 2008 R2 Hyper VWindows Server 2008 R2 Hyper V
Windows Server 2008 R2 Hyper V
 
On command shift 1.0 release
On command shift 1.0 releaseOn command shift 1.0 release
On command shift 1.0 release
 
Hyper-V 2008 R2 Best Practices
Hyper-V 2008 R2 Best PracticesHyper-V 2008 R2 Best Practices
Hyper-V 2008 R2 Best Practices
 
Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...
Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...
Applications of Virtual Machine Monitors for Scalable, Reliable, and Interact...
 
Windows Server 2008 R2 Overview 1225768142880746 9
Windows Server 2008 R2 Overview 1225768142880746 9Windows Server 2008 R2 Overview 1225768142880746 9
Windows Server 2008 R2 Overview 1225768142880746 9
 
3 Hyper V
3 Hyper V3 Hyper V
3 Hyper V
 
Presentation on Large Scale Data Management
Presentation on Large Scale Data ManagementPresentation on Large Scale Data Management
Presentation on Large Scale Data Management
 
directCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI ExpressdirectCell - Cell/B.E. tightly coupled via PCI Express
directCell - Cell/B.E. tightly coupled via PCI Express
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Application Live Migration in LAN/WAN Environment

  • 1. IBM Research Application Live Migration in LAN/WAN Environment Mahendra Kutare Summer Intern (Georgia Tech) Mentor – Sai Zeng Manager – Milind Naphade, Chitra Dorai 1 IBM © 2008 IBM Corporation
  • 2. IBM Research Talk Outline        2 Live Migration Overview Live Migration in LAN Live Migration in WAN Gap Analysis for WAN Live Migration Road Map Live Migration Benchmarking for WAN Future Work IBM © 2008 IBM Corporation
  • 3. IBM Research Live Migration Overview  Live migration    Motivation      3 Movement of a running virtual machine from one physical host to another Goal: Minimize the response delays to end users Maintenance and upgrade Load balancing Server consolidation Business acquisition, outsourcing High availability IBM © 2008 IBM Corporation
  • 4. IBM Research Live Migration Common Steps Source Validation Pre Migration Destination Validation Disk Copy Source Machine Selection Source VM Selection Destination Machine Selection Block Image Copy Iterative Pre Copy, Suspend and Copy Migration Memory Copy Network Demand Paging Network Traffic Redirect Destination VM Activation Destination VM Validation Destination Active VM Validation Destination Local Device Connection Post Migration Destination VM Operation Resumption Source VM Commit Migration Complete Discard Original VM 4 IBM © 2008 IBM Corporation
  • 5. IBM Research Live Migration in LAN  Overview    5 Targeted at cluster environment Top players – VMware, Xen provide LAN migration capabilities Different migration phases and their impact IBM © 2008 IBM Corporation
  • 6. IBM Research Evaluation of Current Live Migration Solutions in LAN  VMware     Xen    Power 6 processors provide partition mobility support. Disruption time – SAP demo less than a sec Hyper – V    6 Command line tools – Xend, Xm Least disruption time – Less than 250 ms IBM System p    Suite of products to enable and support live migration. VMotion, Storage VMotion, DRS, HA Disruption time - Less than a sec IBM “Quick” motion – Clustering failover solution No real time “live” migration Disruption time – around 8 sec © 2008 IBM Corporation
  • 7. IBM Research VMotion Live Migration Under The Hood ResumePre-copy memory from esx01 to esx02, Page” Virtual machine is machineonon esx01 and Suspend the virtual machineand “Demand virtual machine running from esx01 Delete virtual on esx02 esx01 esx01 “Background Page” server Copy the current configuration file to esx02 from until all memoryto bebeen Completed… copied esx01 copy memory accesses modified memory with and about has bitmap to esx02 VMotion system moved toto a bitmap ongoing changes logged esx02 when Successfully successfully Shared Gigabit Ethernet Backplane 7 IBM Shared LUN visibility on the SAN CPUs of the same type © 2008 IBM Corporation
  • 8. IBM Research IBM System P – POWER 6 Live Migration – Under the Hood 8 Armstrong, W.L., etl, IBM POWER6 partition mobility: Moving virtual servers seamlessly between physical systems IBM © 2008 IBM Corporation
  • 9. IBM Research Live Migration Vendor Summary Memory Phase  Vendors Pure Demand Paging Iterative Pre Copy of Updates All Pages Moved in First Iteration IBM Yes No No VMware Yes No Yes Xen No Yes Yes 9 IBM © 2008 IBM Corporation
  • 10. IBM Research Live Migration Timeline and Performance Impact Memory Phases  Impact Pure Demand Paging Iterative Pre Copy of Updates All pages not in first iteration Down Time Less More - Total Migration Time More Less More Dedicated Bandwidth - - No Performance Impact After Migration More Less More 10 IBM © 2008 IBM Corporation
  • 11. Memory IBM Research Source Application(S) Live Migration State Diagram (Rob Strom) Target .Virtual App Common Live Migration Process Under Hood Live Application live Start Initiate migration Precopy Send page Pre-cop ied pag es Send page Start Target Demand Paging Start M e Send page ssage Delayed Page s 11 IBM ge t for Pa Reques e Don Application live ge t for Pa Reques Done Application(T) Disruption Stopped Machine live Time Machine live Stop Source © 2008 IBM Corporation
  • 12. IBM Research Live Migration in LAN - Constraints Source and destination hosts must be:  Part of same datacenter. Cluster of physical hosts in LAN environment.  Connected to same dedicated Gigabit network  Connected to the same storage Shared storage common to both source and target servers.  Homogenous environment E.g. VMware ESX servers must have compatible CPU models. System p: POWER6, AIX5.3 and above 12 IBM Control Center VM1 Host B Host A Network Disk 1 Shared SAN © 2008 IBM Corporation
  • 13. IBM Research Live Migration Capability Requirements - WAN  13 Three Requirements  Consistency: behave as if it is as a single VM, once destination VM starts, its file system should be consistent to the source VM  Minimum service disruption: Migration does not degrade performance significantly as perceived by end user  Network connection: After IP address change, new connections are seamlessly redirect to the new IP address at the destination. IBM © 2008 IBM Corporation
  • 14. IBM Research Live Migration in WAN – Characteristics and Challenges Source and destination hosts:  Cannot share the same storage Need to migration disk over  IP address different Network connectivity even if IP address changes  Connected to low bandwidth network with high latency Have impact on migration policies  Homogenous environment E.g. VMware ESX servers must have compatible CPU models. System p: POWER6, AIX5.3 and above 14 IBM Control Center IP1 IP2 VM1 Host B Host A Network Disk 1 Disk 2 Disk 2 © 2008 IBM Corporation
  • 15. IBM Research Live Migration Capability Gaps - WAN      Network : IP address changes  Available solutions - Mobile IP and IP tunneling  Academic effort with focus on Xen Memory  Same solution as LAN Storage  Possible solutions as replication  Deal with local persistent state storage Best solution has poor performance – 68 seconds disruption: no isolation of root causes. References     15 Bradford, R., etl, Deutsche Telekom Lab, Live Wide-Area Migration of Virtual Machines Including Local Persistent State Ramkrishnan, K.K., etl, AT&T Labs-Research, Live Data Center Migration across WANs: A Robust Cooperative Context Aware Approach Clark C., et.al, Live Migration of Virtual Machines Many more… IBM © 2008 IBM Corporation
  • 16. IBM Research Live Migration Road Map  Memory Migration  Benchmark and evaluate existing technologies for WAN environment  Iterative Pre copy  Demand Paging  Develop technologies leveraging existing solution concepts for WAN  Storage Migration  Benchmark and evaluate existing technologies for WAN environment.  Xen storage migration  Replication – Synchronous / Asynchronous  Develop technologies leveraging existing solution concepts for WAN 16 IBM © 2008 IBM Corporation
  • 17. IBM Research Live Migration Benchmarking in WAN     17 Focus: Memory Migration Objective : Establish baselines for new technology development Overview  Investigate migration policies (e.g when to suspend source VM)  Total migration time, disruption time  Require base line measurement to evaluate future developed technologies Design  VM migration without any workload  VM with self contained workload migration  VM with web server, simulated backend and load generator  VM with web server, application server, simulated backend and load generator IBM © 2008 IBM Corporation
  • 18. IBM Research Experimental Setup    Redhat Enterprise Linux 5 Server/Xen – Host OS(dom0) Redhat Enterprise Linux 5 Client – Guest OS (domU) Sample Workloads     Network Emulator   WANem - WAN characteristics Monitoring   18 SPECjvm2008 SPECweb2005 IBM Trader 6 IBM Workload performance TCP Traces © 2008 IBM Corporation
  • 19. IBM Research Experiment 1 – VM Live Migration without Workload  Goal: Understand correlation between memory and total migration time Storage WAN VM VM Hypervisor Hypervisor Host A Host B Simulated Network Network 19 IBM © 2008 IBM Corporation
  • 20. IBM Research Experiment 1 Results – Memory vs. Total Migration Time  Observations:  Total migration time is linear function of memory  There is overhead time Intercept Memory Coefficients 1.86102573 0.00919992 Standard Error t Stat P-value Lower 95% 0.162785817 11.43236 1E-10 1.52342824 0.000141606 64.96847 1.23E-26 0.00890624 Upper 95% Lower 95.0% 2.198623212 1.523428242 0.009493588 0.008906243 Upper 95.0% 2.198623212 0.009493588 Memory Line Fit Plot Total migration time 25 20 15 Total migration time 10 Predicted Total migration time 5 0 0 500 1000 1500 2000 2500 Memory 20 IBM © 2008 IBM Corporation
  • 21. IBM Research Experiment 2 - VM Live Migration with Workload   Goal: Understand impact of migration on benchmark performance Setup: VM loaded with SPEC JVM - benchmark to evaluate performance of JRE  Low file I/O, no network I/O Storage WAN VM SPEC JVM Hypervisor Hypervisor Host A Host B Simulated Network Network 21 IBM © 2008 IBM Corporation
  • 22. IBM Research Experiment 2 Results – Free Heap Size Vs Total Migration Time Derby-1Operation-migration on benchmark performance  Setup: VM loaded with SPEC JVM benchmark to evaluate performance of JRE  Low file I/O, no network I/O  Observations:  Performance metrics are not the best (operations/free heap size) 200000000 free heap size-bytes  Goal: Understand impact of migration 250000000 150000000 100000000 50000000 0 0 50000 100000 150000 200000 250000 300000 ms Derby-1operation-1iteration 250000000 free heap size-bytes 200000000 150000000 100000000 50000000 0 0 50000 100000 150000 200000 250000 300000 350000 ms 22 IBM © 2008 IBM Corporation
  • 23. IBM Research Experiment 3 - VM with web server, simulated backend and load generator … Client 1 Client n Prime Client Host C Host D Storage SPECweb VM WAN Emulator Hypervisor Hypervisor Host A Host B Simulated Network Network 23 IBM © 2008 IBM Corporation
  • 24. IBM Research Experiment 3  Goal: Understand the relation between service disruption time/total migration time and network bandwidth and latency  VM loaded with benchmark tool offering the capabilities of measuring both SSL and non-SSL request/response performance  Status:  Finish SPECweb installation on blade servers and laptop clients  Setting up of TCP trace analyzing tool for measuring TCP throughput 24 IBM © 2008 IBM Corporation
  • 25. IBM Research Future Work  Memory Migration  Benchmarking next steps:  Test runs measuring TCP throughput over elapsed migration time  Tests runs measuring total migration/disruption time Vs bandwidth for fixed latency  Test runs measuring total migration/disruption time Vs latency for fixed bandwidth  Develop technologies leveraging existing solution concepts for WAN  Storage Migration  Benchmark and evaluate existing technologies for WAN environment.  Xen storage migration  Replication – Synchronous / Asynchronous  Develop technologies leveraging existing solution concepts for WAN 25 IBM © 2008 IBM Corporation
  • 26. IBM Research 26 IBM © 2008 IBM Corporation
  • 27. IBM Research Live Migration Vendor Summary Pure Demand Paging Iterative Pre Copy of Updates All Pages Moved in First Iteration IBM Yes No No VMware Yes No Yes Xen No Yes Yes 27 IBM © 2008 IBM Corporation
  • 28. IBM Research Live Migration Technical Impact Summary Pure Demand Paging Iterative Pre Copy of Updates Down Time Less More Total Migration Time More Less Performance Impact After Migration More Less 28 IBM © 2008 IBM Corporation
  • 29. IBM Research SPECweb2005 Workload      29 Configured on Apache with PHP support. Backend Simulator Apache with PHP and Backend Simulator on 9.2.252.243 (blade server) listening on port 80 and 81 Clients configured on 9.2.84.87 (laptop) Characteristics  Number of workload clients = 1  Simultaneous session = 5  Banking workload execution IBM © 2008 IBM Corporation
  • 30. IBM Research WANem   30 Installed on 9.2.252.243 (blade server) as WANem VM (9.2.252.218) Characteristics  Configured to run at various bandwidth and delay values  1.5 Mbps to 155 Mbps and other.. IBM © 2008 IBM Corporation
  • 31. IBM Research Live Migration in Xen 31 IBM © 2008 IBM Corporation
  • 32. IBM Research Live Migration in WAN 32 IBM © 2008 IBM Corporation
  • 33. IBM Research Live Migration Capability Challenges - WAN    33 Network  Maintaining connectivity and application dependencies for end users even if IP address changes Memory  Amount of data – Page size  Network characteristics – No dedicated bandwidth, high latency Storage  No shared storage between source and target machines  Size of storage  VMs at Source machine have shared database IBM © 2008 IBM Corporation
  • 34. IBM Research Live Migration Capability Gaps - WAN      Network : IP address changes  Available solutions - Mobile IP and IP tunneling  Academic effort with focus on Xen Memory  Same solution as LAN Storage  Possible solutions as replication  Deal with local persistent state storage Best solution has poor performance – 68 seconds disruption: no isolation of root causes. References     34 Bradford, R., etl, Deutsche Telekom Lab, Live Wide-Area Migration of Virtual Machines Including Local Persistent State Ramkrishnan, K.K., etl, AT&T Labs-Research, Live Data Center Migration across WANs: A Robust Cooperative Context Aware Approach Clark C., etl, Live Migration of Virtual Machines Many more… IBM © 2008 IBM Corporation
  • 35. IBM Research Live Migration – Why    35 Live migration  Movement of a running virtual machine (OS, Application, etc., etc.) from one physical host to another  Minimizing the response delays to end users Many reasons to do migration  Maintenance and upgrade  Load balancing  Server consolidation  Business acquisition, outsourcing  Shut down of data center  High availability  Disaster recovery  More… Characteristics  Frequent vs. non-frequent  Planned vs. unplanned IBM © 2008 IBM Corporation
  • 36. IBM Research What is Hyper-V Quick Motion  36 Quick Motion  Renamed feature of host clustering (available in 2006)  Virtual server plus clustering  Guest virtual machine can fail over between cluster nodes (for both planned and unplanned downtime)  Offer the ability to quickly move a virtual machine from physical server to physical server  Benchmark: 512Mb virtual machine can be migrated from one server to another in about 6 seconds using 1Gb iSCSI IBM Copyright of http://blogs.technet.com/daven/archive/ 2007/06/08/virtual-server-quick-migration.aspx © 2008 IBM Corporation
  • 37. IBM Research Live Migration – Setting the Stage     37 Live Application Migration Common Processes and Technological Capabilities: An Overview Live Application Migration Capability: Gap Analysis - LAN and WAN Top players considered  VMware - VMotion  System p - System p Live Mobility  Xen – XenMotion  Microsoft – Hyper-V QuickMotion Our approach – process centric approach  Process breakdown: e.g. pre migration, migration, post migration  Capability mapping to process steps  Technologies and tools mapping to capabilities IBM © 2008 IBM Corporation

Notas del editor

  1. Memory Live Migration Core Steps Under the Hood
  2. Live migration is the movement of a virtual machine from one physical host to another while continuously powered-up. When properly carried out, this process takes place without any noticeable effect from the point of view of the end user. Live migration allows an administrator to take a virtual machine offline for maintenance or upgrading without subjecting the system's users to downtime. One of the most significant advantages of live migration is the fact that it facilitates proactive maintenance. If an imminent failure is suspected, the potential problem can be resolved before disruption of service occurs. Live migration can also be used for load balancing, in which work is shared among computers in order to optimize the utilization of available CPU resources. Planned downtime is the easier of the two (because it's scheduled, not a surprise) and the most common. Generally, planned downtime is for hardware servicing (adding additional memory, storage or updating a BIOS) or software patching. Most people schedule this work off hours (early mornings or on weekends). Unplanned downtime is the more difficult one, where a server is unexpectedly powered off and you want the virtual machines running on that server to automatically restart on another server without user intervention. Resource balancing A system does not have enough resources for the workload while another system does Sever consolidation Allows to move applications from individual, stand-alone servers to consolidated servers New system deployment A workload running on an existing system must be migrated to a new, more powerful one. Availability requirements When a system requires maintenance, its hosted applications must not be stopped and can be migrated to another system.
  3. Iterative pre-copy: during the first iteration, all pages are transferred from A to B. Subsequent iterations copy only thos pages dirties during the previous transfer phase. Stop and copy: suspend the running OS instance at A and redirect its network traffic to B. As described earlier, CPU state and any remaining iconsistent memory pages are then transferred. At the end of this stage there is a consistent suspended copy of he VM at both A and B. The copy at A is stil considered to be primary and is resumed in case of failure. Source validation and destination validation is the step to make sure the migraiton can happen successfully, for instance, we have to check what’s inside source machine we want to migration, what’s the configuration at source interns of hardware all the way to the software, on the destination, we need to check the compatibility of the destinaation machine host os and hypervisor, also the available resources etc. block image copy is the process the system copies the VM’s disck image from the source to the destination while the fil eis in use by the running VM at the source. It worth noting that the data transferred durng migration does not need to be the entrie file system used by the VM. Some technology, such as xenoserver can have template disk image, which allows tranfereing only the diffference between the template and the customized disk image In the past months, we have carefully studied the migration process of vmware, xen and system p, finally, we summarized the common steps among all these top players, which can can server as reference to help us to develop our own live migration processes.
  4. Live migration is the movement of a virtual machine from one physical host to another while continuously powered-up. When properly carried out, this process takes place without any noticeable effect from the point of view of the end user. Live migration allows an administrator to take a virtual machine offline for maintenance or upgrading without subjecting the system's users to downtime. One of the most significant advantages of live migration is the fact that it facilitates proactive maintenance. If an imminent failure is suspected, the potential problem can be resolved before disruption of service occurs. Live migration can also be used for load balancing, in which work is shared among computers in order to optimize the utilization of available CPU resources. Planned downtime is the easier of the two (because it's scheduled, not a surprise) and the most common. Generally, planned downtime is for hardware servicing (adding additional memory, storage or updating a BIOS) or software patching. Most people schedule this work off hours (early mornings or on weekends). Unplanned downtime is the more difficult one, where a server is unexpectedly powered off and you want the virtual machines running on that server to automatically restart on another server without user intervention. Resource balancing A system does not have enough resources for the workload while another system does Sever consolidation Allows to move applications from individual, stand-alone servers to consolidated servers New system deployment A workload running on an existing system must be migrated to a new, more powerful one. Availability requirements When a system requires maintenance, its hosted applications must not be stopped and can be migrated to another system.
  5. Live migration is the movement of a virtual machine from one physical host to another while continuously powered-up. When properly carried out, this process takes place without any noticeable effect from the point of view of the end user. Live migration allows an administrator to take a virtual machine offline for maintenance or upgrading without subjecting the system's users to downtime. One of the most significant advantages of live migration is the fact that it facilitates proactive maintenance. If an imminent failure is suspected, the potential problem can be resolved before disruption of service occurs. Live migration can also be used for load balancing, in which work is shared among computers in order to optimize the utilization of available CPU resources. Planned downtime is the easier of the two (because it's scheduled, not a surprise) and the most common. Generally, planned downtime is for hardware servicing (adding additional memory, storage or updating a BIOS) or software patching. Most people schedule this work off hours (early mornings or on weekends). Unplanned downtime is the more difficult one, where a server is unexpectedly powered off and you want the virtual machines running on that server to automatically restart on another server without user intervention. Resource balancing A system does not have enough resources for the workload while another system does Sever consolidation Allows to move applications from individual, stand-alone servers to consolidated servers New system deployment A workload running on an existing system must be migrated to a new, more powerful one. Availability requirements When a system requires maintenance, its hosted applications must not be stopped and can be migrated to another system.
  6. <number> [This slide is a 7-click animation. Please practice with it before presenting.] Let’s explore how VMotion works in more detail. First there are some important configuration requirements for VMotion: VMotion is only supported by ESX Server hosts under VirtualCenter management. A dedicated gigabit Ethernet network segment is needed between ESX Server hosts to accommodate the rapid data transfers performed. The ESX Server hosts must share storage LUNs on the same SAN and the virtual disk files for the virtual machines to be migrated must be contained in those shared LUNs. Finally, the processors on the ESX Server hosts must be of the same type. For example, VMotion from a Xeon host to an Opteron host is not supported because the processor architectures are too different. [Click 1] We start with virtual machine A running on host ESX01. We want to move VM A to our second host, ESX02 so that we can perform maintenance on host ESX01, but VM A has active user connections and network session which we want preserved. [Click 2] The VMotion migration is initiated from the VirtualCenter client or a VirtualCenter scheduled task or SDK script. The first action is to copy the VM A configuration file to host ESX02 to establish and instance of the VM on the new host. The virtual machine configuration file is simply a small text file listing the virtual machine’s properties. [Click 3] Next, the memory image of VM A is copied to the target host. The memory image can be quite large, so the dedicated gigabit Ethernet link required by VMotion lets that copy proceed at high speed. Immediately before the VM A memory image copy begins, VMotion redirects new memory write operations on host ESX01 to a memory bitmap which will record all VM A memory updates during the course of the VMotion migration. In that way, the full memory image is read-only and static during the VMotion operation. Because the virtual disk file for VM A is stored on a VMFS-formatted SAN LUN mounted by both ESX01 and ESX02, we don’t need to transfer that potentially very large file. The multiple access feature of the VMFS file system enables this time-saving method. [Click 4] Now, we suspend VM A on the source host and copy the memory bitmap to the target host. Because the bulk of VM A’s memory was copied earlier, the transfer of the memory bitmap proceeds quickly – only taking a second or two. This step is the only one in which activity is interrupted and that interruption is too short to cause connections to be dropped and is barely noticeable to users. [Click 5] As soon as the memory bitmap with the changes made to memory finishes copying, we resume VM A on its new home, ESX02. VMotion also sends an Address Resolution Protocol (ARP) ping packet to the production network switch to inform it that the switch port to use for VM A has changed. That preserves all the network connections to VM A. Some modified memory pages may still reside on ESX01 after VM A has resumed. When VM A needs access to those pages, VMotion will “demand page” them or transfer them as needed over to ESX02. This technique minimizes the service interruption when the memory bitmap is copied. [Click 6] VMotion completes the memory image transfer by background paging the remaining memory of VM A over to target host ESX02 and does a final commit of all the modified memory pages to the full VM A memory image. Now VM A is back to using its full memory image in read/write mode. [Click 7] The VMotion migration is now complete and we finish with a cleanup operation of deleting VM A from host ESX02. Depending on the size of the virtual machine’s memory, it may take several minutes to complete a VMotion migration, but the multi-staged memory transfer method employed by VMotion reduces the period of actual interruption to just a second or two, and there is no noticeable downtime for the virtual machine or its applications.
  7. Armstrong, W.L., etl, IBM POWER6 partition mobility: Moving virtual servers seamlessly between physical systems The hypervisor keeps track of the pages that need to be migrated in a dirty page table. All pages of the partition are marked as dirty at the start of the migration. Pages that have been sent are set to an effective state of read only in the PPT and marked clean. Whenever the partition attempts to write to one of the clean pages, it is intercepted by the hypervisor by means of a VPM interrupt. The hypervisor reverts that page to the dirty state. The hypervisor then makes the page writable again and returns control to the partition at the point of interruption. The process of sending or resending pages to the destination hypervisor continues until there is sufficient partition memory state on the destination hypervisor so that the processing of the partition can be transferred to processors on the destination server and resume its operation there. The source hypervisor suspends the partition and transfers its internal processor and other necessary state to the destination hypervisor. The source hypervisor also sends the dirty page table to the destination hypervisor. The destination hypervisor receives the dirty page table and uses it to set the state of all dirty pages to an ‘‘invalid’’ access state. The partition is then resumed on the destination hypervisor. The source hypervisor continues sending the remaining partition page frames to the destination hypervisor, which marks them as clean upon their successful arrival. The destination hypervisor resumes the partition with the virtual processors of the partition in VPM mode. After the partition is resumed, any attempt by the partition to access a page whose state is invalid causes a VPM interrupt, which is handled by the hypervisor. The destination hypervisor blocks the virtual processor and then makes a high-priority ‘‘demand paging’’ request to the source hypervisor for that page. The requested page is sent ahead of other pages that are waiting to be transferred to the destination hypervisor. When the requested page arrives, the hypervisor marks the page as ‘‘valid’’ and resumes the virtual processor at the point of interruption. This process continues transparently to the partition until all remaining partition pages have been transferred from the source to the destination. Once all pages are resident on the destination server, the destination hypervisor takes the virtual processors of the partition out of VPM mode. During the period of time that the partition is in VPM mode for movement, other storage access interrupts may occur. The source or the destination hypervisor uses the VPT to analyze an interrupt and passes control to the OS interrupt handler if the interrupt is not associated with partition movement.
  8. We have source VM and target machine. Yellow bar representing source machine is alive Application is running inside machine. Green line representing application is running. There is external decisiion to migrate the application, which initialive the migration process, and enter precopy state. During precopy phase, the pages has been sent in the order determined by source/control point. During this phase, the pages has been sent or in the queue to be sent can bet written again, which needs some mechanism to detect that. When destination received sufficient pages, application at source enter stopped state, which means the pages in the queue is still kept sending, however pages cannot get written. By doing this, the queue of pages can be garanteed to dinished, while in precopy phase, there is no such guarantee. While target is ready to start, a start message along with list of invalide pages to destination, informing it the pages has not been sent and the pages have sent but get writtten. After getting message, the source application starts, meantime, the delayed pages keep arriving to the destination. When appliation is alive on destination, it starts to enter demand paging phase, where if any attempt to access a page who is invlid on target, the target will make a high prioirty gets write calls to invalid pages, it will send request to source to ask sending the page with the most recent states, these requested pages are After the partition is resumed, any attempt by the partition to access a page whose state is invalid causes a VPM interrupt, which is handled by the hypervisor. The destination hypervisor blocks the virtual processor and then makes a high-priority ‘‘demand paging’’ request to the source hypervisor for that page. The requested page is sent ahead of other pages that are waiting to be transferred to the destination hypervisor
  9. Virtualization software support is installed and available on source. Virtualization software support is installed and available on destination. Source and destination VM connected to same shared storage Source and destination VM share a dedicated gigabit network. Source and destination VM share the same shared network. Check for a successful VM migration Source and target destination hosts must be: Part of same datacenter. Cluster of physical hosts in LAN environment. Connected to same Gigabit network Candidate virtual machines must not be connected to internal networks or local devices. Connected to the same storage Shared storage common to both source and target ESX servers. Must have compatible CPU models. Source and target ESX servers have the processors of same family
  10. Virtualization software support is installed and available on source. Virtualization software support is installed and available on destination. Source and destination VM connected to same shared storage Source and destination VM share a dedicated gigabit network. Source and destination VM share the same shared network. Check for a successful VM migration Source and target destination hosts must be: Part of same datacenter. Cluster of physical hosts in LAN environment. Connected to same Gigabit network Candidate virtual machines must not be connected to internal networks or local devices. Connected to the same storage Shared storage common to both source and target ESX servers. Must have compatible CPU models. Source and target ESX servers have the processors of same family
  11. Robert Bradford, Evngelos Kotsovinos, Anja Feldmann, Harald Schioberg, Deutsche Telekom Lab, “Live Wide-Area Migration of Virtual Machines Including Local Persistent State K.K. Ramkrishnan, etl, AT&T Labs-Research, Live Data Center Migration across WANs: A Robust Cooperative Context Aware Approach Clark C., etl, Live Migration of Virtual Machines Network redirect Disk copy *Required for states stored in data stores, more relevant for WAN Disk Snapshot Create disk only snapshot – disk image at a particular time to create child disk Snapshot consolidation Consolidation of child and parent disk snapshots Asynchronous replication Local and remote storage systems are allowed to diverge Virtual Machine Disk Files are not external disks. They are virtual machine configuration files which needs to be moved to other physical machine during live migrations Disk SnapshotCreate disk only snapshot - disk image at a particular time to create child disk REDO logsLog of disk write activities which can then be replayed to restore remote disk and keep it consistent with local disk Snapshot consolidationConsolidation of child and parent disk snapshots after Asynchronous ReplicationLocal and remote storage systems are allowed to diverge. The amount of divergence between local and remote copies is typically bounded by either certain amount of time or data.
  12. Scott Trent: AIX guru (specweb wiki) SPEC IBM representative: Alan Adamson: SWG Virtualization efficiency: Chris Floyd and Joe Jakubowski (STG system x)
  13. Scott Trent: AIX guru (specweb wiki) SPEC IBM representative: Alan Adamson: SWG Virtualization efficiency: Chris Floyd and Joe Jakubowski (STG system x)
  14. Measuring mechanism sits inside VM, timekeeping is not trusty
  15. Scott Trent: AIX guru (specweb wiki) SPEC IBM representative: Alan Adamson: SWG Virtualization efficiency: Chris Floyd and Joe Jakubowski (STG system x)
  16. Robert Bradford, Evngelos Kotsovinos, Anja Feldmann, Harald Schioberg, Deutsche Telekom Lab, “Live Wide-Area Migration of Virtual Machines Including Local Persistent State K.K. Ramkrishnan, etl, AT&T Labs-Research, Live Data Center Migration across WANs: A Robust Cooperative Context Aware Approach Clark C., etl, Live Migration of Virtual Machines Network redirect Disk copy *Required for states stored in data stores, more relevant for WAN Disk Snapshot Create disk only snapshot – disk image at a particular time to create child disk Snapshot consolidation Consolidation of child and parent disk snapshots Asynchronous replication Local and remote storage systems are allowed to diverge Virtual Machine Disk Files are not external disks. They are virtual machine configuration files which needs to be moved to other physical machine during live migrations Disk SnapshotCreate disk only snapshot - disk image at a particular time to create child disk REDO logsLog of disk write activities which can then be replayed to restore remote disk and keep it consistent with local disk Snapshot consolidationConsolidation of child and parent disk snapshots after Asynchronous ReplicationLocal and remote storage systems are allowed to diverge. The amount of divergence between local and remote copies is typically bounded by either certain amount of time or data.
  17. Live migration is the movement of a virtual machine from one physical host to another while continuously powered-up. When properly carried out, this process takes place without any noticeable effect from the point of view of the end user. Live migration allows an administrator to take a virtual machine offline for maintenance or upgrading without subjecting the system's users to downtime. One of the most significant advantages of live migration is the fact that it facilitates proactive maintenance. If an imminent failure is suspected, the potential problem can be resolved before disruption of service occurs. Live migration can also be used for load balancing, in which work is shared among computers in order to optimize the utilization of available CPU resources. Planned downtime is the easier of the two (because it's scheduled, not a surprise) and the most common. Generally, planned downtime is for hardware servicing (adding additional memory, storage or updating a BIOS) or software patching. Most people schedule this work off hours (early mornings or on weekends). Unplanned downtime is the more difficult one, where a server is unexpectedly powered off and you want the virtual machines running on that server to automatically restart on another server without user intervention. Resource balancing A system does not have enough resources for the workload while another system does Sever consolidation Allows to move applications from individual, stand-alone servers to consolidated servers New system deployment A workload running on an existing system must be migrated to a new, more powerful one. Availability requirements When a system requires maintenance, its hosted applications must not be stopped and can be migrated to another system.
  18. http://blog.scottlowe.org/2007/07/23/live-migration-vs-quick-migration/ Quick Migration simply saves the state of a running virtual machine (memory to disk), moves the storage connectivity from one physical server to another and then restores the virtual machine (disk to memory).  This is quick (seconds) - but it will depend on how much memory needs to be written to disk and the speed of the connectivity to the storage.  For your reference, a 512Mb virtual machine can be migrated from one server to another in about six seconds using 1Gb iSCSI.