SlideShare una empresa de Scribd logo
1 de 32
TCP Issues in Virtualized Datacenter
Networks
Hemanth Kumar Mantri
Department of Computer Science 1 of 27
Selected Papers
• The TCP Outcast Problem: Exposing
Unfairness in Data Center Networks.
– NSDI’12
• vSnoop: Improving TCP Throughput in
VirtualizedEnvironments via Ack Offload.
– ACM/IEEE SC, 2010
2 of 27
Background and Motivation
• Data center is a shared environment
– Multi Tenancy
• Virtualization: A key enabler of cloud
computing
– Amazon EC2
• Resource sharing
– CPU/Memory are strictly shared
– Network sharing largely laissez-faire
3 of 27
Data Center Networks
• Flows compete via TCP
• Ideally, TCP should achieve true fairness
– All flows get equal share of link capacity
• In practice, TCP exhibits RTT-bias
– Throughput is inversely proportional to RTT
• 2 Major Issues
– Unfairness (in general)
– Low Throughput (in virtualized environments)
4 of 27
Datacenter Topology (Hierarchical)
5 of 27
Traffic Pattern: Many to One
6 of 27
Key Find: Unfairness
Inverse RTT Bias?
Low RTT = Low Throughput
7 of 27
Further Investigation
Instantaneous Average
2-hop flow is consistently starved!!
TCP Outcast Problem
• Some Flows are ‘Outcast’ed and receive very low
throughput compared to others
• Almost an order of magnitude reduction in some
cases
8 of 27
Experiments
• Same RTTs
• Same Hop Length
• Unsynchronized Flows
• Introduce Background Traffic
• Vary Switch Buffer Size
• Vary TCP
– RENO, MP-TCP, BIC, Cubic + SACK
• Unfairness Persists! 9 of 27
Observation
Flow differential at input ports is the culprit! 10 of 27
Vary #flows at competing bottle neck
switch
11 of 27
Reason: Port Blackout
1. Packets are roughly same size
2. Similar inter-arrival rates (Predictable Timing) 12 of 27
Port Blackout
• Can occur on any input port
• Happens for small intervals of time
• Has more catastrophic effect on
throughput of fewer flows!!
– Experiments showed that “same number” of
packet drops affect the throughput of fewer
flows much more than if there were several
concurrent flows.
13 of 27
Conditions for TCP Outcast
14 of 27
Solutions?
• Stochastic Fair Queuing (SFQ)
– Explicitly enforce fairness among flows
– Expensive for commodity switches
• Equal Length Routing
– All flows are forced to go through Core
– Better interleaving of packets, alleviate PB
15 of 27
• Multiple VMs hosted by one physical host
• Multiple VMs sharing the same core
– Flexibility, scalability, and economy
VM Consolidation
Hardware
Virtualization Layer
VM 1 VM 3 VM 4VM 2
Observation:
VM consolidation negatively
impacts network performance!
16 of 27
Sender
Hardware
Virtualization Layer
Investigating the Problem
Server
VM 1 VM 2 VM 3
Client
17 of 27
40
60
80
100
120
140
160
180
5432
RTT(ms)
Number of VMs
RTT increases in
proportion to VM
scheduling slice
(30ms)
Effect of CPU Sharing
18 of 27
Exact Culprit
Sender
Hardware
Driver Domain
(dom0)
VM 1
Device
Driver
VM 3
bufbuf
VM 2
buf
19 of 27
Connection to the VM is much
slower than dom0!
Impact on TCP Throughput
+ dom0
x VM
20 of 27
Solution: vSnoop
• Alleviates the negative effect of VM scheduling on
TCP throughput
• Implemented within the driver domain to
accelerate TCP connections
• Does not require any modifications to the VM
• Does not violate end-to-end TCP semantics
• Applicable across a wide range of VMMs
– Xen, VMware, KVM, etc.
21 of 27
Sender VM1 BufferDriver Domain
Time
SYN
SYN,ACK
SYN
SYN,ACK
VM1 buffer
TCP Connection to a VM
Scheduled VM
VM1
VM2
VM3
VM1
VM2
VM3
SYN,ACK
SYN
VM Scheduling
Latency
RTT
RTT
VM Scheduling
Latency
Sender establishes a TCP
connection to VM1
22 of 27
Sender VM Shared BufferDriver Domain
Time
SYN
SYN,ACK
SYN
SYN,ACK
VM1 buffer
Key Idea: Acknowledgement Offload
Scheduled VM
VM1
VM2
VM3
VM1
VM2
VM3
SYN,ACK
w/ vSnoop
Faster progress during
TCP slowstart
23 of 27
• Challenge 1: Out-of-order/special packets (SYN, FIN packets)
• Solution: Let the VM handle these packets
• Challenge 2: Packet loss after vSnoop
• Solution: Let vSnoop acknowledge only if room in buffer
• Challenge 3: ACKs generated by the VM
• Solution: Suppress/rewrite ACKs already generated by vSnoop
Challenges
24 of 27
vSnoop Implementation in Xen
Driver Domain (dom0)
Bridge
Netfront
Netback
vSnoop
VM1
Netfront
Netback
VM3
Netfront
Netback
VM2
buf bufbuf
Tuning
Netfront
25 of 27
Median
0.192MB/s
0.778MB/s
6.003MB/s
TCP Throughput Improvement
• 3 VMs consolidated, 1000 transfers of a 100KB file
• Vanilla Xen, Xen+tuning, Xen+tuning+vSnoop
30x Improvement
+ Vanilla Xen
x Xen+tuning
* Xen+tuning+vSnoop
26 of 27
Thank You!
• References
– http://friends.cs.purdue.edu/dokuwiki/doku.php
– https://www.usenix.org/conference/nsdi12/tech-
schedule/technical-sessions
• Most animations and pictures are taken from
the authors’ original slides and NSDI’12
conference talk.
27 of 27
BACKUP SLIDES
28
Conditions for Outcast
• Switches use the tail-drop queue
management discipline
• A large set of flows and a small set of
flows arriving at two different input ports
compete for a bottleneck output port at a
switch
29
Why does Unfairness Matter?
• Multi Tenant Clouds
– Some tenants get better performance than
others
• Map Reduce Apps
– Straggler problems
– One delayed flow affects overall job
completion
30
State Machine Maintained Per-
FlowStart
Unexpected
Sequence
Active
(online)
No buffer
(offline)
Out-of-order
packet
In-order pkt
Buffer space available
Out-of-order
packet
In-order pkt
No buffer
In-order pkt
Buffer space available
No buffer
Packet recv
Early acknowledgements
for in-order packets
Don’t
acknowledge
Pass out-of-order
pkts to VM
31
vSnoop’s Impact on TCP Flows
• Slow Start
– Early acknowledgements help progress
connections faster
– Most significant benefit for short transfers that are
more prevalent in data centers
• Congestion Avoidance and Fast Retransmit
– Large flows in the steady state can also benefit
from vSnoop
– Benefit not as much as for Slow Start 32

Más contenido relacionado

La actualidad más candente

Hhm 3470 mq v8 and more recent new things for z os
Hhm 3470 mq v8 and more recent new things for z osHhm 3470 mq v8 and more recent new things for z os
Hhm 3470 mq v8 and more recent new things for z osPete Siddall
 
Design and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-ServiceDesign and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-Servicesoichi shigeta
 
Application Live Migration in LAN/WAN Environment
Application Live Migration in LAN/WAN EnvironmentApplication Live Migration in LAN/WAN Environment
Application Live Migration in LAN/WAN EnvironmentMahendra Kutare
 
Training Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringTraining Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringContinuent
 
Feedback queuing models for time shared systems
Feedback queuing models for time shared systemsFeedback queuing models for time shared systems
Feedback queuing models for time shared systemsPushpalanka Jayawardhana
 
IBM MQ Clustering (2017 version)
IBM MQ Clustering (2017 version)IBM MQ Clustering (2017 version)
IBM MQ Clustering (2017 version)MarkTaylorIBM
 
Feedback Queueing Models for Time Shared Systems
Feedback Queueing Models for Time Shared SystemsFeedback Queueing Models for Time Shared Systems
Feedback Queueing Models for Time Shared SystemsIshara Amarasekera
 
VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld
 
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...The Linux Foundation
 
Demand-Based Coordinated Scheduling for SMP VMs
Demand-Based Coordinated Scheduling for SMP VMsDemand-Based Coordinated Scheduling for SMP VMs
Demand-Based Coordinated Scheduling for SMP VMsHwanju Kim
 
Swift container sync
Swift container syncSwift container sync
Swift container syncOpen Stack
 
Containers in a File
Containers in a FileContainers in a File
Containers in a FileOpenVZ
 
Where is My Message?: Use MQ Tools to Work Out What Applications Have Done
Where is My Message?: Use MQ Tools to Work Out What Applications Have DoneWhere is My Message?: Use MQ Tools to Work Out What Applications Have Done
Where is My Message?: Use MQ Tools to Work Out What Applications Have DoneMorag Hughson
 

La actualidad más candente (20)

Hhm 3470 mq v8 and more recent new things for z os
Hhm 3470 mq v8 and more recent new things for z osHhm 3470 mq v8 and more recent new things for z os
Hhm 3470 mq v8 and more recent new things for z os
 
Design and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-ServiceDesign and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-Service
 
Application Live Migration in LAN/WAN Environment
Application Live Migration in LAN/WAN EnvironmentApplication Live Migration in LAN/WAN Environment
Application Live Migration in LAN/WAN Environment
 
Training Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringTraining Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten Clustering
 
VM Live Migration Speedup in Xen
VM Live Migration Speedup in XenVM Live Migration Speedup in Xen
VM Live Migration Speedup in Xen
 
Feedback queuing models for time shared systems
Feedback queuing models for time shared systemsFeedback queuing models for time shared systems
Feedback queuing models for time shared systems
 
IBM MQ Clustering (2017 version)
IBM MQ Clustering (2017 version)IBM MQ Clustering (2017 version)
IBM MQ Clustering (2017 version)
 
Feedback Queueing Models for Time Shared Systems
Feedback Queueing Models for Time Shared SystemsFeedback Queueing Models for Time Shared Systems
Feedback Queueing Models for Time Shared Systems
 
Mule
MuleMule
Mule
 
VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series VMworld 2014: Extreme Performance Series
VMworld 2014: Extreme Performance Series
 
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
 
XS Boston 2008 Quantitative
XS Boston 2008 QuantitativeXS Boston 2008 Quantitative
XS Boston 2008 Quantitative
 
XS Boston 2008 XenLoop
XS Boston 2008 XenLoopXS Boston 2008 XenLoop
XS Boston 2008 XenLoop
 
Demand-Based Coordinated Scheduling for SMP VMs
Demand-Based Coordinated Scheduling for SMP VMsDemand-Based Coordinated Scheduling for SMP VMs
Demand-Based Coordinated Scheduling for SMP VMs
 
Message passing in Distributed Computing Systems
Message passing in Distributed Computing SystemsMessage passing in Distributed Computing Systems
Message passing in Distributed Computing Systems
 
AMQP 1.0 introduction
AMQP 1.0 introductionAMQP 1.0 introduction
AMQP 1.0 introduction
 
XS 2008 Boston Capacity Planning
XS 2008 Boston Capacity PlanningXS 2008 Boston Capacity Planning
XS 2008 Boston Capacity Planning
 
Swift container sync
Swift container syncSwift container sync
Swift container sync
 
Containers in a File
Containers in a FileContainers in a File
Containers in a File
 
Where is My Message?: Use MQ Tools to Work Out What Applications Have Done
Where is My Message?: Use MQ Tools to Work Out What Applications Have DoneWhere is My Message?: Use MQ Tools to Work Out What Applications Have Done
Where is My Message?: Use MQ Tools to Work Out What Applications Have Done
 

Similar a TCP Issues in DataCenter Networks

XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...The Linux Foundation
 
TLS in manet
TLS in manetTLS in manet
TLS in manetJay Patel
 
VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld
 
Designing TCP-Friendly Window-based Congestion Control
Designing TCP-Friendly Window-based Congestion ControlDesigning TCP-Friendly Window-based Congestion Control
Designing TCP-Friendly Window-based Congestion Controlsoohyunc
 
lec 3 4 Core Delays Thruput Net Arch.ppt
lec 3 4 Core Delays Thruput Net Arch.pptlec 3 4 Core Delays Thruput Net Arch.ppt
lec 3 4 Core Delays Thruput Net Arch.pptMahamKhurram4
 
Congestion_Control09.ppt
Congestion_Control09.pptCongestion_Control09.ppt
Congestion_Control09.ppttahaniali27
 
FATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureFATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureAnkita Mahajan
 
Lecture notes - Data Centers________.pptx
Lecture notes - Data Centers________.pptxLecture notes - Data Centers________.pptx
Lecture notes - Data Centers________.pptxSandeepGupta229023
 
RIPE 80: Buffers and Protocols
RIPE 80: Buffers and ProtocolsRIPE 80: Buffers and Protocols
RIPE 80: Buffers and ProtocolsAPNIC
 
Congection control and Internet working
Congection control and Internet workingCongection control and Internet working
Congection control and Internet workingTharuniDiddekunta
 
AusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBRAusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBRAPNIC
 

Similar a TCP Issues in DataCenter Networks (20)

10 sdn-vir-6up
10 sdn-vir-6up10 sdn-vir-6up
10 sdn-vir-6up
 
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
XPDS13: On Paravirualizing TCP - Congestion Control on Xen VMs - Luwei Cheng,...
 
TLS in manet
TLS in manetTLS in manet
TLS in manet
 
VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead VMworld 2013: Extreme Performance Series: Network Speed Ahead
VMworld 2013: Extreme Performance Series: Network Speed Ahead
 
Designing TCP-Friendly Window-based Congestion Control
Designing TCP-Friendly Window-based Congestion ControlDesigning TCP-Friendly Window-based Congestion Control
Designing TCP-Friendly Window-based Congestion Control
 
lec 3 4 Core Delays Thruput Net Arch.ppt
lec 3 4 Core Delays Thruput Net Arch.pptlec 3 4 Core Delays Thruput Net Arch.ppt
lec 3 4 Core Delays Thruput Net Arch.ppt
 
transport layer
transport layertransport layer
transport layer
 
Congestion control
Congestion controlCongestion control
Congestion control
 
Congestion_Control09.ppt
Congestion_Control09.pptCongestion_Control09.ppt
Congestion_Control09.ppt
 
Lect9 (1)
Lect9 (1)Lect9 (1)
Lect9 (1)
 
Lect9
Lect9Lect9
Lect9
 
FATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureFATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network Architecture
 
Tcp (1)
Tcp (1)Tcp (1)
Tcp (1)
 
Tcp
TcpTcp
Tcp
 
NE #1.pptx
NE #1.pptxNE #1.pptx
NE #1.pptx
 
Lecture notes - Data Centers________.pptx
Lecture notes - Data Centers________.pptxLecture notes - Data Centers________.pptx
Lecture notes - Data Centers________.pptx
 
RIPE 80: Buffers and Protocols
RIPE 80: Buffers and ProtocolsRIPE 80: Buffers and Protocols
RIPE 80: Buffers and Protocols
 
Part9-congestion.pptx
Part9-congestion.pptxPart9-congestion.pptx
Part9-congestion.pptx
 
Congection control and Internet working
Congection control and Internet workingCongection control and Internet working
Congection control and Internet working
 
AusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBRAusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBR
 

Más de Hemanth Kumar Mantri

Más de Hemanth Kumar Mantri (8)

Basic Paxos Implementation in Orc
Basic Paxos Implementation in OrcBasic Paxos Implementation in Orc
Basic Paxos Implementation in Orc
 
Neural Networks in File access Prediction
Neural Networks in File access PredictionNeural Networks in File access Prediction
Neural Networks in File access Prediction
 
Connected Components Labeling
Connected Components LabelingConnected Components Labeling
Connected Components Labeling
 
JPEG Image Compression
JPEG Image CompressionJPEG Image Compression
JPEG Image Compression
 
Traffic Simulation using NetLogo
Traffic Simulation using NetLogoTraffic Simulation using NetLogo
Traffic Simulation using NetLogo
 
Search Engine Switching
Search Engine SwitchingSearch Engine Switching
Search Engine Switching
 
Hadoop and MapReduce
Hadoop and MapReduceHadoop and MapReduce
Hadoop and MapReduce
 
Auto Tuning
Auto TuningAuto Tuning
Auto Tuning
 

Último

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Último (20)

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

TCP Issues in DataCenter Networks

  • 1. TCP Issues in Virtualized Datacenter Networks Hemanth Kumar Mantri Department of Computer Science 1 of 27
  • 2. Selected Papers • The TCP Outcast Problem: Exposing Unfairness in Data Center Networks. – NSDI’12 • vSnoop: Improving TCP Throughput in VirtualizedEnvironments via Ack Offload. – ACM/IEEE SC, 2010 2 of 27
  • 3. Background and Motivation • Data center is a shared environment – Multi Tenancy • Virtualization: A key enabler of cloud computing – Amazon EC2 • Resource sharing – CPU/Memory are strictly shared – Network sharing largely laissez-faire 3 of 27
  • 4. Data Center Networks • Flows compete via TCP • Ideally, TCP should achieve true fairness – All flows get equal share of link capacity • In practice, TCP exhibits RTT-bias – Throughput is inversely proportional to RTT • 2 Major Issues – Unfairness (in general) – Low Throughput (in virtualized environments) 4 of 27
  • 6. Traffic Pattern: Many to One 6 of 27
  • 7. Key Find: Unfairness Inverse RTT Bias? Low RTT = Low Throughput 7 of 27
  • 8. Further Investigation Instantaneous Average 2-hop flow is consistently starved!! TCP Outcast Problem • Some Flows are ‘Outcast’ed and receive very low throughput compared to others • Almost an order of magnitude reduction in some cases 8 of 27
  • 9. Experiments • Same RTTs • Same Hop Length • Unsynchronized Flows • Introduce Background Traffic • Vary Switch Buffer Size • Vary TCP – RENO, MP-TCP, BIC, Cubic + SACK • Unfairness Persists! 9 of 27
  • 10. Observation Flow differential at input ports is the culprit! 10 of 27
  • 11. Vary #flows at competing bottle neck switch 11 of 27
  • 12. Reason: Port Blackout 1. Packets are roughly same size 2. Similar inter-arrival rates (Predictable Timing) 12 of 27
  • 13. Port Blackout • Can occur on any input port • Happens for small intervals of time • Has more catastrophic effect on throughput of fewer flows!! – Experiments showed that “same number” of packet drops affect the throughput of fewer flows much more than if there were several concurrent flows. 13 of 27
  • 14. Conditions for TCP Outcast 14 of 27
  • 15. Solutions? • Stochastic Fair Queuing (SFQ) – Explicitly enforce fairness among flows – Expensive for commodity switches • Equal Length Routing – All flows are forced to go through Core – Better interleaving of packets, alleviate PB 15 of 27
  • 16. • Multiple VMs hosted by one physical host • Multiple VMs sharing the same core – Flexibility, scalability, and economy VM Consolidation Hardware Virtualization Layer VM 1 VM 3 VM 4VM 2 Observation: VM consolidation negatively impacts network performance! 16 of 27
  • 17. Sender Hardware Virtualization Layer Investigating the Problem Server VM 1 VM 2 VM 3 Client 17 of 27
  • 18. 40 60 80 100 120 140 160 180 5432 RTT(ms) Number of VMs RTT increases in proportion to VM scheduling slice (30ms) Effect of CPU Sharing 18 of 27
  • 19. Exact Culprit Sender Hardware Driver Domain (dom0) VM 1 Device Driver VM 3 bufbuf VM 2 buf 19 of 27
  • 20. Connection to the VM is much slower than dom0! Impact on TCP Throughput + dom0 x VM 20 of 27
  • 21. Solution: vSnoop • Alleviates the negative effect of VM scheduling on TCP throughput • Implemented within the driver domain to accelerate TCP connections • Does not require any modifications to the VM • Does not violate end-to-end TCP semantics • Applicable across a wide range of VMMs – Xen, VMware, KVM, etc. 21 of 27
  • 22. Sender VM1 BufferDriver Domain Time SYN SYN,ACK SYN SYN,ACK VM1 buffer TCP Connection to a VM Scheduled VM VM1 VM2 VM3 VM1 VM2 VM3 SYN,ACK SYN VM Scheduling Latency RTT RTT VM Scheduling Latency Sender establishes a TCP connection to VM1 22 of 27
  • 23. Sender VM Shared BufferDriver Domain Time SYN SYN,ACK SYN SYN,ACK VM1 buffer Key Idea: Acknowledgement Offload Scheduled VM VM1 VM2 VM3 VM1 VM2 VM3 SYN,ACK w/ vSnoop Faster progress during TCP slowstart 23 of 27
  • 24. • Challenge 1: Out-of-order/special packets (SYN, FIN packets) • Solution: Let the VM handle these packets • Challenge 2: Packet loss after vSnoop • Solution: Let vSnoop acknowledge only if room in buffer • Challenge 3: ACKs generated by the VM • Solution: Suppress/rewrite ACKs already generated by vSnoop Challenges 24 of 27
  • 25. vSnoop Implementation in Xen Driver Domain (dom0) Bridge Netfront Netback vSnoop VM1 Netfront Netback VM3 Netfront Netback VM2 buf bufbuf Tuning Netfront 25 of 27
  • 26. Median 0.192MB/s 0.778MB/s 6.003MB/s TCP Throughput Improvement • 3 VMs consolidated, 1000 transfers of a 100KB file • Vanilla Xen, Xen+tuning, Xen+tuning+vSnoop 30x Improvement + Vanilla Xen x Xen+tuning * Xen+tuning+vSnoop 26 of 27
  • 27. Thank You! • References – http://friends.cs.purdue.edu/dokuwiki/doku.php – https://www.usenix.org/conference/nsdi12/tech- schedule/technical-sessions • Most animations and pictures are taken from the authors’ original slides and NSDI’12 conference talk. 27 of 27
  • 29. Conditions for Outcast • Switches use the tail-drop queue management discipline • A large set of flows and a small set of flows arriving at two different input ports compete for a bottleneck output port at a switch 29
  • 30. Why does Unfairness Matter? • Multi Tenant Clouds – Some tenants get better performance than others • Map Reduce Apps – Straggler problems – One delayed flow affects overall job completion 30
  • 31. State Machine Maintained Per- FlowStart Unexpected Sequence Active (online) No buffer (offline) Out-of-order packet In-order pkt Buffer space available Out-of-order packet In-order pkt No buffer In-order pkt Buffer space available No buffer Packet recv Early acknowledgements for in-order packets Don’t acknowledge Pass out-of-order pkts to VM 31
  • 32. vSnoop’s Impact on TCP Flows • Slow Start – Early acknowledgements help progress connections faster – Most significant benefit for short transfers that are more prevalent in data centers • Congestion Avoidance and Fast Retransmit – Large flows in the steady state can also benefit from vSnoop – Benefit not as much as for Slow Start 32