SlideShare a Scribd company logo
1 of 40
Hedera: Dynamic Flow
Scheduling for Data Center
Network
Mohammad Al-Fares, Sivasankar
Radhakrishnan, Barath Raghavan, Nelson
Huang, Amin Vahdat
- USENIX NSDI 2010 -
1
Presenter: Jason, Tsung-Cheng, HOU
Advisor: Wanjiun Liao
Dec. 22nd, 2011
Problem
• Relying on multipathing, due to…
– Limited port densities of
routers/switches
– Horizontal expansion
• Multi-rooted tree topologies
– Example: Fat-tree / Clos
2
Problem
• BW demand is essential and volatile
– Must route among multiple paths
– Avoid bottlenecks and deliver aggre. BW
• However, current multipath routing…
– Mostly: flow-hash-based ECMP
– Static and oblivious to link-utilization
– Causes long-term large-flow collisions
• Inefficiently utilizing path diversity
– Need a protocol or a scheduler
3
Collisions of elephant flows
• Collisions in two ways: Upward or Downward
D1S1 D2S2 D3S3 D4S4
Equal Cost Paths
• Many equal cost paths going up to the core
switches
• Only one path down from each core switch
• Need to find good flow-to-core mapping
DS
Goal
• Given a dynamic flow demands
– Need to find paths that maximize
network bisection BW
– No end hosts modifications
• However, local switch information is
unable to find proper allocation
– Need a central scheduler
– Must use commodity Ethernet switches
– OpenFlow
6
Architecture
• Detect Large Flows
– Flows that need bandwidth but are network-limited
• Estimate Flow Demands
– Use min-max fairness to allocate flows between SD
pairs
• Allocate Flows
– Use estimated demands to heuristically find better
placement of large flows on the EC paths
– Arrange switches and iterate again
Detect
Large Flows
Estimate
Flow Demands
Allocate Flows
Architecture
• Feedback loop
• Optimize achievable bisection BW by
assigning flow-to-core mappings
• Heuristics of flow demand estimation and
placement
• Central Scheduler
– Global knowledge of all links in the network
– Control tables of all switches (OpenFlow)
Detect
Large Flows
Estimate
Flow Demands
Allocate Flows
Elephant Detection
9
Elephant Detection
• Scheduler polls edge switches
– Flows exceeding threshold are “large”
– 10% of hosts’ link capacity (> 100Mbps)
• Small flows: Default ECMP hashing
• Hedera complements ECMP
– Default forwarding is ECMP
– Only schedules large flows contributing
to bisection BW bottlenecks
• Centralized functions: the essentials
10
Demand Estimation
11
Demand Estimation
• Current flow rate: misleading
– May be already constrained by network
• Need to find flow’s “natural” BW
demand when not limited by network
– As if only limited by NIC of S or D
• Allocate S/D capacity among flows
using max-min fairness
• Equals to BW allocation of optimal
routing, input to placement algorithm
12
Demand Estimation
• Given pairs of large flows, modify
each flow size at S/D iteratively
– S distributes unconv. BW among flows
– R limited: redistributes BW among
excessive-demand flows
– Repeat until all flows converge
• Guaranteed to converge in O(|F|)
– Linear to no. of flows
13
Demand Estimation
A
B
C
X
Y
Flow Estimate Conv. ?
AX
AY
BY
CY
Sender
Available
Unconv. BW
Flows Share
A 1 2 1/2
B 1 1 1
C 1 1 1
Senders
Demand Estimation
Recv RL?
Non-SL
Flows
Share
X No - -
Y Yes 3 1/3
Receivers
Flow Estimate Conv. ?
AX 1/2
AY 1/2
BY 1
CY 1
A
B
C
X
Y
Demand Estimation
Flow Estimate Conv. ?
AX 1/2
AY 1/3 Yes
BY 1/3 Yes
CY 1/3 Yes
Sender
Available
Unconv. BW
Flows Share
A 2/3 1 2/3
B 0 0 0
C 0 0 0
Senders
A
B
C
X
Y
Demand Estimation
Flow Estimate Conv. ?
AX 2/3 Yes
AY 1/3 Yes
BY 1/3 Yes
CY 1/3 Yes
Recv RL?
Non-SL
Flows
Share
X No - -
Y No - -
Receivers
A
B
C
X
Y
Placement Heuristics
18
Placement Heuristics
• Find a good large-flow-to-core mapping
– such that average bisection BW is maximized
• Two approaches
• Global First Fit: Greedily choose path that
has sufficient unreserved BW
– O([ports/switch]2)
• Simulated Annealing: Iteratively find a
globally better mapping of paths to flows
– O(# flows)
Global First-Fit
• New flow found, linearly search all paths from SD
• Place on first path with links can fit the flow
• Once flow ends, entries + reservations time out
?
Flow A
Flow B
Flow C
? ?
0 1 2 3
Scheduler
S D
Simulated Annealing
• Annealing: letting metal to cool down
and get better crystal structure
– Heating up to enter higher energy state
– Cooling to lower energy state with a
better structure and stopping at a temp
• Simulated Annealing:
– Search neighborhood for possible states
– Probabilistically accepting worse state
– Accepting better state, settle gradually
– Avoid local minima 21
Simulated Annealing
• State / State Space
– Possible solutions
• Energy
– Objective
• Neighborhood
– Other options
• Boltzman’s Function
– Prob. to higher state
• Control Temperature
– Current temp. affect
prob. to higher state
• Cooling Schedule
– How temp. falls
• Stopping Criterion
22
)/(1)( tEEP
Simulated Annealing
• State Space:
– All possible large-flow-to-core mappings
– However, same destinations map to same core
– Reduce state space, as long as not too many
large flows and proper threshold
• Neighborhood:
– Swap cores for two hosts within same pod,
attached to same edge / aggregate
– Avoids local minima
23
Simulated Annealing
• Energy:
– Estimated demand of flows
– Total exceeded BW capacity of links, minimize
• Temperature: remaining iterations
• Probability:
• Final state is published to switches and
used as initial state for next round
• Incremental calculation of exceeded cap.
• No recalculation of all links, only new large
flows found and neighborhood swaps 24
Evaluation
25
Implementation
• 16 hosts, k=4 fat-tree data plane
– 20 switches: 4-port NetFPGAs / OpenFlow
– Parallel 48-port non-blocking Quanta switch
– 1 scheduler, OpenFlow control protocol
– Testbed: PortLand
26
Simulator
• k=32; 8,192 hosts
– Pack-level simulators not applicable
– 1Gbps for 8k hosts, takes 2.5x1011 pkts
• Model TCP flows
– TCP’s AIMD when constrained by topology
– Poisson arrival of flows
– No pkt size variations
– No bursty traffic
– No inter-flow dynamics
27
PortLand/OpenFlow, k=4
28
Simulator
29
Reactiveness
• Demand Estimation:
– 27K hosts, 250K flows, converges < 200ms
• Simulated Annealing:
– Asymptotically dependent on # of flows + #
iter., 50K flows and 1K iter.: 11ms
– Most of final bisection BW: few hundred iter.
• Scheduler control loop:
– Polling + Est. + SA = 145ms for 27K hosts
Comments
31
Comments
• Destine to same host, via same core
– May congest at cores, but how severe?
– Large flows to/from a host: <k/2
– No proof, no evaluation
• Decrease search space and runtime
– Scalable for per-flow basis? For large k?
• No protection for mice flows, RPCs
– Only assumes work well under ECMP
– No address when route with large flows
32
Comments
• Own flow-level simulator
– Aim to saturate network
– No flow number by different size
– Traffic generation: avg. flow size and arrival
rates (Poisson) with a mean
– Only above descriptions, no specific numbers
– Too ideal or not volatile enough?
– Avg. bisection BW, but real-time graphs?
• States that per-flow VLB = per-flow ECMP
– Does not compare with other options (VL2)
– No further elaboration
33
Comments
• Shared responsibility
– Controller only deals with critical situations
– Switches perform default measures
– Improves performance and saves time
– How to strike a balance?
– Adopt to different problems?
• Default multipath routing
– States problems of per-flow VLB and ECMP
– How about per-pkt? Author’s future work
– How to improve switches’ default actions?
34
Comments
• Critical controller actions
– Considers large flows degrade overall efficiency
– What are critical situations?
– How to detect and react?
– How to improve reactiveness and adaptability?
• Amin Vahdat’s lab
– Proposes fat-tree topology
– Develops PortLand L2 virtualization
– Hedera: enhances multipath performance
– Integrate all above
35
References
• M. Al-Fares, et. al., “Hedera: Dynamic Flow Scheduling for
Data Center Network”, USENIX NSDI 2010
• Tathagata Das, “Hedera: Dynamic Flow Scheduling for Data
Center Networks”, UC Berkeley course CS 294
• M. Al-Fares, “Hedera: Dynamic Flow Scheduling for Data
Center Network”, USENIX NSDI 2010, slides
36
Supplement
37
Fault-Tolerance
• Link / Switch failure
– Use PortLand’s fault notification protocol
– Hedera routes around failed components
0 1 3
Flow A
Flow B
Flow C
2
Scheduler
Fault-Tolerance
• Scheduler failure
– Soft-state, not required for correctness
(connectivity)
– Switches fall back to ECMP
0 1 3
Flow A
Flow B
Flow C
2
Scheduler
Limitations
• Dynamic workloads,
large flow turnover
faster than control
loop
– Scheduler will be
continually chasing
the traffic matrix
• Need to include
penalty term for
unnecessary SA flow
re-assignmentsFlow Size
MatrixStability
StableUnstable
ECMP Hedera

More Related Content

What's hot

ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Sri Ambati
 
Adversarial Attacks and Defenses in Deep Learning.pdf
Adversarial Attacks and Defenses in Deep Learning.pdfAdversarial Attacks and Defenses in Deep Learning.pdf
Adversarial Attacks and Defenses in Deep Learning.pdfMichelleHoogenhout
 
Optimization technique genetic algorithm
Optimization technique genetic algorithmOptimization technique genetic algorithm
Optimization technique genetic algorithmUday Wankar
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Universitat Politècnica de Catalunya
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)cairo university
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)Susang Kim
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...NILESH VERMA
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Edureka!
 
Machine Learning Interpretability / Explainability
Machine Learning Interpretability / ExplainabilityMachine Learning Interpretability / Explainability
Machine Learning Interpretability / ExplainabilityRaouf KESKES
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentationAyanaRukasar
 

What's hot (20)

ensemble learning
ensemble learningensemble learning
ensemble learning
 
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
Explaining Black-Box Machine Learning Predictions - Sameer Singh, Assistant P...
 
Adversarial Attacks and Defenses in Deep Learning.pdf
Adversarial Attacks and Defenses in Deep Learning.pdfAdversarial Attacks and Defenses in Deep Learning.pdf
Adversarial Attacks and Defenses in Deep Learning.pdf
 
Optimization technique genetic algorithm
Optimization technique genetic algorithmOptimization technique genetic algorithm
Optimization technique genetic algorithm
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)Machine Learning lecture4(logistic regression)
Machine Learning lecture4(logistic regression)
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)[Paper] attention mechanism(luong)
[Paper] attention mechanism(luong)
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...Demystifying NLP Transformers: Understanding the Power and Architecture behin...
Demystifying NLP Transformers: Understanding the Power and Architecture behin...
 
Rnn and lstm
Rnn and lstmRnn and lstm
Rnn and lstm
 
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
Restricted Boltzmann Machine | Neural Network Tutorial | Deep Learning Tutori...
 
Machine Learning Interpretability / Explainability
Machine Learning Interpretability / ExplainabilityMachine Learning Interpretability / Explainability
Machine Learning Interpretability / Explainability
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Introduction to Deep Learning
Introduction to Deep Learning Introduction to Deep Learning
Introduction to Deep Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 

Similar to Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

Valiant Load Balancing and Traffic Oblivious Routing
Valiant Load Balancing and Traffic Oblivious RoutingValiant Load Balancing and Traffic Oblivious Routing
Valiant Load Balancing and Traffic Oblivious RoutingJason TC HOU (侯宗成)
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureGunawan Jusuf
 
FATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureFATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureAnkita Mahajan
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksJason TC HOU (侯宗成)
 
24-ad-hoc.ppt
24-ad-hoc.ppt24-ad-hoc.ppt
24-ad-hoc.pptsumadi26
 
Energy Efficient Routing Approaches in Ad-hoc Networks
                Energy Efficient Routing Approaches in Ad-hoc Networks                Energy Efficient Routing Approaches in Ad-hoc Networks
Energy Efficient Routing Approaches in Ad-hoc NetworksKishan Patel
 
Introduction to backwards learning algorithm
Introduction to backwards learning algorithmIntroduction to backwards learning algorithm
Introduction to backwards learning algorithmRoshan Karunarathna
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.pptPatrick Theuri
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.pptVimalMallick
 
12-adhocssasalirezaalirezalakakssaas.ppt
12-adhocssasalirezaalirezalakakssaas.ppt12-adhocssasalirezaalirezalakakssaas.ppt
12-adhocssasalirezaalirezalakakssaas.pptalirezakgm
 
Routing protocols-network-layer
Routing protocols-network-layerRouting protocols-network-layer
Routing protocols-network-layerNitesh Singh
 
AusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBRAusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBRAPNIC
 
CS553_ST7_Ch15-LANOverview (1).ppt
CS553_ST7_Ch15-LANOverview (1).pptCS553_ST7_Ch15-LANOverview (1).ppt
CS553_ST7_Ch15-LANOverview (1).pptMekiPetitSeg
 
CS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptCS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptSmitNiks
 
CS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptCS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptssuser2cc0d4
 
NZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRNZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRAPNIC
 
Tcp congestion control topic in high speed network
Tcp congestion control topic  in high speed networkTcp congestion control topic  in high speed network
Tcp congestion control topic in high speed networkGOKULKANNANMMECLECTC
 
RIPE 76: TCP and BBR
RIPE 76: TCP and BBRRIPE 76: TCP and BBR
RIPE 76: TCP and BBRAPNIC
 

Similar to Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN) (20)

Data Center Network Multipathing
Data Center Network MultipathingData Center Network Multipathing
Data Center Network Multipathing
 
Valiant Load Balancing and Traffic Oblivious Routing
Valiant Load Balancing and Traffic Oblivious RoutingValiant Load Balancing and Traffic Oblivious Routing
Valiant Load Balancing and Traffic Oblivious Routing
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture
 
FATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network ArchitectureFATTREE: A scalable Commodity Data Center Network Architecture
FATTREE: A scalable Commodity Data Center Network Architecture
 
DevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance NetworksDevoFlow - Scaling Flow Management for High-Performance Networks
DevoFlow - Scaling Flow Management for High-Performance Networks
 
24-ad-hoc.ppt
24-ad-hoc.ppt24-ad-hoc.ppt
24-ad-hoc.ppt
 
Energy Efficient Routing Approaches in Ad-hoc Networks
                Energy Efficient Routing Approaches in Ad-hoc Networks                Energy Efficient Routing Approaches in Ad-hoc Networks
Energy Efficient Routing Approaches in Ad-hoc Networks
 
Introduction to backwards learning algorithm
Introduction to backwards learning algorithmIntroduction to backwards learning algorithm
Introduction to backwards learning algorithm
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.ppt
 
layer2-network-design.ppt
layer2-network-design.pptlayer2-network-design.ppt
layer2-network-design.ppt
 
12-adhocssasalirezaalirezalakakssaas.ppt
12-adhocssasalirezaalirezalakakssaas.ppt12-adhocssasalirezaalirezalakakssaas.ppt
12-adhocssasalirezaalirezalakakssaas.ppt
 
Quality of service
Quality of serviceQuality of service
Quality of service
 
Routing protocols-network-layer
Routing protocols-network-layerRouting protocols-network-layer
Routing protocols-network-layer
 
AusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBRAusNOG 2019: TCP and BBR
AusNOG 2019: TCP and BBR
 
CS553_ST7_Ch15-LANOverview (1).ppt
CS553_ST7_Ch15-LANOverview (1).pptCS553_ST7_Ch15-LANOverview (1).ppt
CS553_ST7_Ch15-LANOverview (1).ppt
 
CS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptCS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.ppt
 
CS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.pptCS553_ST7_Ch15-LANOverview.ppt
CS553_ST7_Ch15-LANOverview.ppt
 
NZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBRNZNOG 2020: Buffers, Buffer Bloat and BBR
NZNOG 2020: Buffers, Buffer Bloat and BBR
 
Tcp congestion control topic in high speed network
Tcp congestion control topic  in high speed networkTcp congestion control topic  in high speed network
Tcp congestion control topic in high speed network
 
RIPE 76: TCP and BBR
RIPE 76: TCP and BBRRIPE 76: TCP and BBR
RIPE 76: TCP and BBR
 

More from Jason TC HOU (侯宗成)

More from Jason TC HOU (侯宗成) (11)

A Data Culture in Daily Work - Examples @ KKTV
A Data Culture in Daily Work - Examples @ KKTVA Data Culture in Daily Work - Examples @ KKTV
A Data Culture in Daily Work - Examples @ KKTV
 
Triangulating Data to Drive Growth
Triangulating Data to Drive GrowthTriangulating Data to Drive Growth
Triangulating Data to Drive Growth
 
Design & Growth @ KKTV - uP!ck Sharing
Design & Growth @ KKTV - uP!ck SharingDesign & Growth @ KKTV - uP!ck Sharing
Design & Growth @ KKTV - uP!ck Sharing
 
文武雙全的產品設計 DESIGNING WITH DATA
文武雙全的產品設計 DESIGNING WITH DATA文武雙全的產品設計 DESIGNING WITH DATA
文武雙全的產品設計 DESIGNING WITH DATA
 
Growth @ KKTV
Growth @ KKTVGrowth @ KKTV
Growth @ KKTV
 
Growth 的基石 用戶行為追蹤
Growth 的基石   用戶行為追蹤Growth 的基石   用戶行為追蹤
Growth 的基石 用戶行為追蹤
 
App 的隱形殺手 - 留存率
App 的隱形殺手 - 留存率App 的隱形殺手 - 留存率
App 的隱形殺手 - 留存率
 
Software-Defined Networking , Survey of HotSDN 2012
Software-Defined Networking , Survey of HotSDN 2012Software-Defined Networking , Survey of HotSDN 2012
Software-Defined Networking , Survey of HotSDN 2012
 
Software-Defined Networking SDN - A Brief Introduction
Software-Defined Networking SDN - A Brief IntroductionSoftware-Defined Networking SDN - A Brief Introduction
Software-Defined Networking SDN - A Brief Introduction
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
 
OpenStack Framework Introduction
OpenStack Framework IntroductionOpenStack Framework Introduction
OpenStack Framework Introduction
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Hedera - Dynamic Flow Scheduling for Data Center Networks, an Application of Software-Defined Networking (SDN)

  • 1. Hedera: Dynamic Flow Scheduling for Data Center Network Mohammad Al-Fares, Sivasankar Radhakrishnan, Barath Raghavan, Nelson Huang, Amin Vahdat - USENIX NSDI 2010 - 1 Presenter: Jason, Tsung-Cheng, HOU Advisor: Wanjiun Liao Dec. 22nd, 2011
  • 2. Problem • Relying on multipathing, due to… – Limited port densities of routers/switches – Horizontal expansion • Multi-rooted tree topologies – Example: Fat-tree / Clos 2
  • 3. Problem • BW demand is essential and volatile – Must route among multiple paths – Avoid bottlenecks and deliver aggre. BW • However, current multipath routing… – Mostly: flow-hash-based ECMP – Static and oblivious to link-utilization – Causes long-term large-flow collisions • Inefficiently utilizing path diversity – Need a protocol or a scheduler 3
  • 4. Collisions of elephant flows • Collisions in two ways: Upward or Downward D1S1 D2S2 D3S3 D4S4
  • 5. Equal Cost Paths • Many equal cost paths going up to the core switches • Only one path down from each core switch • Need to find good flow-to-core mapping DS
  • 6. Goal • Given a dynamic flow demands – Need to find paths that maximize network bisection BW – No end hosts modifications • However, local switch information is unable to find proper allocation – Need a central scheduler – Must use commodity Ethernet switches – OpenFlow 6
  • 7. Architecture • Detect Large Flows – Flows that need bandwidth but are network-limited • Estimate Flow Demands – Use min-max fairness to allocate flows between SD pairs • Allocate Flows – Use estimated demands to heuristically find better placement of large flows on the EC paths – Arrange switches and iterate again Detect Large Flows Estimate Flow Demands Allocate Flows
  • 8. Architecture • Feedback loop • Optimize achievable bisection BW by assigning flow-to-core mappings • Heuristics of flow demand estimation and placement • Central Scheduler – Global knowledge of all links in the network – Control tables of all switches (OpenFlow) Detect Large Flows Estimate Flow Demands Allocate Flows
  • 10. Elephant Detection • Scheduler polls edge switches – Flows exceeding threshold are “large” – 10% of hosts’ link capacity (> 100Mbps) • Small flows: Default ECMP hashing • Hedera complements ECMP – Default forwarding is ECMP – Only schedules large flows contributing to bisection BW bottlenecks • Centralized functions: the essentials 10
  • 12. Demand Estimation • Current flow rate: misleading – May be already constrained by network • Need to find flow’s “natural” BW demand when not limited by network – As if only limited by NIC of S or D • Allocate S/D capacity among flows using max-min fairness • Equals to BW allocation of optimal routing, input to placement algorithm 12
  • 13. Demand Estimation • Given pairs of large flows, modify each flow size at S/D iteratively – S distributes unconv. BW among flows – R limited: redistributes BW among excessive-demand flows – Repeat until all flows converge • Guaranteed to converge in O(|F|) – Linear to no. of flows 13
  • 14. Demand Estimation A B C X Y Flow Estimate Conv. ? AX AY BY CY Sender Available Unconv. BW Flows Share A 1 2 1/2 B 1 1 1 C 1 1 1 Senders
  • 15. Demand Estimation Recv RL? Non-SL Flows Share X No - - Y Yes 3 1/3 Receivers Flow Estimate Conv. ? AX 1/2 AY 1/2 BY 1 CY 1 A B C X Y
  • 16. Demand Estimation Flow Estimate Conv. ? AX 1/2 AY 1/3 Yes BY 1/3 Yes CY 1/3 Yes Sender Available Unconv. BW Flows Share A 2/3 1 2/3 B 0 0 0 C 0 0 0 Senders A B C X Y
  • 17. Demand Estimation Flow Estimate Conv. ? AX 2/3 Yes AY 1/3 Yes BY 1/3 Yes CY 1/3 Yes Recv RL? Non-SL Flows Share X No - - Y No - - Receivers A B C X Y
  • 19. Placement Heuristics • Find a good large-flow-to-core mapping – such that average bisection BW is maximized • Two approaches • Global First Fit: Greedily choose path that has sufficient unreserved BW – O([ports/switch]2) • Simulated Annealing: Iteratively find a globally better mapping of paths to flows – O(# flows)
  • 20. Global First-Fit • New flow found, linearly search all paths from SD • Place on first path with links can fit the flow • Once flow ends, entries + reservations time out ? Flow A Flow B Flow C ? ? 0 1 2 3 Scheduler S D
  • 21. Simulated Annealing • Annealing: letting metal to cool down and get better crystal structure – Heating up to enter higher energy state – Cooling to lower energy state with a better structure and stopping at a temp • Simulated Annealing: – Search neighborhood for possible states – Probabilistically accepting worse state – Accepting better state, settle gradually – Avoid local minima 21
  • 22. Simulated Annealing • State / State Space – Possible solutions • Energy – Objective • Neighborhood – Other options • Boltzman’s Function – Prob. to higher state • Control Temperature – Current temp. affect prob. to higher state • Cooling Schedule – How temp. falls • Stopping Criterion 22 )/(1)( tEEP
  • 23. Simulated Annealing • State Space: – All possible large-flow-to-core mappings – However, same destinations map to same core – Reduce state space, as long as not too many large flows and proper threshold • Neighborhood: – Swap cores for two hosts within same pod, attached to same edge / aggregate – Avoids local minima 23
  • 24. Simulated Annealing • Energy: – Estimated demand of flows – Total exceeded BW capacity of links, minimize • Temperature: remaining iterations • Probability: • Final state is published to switches and used as initial state for next round • Incremental calculation of exceeded cap. • No recalculation of all links, only new large flows found and neighborhood swaps 24
  • 26. Implementation • 16 hosts, k=4 fat-tree data plane – 20 switches: 4-port NetFPGAs / OpenFlow – Parallel 48-port non-blocking Quanta switch – 1 scheduler, OpenFlow control protocol – Testbed: PortLand 26
  • 27. Simulator • k=32; 8,192 hosts – Pack-level simulators not applicable – 1Gbps for 8k hosts, takes 2.5x1011 pkts • Model TCP flows – TCP’s AIMD when constrained by topology – Poisson arrival of flows – No pkt size variations – No bursty traffic – No inter-flow dynamics 27
  • 30. Reactiveness • Demand Estimation: – 27K hosts, 250K flows, converges < 200ms • Simulated Annealing: – Asymptotically dependent on # of flows + # iter., 50K flows and 1K iter.: 11ms – Most of final bisection BW: few hundred iter. • Scheduler control loop: – Polling + Est. + SA = 145ms for 27K hosts
  • 32. Comments • Destine to same host, via same core – May congest at cores, but how severe? – Large flows to/from a host: <k/2 – No proof, no evaluation • Decrease search space and runtime – Scalable for per-flow basis? For large k? • No protection for mice flows, RPCs – Only assumes work well under ECMP – No address when route with large flows 32
  • 33. Comments • Own flow-level simulator – Aim to saturate network – No flow number by different size – Traffic generation: avg. flow size and arrival rates (Poisson) with a mean – Only above descriptions, no specific numbers – Too ideal or not volatile enough? – Avg. bisection BW, but real-time graphs? • States that per-flow VLB = per-flow ECMP – Does not compare with other options (VL2) – No further elaboration 33
  • 34. Comments • Shared responsibility – Controller only deals with critical situations – Switches perform default measures – Improves performance and saves time – How to strike a balance? – Adopt to different problems? • Default multipath routing – States problems of per-flow VLB and ECMP – How about per-pkt? Author’s future work – How to improve switches’ default actions? 34
  • 35. Comments • Critical controller actions – Considers large flows degrade overall efficiency – What are critical situations? – How to detect and react? – How to improve reactiveness and adaptability? • Amin Vahdat’s lab – Proposes fat-tree topology – Develops PortLand L2 virtualization – Hedera: enhances multipath performance – Integrate all above 35
  • 36. References • M. Al-Fares, et. al., “Hedera: Dynamic Flow Scheduling for Data Center Network”, USENIX NSDI 2010 • Tathagata Das, “Hedera: Dynamic Flow Scheduling for Data Center Networks”, UC Berkeley course CS 294 • M. Al-Fares, “Hedera: Dynamic Flow Scheduling for Data Center Network”, USENIX NSDI 2010, slides 36
  • 38. Fault-Tolerance • Link / Switch failure – Use PortLand’s fault notification protocol – Hedera routes around failed components 0 1 3 Flow A Flow B Flow C 2 Scheduler
  • 39. Fault-Tolerance • Scheduler failure – Soft-state, not required for correctness (connectivity) – Switches fall back to ECMP 0 1 3 Flow A Flow B Flow C 2 Scheduler
  • 40. Limitations • Dynamic workloads, large flow turnover faster than control loop – Scheduler will be continually chasing the traffic matrix • Need to include penalty term for unnecessary SA flow re-assignmentsFlow Size MatrixStability StableUnstable ECMP Hedera