20. Behavior
(from Phanishayee et al, “Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage
Systems”)
21. RFC 6298
(2.4) Whenever RTO is computed, if it is less than 1
second, then the RTO SHOULD be rounded up to 1 second.
- in practice, often 200ms
RFC 2581
The delayed ACK algorithm specified in [Bra89] SHOULD be
used by a TCP receiver. When used, a TCP receiver MUST NOT
excessively delay acknowledgments. Specifically, an ACK SHOULD
be generated for at least every second full-sized segment, and
MUST be generated within 500 ms of the arrival of the first
unacknowledged packet.
- in practice, often 40ms
22. Solutions
• Proposal 1: Adjust RTO (Vasudevan et al.)
• Proposal 2: DCTCP (Alizadeh et al.)
28. DCTCP
• Three goals
• Low latency for short flows
• High burst tolerance (incast)
• High throughput for long flows
• Basic approach: keep switch queues short
29. Queue Length
• RTT measurements are noisy
• At high speeds, very small
• GigE: 10 packets is 120μs
• 10GigE: 10 paciets is 12μs
• Use ECN (explicit congestion notification)
• RFC 3168
31. Monitoring α
• Per RTT, measure F, the fraction of packets
sent that had the ECN bit set
• DCTCP acks copy the ECN bit of the corresponding
data packets into ECN-Echo field
• Compute α, EWMA of F
33. DCTCP Caveat
“We stress that DCTCP is designed for the data
center environment. In this paper, we make no
claims about suitability of DCTCP for wide area
networks.”
34. Data Center Networks
• Very different than wide area Internet
• Tiny RTTs
• Different traffic patterns
• Single administrative domain
• Standards (e.g., IETF) much less important
• A lot of very novel network design