SlideShare una empresa de Scribd logo
1 de 46
Descargar para leer sin conexión
TCP Tuning for the Web
Jason Cook - @macros - jason@fastly.com
Wednesday, June 19, 13
Me
• Co-founder and Operations at Fastly
• Former Operations Engineer at Wikia
• Lots of Sysadmin and Linux consulting
Wednesday, June 19, 13
Wednesday, June 19, 13
The Goal
• Make the best use of our limited resources to deliver the best user
experience
Wednesday, June 19, 13
Focus
Wednesday, June 19, 13
Linux
• I like it
• I use it
• It won’t hurt my feelings if you don’t
• Examples will be aimed primarily at linux
Wednesday, June 19, 13
Small Requests
• Optimizing towards small objects like html and js/css
Wednesday, June 19, 13
Just Enough TCP
• Not a deep dive into TCP
Wednesday, June 19, 13
The accept() loop
Wednesday, June 19, 13
Entry point from the kernel to
your application
• Client sends SYN
• Kernel hands SYN to Server
• Server calls accept()
• Kernel sends SYN/ACK
• Client sends ACK
Wednesday, June 19, 13
Backlog
• Number of connections allowed to be in a SYN state
• Kernel will drop new SYNs when this limit is hit
• Clients wait 3s before trying again, then 9s on second failure
Wednesday, June 19, 13
Backlog Tuning (kernel side)
• net.ipv4.tcp_max_syn_backlog and net.core.somaxconn
• Default value of 1024 is for “systems with more than 128MB of
memory”
• 64 bytes per entry
• Undocumented max of 65535
Wednesday, June 19, 13
Backlog Tuning (app side)
• Set when you call listen()
• nginx, redis, apache default to 511
• mysql default of 50
Wednesday, June 19, 13
DDoS Handling
Wednesday, June 19, 13
The SYN Flood
• Resource exhaustion attack
• Cheaper for attacker than target
• Client sends SYN with bogus return address
• Until the ACK is completed the connection occupies a slot in the
backlog queue
Wednesday, June 19, 13
The SYN Cookie
• When enabled, kicks in when the backlog queue is full
• Sends a slightly more expensive but carefully crafted SYN/ACK then
drops the SYN from the queue
• If the client responds to the SYN/ACK it can rebuild the original SYN
and proceed
• Does disable large windows when active, but better than dropping
entirely
Wednesday, June 19, 13
Dealing with a SYN Flood
• Default to syncookies enabled and alert when triggered
• tcpdump/wireshark
• Frequently attacks have a detectable signature such as all having the same initial window size
• iptables is very flexible for matching these signatures, but can be expensive
• If hardware filters are available, use iptables to identify and hardware to block
Wednesday, June 19, 13
The Joys of Hardware
Wednesday, June 19, 13
Queues
• Modern network hardware is multi-queue
• By default assigns a queue per cpu core
• Should get even balancing of incoming requests, irqbalance can mess
that up
• Intel ships a script with their drivers to aid in static assignment to
avoid irqbalance
Wednesday, June 19, 13
Packet Filters
• Intel and others have packet filters in their nics
• Small, only 128 wide in the intel 82599
• Much more limited matchers than iptables
• src,dst,type,port,vlan
Wednesday, June 19, 13
Hardware Flow Director
• Same mechanism as filters, just includes a mapping destination
• Set affinity to put both queue and app on same core
• Good for things like SSH and BGPD
• Maintain access in the face of an attack on other services
Wednesday, June 19, 13
TCP Offload Engines
Wednesday, June 19, 13
Full Offload
• Not so good on the public net
• Limited buffer resources on card
• Black box from a security perspective
• Can break features like filtering and QoS
Wednesday, June 19, 13
Partial Offload
• Better, but with their own caveats
Wednesday, June 19, 13
Large Receive Offload
• Collapses packets into a larger buffer before handing to OS
• Great for large volume ingress, which is not http
• Doesn’t work with ipv6
Wednesday, June 19, 13
Generic Receive Offload
• Similar to LRO, but more careful
• Will only merge “safe” packets into a buffer
• Will flush at timestamp tick
• Usually a win and you should test
Wednesday, June 19, 13
TCP Segementation
• OS fills a 64KB buffer and produces a single header template
• NIC splits the buffer into segements and checksums before sending
• Can save lots of overhead, but not much of a win in small request/
response cycles
Wednesday, June 19, 13
TX/RX Checksumming
• For small packets there is almost no win here
Wednesday, June 19, 13
Bonding
• Linux bonding driver uses a single queue, large bottleneck for high
packet rates
• teaming driver should be better, userspace tools only worked in
modern fedora core so gave up
• Myricom hardware can do bonding natively
Wednesday, June 19, 13
TCP Slow Start
Wednesday, June 19, 13
Why Slow Start
• Early TCP implementations allowed the sender to immediately send as
much data as the client window allowed
• In 1986 the internet suffered first congestion collapse
• 1000x reduction in effective throughput
• 1988Van Jacobsen proposes Slow Start
Wednesday, June 19, 13
Slow Start
• Goal is to avoid putting more data in flight than there is bandwidth
available
• Client sends a receive window size of how much data they can buffer
• Server sends an inital burst based on server inital congestion window
• Double the window size for each received ACK
• Increases until a packet drop or slow start threshold is reached
Wednesday, June 19, 13
Tuning Slow Start
• Increase your congestion window
• 2.6.39+ defaults to 10
• 2.6.19+ can be set as a path attribute
• ip route change default via 172.16.0.1 dev eth0 proto static initcwnd 10
Wednesday, June 19, 13
Proportional Rate Reduction
• Added in linux 3.2
• Prior to PRR a loss would halve to congestion window potentially
below the slow start threshold
• PRR paces retransmits to smooth out
• Makes disabling net.ipv4.tcp_no_metrics_save safer
Wednesday, June 19, 13
TCP Buffering
Wednesday, June 19, 13
Throughput = Buffer Size / Latency
Wednesday, June 19, 13
Buffer Tuning
• 16MB buffer at 50ms RTT = 320MB/s max rate
• Set as 3 values; min, default, and max
• net.ipv4.tcp_rmem = 4096 65536 16777216
• net.ipv4.tcp_wmem = 4096 65536 16777216
Wednesday, June 19, 13
TIME_WAIT
• State entered after a server has closed the connection
• Kept around in case of delayed duplicate ACKs to our FIN
Wednesday, June 19, 13
Busy servers collect a lot
• Default timeout is 2 * FIN timeout, so120s in linux
• Worth dropping FIN timeout
• net.ipv4.tcp_fin_timeout = 10
Wednesday, June 19, 13
Tuning TIME_WAIT
• net.ipv4.tcp_tw_reuse=1
• Reuse sockets in TIME_WAIT if safe to do so
• net.ipv4.tcp_max_tw_buckets
• Default of 131072, way too small for most sites
• Connections/s * timeout value
Wednesday, June 19, 13
net.ipv4.tcp_tw_recyle
•EVIL!
Wednesday, June 19, 13
net.ipv4.tcp_tw_reuse
• Allows the reuse of a TIME_WAIT socket if the client’s timestamp
increases
• Silently drops SYNs from the client if they don’t, like can happen
behind NAT
• Still useful for high churn servers when you can make assumptions of
local network
Wednesday, June 19, 13
SSL and Keepalives
• Enable them if at all possible
• The initial handshake is the most computationally intensive part
• 1-10ms on modern hardware
• 6 * RTT before user gets data really sucks over long distances
• SV to LON = 160ms = 1s before user starts getting content
Wednesday, June 19, 13
SSL and Geo
• If you can move SSL termination closer to you users, do so
• Most CDNs offer this, even for non-cacheable things
• EC2 and Route53 is another option
Wednesday, June 19, 13
In closing
• Upgrade your kernel
• Increase your initial congestion window
• Check your backlog and time wait limits
• Size your buffers to something reasonable
• Get closer to your users if you can
Wednesday, June 19, 13
Thanks!
Jason Cook
@macros
jason@fastly.com
Wednesday, June 19, 13

Más contenido relacionado

Destacado

A particle filter based scheme for indoor tracking on an Android Smartphone
A particle filter based scheme for indoor tracking on an Android SmartphoneA particle filter based scheme for indoor tracking on an Android Smartphone
A particle filter based scheme for indoor tracking on an Android SmartphoneDivye Kapoor
 
Rootkit 102 - Kernel-Based Rootkit
Rootkit 102 - Kernel-Based RootkitRootkit 102 - Kernel-Based Rootkit
Rootkit 102 - Kernel-Based RootkitChia-Hao Tsai
 
OMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with VolatlityOMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with VolatlityAndrew Case
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/CoreShay Cohen
 
The TCP/IP stack in the FreeBSD kernel COSCUP 2014
The TCP/IP stack in the FreeBSD kernel COSCUP 2014The TCP/IP stack in the FreeBSD kernel COSCUP 2014
The TCP/IP stack in the FreeBSD kernel COSCUP 2014Kevin Lo
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness Peter Griffin
 
The Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsThe Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsDivye Kapoor
 
Rootkit on Linux X86 v2.6
Rootkit on Linux X86 v2.6Rootkit on Linux X86 v2.6
Rootkit on Linux X86 v2.6fisher.w.y
 
Linux Kernel Exploitation
Linux Kernel ExploitationLinux Kernel Exploitation
Linux Kernel ExploitationScio Security
 
Part 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxPart 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxTushar B Kute
 
了解网络
了解网络了解网络
了解网络Feng Yu
 
Debugging Applications with GNU Debugger
Debugging Applications with GNU DebuggerDebugging Applications with GNU Debugger
Debugging Applications with GNU DebuggerPriyank Kapadia
 
了解Cpu
了解Cpu了解Cpu
了解CpuFeng Yu
 
Part 02 Linux Kernel Module Programming
Part 02 Linux Kernel Module ProgrammingPart 02 Linux Kernel Module Programming
Part 02 Linux Kernel Module ProgrammingTushar B Kute
 
Linux Kernel Tour
Linux Kernel TourLinux Kernel Tour
Linux Kernel Toursamrat das
 
Ethernet and TCP optimizations
Ethernet and TCP optimizationsEthernet and TCP optimizations
Ethernet and TCP optimizationsJeff Squyres
 
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...Jim St. Leger
 
Browsing Linux Kernel Source
Browsing Linux Kernel SourceBrowsing Linux Kernel Source
Browsing Linux Kernel SourceMotaz Saad
 

Destacado (20)

A particle filter based scheme for indoor tracking on an Android Smartphone
A particle filter based scheme for indoor tracking on an Android SmartphoneA particle filter based scheme for indoor tracking on an Android Smartphone
A particle filter based scheme for indoor tracking on an Android Smartphone
 
Rootkit 102 - Kernel-Based Rootkit
Rootkit 102 - Kernel-Based RootkitRootkit 102 - Kernel-Based Rootkit
Rootkit 102 - Kernel-Based Rootkit
 
OMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with VolatlityOMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
OMFW 2012: Analyzing Linux Kernel Rootkits with Volatlity
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/Core
 
The TCP/IP stack in the FreeBSD kernel COSCUP 2014
The TCP/IP stack in the FreeBSD kernel COSCUP 2014The TCP/IP stack in the FreeBSD kernel COSCUP 2014
The TCP/IP stack in the FreeBSD kernel COSCUP 2014
 
LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness LAS16-403 - GDB Linux Kernel Awareness
LAS16-403 - GDB Linux Kernel Awareness
 
The Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOsThe Linux Kernel Implementation of Pipes and FIFOs
The Linux Kernel Implementation of Pipes and FIFOs
 
Rootkit on Linux X86 v2.6
Rootkit on Linux X86 v2.6Rootkit on Linux X86 v2.6
Rootkit on Linux X86 v2.6
 
Linux Kernel Exploitation
Linux Kernel ExploitationLinux Kernel Exploitation
Linux Kernel Exploitation
 
Part 04 Creating a System Call in Linux
Part 04 Creating a System Call in LinuxPart 04 Creating a System Call in Linux
Part 04 Creating a System Call in Linux
 
了解网络
了解网络了解网络
了解网络
 
Debugging Applications with GNU Debugger
Debugging Applications with GNU DebuggerDebugging Applications with GNU Debugger
Debugging Applications with GNU Debugger
 
了解Cpu
了解Cpu了解Cpu
了解Cpu
 
Part 02 Linux Kernel Module Programming
Part 02 Linux Kernel Module ProgrammingPart 02 Linux Kernel Module Programming
Part 02 Linux Kernel Module Programming
 
The Stack Frame
The Stack FrameThe Stack Frame
The Stack Frame
 
Linux Kernel Tour
Linux Kernel TourLinux Kernel Tour
Linux Kernel Tour
 
Ethernet and TCP optimizations
Ethernet and TCP optimizationsEthernet and TCP optimizations
Ethernet and TCP optimizations
 
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
 
Browsing Linux Kernel Source
Browsing Linux Kernel SourceBrowsing Linux Kernel Source
Browsing Linux Kernel Source
 
TCPIP
TCPIPTCPIP
TCPIP
 

Último

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Último (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

Velocity 2013 TCP Tuning for the Web

  • 1. TCP Tuning for the Web Jason Cook - @macros - jason@fastly.com Wednesday, June 19, 13
  • 2. Me • Co-founder and Operations at Fastly • Former Operations Engineer at Wikia • Lots of Sysadmin and Linux consulting Wednesday, June 19, 13
  • 4. The Goal • Make the best use of our limited resources to deliver the best user experience Wednesday, June 19, 13
  • 6. Linux • I like it • I use it • It won’t hurt my feelings if you don’t • Examples will be aimed primarily at linux Wednesday, June 19, 13
  • 7. Small Requests • Optimizing towards small objects like html and js/css Wednesday, June 19, 13
  • 8. Just Enough TCP • Not a deep dive into TCP Wednesday, June 19, 13
  • 10. Entry point from the kernel to your application • Client sends SYN • Kernel hands SYN to Server • Server calls accept() • Kernel sends SYN/ACK • Client sends ACK Wednesday, June 19, 13
  • 11. Backlog • Number of connections allowed to be in a SYN state • Kernel will drop new SYNs when this limit is hit • Clients wait 3s before trying again, then 9s on second failure Wednesday, June 19, 13
  • 12. Backlog Tuning (kernel side) • net.ipv4.tcp_max_syn_backlog and net.core.somaxconn • Default value of 1024 is for “systems with more than 128MB of memory” • 64 bytes per entry • Undocumented max of 65535 Wednesday, June 19, 13
  • 13. Backlog Tuning (app side) • Set when you call listen() • nginx, redis, apache default to 511 • mysql default of 50 Wednesday, June 19, 13
  • 15. The SYN Flood • Resource exhaustion attack • Cheaper for attacker than target • Client sends SYN with bogus return address • Until the ACK is completed the connection occupies a slot in the backlog queue Wednesday, June 19, 13
  • 16. The SYN Cookie • When enabled, kicks in when the backlog queue is full • Sends a slightly more expensive but carefully crafted SYN/ACK then drops the SYN from the queue • If the client responds to the SYN/ACK it can rebuild the original SYN and proceed • Does disable large windows when active, but better than dropping entirely Wednesday, June 19, 13
  • 17. Dealing with a SYN Flood • Default to syncookies enabled and alert when triggered • tcpdump/wireshark • Frequently attacks have a detectable signature such as all having the same initial window size • iptables is very flexible for matching these signatures, but can be expensive • If hardware filters are available, use iptables to identify and hardware to block Wednesday, June 19, 13
  • 18. The Joys of Hardware Wednesday, June 19, 13
  • 19. Queues • Modern network hardware is multi-queue • By default assigns a queue per cpu core • Should get even balancing of incoming requests, irqbalance can mess that up • Intel ships a script with their drivers to aid in static assignment to avoid irqbalance Wednesday, June 19, 13
  • 20. Packet Filters • Intel and others have packet filters in their nics • Small, only 128 wide in the intel 82599 • Much more limited matchers than iptables • src,dst,type,port,vlan Wednesday, June 19, 13
  • 21. Hardware Flow Director • Same mechanism as filters, just includes a mapping destination • Set affinity to put both queue and app on same core • Good for things like SSH and BGPD • Maintain access in the face of an attack on other services Wednesday, June 19, 13
  • 23. Full Offload • Not so good on the public net • Limited buffer resources on card • Black box from a security perspective • Can break features like filtering and QoS Wednesday, June 19, 13
  • 24. Partial Offload • Better, but with their own caveats Wednesday, June 19, 13
  • 25. Large Receive Offload • Collapses packets into a larger buffer before handing to OS • Great for large volume ingress, which is not http • Doesn’t work with ipv6 Wednesday, June 19, 13
  • 26. Generic Receive Offload • Similar to LRO, but more careful • Will only merge “safe” packets into a buffer • Will flush at timestamp tick • Usually a win and you should test Wednesday, June 19, 13
  • 27. TCP Segementation • OS fills a 64KB buffer and produces a single header template • NIC splits the buffer into segements and checksums before sending • Can save lots of overhead, but not much of a win in small request/ response cycles Wednesday, June 19, 13
  • 28. TX/RX Checksumming • For small packets there is almost no win here Wednesday, June 19, 13
  • 29. Bonding • Linux bonding driver uses a single queue, large bottleneck for high packet rates • teaming driver should be better, userspace tools only worked in modern fedora core so gave up • Myricom hardware can do bonding natively Wednesday, June 19, 13
  • 31. Why Slow Start • Early TCP implementations allowed the sender to immediately send as much data as the client window allowed • In 1986 the internet suffered first congestion collapse • 1000x reduction in effective throughput • 1988Van Jacobsen proposes Slow Start Wednesday, June 19, 13
  • 32. Slow Start • Goal is to avoid putting more data in flight than there is bandwidth available • Client sends a receive window size of how much data they can buffer • Server sends an inital burst based on server inital congestion window • Double the window size for each received ACK • Increases until a packet drop or slow start threshold is reached Wednesday, June 19, 13
  • 33. Tuning Slow Start • Increase your congestion window • 2.6.39+ defaults to 10 • 2.6.19+ can be set as a path attribute • ip route change default via 172.16.0.1 dev eth0 proto static initcwnd 10 Wednesday, June 19, 13
  • 34. Proportional Rate Reduction • Added in linux 3.2 • Prior to PRR a loss would halve to congestion window potentially below the slow start threshold • PRR paces retransmits to smooth out • Makes disabling net.ipv4.tcp_no_metrics_save safer Wednesday, June 19, 13
  • 36. Throughput = Buffer Size / Latency Wednesday, June 19, 13
  • 37. Buffer Tuning • 16MB buffer at 50ms RTT = 320MB/s max rate • Set as 3 values; min, default, and max • net.ipv4.tcp_rmem = 4096 65536 16777216 • net.ipv4.tcp_wmem = 4096 65536 16777216 Wednesday, June 19, 13
  • 38. TIME_WAIT • State entered after a server has closed the connection • Kept around in case of delayed duplicate ACKs to our FIN Wednesday, June 19, 13
  • 39. Busy servers collect a lot • Default timeout is 2 * FIN timeout, so120s in linux • Worth dropping FIN timeout • net.ipv4.tcp_fin_timeout = 10 Wednesday, June 19, 13
  • 40. Tuning TIME_WAIT • net.ipv4.tcp_tw_reuse=1 • Reuse sockets in TIME_WAIT if safe to do so • net.ipv4.tcp_max_tw_buckets • Default of 131072, way too small for most sites • Connections/s * timeout value Wednesday, June 19, 13
  • 42. net.ipv4.tcp_tw_reuse • Allows the reuse of a TIME_WAIT socket if the client’s timestamp increases • Silently drops SYNs from the client if they don’t, like can happen behind NAT • Still useful for high churn servers when you can make assumptions of local network Wednesday, June 19, 13
  • 43. SSL and Keepalives • Enable them if at all possible • The initial handshake is the most computationally intensive part • 1-10ms on modern hardware • 6 * RTT before user gets data really sucks over long distances • SV to LON = 160ms = 1s before user starts getting content Wednesday, June 19, 13
  • 44. SSL and Geo • If you can move SSL termination closer to you users, do so • Most CDNs offer this, even for non-cacheable things • EC2 and Route53 is another option Wednesday, June 19, 13
  • 45. In closing • Upgrade your kernel • Increase your initial congestion window • Check your backlog and time wait limits • Size your buffers to something reasonable • Get closer to your users if you can Wednesday, June 19, 13