SlideShare una empresa de Scribd logo
1 de 55
Descargar para leer sin conexión
Evolution of kube-proxy
Laurent Bernaille
@lbernail
Datadog
Monitoring service
Over 350 integrations
Over 1,200 employees
Over 8,000 customers
Runs on millions of hosts
Trillions of data points per day
10000s hosts in our infra
10s of Kubernetes clusters
Clusters from 50 to 3000 nodes
Multi-cloud
Very fast growth
What is kube-proxy?
● Network proxy running on each Kubernetes node
● Implements the Kubernetes service abstraction
○ Map service Cluster IPs (virtual IPs) to pods
○ Enable ingress traffic
Kube-proxy
Deployment and service
apiVersion: apps/v1
kind: Deployment
metadata:
name: echodeploy
labels:
app: echo
spec:
replicas: 3
selector:
matchLabels:
app: echo
template:
metadata:
labels:
app: echo
spec:
containers:
- name: echopod
image: lbernail/echo:0.5
apiVersion: v1
kind: Service
metadata:
name: echo
labels:
app: echo
spec:
type: ClusterIP
selector:
app: echo
ports:
- name: http
protocol: TCP
port: 80
targetPort: 5000
Created resources
Deployment ReplicaSet Pod 1
label: app=echo
Pod 2
label: app=echo
Pod 3
label: app=echo
Service
Selector: app=echo
kubectl get all
NAME AGE
deploy/echodeploy 16s
NAME AGE
rs/echodeploy-75dddcf5f6 16s
NAME READY
po/echodeploy-75dddcf5f6-jtjts 1/1
po/echodeploy-75dddcf5f6-r7nmk 1/1
po/echodeploy-75dddcf5f6-zvqhv 1/1
NAME TYPE CLUSTER-IP
svc/echo ClusterIP 10.200.246.139
The endpoint object
Deployment ReplicaSet Pod 1
label: app=echo
Pod 2
label: app=echo
Pod 3
label: app=echo
kubectl describe endpoints echo
Name: echo
Namespace: datadog
Labels: app=echo
Annotations: <none>
Subsets:
Addresses: 10.150.4.10,10.150.6.16,10.150.7.10
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
http 5000 TCP
Endpoints
Addresses:
10.150.4.10
10.150.6.16
10.150.7.10
Service
Selector: app=echo
Readiness
readinessProbe:
httpGet:
path: /ready
port: 5000
periodSeconds: 2
successThreshold: 2
failureThreshold: 2
● A pod can be started but no ready to serve requests
○ Initialization
○ Connection to backends
● Kubernetes provides an abstraction for this: Readiness Probes
Accessing the service
kubectl run -it test --image appropriate/curl ash
# while true ; do curl 10.200.246.139 ; sleep 1 ; done
Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2
Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2
Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2
Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2
Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2
Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2
Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2
Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2
Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2
Endpoint only consist of pods that are Ready
How does it work?
Pod statuses
API Server
Node
kubelet pod
HC
Status updates
Node
kubelet pod
HC
ETCD
pods
Endpoints
API Server
Node
kubelet pod
HC
Status updates
Controller Manager
Watch
- pods
- services
endpoint
controller
Node
kubelet pod
HC
Sync endpoints:
- list pods matching selector
- add IP to endpoints
ETCD
pods
services
endpoints
Accessing services
API Server
Node A
Controller Manager
Watch
- pods
- services
endpoint
controller
Sync endpoints:
- list pods matching selector
- add IP to endpoints
ETCD
pods
services
endpoints
kube-proxy proxier
Watch
- services
- endpoints
Accessing services
API Server
Node A
Controller Manager
endpoint
controller
ETCD
pods
services
endpoints
kube-proxy proxier
client Node Bpod 1
Node Cpod 2
Implementations
userspace
proxy-mode=userspace
API Server
Node A
kube-proxy
iptables client
Node B
Node C
pod 1
pod 2
kube-proxy binds 1 port per service
Client traffic redirected to kube-proxy
kube-proxy connects to pods
proxy-mode=userspace
PREROUTING
any / any =>
KUBE-PORTALS-CONTAINER
All traffic is processed by kube chains
proxy-mode=userspace
PREROUTING
any / any =>
KUBE-PORTALS-CONTAINER
All traffic is processed by kube chains
KUBE-PORTALS-CONTAINER
any / VIP:PORT => NodeIP:PortX
Portal chain
One rule per service
One port allocated for each service (and bound by kube-proxy)
● Performance (userland proxying)
● Source IP cannot be kept
● Not recommended anymore
● Since Kubernetes 1.2, default is iptables
Limitations
iptables
proxy-mode=iptables
API Server
Node A
kube-proxy iptables
client
Node B
Node C
pod 1
pod 2
Outgoing traffic
1. Client to Service IP
2. DNAT: Client to Pod1 IP
Reverse path
1. Pod1 IP to Client
2. Reverse NAT: Service IP to client
proxy-mode=iptables
PREROUTING / OUTPUT
any / any => KUBE-SERVICES
All traffic is processed by kube chains
proxy-mode=iptables
PREROUTING / OUTPUT
any / any => KUBE-SERVICES
KUBE-SERVICES
any / VIP:PORT => KUBE-SVC-XXX
Global Service chain
Identify service and jump to appropriate service chain
proxy-mode=iptables
PREROUTING / OUTPUT
any / any => KUBE-SERVICES
KUBE-SERVICES
any / VIP:PORT => KUBE-SVC-XXX
KUBE-SVC-XXX
any / any proba 33% => KUBE-SEP-AAA
any / any proba 50% => KUBE-SEP-BBB
any / any => KUBE-SEP-CCC
Service chain (one per service)
Use statistic iptables module (probability of rule being applied)
Rules are evaluated sequentially (hence the 33%, 50%, 100%)
proxy-mode=iptables
PREROUTING / OUTPUT
any / any => KUBE-SERVICES
KUBE-SERVICES
any / VIP:PORT => KUBE-SVC-XXX
KUBE-SVC-XXX
any / any proba 33% => KUBE-SEP-AAA
any / any proba 50% => KUBE-SEP-BBB
any / any => KUBE-SEP-CCC
KUBE-SEP-AAA
endpoint IP / any => KUBE-MARK-MASQ
any / any => DNAT endpoint IP:Port
Endpoint Chain
Mark hairpin traffic (client = target) for SNAT
DNAT to the endpoint
Hairpin traffic?
API Server
Node A
kube-proxy iptables
pod 1
Node B
Node C
pod 2
pod 3
Client can also be a destination
After DNAT:
Src IP= Pod1, Dst IP= Pod1
No reverse NAT possible
=> SNAT on host for this traffic
1. Pod1 IP => SVC IP
2. SNAT: HostIP => SVC IP
3. DNAT: HostIP => Pod1 IP
Reverse path
1. Pod1 IP => Host IP
2. Reverse NAT: SVC IP => Pod1IP
Persistency
spec:
type: ClusterIP
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 600
KUBE-SEP-AAA
endpoint IP / any => KUBE-MARK-MASQ
any / any => DNAT endpoint IP:Port
recent : set rsource KUBE-SEP-AAA
Use “recent” module
Add Source IP to set named KUBE-SEP-AAA
Persistency
KUBE-SEP-AAA
endpoint IP / any => KUBE-MARK-MASQ
any / any => DNAT endpoint IP:Port
recent : set rsource KUBE-SEP-AAA
Use “recent” module
Add Source IP to set named KUBE-SEP-AAA
KUBE-SVC-XXX
any / any recent: rcheck KUBE-SEP-AAA => KUBE-SEP-AAA
any / any recent: rcheck KUBE-SEP-BBB => KUBE-SEP-BBB
any / any recent: rcheck KUBE-SEP-CCC => KUBE-SEP-CCC
Load-balancing rules
Use recent module
If Source IP is in set named KUBE-SEP-AAA,
jump to KUBE-SEP-AAA
● iptables was not designed for load-balancing
● hard to debug (very quickly, 10000s of rules)
● performance impact
○ control plane: syncing rules
○ data plane: going through rules
Limitations
“Scale Kubernetes to Support 50,000 Services” Haibin Xie & Quinton Hoole
Kubecon Berlin 2017
“Scale Kubernetes to Support 50,000 Services” Haibin Xie & Quinton Hoole
Kubecon Berlin 2017
ipvs
● L4 load-balancer build in the Linux Kernel
● Many load-balancing algorithms
● Native persistency
● Fast atomic update (add/remove endpoint)
● Still relies on iptables for some use cases (SNAT)
● GA since 1.11
proxy-mode=ipvs
proxy-mode=ipvs
Pod X
Pod Y
Pod Z
Service ClusterIP:Port S
Backed by pod:port X, Y, Z
Virtual Server S
Realservers
- X
- Y
- Z
kube-proxy apiserver
Watches endpoints
Updates IPVS
New connection
Pod X
Pod Y
Pod Z
Virtual Server S
Realservers
- X
- Y
- Z
kube-proxy apiserver
IPVS conntrack
App A => S >> A => X
App establishes connection to S
IPVS associates Realserver X
Pod X deleted
Pod X
Pod Y
Pod Z
Virtual Server S
Realservers
- Y
- Z
kube-proxy apiserver
IPVS conntrack
App A => S >> A => X
Apiserver removes X from S endpoints
Kube-proxy removes X from realservers
Kernel drops traffic (no realserver)
Clean-up
Pod Y
Pod Z
Virtual Server S
Realservers
- Y
- Z
kube-proxy apiserver
IPVS conntrack
App A => S >> A => X
net/ipv4/vs/expire_nodest_conn
Delete conntrack entry on next packet
Forcefully terminate (RST)
RST
Limit?
● No graceful termination
● As soon as a pod is Terminating connections are destroyed
● Addressing this took time
Graceful termination (1.12)
Pod X
Pod Y
Pod Z
Virtual Server S
Realservers
- X Weight:0
- Y
- Z
kube-proxy apiserver
IPVS conntrack
App A => S >> A => X
Apiserver removes X from S endpoints
Kube-proxy sets Weight to 0
No new connection
Garbage collection
Pod X
Pod Y
Pod Z
Virtual Server S
Realservers
- Y
- Z
kube-proxy apiserver
IPVS conntrack
App A => S >> A => X
Pod exits and sends FIN
Kube-proxy removes realserver when it
has no active connection
FIN
What if X doesn’t send FIN?
Pod Y
Pod Z
Virtual Server S
Realservers
- X Weight:0
- Y
- Z
kube-proxy
IPVS conntrack
App A => S >> A => X
Conntrack entries expires after 900s
If A sends traffic, it never expires
Traffic blackholed until App detects issue
~15mn
> Mitigation: lower tcp_retries2
● Default timeouts
○ tcp / tcpfin : 900s / 120s
○ udp: 300s
● Under high load, ports can be reused fast and keep entry active
○ Especially with conn_reuse_mode=0
● Very bad for DNS => No graceful termination for UDP
IPVS connection tracking
● Very recent feature: allow for configurable timeouts
● Ideally we’d want:
○ Terminating pod: set weight to 0
○ Deleted pod: remove backend
> This would require an API change for Endpoints
IPVS connection tracking
● Works pretty well at large scale
● Not 100% feature parity (a few edge cases)
● Tuning the IPVS parameters took time
● Improvements under discussion
IPVS status
Common challenges
Control plane scalability
● On each update to a service (new pod, readiness change)
○ The full endpoint object is recomputed
○ The endpoint is sent to all kube-proxy
● Example of a single endpoint change
○ Service with 2000 endpoints, ~100B / endpoint, 5000 nodes
○ Each node gets 2000 x 100B = 200kB
○ Apiservers send 5000 x 200kB = 1GB
● Rolling update?
○ Total traffic => 2000 x 1GB = 2TB
=> Endpoint Slices
● Service potentially mapped to multiple EndpointSlices
● Maximum 100 endpoints in each slice
● Much more efficient for services with many endpoints
● Beta in 1.17
Conntrack size
● kube-proxy relies on the conntrack
○ Not great for services receiving a lot of connections
○ Pretty bad for the cluster DNS
● DNS strategy
○ Use node-local cache and force TCP for upstream
○ Much more manageable
Conntrack and UDP
● Very good surprise with Kernel 5.0+
Recent features
New features
● Dual-stack support
○ Alpha in 1.16
● Topology aware routing
○ Use topology keys to route to services
○ Alpha in 1.17
Conclusion
Conclusion
● kube-proxy works at significant scale
● A lot of ongoing efforts to improve kube-proxy
● iptables and IPVS are not perfect matches for the service abstraction
○ Not designed for client-side load-balancing
● Very promising eBPF alternative: Cilium
○ No need for iptables / IPVS
○ Works without conntrack
○ See Daniel Borkmann’s talk
Thank you

Más contenido relacionado

La actualidad más candente

Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxRomanKhavronenko
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes NetworkingCJ Cullen
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In DeepMydbops
 
Deep dive into Kubernetes Networking
Deep dive into Kubernetes NetworkingDeep dive into Kubernetes Networking
Deep dive into Kubernetes NetworkingSreenivas Makam
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumScyllaDB
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
 
Kubernetes
KubernetesKubernetes
KubernetesHenry He
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Flink Forward
 
Percona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsPercona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsMydbops
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
BPF  & Cilium - Turning Linux into a Microservices-aware Operating SystemBPF  & Cilium - Turning Linux into a Microservices-aware Operating System
BPF & Cilium - Turning Linux into a Microservices-aware Operating SystemThomas Graf
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters MongoDB
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For ArchitectsKevin Brockhoff
 
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...Vietnam Open Infrastructure User Group
 
Kubernetes Operators And The Redis Enterprise Journey: Michal Rabinowitch
Kubernetes Operators And The Redis Enterprise Journey: Michal RabinowitchKubernetes Operators And The Redis Enterprise Journey: Michal Rabinowitch
Kubernetes Operators And The Redis Enterprise Journey: Michal RabinowitchRedis Labs
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsJulian Mazzitelli
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark DownscalingDatabricks
 
Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Weaveworks
 

La actualidad más candente (20)

Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
 
Kubernetes Networking
Kubernetes NetworkingKubernetes Networking
Kubernetes Networking
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
 
Deep dive into Kubernetes Networking
Deep dive into Kubernetes NetworkingDeep dive into Kubernetes Networking
Deep dive into Kubernetes Networking
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Kubernetes
KubernetesKubernetes
Kubernetes
 
Fluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at ScaleFluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at Scale
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
 
Percona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient BackupsPercona Xtrabackup - Highly Efficient Backups
Percona Xtrabackup - Highly Efficient Backups
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Docker swarm
Docker swarmDocker swarm
Docker swarm
 
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
BPF  & Cilium - Turning Linux into a Microservices-aware Operating SystemBPF  & Cilium - Turning Linux into a Microservices-aware Operating System
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
OpenTelemetry For Architects
OpenTelemetry For ArchitectsOpenTelemetry For Architects
OpenTelemetry For Architects
 
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
 
Kubernetes Operators And The Redis Enterprise Journey: Michal Rabinowitch
Kubernetes Operators And The Redis Enterprise Journey: Michal RabinowitchKubernetes Operators And The Redis Enterprise Journey: Michal Rabinowitch
Kubernetes Operators And The Redis Enterprise Journey: Michal Rabinowitch
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)Introduction to the Container Network Interface (CNI)
Introduction to the Container Network Interface (CNI)
 

Similar a Evolution of kube-proxy (Brussels, Fosdem 2020)

Deep dive in container service discovery
Deep dive in container service discoveryDeep dive in container service discovery
Deep dive in container service discoveryDocker, Inc.
 
Scaling Kubernetes to Support 50000 Services.pptx
Scaling Kubernetes to Support 50000 Services.pptxScaling Kubernetes to Support 50000 Services.pptx
Scaling Kubernetes to Support 50000 Services.pptxthaond2
 
What's new in Kubernetes
What's new in KubernetesWhat's new in Kubernetes
What's new in KubernetesDaniel Smith
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to KubernetesPaul Czarkowski
 
DCEU 18: Docker Container Networking
DCEU 18: Docker Container NetworkingDCEU 18: Docker Container Networking
DCEU 18: Docker Container NetworkingDocker, Inc.
 
Microservice bus tutorial
Microservice bus tutorialMicroservice bus tutorial
Microservice bus tutorialHuabing Zhao
 
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021WDDay
 
Nynog-K8s-networking-101.pptx
Nynog-K8s-networking-101.pptxNynog-K8s-networking-101.pptx
Nynog-K8s-networking-101.pptxDanielHertzberg4
 
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...KubeAcademy
 
Kubernetes workshop -_the_basics
Kubernetes workshop -_the_basicsKubernetes workshop -_the_basics
Kubernetes workshop -_the_basicsSjuul Janssen
 
Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Laurent Bernaille
 
Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey FedorovNetworking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey FedorovSergey Fedorov
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWSGrant Ellis
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWSGrant Ellis
 
Kubernetes internals (Kubernetes 해부하기)
Kubernetes internals (Kubernetes 해부하기)Kubernetes internals (Kubernetes 해부하기)
Kubernetes internals (Kubernetes 해부하기)DongHyeon Kim
 
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADSKNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADSElad Hirsch
 
KubeCon EU 2016: Using Traffic Control to Test Apps in Kubernetes
KubeCon EU 2016: Using Traffic Control to Test Apps in KubernetesKubeCon EU 2016: Using Traffic Control to Test Apps in Kubernetes
KubeCon EU 2016: Using Traffic Control to Test Apps in KubernetesKubeAcademy
 
Zero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with KubernetesZero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with KubernetesWojciech Barczyński
 
Testing applications with traffic control in containers / Alban Crequy (Kinvolk)
Testing applications with traffic control in containers / Alban Crequy (Kinvolk)Testing applications with traffic control in containers / Alban Crequy (Kinvolk)
Testing applications with traffic control in containers / Alban Crequy (Kinvolk)Ontico
 

Similar a Evolution of kube-proxy (Brussels, Fosdem 2020) (20)

Deep dive in container service discovery
Deep dive in container service discoveryDeep dive in container service discovery
Deep dive in container service discovery
 
Scaling Kubernetes to Support 50000 Services.pptx
Scaling Kubernetes to Support 50000 Services.pptxScaling Kubernetes to Support 50000 Services.pptx
Scaling Kubernetes to Support 50000 Services.pptx
 
What's new in Kubernetes
What's new in KubernetesWhat's new in Kubernetes
What's new in Kubernetes
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
DCEU 18: Docker Container Networking
DCEU 18: Docker Container NetworkingDCEU 18: Docker Container Networking
DCEU 18: Docker Container Networking
 
Microservice bus tutorial
Microservice bus tutorialMicroservice bus tutorial
Microservice bus tutorial
 
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
 
Nynog-K8s-networking-101.pptx
Nynog-K8s-networking-101.pptxNynog-K8s-networking-101.pptx
Nynog-K8s-networking-101.pptx
 
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
 
Kubernetes workshop -_the_basics
Kubernetes workshop -_the_basicsKubernetes workshop -_the_basics
Kubernetes workshop -_the_basics
 
Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019Kubernetes the Very Hard Way. Lisa Portland 2019
Kubernetes the Very Hard Way. Lisa Portland 2019
 
Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey FedorovNetworking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
Networking @Scale'19 - Getting a Taste of Your Network - Sergey Fedorov
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWS
 
Kubernetes on AWS
Kubernetes on AWSKubernetes on AWS
Kubernetes on AWS
 
Kubernetes internals (Kubernetes 해부하기)
Kubernetes internals (Kubernetes 해부하기)Kubernetes internals (Kubernetes 해부하기)
Kubernetes internals (Kubernetes 해부하기)
 
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADSKNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
KNATIVE - DEPLOY, AND MANAGE MODERN CONTAINER-BASED SERVERLESS WORKLOADS
 
KubeCon EU 2016: Using Traffic Control to Test Apps in Kubernetes
KubeCon EU 2016: Using Traffic Control to Test Apps in KubernetesKubeCon EU 2016: Using Traffic Control to Test Apps in Kubernetes
KubeCon EU 2016: Using Traffic Control to Test Apps in Kubernetes
 
Zero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with KubernetesZero downtime deployment of micro-services with Kubernetes
Zero downtime deployment of micro-services with Kubernetes
 
Testing applications with traffic control in containers / Alban Crequy (Kinvolk)
Testing applications with traffic control in containers / Alban Crequy (Kinvolk)Testing applications with traffic control in containers / Alban Crequy (Kinvolk)
Testing applications with traffic control in containers / Alban Crequy (Kinvolk)
 

Más de Laurent Bernaille

How the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceHow the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceLaurent Bernaille
 
Kubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesKubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesLaurent Bernaille
 
Making the most out of kubernetes audit logs
Making the most out of kubernetes audit logsMaking the most out of kubernetes audit logs
Making the most out of kubernetes audit logsLaurent Bernaille
 
Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Laurent Bernaille
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...Laurent Bernaille
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!Laurent Bernaille
 
Optimizing kubernetes networking
Optimizing kubernetes networkingOptimizing kubernetes networking
Optimizing kubernetes networkingLaurent Bernaille
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayLaurent Bernaille
 
Deep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay NetworksDeep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay NetworksLaurent Bernaille
 
Deeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay NetworksDeeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay NetworksLaurent Bernaille
 
Operational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesOperational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesLaurent Bernaille
 
Deep dive in Docker Overlay Networks
Deep dive in Docker Overlay NetworksDeep dive in Docker Overlay Networks
Deep dive in Docker Overlay NetworksLaurent Bernaille
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Laurent Bernaille
 
Early recognition of encryted applications
Early recognition of encryted applicationsEarly recognition of encryted applications
Early recognition of encryted applicationsLaurent Bernaille
 
Early application identification. CONEXT 2006
Early application identification. CONEXT 2006Early application identification. CONEXT 2006
Early application identification. CONEXT 2006Laurent Bernaille
 

Más de Laurent Bernaille (16)

How the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My NamespaceHow the OOM Killer Deleted My Namespace
How the OOM Killer Deleted My Namespace
 
Kubernetes DNS Horror Stories
Kubernetes DNS Horror StoriesKubernetes DNS Horror Stories
Kubernetes DNS Horror Stories
 
Making the most out of kubernetes audit logs
Making the most out of kubernetes audit logsMaking the most out of kubernetes audit logs
Making the most out of kubernetes audit logs
 
Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019Kubernetes the Very Hard Way. Velocity Berlin 2019
Kubernetes the Very Hard Way. Velocity Berlin 2019
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you! ...
 
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
10 ways to shoot yourself in the foot with kubernetes, #9 will surprise you!
 
Optimizing kubernetes networking
Optimizing kubernetes networkingOptimizing kubernetes networking
Optimizing kubernetes networking
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard way
 
Deep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay NetworksDeep Dive in Docker Overlay Networks
Deep Dive in Docker Overlay Networks
 
Deeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay NetworksDeeper dive in Docker Overlay Networks
Deeper dive in Docker Overlay Networks
 
Discovering OpenBSD on AWS
Discovering OpenBSD on AWSDiscovering OpenBSD on AWS
Discovering OpenBSD on AWS
 
Operational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesOperational challenges behind Serverless architectures
Operational challenges behind Serverless architectures
 
Deep dive in Docker Overlay Networks
Deep dive in Docker Overlay NetworksDeep dive in Docker Overlay Networks
Deep dive in Docker Overlay Networks
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Early recognition of encryted applications
Early recognition of encryted applicationsEarly recognition of encryted applications
Early recognition of encryted applications
 
Early application identification. CONEXT 2006
Early application identification. CONEXT 2006Early application identification. CONEXT 2006
Early application identification. CONEXT 2006
 

Último

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Evolution of kube-proxy (Brussels, Fosdem 2020)

  • 1. Evolution of kube-proxy Laurent Bernaille @lbernail
  • 2. Datadog Monitoring service Over 350 integrations Over 1,200 employees Over 8,000 customers Runs on millions of hosts Trillions of data points per day 10000s hosts in our infra 10s of Kubernetes clusters Clusters from 50 to 3000 nodes Multi-cloud Very fast growth
  • 4. ● Network proxy running on each Kubernetes node ● Implements the Kubernetes service abstraction ○ Map service Cluster IPs (virtual IPs) to pods ○ Enable ingress traffic Kube-proxy
  • 5. Deployment and service apiVersion: apps/v1 kind: Deployment metadata: name: echodeploy labels: app: echo spec: replicas: 3 selector: matchLabels: app: echo template: metadata: labels: app: echo spec: containers: - name: echopod image: lbernail/echo:0.5 apiVersion: v1 kind: Service metadata: name: echo labels: app: echo spec: type: ClusterIP selector: app: echo ports: - name: http protocol: TCP port: 80 targetPort: 5000
  • 6. Created resources Deployment ReplicaSet Pod 1 label: app=echo Pod 2 label: app=echo Pod 3 label: app=echo Service Selector: app=echo kubectl get all NAME AGE deploy/echodeploy 16s NAME AGE rs/echodeploy-75dddcf5f6 16s NAME READY po/echodeploy-75dddcf5f6-jtjts 1/1 po/echodeploy-75dddcf5f6-r7nmk 1/1 po/echodeploy-75dddcf5f6-zvqhv 1/1 NAME TYPE CLUSTER-IP svc/echo ClusterIP 10.200.246.139
  • 7. The endpoint object Deployment ReplicaSet Pod 1 label: app=echo Pod 2 label: app=echo Pod 3 label: app=echo kubectl describe endpoints echo Name: echo Namespace: datadog Labels: app=echo Annotations: <none> Subsets: Addresses: 10.150.4.10,10.150.6.16,10.150.7.10 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- http 5000 TCP Endpoints Addresses: 10.150.4.10 10.150.6.16 10.150.7.10 Service Selector: app=echo
  • 8. Readiness readinessProbe: httpGet: path: /ready port: 5000 periodSeconds: 2 successThreshold: 2 failureThreshold: 2 ● A pod can be started but no ready to serve requests ○ Initialization ○ Connection to backends ● Kubernetes provides an abstraction for this: Readiness Probes
  • 9. Accessing the service kubectl run -it test --image appropriate/curl ash # while true ; do curl 10.200.246.139 ; sleep 1 ; done Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2 Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2 Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.7.10 | Source: 10.150.6.17 | Version: v2 Container: 10.150.6.16 | Source: 10.150.6.17 | Version: v2 Container: 10.150.4.10 | Source: 10.150.6.17 | Version: v2 Endpoint only consist of pods that are Ready
  • 10. How does it work?
  • 11. Pod statuses API Server Node kubelet pod HC Status updates Node kubelet pod HC ETCD pods
  • 12. Endpoints API Server Node kubelet pod HC Status updates Controller Manager Watch - pods - services endpoint controller Node kubelet pod HC Sync endpoints: - list pods matching selector - add IP to endpoints ETCD pods services endpoints
  • 13. Accessing services API Server Node A Controller Manager Watch - pods - services endpoint controller Sync endpoints: - list pods matching selector - add IP to endpoints ETCD pods services endpoints kube-proxy proxier Watch - services - endpoints
  • 14. Accessing services API Server Node A Controller Manager endpoint controller ETCD pods services endpoints kube-proxy proxier client Node Bpod 1 Node Cpod 2
  • 17. proxy-mode=userspace API Server Node A kube-proxy iptables client Node B Node C pod 1 pod 2 kube-proxy binds 1 port per service Client traffic redirected to kube-proxy kube-proxy connects to pods
  • 18. proxy-mode=userspace PREROUTING any / any => KUBE-PORTALS-CONTAINER All traffic is processed by kube chains
  • 19. proxy-mode=userspace PREROUTING any / any => KUBE-PORTALS-CONTAINER All traffic is processed by kube chains KUBE-PORTALS-CONTAINER any / VIP:PORT => NodeIP:PortX Portal chain One rule per service One port allocated for each service (and bound by kube-proxy)
  • 20. ● Performance (userland proxying) ● Source IP cannot be kept ● Not recommended anymore ● Since Kubernetes 1.2, default is iptables Limitations
  • 22. proxy-mode=iptables API Server Node A kube-proxy iptables client Node B Node C pod 1 pod 2 Outgoing traffic 1. Client to Service IP 2. DNAT: Client to Pod1 IP Reverse path 1. Pod1 IP to Client 2. Reverse NAT: Service IP to client
  • 23. proxy-mode=iptables PREROUTING / OUTPUT any / any => KUBE-SERVICES All traffic is processed by kube chains
  • 24. proxy-mode=iptables PREROUTING / OUTPUT any / any => KUBE-SERVICES KUBE-SERVICES any / VIP:PORT => KUBE-SVC-XXX Global Service chain Identify service and jump to appropriate service chain
  • 25. proxy-mode=iptables PREROUTING / OUTPUT any / any => KUBE-SERVICES KUBE-SERVICES any / VIP:PORT => KUBE-SVC-XXX KUBE-SVC-XXX any / any proba 33% => KUBE-SEP-AAA any / any proba 50% => KUBE-SEP-BBB any / any => KUBE-SEP-CCC Service chain (one per service) Use statistic iptables module (probability of rule being applied) Rules are evaluated sequentially (hence the 33%, 50%, 100%)
  • 26. proxy-mode=iptables PREROUTING / OUTPUT any / any => KUBE-SERVICES KUBE-SERVICES any / VIP:PORT => KUBE-SVC-XXX KUBE-SVC-XXX any / any proba 33% => KUBE-SEP-AAA any / any proba 50% => KUBE-SEP-BBB any / any => KUBE-SEP-CCC KUBE-SEP-AAA endpoint IP / any => KUBE-MARK-MASQ any / any => DNAT endpoint IP:Port Endpoint Chain Mark hairpin traffic (client = target) for SNAT DNAT to the endpoint
  • 27. Hairpin traffic? API Server Node A kube-proxy iptables pod 1 Node B Node C pod 2 pod 3 Client can also be a destination After DNAT: Src IP= Pod1, Dst IP= Pod1 No reverse NAT possible => SNAT on host for this traffic 1. Pod1 IP => SVC IP 2. SNAT: HostIP => SVC IP 3. DNAT: HostIP => Pod1 IP Reverse path 1. Pod1 IP => Host IP 2. Reverse NAT: SVC IP => Pod1IP
  • 28. Persistency spec: type: ClusterIP sessionAffinity: ClientIP sessionAffinityConfig: clientIP: timeoutSeconds: 600 KUBE-SEP-AAA endpoint IP / any => KUBE-MARK-MASQ any / any => DNAT endpoint IP:Port recent : set rsource KUBE-SEP-AAA Use “recent” module Add Source IP to set named KUBE-SEP-AAA
  • 29. Persistency KUBE-SEP-AAA endpoint IP / any => KUBE-MARK-MASQ any / any => DNAT endpoint IP:Port recent : set rsource KUBE-SEP-AAA Use “recent” module Add Source IP to set named KUBE-SEP-AAA KUBE-SVC-XXX any / any recent: rcheck KUBE-SEP-AAA => KUBE-SEP-AAA any / any recent: rcheck KUBE-SEP-BBB => KUBE-SEP-BBB any / any recent: rcheck KUBE-SEP-CCC => KUBE-SEP-CCC Load-balancing rules Use recent module If Source IP is in set named KUBE-SEP-AAA, jump to KUBE-SEP-AAA
  • 30. ● iptables was not designed for load-balancing ● hard to debug (very quickly, 10000s of rules) ● performance impact ○ control plane: syncing rules ○ data plane: going through rules Limitations
  • 31. “Scale Kubernetes to Support 50,000 Services” Haibin Xie & Quinton Hoole Kubecon Berlin 2017
  • 32. “Scale Kubernetes to Support 50,000 Services” Haibin Xie & Quinton Hoole Kubecon Berlin 2017
  • 33. ipvs
  • 34. ● L4 load-balancer build in the Linux Kernel ● Many load-balancing algorithms ● Native persistency ● Fast atomic update (add/remove endpoint) ● Still relies on iptables for some use cases (SNAT) ● GA since 1.11 proxy-mode=ipvs
  • 35. proxy-mode=ipvs Pod X Pod Y Pod Z Service ClusterIP:Port S Backed by pod:port X, Y, Z Virtual Server S Realservers - X - Y - Z kube-proxy apiserver Watches endpoints Updates IPVS
  • 36. New connection Pod X Pod Y Pod Z Virtual Server S Realservers - X - Y - Z kube-proxy apiserver IPVS conntrack App A => S >> A => X App establishes connection to S IPVS associates Realserver X
  • 37. Pod X deleted Pod X Pod Y Pod Z Virtual Server S Realservers - Y - Z kube-proxy apiserver IPVS conntrack App A => S >> A => X Apiserver removes X from S endpoints Kube-proxy removes X from realservers Kernel drops traffic (no realserver)
  • 38. Clean-up Pod Y Pod Z Virtual Server S Realservers - Y - Z kube-proxy apiserver IPVS conntrack App A => S >> A => X net/ipv4/vs/expire_nodest_conn Delete conntrack entry on next packet Forcefully terminate (RST) RST
  • 39. Limit? ● No graceful termination ● As soon as a pod is Terminating connections are destroyed ● Addressing this took time
  • 40. Graceful termination (1.12) Pod X Pod Y Pod Z Virtual Server S Realservers - X Weight:0 - Y - Z kube-proxy apiserver IPVS conntrack App A => S >> A => X Apiserver removes X from S endpoints Kube-proxy sets Weight to 0 No new connection
  • 41. Garbage collection Pod X Pod Y Pod Z Virtual Server S Realservers - Y - Z kube-proxy apiserver IPVS conntrack App A => S >> A => X Pod exits and sends FIN Kube-proxy removes realserver when it has no active connection FIN
  • 42. What if X doesn’t send FIN? Pod Y Pod Z Virtual Server S Realservers - X Weight:0 - Y - Z kube-proxy IPVS conntrack App A => S >> A => X Conntrack entries expires after 900s If A sends traffic, it never expires Traffic blackholed until App detects issue ~15mn > Mitigation: lower tcp_retries2
  • 43. ● Default timeouts ○ tcp / tcpfin : 900s / 120s ○ udp: 300s ● Under high load, ports can be reused fast and keep entry active ○ Especially with conn_reuse_mode=0 ● Very bad for DNS => No graceful termination for UDP IPVS connection tracking
  • 44. ● Very recent feature: allow for configurable timeouts ● Ideally we’d want: ○ Terminating pod: set weight to 0 ○ Deleted pod: remove backend > This would require an API change for Endpoints IPVS connection tracking
  • 45. ● Works pretty well at large scale ● Not 100% feature parity (a few edge cases) ● Tuning the IPVS parameters took time ● Improvements under discussion IPVS status
  • 47. Control plane scalability ● On each update to a service (new pod, readiness change) ○ The full endpoint object is recomputed ○ The endpoint is sent to all kube-proxy ● Example of a single endpoint change ○ Service with 2000 endpoints, ~100B / endpoint, 5000 nodes ○ Each node gets 2000 x 100B = 200kB ○ Apiservers send 5000 x 200kB = 1GB ● Rolling update? ○ Total traffic => 2000 x 1GB = 2TB
  • 48. => Endpoint Slices ● Service potentially mapped to multiple EndpointSlices ● Maximum 100 endpoints in each slice ● Much more efficient for services with many endpoints ● Beta in 1.17
  • 49. Conntrack size ● kube-proxy relies on the conntrack ○ Not great for services receiving a lot of connections ○ Pretty bad for the cluster DNS ● DNS strategy ○ Use node-local cache and force TCP for upstream ○ Much more manageable
  • 50. Conntrack and UDP ● Very good surprise with Kernel 5.0+
  • 52. New features ● Dual-stack support ○ Alpha in 1.16 ● Topology aware routing ○ Use topology keys to route to services ○ Alpha in 1.17
  • 54. Conclusion ● kube-proxy works at significant scale ● A lot of ongoing efforts to improve kube-proxy ● iptables and IPVS are not perfect matches for the service abstraction ○ Not designed for client-side load-balancing ● Very promising eBPF alternative: Cilium ○ No need for iptables / IPVS ○ Works without conntrack ○ See Daniel Borkmann’s talk