Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

DCUS17 : Docker networking deep dive

3.864 visualizaciones

Publicado el

This presentation covers Docker Networking from different perspectives and dissects the stack using various network troubleshooting tools

Publicado en: Tecnología

DCUS17 : Docker networking deep dive

  1. 1. Docker Networking deep dive Application-Plane to Data-Plane Madhu Venugopal Sr. Director Networking, Docker Inc
  2. 2. Network Layers, Planes and Dimensions
  3. 3. Application dimension “OSI is a beautiful dream, and TCP/IP is living it!” - Einar Stefferud Application Presentation Session Transport Network Data Link Physical OSI Model Application Transport Network Data Link TCP/IP Model HTTP, DNS, SSH, DHCP, … TCP, UDP IPv4, IPv6, ARP Ethernet
  4. 4. Infrastructure dimension Management plane Control plane Data plane UX, CLI, REST-API, SNMP, … Distributed (OSPF, BGP, Gossip-based), Centralized(OpenFlow, OVSDB) User/Operator/Tools managing Network Infrastructure Signaling between network entities to exchange reachability states Actual movement of application data packets IPTables, IPVS, OVS-DP, DPDK, BPF, Routing Tables, …
  5. 5. Application Transport Network Data Link MgmtPlane ControlPlane DataPlane Docker networking • Provides portable application services • Service-Discovery • Load-Balancing • Built-in and pluggable network drivers • Overlay, macvlan, bridge • Remote Drivers / Plugins • Built-in Management plane • API, CLI • Docker Stack / Compose • Built-in distributed control plane • Gossip based • Encrypted Control & Data plane
  6. 6. Deep dive
  7. 7. Application Stackversion: "3" services: web: ports: - “8080:80” networks: - frontend deploy: replicas: 2 app: networks: - frontend - backend db: networks: - backend networks: frontend: driver: overlay backend: driver: overlay driver_opts: encrypted : true Stack Deploy$ docker stack deploy -c d.yml demo Creating network demo_frontend Creating network demo_backend Creating service demo_web Creating service demo_app Creating service demo_db $ docker network ls NETWORK ID NAME DRIVER SCOPE n5myqlubepvl demo_backend overlay swarm 4m5e9hn5x0xx demo_frontend overlay swarm $ docker service ls ID NAME MODE REPLICAS 69rwee5mbbzm demo_web replicated 2/2 gkwx4z4ksrz1 demo_app replicated 1/1 4m5e9hn5x0xx demo_db replicated 1/1
  8. 8. Application Stack $ docker stack deploy -c d.yml demo Creating service demo_web Creating service demo_app Creating service demo_db Creating network demo_frontend Creating network demo_backend Day in life of a Stack Deploy • Manager only operation • Reserves network resources at mgmt plane such as subnet and vxlan-id. No impact to the data-plane yet. • Manager reserves service and task resources : Service VIP and Task IPs • Tasks Scheduled to swarm workers • Network scoped Service Registration on Docker DNS server • Service name -> VIP • Task name -> Task IP • task.Service-Name -> All Task IPs • Exchange SD & LB states via Gossip • Prepare Data-plane* • Call Driver APIs and exchange driver states via Gossip
  9. 9. Resource Allocation Manager Network Create Orchestrator Allocator Scheduler Dispatcher Service Create Task Create Task Dispatch Task Dispatch Worker1 Worker2 Engine Libnetwork Engine Libnetwork • Centralized resource and policy definition • Networks are a definition of policy • Central resource allocation (IP Subnets, Addresses, VNIs) • Can mutate state as long as managers are available
  10. 10. De-centralized events Swarm Scope Gossip W1 W2 W3 W1 W5 W4 Network Scope Gossip • Eventually consistent • State dissemination through de-centralized events • Service Registration • Load-Balancer configs • Routing states • Fast convergence • ~ O(logn) • Highly scalable • Continues to function even if all managers are Down Gossip
  11. 11. State dissemination Node A Broadcast state change to 3 nodes in the network-scope Random Node C Random Node D Random Node E 9 More nodes receive rebroadcast Rebroadcast Entire cluster receives rebroadcast Rebroadcast Accept state update only if entry’s lamport time is greater than the lamport time of existing entry Random Node F Periodic bulk sync to a random node in the network-scope Create State
  12. 12. Worker1 task1.web task2.web Worker3 demo_frontend overlay network (vxlan-id 4097) DNS resolver 127.0.0.11 Worker2 task1.app Docker DNS server Docker DNS server Docker DNS server DNS resolver 127.0.0.11 DNS resolver 127.0.0.11 DNS resolver 127.0.0.11 task1.db web 10.0.1.4 (vip) app 10.0.1.8 (vip) task1.web 10.0.1.5 task2.web 10.0.1.6 task1.app 10.0.1.9 Service Discovery states Routing states 10.0.1.6 :{Worker2,4097} 10.0.1.9 :{Worker2,4097} demo_backend overlay network (vxlan-id 4098) web 10.0.1.4 (vip) app 10.0.1.8 (vip) task1.web 10.0.1.5 task2.web 10.0.1.6 task1.app 10.0.1.9 Service Discovery states Routing states 10.0.1.5 :{Worker1,4097} db 10.0.2.4 (vip) app 10.0.2.8 (vip) task1.db 10.0.2.5 task1.app 10.0.2.6 Service Discovery states Routing states 10.0.2.5 :{Worker3,4098} db 10.0.2.4 (vip) app 10.0.2.8 (vip) task1.db 10.0.2.5 task1.app 10.0.2.6 Service Discovery states Routing states 10.0.2.6 :{Worker2,4098} Gossip Gossip 10.0.1.5 10.0.1.6 10.0.1.9 10.0.2.6 10.0.2.5
  13. 13. Troubleshooting Control-Plane $ docker network inspect -v demo_frontend [ { "Name": “demo_frontend", "Id": "m669nibgiwc0mfleq8geaa6mk", "Created": "2017-04-12T13:18:58.049831936Z", "Scope": "swarm", "Driver": “overlay", "Options": { "com.docker.network.driver.overlay.vxlanid_list": "4096" }, …
  14. 14. … "Peers": [ { "Name": "ip-172-31-28-108", "IP": "172.31.28.108" }, { "Name": "ip-172-31-46-47", "IP": "172.31.46.47" }, ] Troubleshooting Control-Plane
  15. 15. "Services": { "web": { "VIP": “10.1.0.6”, "LocalLBIndex": 5, "Tasks": [ { "Name": “web.1", "EndpointID": "1a5323d0e94c", "EndpointIP": "10.1.0.7", "Info": { "Host IP": "172.31.28.108" } Troubleshooting Control-Plane
  16. 16. Service Discovery
  17. 17. Worker1 task1.web task2.web Worker3 demo_frontend overlay network (vxlan-id 4097) DNS resolver 127.0.0.11 Worker2 task1.app Docker DNS server Docker DNS server Docker DNS server DNS resolver 127.0.0.11 DNS resolver 127.0.0.11 DNS resolver 127.0.0.11 task1.db web 10.0.1.4 (vip) app 10.0.1.8 (vip) task1.web 10.0.1.5 task2.web 10.0.1.6 task1.app 10.0.1.9 Service Discovery states Routing states 10.0.1.6 :{Worker2,4097} 10.0.1.9 :{Worker2,4097} demo_backend overlay network (vxlan-id 4098) web 10.0.1.4 (vip) app 10.0.1.8 (vip) task1.web 10.0.1.5 task2.web 10.0.1.6 task1.app 10.0.1.9 Service Discovery states Routing states 10.0.1.5 :{Worker1,4097} db 10.0.2.4 (vip) app 10.0.2.8 (vip) task1.db 10.0.2.5 task1.app 10.0.2.6 Service Discovery states Routing states 10.0.2.5 :{Worker3,4098} db 10.0.2.4 (vip) app 10.0.2.8 (vip) task1.db 10.0.2.5 task1.app 10.0.2.6 Service Discovery states Routing states 10.0.2.6 :{Worker2,4098} Gossip Gossip
  18. 18. /etc/resolv.conf nameserver 127.0.0.11 web 10.0.1.4 (vip) app 10.0.1.8 (vip) task1.web 10.0.1.5 task2.web 10.0.1.6 task1.app 10.0.1.9 task2.app 10.0.1.10 Docker DNS Server Docker Daemon Dissecting the DNS lookup task1.web resolve “app” IPTables {127.0.0.11, 53} : DNAT DNS Query “app” to 127.0.0.11 DNS A Record query : “app”
  19. 19. /etc/resolv.conf nameserver 127.0.0.11 Dissecting the DNS lookup task1.web IPTables {127.0.0.11, 53} : DNAT DNS A Record response : “app” : 10.0.1.8 web 10.0.1.4 (vip) app 10.0.1.8 (vip) task1.web 10.0.1.5 task2.web 10.0.1.6 task1.app 10.0.1.9 task2.app 10.0.1.10 Docker DNS Server Docker Daemon
  20. 20. /etc/resolv.conf nameserver 127.0.0.11 Dissecting the DNS-rr lookup task1.web IPTables {127.0.0.11, 53} : DNAT DNS A Record response : “app” : [ 10.0.1.9, 10.0.1.10 ] web 10.0.1.4 (vip) app 10.0.1.9 10.0.1.10 task1.app 10.0.1.9 task2.app 10.0.1.10 task1.web 10.0.1.5 Docker DNS Server Docker Daemon docker service create —name=app —endpoint-mode=dns-rr demo/my-app
  21. 21. Dataplane
  22. 22. $ docker info … Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge contiv/v2plugin:latest host macvlan null overlay Swarm: active Drivers provide data-plane
  23. 23. Docker host 1 Docker host 2 Docker host 3 CntnrA CntnrB CntnrC CntnrD CntnrE CntnrF Overlay network All containers on the overlay network can communicate! What is Docker Overlay Networking The overlay driver enables simple and secure multi-host networking
  24. 24. Docker Overlay• The overlay driver uses VXLAN technology • A VXLAN tunnel is created on top of underlay network(s) • At each end of the tunnel is a VXLAN tunnel end point (VTEP) • The VTEP performs encapsulation and de-encapsulation • The VTEP exists in the Docker Host’s network namespace VXLAN
  25. 25. Docker Host 1 Docker Host 2 172.31.1.5 192.168.1.25 Br0 Br0 VXLAN tunnel VTEP :4789/udp VTEP :4789/udp veth veth C1: 10.0.0.3 C2: 10.0.0.4 Network Namespace Network Namespace Layer 3 IP transport network Building an Overlay Network (more detailed)
  26. 26. 1.docker network <commands> 2.nsenter —net=<net-namespace> 3.tcpdump -nnvvXXS -i <interface> port <port> 4.iptables -nvL -t <table> 5.ipvsadm -L 6.ip <commands> 7.bridge <commands> 8.drill 9.netstat -tulpn 10.iperf <commands> The Ten Commandments All-in-one tools container : https://github.com/nicolaka/netshoot
  27. 27. root@my-host $ docker network ls NETWORK ID NAME DRIVER SCOPE jm1eohsff6b4 demo_default overlay swarm a5f124aef90b docker_gwbridge bridge local root@my-host $ ls /var/run/docker/netns 1-jm1eohsff6 1-o2hnj2jm1f 2229639766c2 79f0ad997956 ingress_sbox root@my-host $ nsenter —net=/var/run/docker/netns/1-jm1eohsff6 root@my-host $ brctl show br0 bridge name bridge id STP enabled interfaces br0 8000.3a87525fe051 no vxlan0 veth0 veth1 Overlay dataplane
  28. 28. root@my-host $ ip -d link show br0 2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default link/ether 3a:87:52:5f:e0:51 brd ff:ff:ff:ff:ff:ff promiscuity 0 bridge forward_delay 1500 hello_time 200 max_age 2000 addrgenmode eui64 root@my-host $ ip -d link show veth0 17: veth0@if16: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UP mode DEFAULT group default link/ether be:dc:c5:da:8c:0d brd ff:ff:ff:ff:ff:ff link-netnsid 2 promiscuity 1 veth bridge_slave state forwarding priority 32 cost 2 hairpin off guard off root_block off fastleave off learning on flood on addrgenmode eui64 Overlay dataplane
  29. 29. root@my-host $ ip -d link show vxlan0 14: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UNKNOWN mode DEFAULT group default link/ether f6:ae:70:27:6c:9c brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 1 vxlan id 4097 srcport 0 0 dstport 4789 proxy l2miss l3miss ageing 300 bridge_slave state forwarding priority 32 cost 100 hairpin off guard off root_block off fastleave off learning on flood on addrgenmode eui64 Overlay dataplane
  30. 30. root@my-host $ ip -s neighbor show 10.0.0.6 dev vxlan0 lladdr 02:42:0a:00:00:06 used 1100/1100/1100 probes 0 PERMANENT 10.0.0.3 dev vxlan0 lladdr 02:42:0a:00:00:03 used 1101/1101/1101 probes 0 PERMANENT root@my-host $ bridge fdb show … f6:ae:70:27:6c:9c dev vxlan0 vlan 1 master br0 permanent 02:42:0a:00:00:03 dev vxlan0 dst 192.168.56.101 link-netnsid 0 self permanent 02:42:0a:00:00:06 dev vxlan0 dst 192.168.56.101 link-netnsid 0 self permanent be:dc:c5:da:8c:0d dev veth0 vlan 1 master br0 permanent 3a:87:52:5f:e0:51 dev veth1 vlan 1 master br0 permanent … Overlay dataplane
  31. 31. Inside container netns
  32. 32. Worker1 task1.web Worker3 demo_frontend overlay network (east-west) Worker2 task1.app task1.dbtask2.web default_gwbridge default_gwbridge default_gwbridge L2/L3 underlay network (North-South connectivity) demo_backend overlay network (east-west) Inside container netns
  33. 33. root@my-host $ docker inspect demo_app.1.d35s03a7xryoeta34lqys1v5j | grep Key "SandboxKey": "/var/run/docker/netns/2229639766c2", root@my-host $ $ ifconfig eth0 Link encap:Ethernet HWaddr 02:42:0a:00:00:08 inet addr:10.0.0.8 Bcast:0.0.0.0 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:2 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 eth1 Link encap:Ethernet HWaddr 02:42:ac:a8:01:42 inet addr:172.168.1.66 Bcast:0.0.0.0 Mask:255.255.0.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 Inside container netns
  34. 34. Load Balancing
  35. 35. app : 10.0.1.8 Client-side VIP Load Balancing task1.web IPTables mangle table : OUTPUT chain MARK : 10.0.1.8 -> lb-index 5 IPVS lb-index 5 : RR : 10.0.1.9, 10.0.1.10 Conntracker
  36. 36. root@my-host $ iptables -nvL -t mangle Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 MARK all -- * * 0.0.0.0/0 10.0.0.7 MARK set 0x101 0 0 MARK all -- * * 0.0.0.0/0 10.0.0.4 MARK set 0x100 root@my-host $ ipvsadm -L Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 256 rr -> 10.0.0.5:0 Masq 1 0 0 -> 10.0.0.6:0 Masq 1 0 0 FWM 257 rr -> 10.0.0.3:0 Masq 1 0 0 root@my-host $ conntrack -L tcp 6 431997 ESTABLISHED src=10.0.0.8 dst=10.0.0.4 sport=33635 dport=80 src=10.0.0.5 dst=10.0.0.8 sport=80 dport=33635 [ASSURED] mark=0 use=1 Client-side Load Balancing
  37. 37. /etc/resolv.conf nameserver 127.0.0.11 Client-side DNS-rr Load Balancing task1.web DNS A Record response : “app” : [ 10.0.1.9, 10.0.1.10 ] web 10.0.1.4 (vip) app 10.0.1.9 10.0.1.10 task1.app 10.0.1.9 task2.app 10.0.1.10 task1.web 10.0.1.5 Docker DNS Server Docker Daemon docker service create —name=app —endpoint-mode=dns-rr demo/my-app app : [ 10.0.1.9, 10.0.1.10 ]
  38. 38. Routing Mesh • Native load balancing of requests coming from an external source • Services get published on a single port across the entire Swarm • Incoming traffic to the published port can be handled by all Swarm nodes • Traffic is internally load balanced as per normal service VIP load balancing Ingress Network Docker host 2 task2.myservice Docker host 1 task1.myservice Docker host 3 IPVS IPVS IPVS 8080 8080 8080 Ingress network docker service create -p 8080:80 nginx
  39. 39. Linux Kernel NetFilter dataflow
  40. 40. iptables NAT table DOCKER-INGRESS DNAT : Published-Port -> ingress-sbox eth0 Host1 default_gwbridge ingress-sboxeth1 iptables MANGLE table PREROUTING MARK : Published-Port -> <fw-mark-id> IPVS Match <fw-mark-id> -> Masq {RR across container-IPs) ingress-overlay-bridge Ingress Network eth0 iptables NAT table DOCKER-INGRESS DNAT : Published-Port -> ingress-sbox eth0 Host2 default_gwbridge ingress-sbox … eth1 ingress-overlay-bridge eth0 vxlan tunnel with vni Ingress Network eth0 Container-sbox eth1 iptables NAT table PREROUTING Redirect -> target-port Routing Mesh
  41. 41. Homework Deep-dive into Routing-Mesh Questions ? Tweet : @MadhuVenugopal Slack : madhu in #dockercommunity org
  42. 42. Thank You. 106270 - Deep Dive in Docker Overlay Networks (Apr 19, 3:45 PM) 110420 - Docker Networking in Production at Visa (Apr 19, 2:25 PM) @docker #dockercon

×