Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Linux Container Primitives and Runtimes - AWS Summit Sydney

148 visualizaciones

Publicado el

In this session we'll explore the different Linux primitives that are commonly used in implementing container runtimes. We’ll learn about cgroups, namespaces and union filesystems and explain how these are leveraged by container runtimes like Docker to deliver powerful container management system. In this session we’ll demonstrate how Docker uses each of these primitives and show how you can effectively inspect and troubleshoot containers from the host operating system.

  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Linux Container Primitives and Runtimes - AWS Summit Sydney

  1. 1. S U M M I T SYDNEY
  2. 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Linux Container Primitives and Runtimes Alastair Cousins Senior Solutions Architect – Media & Entertainment Amazon Web Services
  3. 3. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  4. 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Linux Kernel Container runtime Container 1 Container 2 Container 3 Container 4 Container 5 Container 6 Control Groups Namespaces Union filesystem
  5. 5. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  6. 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Cgroups and subsystems • Cgroups are an abstract framework • Subsystems are concrete implementations • Most subsystems are resource controllers Examples of subsystems: • Memory • CPU time • Block I/O • Number of discrete processes (pids) • CPU & memory pinning • Freezer (used by docker pause) • Devices • Network priority
  7. 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Hierarchical representation • Different subsystems can organise processes separately • Every pid is represented exactly once in each subsystem • New processes inherit cgroups from their parents ├── blkio │ └── docker │ └── b211c37 ├── cpu,cpuacct │ └── docker │ └── b211c37 ├── cpuset │ └── docker │ └── b211c37 ├── devices │ └── docker │ └── b211c37 ├── freezer │ └── docker │ └── b211c37 ├── hugetlb │ └── docker │ └── b211c37 ├── memory │ └── docker │ └── b211c37
  8. 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T cgroup virtual filesystem • Typically mounted at /sys/fs/cgroup • tasks virtual file holds all pids in the cgroup • Subsystem-specific files hold settings and utilisation data ├── cgroup.clone_children ├── cgroup.procs ├── cgroup.sane_behavior ├── cpuacct.stat ├── cpuacct.usage ├── cpuacct.usage_all ├── cpuacct.usage_percpu ├── cpuacct.usage_percpu_sys ├── cpuacct.usage_percpu_user ├── cpuacct.usage_sys ├── cpuacct.usage_user ├── cpu.cfs_period_us ├── cpu.cfs_quota_us ├── cpu.rt_period_us ├── cpu.rt_runtime_us ├── cpu.shares ├── cpu.stat ├── notify_on_release ├── release_agent └── tasks
  9. 9. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  10. 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T What can you use cgroups for? • cgroups can be used independently of containers • cgroups define resource limits for processes • Monitor processes and organise them • Be careful not to break any assumptions your container runtime or orchestrator might make
  11. 11. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  12. 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T What namespaces are available? • Network • Filesystem (mounts) • Processes (pid) • Inter-process communication (ipc) • Hostname and domain name (uts) • User and group IDs • cgroup
  13. 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Namespace structure Process A Process B Process C Process D pid:[2]pid:[1] pid:[3] net:[4] net:[5] net:[6] mount:[7] mount:[8]
  14. 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T procfs virtual filesystem • Namespaces are visible in /proc organised by PID • Files are symbolic links to the namespace • The link contains the namespace type and inode number to identify the namespace $ readlink /proc/$$/ns/* cgroup:[4026531835] ipc:[4026531839] mnt:[4026531840] net:[4026531993] pid:[4026531836] user:[4026531837] uts:[4026531838]
  15. 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Network namespace • Frequently used in containers • docker run uses a separate network namespace per container • Multiple containers can share a network namespace • Kubernetes pods • Amazon Elastic Container Service (Amazon ECS) tasks with the awsvpc networking mode • Improve isolation by creating dedicated network interfaces • ECS awsvpc networking mode • EKS amazon-vpc-cni-k8s plugin
  16. 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Mount namespace • Provides containers with their own filesystem • Container image is mounted as the root filesystem bash-4.2# mount overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/overl ay2/l/Q5EBZ7CIJYELLG2MBKZIRRFWW6:/var/lib/d ocker/overlay2/l/PKATP76T57BQZ5D44JXYFIB26E ,upperdir=/var/lib/docker/overlay2/88816f95 10a9ff38b31eaaceccbef6ffc9cc3c06bcc451f9684 850db5ee1b152/diff,workdir=/var/lib/docker/ overlay2/88816f9510a9ff38b31eaaceccbef6ffc9 cc3c06bcc451f9684850db5ee1b152/work) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,p tmxmode=666) sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,relatime,mode=755)
  17. 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Creating namespaces • clone(2) and unshare(2) • CLONE_NEW* flags to specify which namespaces • clone(2) is for new processes to create new namespaces • unshare(2) is for existing processes to create new namespaces
  18. 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Entering namespaces • Open a file from /proc/$$/ns (or a bind- mount) • Pass to setns(2) to enter the existing namespace • Namespace remains open as long as the process is running, even if the original file goes away • nsenter(1) is a command for doing this interactively • ip-netns(8) works specifically for network namespaces
  19. 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Persisting namespaces • The kernel automatically garbage collects namespaces by reference-counting • New namespace remains open as long as • a process runs or • a mount is open • Bind-mount a file in /proc/$$/ns to another place on the filesystem $ mount --bind /proc/$$/ns/net /var/run/netns/lcp-demo
  20. 20. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  21. 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T How can you leverage this? • Use nsenter or ip netns to troubleshoot container networking • Monitor containers by entering the pid namespace • Access binaries in your containers with the mount namespace
  22. 22. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  23. 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T How Docker layers work • A copy-on-write view of your files • New files exist only in the top layer • When a file is modified, it is copied up to the top layer • Unmodified files exist in the layer they were added in • Deleted files are hidden, but still exist Top layer (read-write) Intermediate layer (read- only) Base layer (read-only)
  24. 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Overlay filesystem • Joins two directories (upper and lower) to form a union filesystem • Filename is unique identifier • An upperdir can have multiple lowerdirs • When writing to the overlay • lowerdir is not modified, all changes go to upperdir • Existing files are copied-up to the upperdir for modificiation • Whole file is copied, not just changed blocks • “Deleting” a file in the upperdir creates a whiteout • Files: character devices with 0/0 device number • Directories: xattr “trusted.overlay.opaque” set to “y”
  25. 25. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  26. 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T How can you leverage this? • Locate files in your layers • Examine which files and layers contribute to your disk usage • Understand the impact of writable files in your containers and how to reduce # du -h . | sort -hr 753M . 211M ./e33f37/diff 211M ./e33f37 204M ./e33f37/diff/usr 169M ./f87973/diff … # ls ./f87973 diff link # ls ./e33f37 diff link lower work
  27. 27. S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  28. 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Linux Kernel Container runtime Container 1 Container 2 Container 3 Container 4 Container 5 Container 6 Control Groups Namespaces Union filesystem
  29. 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T OCI runtime spec • Containers are “bundles” • Filesystem • JSON document • Filesystem can be a union • JSON document describes • cgroups • Namespaces • Additional mounts • Linux capabilities • Linux security modules • And more { "ociVersion": "1.0.1", ⋮ "root": { "path": "/var/lib/docker/overlay2/03004c/merged" }, ⋮ "hooks": { "prestart": [{"path": "/proc/9306/exe"}] }, "linux": { "resources": { "cpu": {"shares": 0}, "pids": {“limit": 0}, ⋮ }, "cgroupsPath": "/docker/bd5cebc8950c", "namespaces": [ {"type": "mount"}, {"type": "network"}, ⋮ ], ⋮ }
  30. 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Notebooks https://github.com/alcousins/lcp-notebooks
  31. 31. Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alastair Cousins acousins@amazon.com

×