SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
The State of Rootless Containers
Aleksa Sarai / SUSE
Akihiro Suda / NTT
@lordcyphar
@_AkihiroSuda_
Who are we?
Aleksa Sarai
• Senior Software Engineer
at SUSE.
• Maintainer of runc and
several other Open
Container Initiative
projects.
Akihiro Suda
• Software engineer at NTT
(the largest telco in Japan)
• Maintainer of Moby
(former Docker Engine),
BuildKit, containerd, and
etc...
Agenda
• What are Rootless Containers? What are they for?
– User Namespaces
– Network Namespaces
– Mount Namespaces
– cgroups
– Current adoption status
• Demo: “Usernetes”
Introduction to Rootless Containers
• Most container runtimes* require root privileges.
– ... and lack sufficient protections against privilege escalation.
• What can you do if you don't have (and can't get) root privileges?
– (Computing clusters in universities for example.)
• Rootless containers are containers that can be created and managed
without privileged codepaths (some caveats apply).
– Requires quite a few kernel technologies, as well as some
userspace tricks...
“The Security Argument”
Another justification is to avoid privileged codepaths entirely:
• No privilege escalation if you never actually have privileges!
docker:CVE-2014-9357 docker:CVE-2015-3629 docker:CVE-2015-3627
• Configuration mistakes cannot escalate privileges above the original
user. docker:CVE-2016-8867
• Path traversal vulnerabilities only affect paths the user can already
access. docker:CVE-2015-3630 k8s:CVE-2017-1002101 k8s:CVE-2017-1002102
docker:CVE-2018-15664
(This is not a panacea, the kernel features we use have had security flaws
in the past -- especially user namespaces. But you can also restrict their
usage inside rootless containers!)
User Namespaces
• The key component of rootless containers.
– Map UIDs/GIDs in the guest to different UIDs/GIDs on the host.
– Unprivileged (on the host) users can have (limited) root inside!
• Root has UID 0 and full capabilities, but obvious restrictions apply.
– Inaccessible files, inserting kernel modules, rebooting, ...
• Unprivileged users can map only their own UID/GID (to itself or root).
– We need something better to be able to use package managers.
User Namespaces
• To allow multi-user mappings, shadow-utils now provides newuidmap
and newgidmap (packaged by most distributions).
– SETUID binaries writing mappings configured in /etc/sub[ug]id
/etc/subuid:
1000:420000:65536
/proc/42/uid_map:
0 1000 1
1 420000 65536
Provided by the admin (real root)
User can configure map UIDs after
unsharing a user namespace
User Namespaces
Problems:
• SETUID binary can be dangerous
– newuidmap & newgidmap had two CVEs so far:
• CVE-2016-6252 (CVSS v3: 7.8): integer overflow issue
• CVE-2018-7169 (CVSS v3: 5.3): supplementary GID issue
• Hard to maintain subuid & subgid
– Having 64K sub-IDs should be ok for most cases, but to allow
nesting user namespaces, an enormous number of sub-IDs would
be needed
• Potential sub-ID (up to 4G entries) starvation, especially in
LDAP environments with many users
User Namespaces
Alternative way: Single-mapping mode + Ptrace + Xattr
• Single-mapping mode does not require newuidmap/newgidmap
• Ptrace can emulate fake sub-UIDs/sub-GIDs
– No need to hook all syscalls (unlike gVisor)
– Seccomp could be used as well in future
• Xattr (extended file attributes) can be used for persistent chown(2)
emulation (see user.rootlesscontainers).
Free from potential newuidmap/newgidmap CVEs
• But slow and no real isolation across sub-UIDs/sub-GIDs
• Almost adequate for image building purpose, but not panacea
Network Namespaces
An unprivileged user can create network namespaces by acquiring the root
in a user namespace, but cannot set up the veth pair across the parent and
the child (i.e. No internet connection)
• Note: isolating network namespace is not mandatory (but no iptables, bridges, no namespaced abstract
UNIX sockets)
The Internet
Host (“parent”)
UserNS + NetNS (“child”)
NetNS NetNS
Network Namespaces
Prior work: LXC uses SETUID binary (lxc-user-nic) for setting up the
veth pair across the parent and the child
Problem: SETUID binary can be dangerous!
• CVE-2017-5985 (CVSS v3: 3.3): netns privilege escalation
• CVE-2018-6556 (NEW! disclosure: 8/10/2018): arbitrary file open(2)
Network Namespaces
Our approach: use usermode network (“Slirp”) with a TAP device
• Completely unprivileged
The Internet
Host
UserNS + NetNS
NetNS NetNS
TAP
“Slirp” TAPFD
“sendfd” (SCM_RIGHTS cmsg)
Network Namespaces
Benchmark of several “Slirp” implementations:
• slirp4netns (our implementation based on QEMU) is the fastest because
it avoids copying packets across the namespaces
MTU=1500 MTU=4000 MTU=16384 MTU=65520
vde_plug 763 Mbps Unsupported Unsupported Unsupported
VPNKit 514 Mbps 526 Mbps 540 Mbps Unsupported
slirp4netns 1.07 Gbps 2.78 Gbps 4.55 Gbps 9.21 Gbps
cf. rootful veth 52.1 Gbps 45.4 Gbps 43.6 Gbps 51.5 Gbps
Benchmark: iperf3 (netns -> host), measured on Travis CI
See rootless-containers/rootlesskit#12
Network Namespaces
Setting up /etc/resolv.conf (without chroot) is mess…
• resolv.conf may point to 127.0.0.X (for systemd-resolved /
dnsmasq)
• But 127.0.0.X DNS is unaccessible from network namespaces
• We can use bind-mount for replacing resolv.conf, but it is often
forcibly unmounted by systemd-resolved / NetworkManager
Solution: isolate /etc
• Mount an empty tmpfs on /etc
• Create the new resolv.conf on the new /etc
• Create symlinks for the real /etc/*, except resolv.conf
Root Filesystems
Your container root filesystem has to live somewhere. Many filesystem
features used by “rootful” container runtimes aren’t available.
• Ubuntu allows overlayfs in a user namespace, but this isn't supported
upstream (due to security concerns).
• Btrfs allows unprivileged subvolume management, but requires
privileges to set it up beforehand.
• Devicemapper is completely locked away from us.
Root Filesystems
A “simple” work-around is to just extract images to a directory!
• It works … but people want storage deduplication.
Alternatives:
• Reflinks to a "known good" extracted image (inode exhaustion).
– (Can use on XFS, btrfs, ... but not ext4 family.)
• Unprivileged userspace overlayfs using FUSE (Linux >=4.18).
(Container images themselves have significant flaws as well.)
cgroups
/sys/fs/cgroup is a roadblock to many features we want in rootless
containers (accounting, pause and resume, even getting a list of PIDs!).
• By default completely owned by root (and managed by systemd).
There are a variety of workarounds, with various downsides:
• cgroup namespaces (with nsdelegate) only work in cgroupv2.
• LXC’s pam_cgfs requires installation of a PAM module (and only works
for logged-in users).
Current adoption
status
runc
Fully supported since 1.0.0-rc4 (merged March 2017).
• Some minor features don’t work because of outside restrictions.
• Originally only supported completely-unprivileged (no funny
business) mode.
With 1.0.0-rc5, it supports “partially privileged” mode:
• /sys/fs/cgroups can be used if they are set up to be writable.
• Multi-user mappings are supported if they are set up with
/etc/sub[ug]id.
CLONE_NEWCGROUP still not supported (but nsdelegate is v2-only).
umoci and orca-build
umoci is the original generic OCI image manipulation tool.
• https://github.com/openSUSE/umoci
• Supports extraction (unpack) and layer generation (repack).
• It has supported rootless mode since the beginning.
– Emulates CAP_DAC_OVERRIDE with recursive chmod.
– Supports persistent xattr-based chown(2) emulation.
orca-build was one of the first dameon-less OCI (Dockerfile) builders.
• Built on top of umoci, skopeo, and runc.
• Supports rootless building, and is only 500 lines of Python.
• Currently have plans to merge into umoci as a contrib/ wrapper.
BuildKit and img
• BuildKit: next-generation backend for `docker build`
– Integrated to Docker since v18.06, but can be also used as a
standalone daemon, with support for the rootless mode
– Uses the host network namespace at the moment
• Not a huge problem when BuildKit itself is containerized
– Rootless BuildKit has been used in OpenFaaS cloud
• img: rootless and daemonless image builder based on BuildKit, by
Jessie Frazelle
– Same as BuildKit but daemonless
Kaniko
• Google’s unprivileged container image builder
• Different from our approach
– Kaniko itself needs to be executed in a container (without
--privileged)
– Dockerfile RUN instructions are executed without creating nested
containers inside the Kaniko container
• A RUN instruction gains the root in the Kaniko container
• Seems inappropriate for malicious Dockerfiles due to the lack of isolation
– Potential cloud credential leakage: #106
Docker (Moby) & Podman
• Docker / Moby
– Rootless mode is being proposed: #37375
– Supports both slirp4netns and VPNKit for network isolation
– Even Swarm-mode works! (except overlay NW atm)
• Podman: Red Hat’s daemonless replacement for docker
– Already supports rootless mode
– Uses slirp4netns (Thanks Giuseppe Scrivano!)
Kubernetes & CRI runtimes
• kubelet, kube-proxy, and dockershim require a bunch of hacks for
running without cgroups and sysctl
– No hack needed for kube-apiserver and kube-scheduler
– POC available; Planning to propose KEP to SIG-node soon
• Alternative CRI runtimes:
– CRI-O: Already supports rootless mode
– containerd: rootless mode is on plan
• TODO: stability improvement & multi-node network
“Usernetes”
Experimental binary distribution of rootless Moby (Docker), CRI-O and
Kubernetes, installable under $HOME without mess
https://github.com/rootless-containers/usernetes
$ tar xjvf usernetes-x86_64.tbz
$ cd usernetes
$ ./run.sh
$ ./kubectl.sh run -it --image..
Demo: “Usernetes”
The State of Rootless Containers

Más contenido relacionado

La actualidad más candente

[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020Akihiro Suda
 
[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless modeAkihiro Suda
 
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...Akihiro Suda
 
[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit
[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit
[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKitAkihiro Suda
 
[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep Dive[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep DiveAkihiro Suda
 
[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container imagesAkihiro Suda
 
SCALE 2011 Deploying OpenStack with Chef
SCALE 2011 Deploying OpenStack with ChefSCALE 2011 Deploying OpenStack with Chef
SCALE 2011 Deploying OpenStack with ChefMatt Ray
 
Building images efficiently and securely on Kubernetes with BuildKit
Building images efficiently and securely on Kubernetes with BuildKitBuilding images efficiently and securely on Kubernetes with BuildKit
Building images efficiently and securely on Kubernetes with BuildKitNTT Software Innovation Center
 
Introduction and Deep Dive Into Containerd
Introduction and Deep Dive Into ContainerdIntroduction and Deep Dive Into Containerd
Introduction and Deep Dive Into ContainerdKohei Tokunaga
 
Docker engine - Indroduc
Docker engine - IndroducDocker engine - Indroduc
Docker engine - IndroducAl Gifari
 
Upstate DevOps - Containers 101 - March 28, 2019
Upstate DevOps - Containers 101 - March 28, 2019Upstate DevOps - Containers 101 - March 28, 2019
Upstate DevOps - Containers 101 - March 28, 2019Allen Vailliencourt
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013dotCloud
 
Docker open stack boston
Docker open stack bostonDocker open stack boston
Docker open stack bostondotCloud
 
Docker and OpenStack Boston Meetup
Docker and OpenStack Boston MeetupDocker and OpenStack Boston Meetup
Docker and OpenStack Boston MeetupKamesh Pemmaraju
 
Docker, Docker Swarm mangement tool - Gorae
Docker, Docker Swarm mangement tool - GoraeDocker, Docker Swarm mangement tool - Gorae
Docker, Docker Swarm mangement tool - GoraeRhio kim
 
Secure container: Kata container and gVisor
Secure container: Kata container and gVisorSecure container: Kata container and gVisor
Secure container: Kata container and gVisorChing-Hsuan Yen
 
Containers Sandboxing (KubeCon 2018)
Containers Sandboxing (KubeCon 2018)Containers Sandboxing (KubeCon 2018)
Containers Sandboxing (KubeCon 2018)Ariel Shuper
 
Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)Lalatendu Mohanty
 

La actualidad más candente (20)

ISC HPCW talks
ISC HPCW talksISC HPCW talks
ISC HPCW talks
 
[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020
 
[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode
 
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
 
[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit
[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit
[KubeConEU] Building images efficiently and securely on Kubernetes with BuildKit
 
[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep Dive[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep Dive
 
[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images[FOSDEM 2020] Lazy distribution of container images
[FOSDEM 2020] Lazy distribution of container images
 
SCALE 2011 Deploying OpenStack with Chef
SCALE 2011 Deploying OpenStack with ChefSCALE 2011 Deploying OpenStack with Chef
SCALE 2011 Deploying OpenStack with Chef
 
Building images efficiently and securely on Kubernetes with BuildKit
Building images efficiently and securely on Kubernetes with BuildKitBuilding images efficiently and securely on Kubernetes with BuildKit
Building images efficiently and securely on Kubernetes with BuildKit
 
Introduction and Deep Dive Into Containerd
Introduction and Deep Dive Into ContainerdIntroduction and Deep Dive Into Containerd
Introduction and Deep Dive Into Containerd
 
Docker engine - Indroduc
Docker engine - IndroducDocker engine - Indroduc
Docker engine - Indroduc
 
Upstate DevOps - Containers 101 - March 28, 2019
Upstate DevOps - Containers 101 - March 28, 2019Upstate DevOps - Containers 101 - March 28, 2019
Upstate DevOps - Containers 101 - March 28, 2019
 
App container rkt
App container rktApp container rkt
App container rkt
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013
 
Docker open stack boston
Docker open stack bostonDocker open stack boston
Docker open stack boston
 
Docker and OpenStack Boston Meetup
Docker and OpenStack Boston MeetupDocker and OpenStack Boston Meetup
Docker and OpenStack Boston Meetup
 
Docker, Docker Swarm mangement tool - Gorae
Docker, Docker Swarm mangement tool - GoraeDocker, Docker Swarm mangement tool - Gorae
Docker, Docker Swarm mangement tool - Gorae
 
Secure container: Kata container and gVisor
Secure container: Kata container and gVisorSecure container: Kata container and gVisor
Secure container: Kata container and gVisor
 
Containers Sandboxing (KubeCon 2018)
Containers Sandboxing (KubeCon 2018)Containers Sandboxing (KubeCon 2018)
Containers Sandboxing (KubeCon 2018)
 
Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)Introduction to Project atomic (CentOS Dojo Bangalore)
Introduction to Project atomic (CentOS Dojo Bangalore)
 

Similar a The State of Rootless Containers

Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containersRed Hat Developers
 
Docking postgres
Docking postgresDocking postgres
Docking postgresrycamor
 
Docker Security
Docker SecurityDocker Security
Docker Securityantitree
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stackNikos Kormpakis
 
[Podman Special Event] Kubernetes in Rootless Podman
[Podman Special Event] Kubernetes in Rootless Podman[Podman Special Event] Kubernetes in Rootless Podman
[Podman Special Event] Kubernetes in Rootless PodmanAkihiro Suda
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Hajime Tazaki
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetesDongwon Kim
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerAkihiro Nomura
 
Stateless Hypervisors at Scale
Stateless Hypervisors at ScaleStateless Hypervisors at Scale
Stateless Hypervisors at ScaleAntony Messerl
 
Unikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorAnil Madhavapeddy
 
Introduction to NetBSD kernel
Introduction to NetBSD kernelIntroduction to NetBSD kernel
Introduction to NetBSD kernelMahendra M
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Unikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSUnikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSDocker, Inc.
 
Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?Jérôme Petazzoni
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'aНастройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'acorehard_by
 

Similar a The State of Rootless Containers (20)

Podman rootless containers
Podman rootless containersPodman rootless containers
Podman rootless containers
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containers
 
Docking postgres
Docking postgresDocking postgres
Docking postgres
 
Docker Security
Docker SecurityDocker Security
Docker Security
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
 
[Podman Special Event] Kubernetes in Rootless Podman
[Podman Special Event] Kubernetes in Rootless Podman[Podman Special Event] Kubernetes in Rootless Podman
[Podman Special Event] Kubernetes in Rootless Podman
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetes
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 Supercomputer
 
Stateless Hypervisors at Scale
Stateless Hypervisors at ScaleStateless Hypervisors at Scale
Stateless Hypervisors at Scale
 
Unikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library Hypervisor
 
Introduction to NetBSD kernel
Introduction to NetBSD kernelIntroduction to NetBSD kernel
Introduction to NetBSD kernel
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Unikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOSUnikernels: the rise of the library hypervisor in MirageOS
Unikernels: the rise of the library hypervisor in MirageOS
 
Docker_AGH_v0.1.3
Docker_AGH_v0.1.3Docker_AGH_v0.1.3
Docker_AGH_v0.1.3
 
Containerization using docker and its applications
Containerization using docker and its applicationsContainerization using docker and its applications
Containerization using docker and its applications
 
Containerization using docker and its applications
Containerization using docker and its applicationsContainerization using docker and its applications
Containerization using docker and its applications
 
Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'aНастройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'a
 

Más de Akihiro Suda

20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
20240321 [KubeCon EU Pavilion] Lima.pdf_
20240321 [KubeCon EU Pavilion] Lima.pdf_20240321 [KubeCon EU Pavilion] Lima.pdf_
20240321 [KubeCon EU Pavilion] Lima.pdf_Akihiro Suda
 
20240320 [KubeCon EU Pavilion] containerd.pdf
20240320 [KubeCon EU Pavilion] containerd.pdf20240320 [KubeCon EU Pavilion] containerd.pdf
20240320 [KubeCon EU Pavilion] containerd.pdfAkihiro Suda
 
20240201 [HPC Containers] Rootless Containers.pdf
20240201 [HPC Containers] Rootless Containers.pdf20240201 [HPC Containers] Rootless Containers.pdf
20240201 [HPC Containers] Rootless Containers.pdfAkihiro Suda
 
[KubeConNA2023] Lima pavilion
[KubeConNA2023] Lima pavilion[KubeConNA2023] Lima pavilion
[KubeConNA2023] Lima pavilionAkihiro Suda
 
[KubeConNA2023] containerd pavilion
[KubeConNA2023] containerd pavilion[KubeConNA2023] containerd pavilion
[KubeConNA2023] containerd pavilionAkihiro Suda
 
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdfAkihiro Suda
 
[CNCF TAG-Runtime] Usernetes Gen2
[CNCF TAG-Runtime] Usernetes Gen2[CNCF TAG-Runtime] Usernetes Gen2
[CNCF TAG-Runtime] Usernetes Gen2Akihiro Suda
 
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...Akihiro Suda
 
The internals and the latest trends of container runtimes
The internals and the latest trends of container runtimesThe internals and the latest trends of container runtimes
The internals and the latest trends of container runtimesAkihiro Suda
 
[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilion[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilionAkihiro Suda
 
[KubeConEU2023] containerd pavilion
[KubeConEU2023] containerd pavilion[KubeConEU2023] containerd pavilion
[KubeConEU2023] containerd pavilionAkihiro Suda
 
[Container Plumbing Days 2023] Why was nerdctl made?
[Container Plumbing Days 2023] Why was nerdctl made?[Container Plumbing Days 2023] Why was nerdctl made?
[Container Plumbing Days 2023] Why was nerdctl made?Akihiro Suda
 
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
[FOSDEM2023] Bit-for-bit reproducible builds with DockerfileAkihiro Suda
 
[CNCF TAG-Runtime 2022-10-06] Lima
[CNCF TAG-Runtime 2022-10-06] Lima[CNCF TAG-Runtime 2022-10-06] Lima
[CNCF TAG-Runtime 2022-10-06] LimaAkihiro Suda
 
[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOSAkihiro Suda
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Akihiro Suda
 
[Docker Tokyo #35] Docker 20.10
[Docker Tokyo #35] Docker 20.10[Docker Tokyo #35] Docker 20.10
[Docker Tokyo #35] Docker 20.10Akihiro Suda
 
DockerとPodmanの比較
DockerとPodmanの比較DockerとPodmanの比較
DockerとPodmanの比較Akihiro Suda
 
[Container Runtime Meetup] runc & User Namespaces
[Container Runtime Meetup] runc & User Namespaces[Container Runtime Meetup] runc & User Namespaces
[Container Runtime Meetup] runc & User NamespacesAkihiro Suda
 

Más de Akihiro Suda (20)

20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
20240321 [KubeCon EU Pavilion] Lima.pdf_
20240321 [KubeCon EU Pavilion] Lima.pdf_20240321 [KubeCon EU Pavilion] Lima.pdf_
20240321 [KubeCon EU Pavilion] Lima.pdf_
 
20240320 [KubeCon EU Pavilion] containerd.pdf
20240320 [KubeCon EU Pavilion] containerd.pdf20240320 [KubeCon EU Pavilion] containerd.pdf
20240320 [KubeCon EU Pavilion] containerd.pdf
 
20240201 [HPC Containers] Rootless Containers.pdf
20240201 [HPC Containers] Rootless Containers.pdf20240201 [HPC Containers] Rootless Containers.pdf
20240201 [HPC Containers] Rootless Containers.pdf
 
[KubeConNA2023] Lima pavilion
[KubeConNA2023] Lima pavilion[KubeConNA2023] Lima pavilion
[KubeConNA2023] Lima pavilion
 
[KubeConNA2023] containerd pavilion
[KubeConNA2023] containerd pavilion[KubeConNA2023] containerd pavilion
[KubeConNA2023] containerd pavilion
 
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
 
[CNCF TAG-Runtime] Usernetes Gen2
[CNCF TAG-Runtime] Usernetes Gen2[CNCF TAG-Runtime] Usernetes Gen2
[CNCF TAG-Runtime] Usernetes Gen2
 
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
 
The internals and the latest trends of container runtimes
The internals and the latest trends of container runtimesThe internals and the latest trends of container runtimes
The internals and the latest trends of container runtimes
 
[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilion[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilion
 
[KubeConEU2023] containerd pavilion
[KubeConEU2023] containerd pavilion[KubeConEU2023] containerd pavilion
[KubeConEU2023] containerd pavilion
 
[Container Plumbing Days 2023] Why was nerdctl made?
[Container Plumbing Days 2023] Why was nerdctl made?[Container Plumbing Days 2023] Why was nerdctl made?
[Container Plumbing Days 2023] Why was nerdctl made?
 
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
 
[CNCF TAG-Runtime 2022-10-06] Lima
[CNCF TAG-Runtime 2022-10-06] Lima[CNCF TAG-Runtime 2022-10-06] Lima
[CNCF TAG-Runtime 2022-10-06] Lima
 
[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS[KubeCon EU 2022] Running containerd and k3s on macOS
[KubeCon EU 2022] Running containerd and k3s on macOS
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行
 
[Docker Tokyo #35] Docker 20.10
[Docker Tokyo #35] Docker 20.10[Docker Tokyo #35] Docker 20.10
[Docker Tokyo #35] Docker 20.10
 
DockerとPodmanの比較
DockerとPodmanの比較DockerとPodmanの比較
DockerとPodmanの比較
 
[Container Runtime Meetup] runc & User Namespaces
[Container Runtime Meetup] runc & User Namespaces[Container Runtime Meetup] runc & User Namespaces
[Container Runtime Meetup] runc & User Namespaces
 

Último

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 

Último (20)

Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 

The State of Rootless Containers

  • 1. The State of Rootless Containers Aleksa Sarai / SUSE Akihiro Suda / NTT @lordcyphar @_AkihiroSuda_
  • 2. Who are we? Aleksa Sarai • Senior Software Engineer at SUSE. • Maintainer of runc and several other Open Container Initiative projects. Akihiro Suda • Software engineer at NTT (the largest telco in Japan) • Maintainer of Moby (former Docker Engine), BuildKit, containerd, and etc...
  • 3. Agenda • What are Rootless Containers? What are they for? – User Namespaces – Network Namespaces – Mount Namespaces – cgroups – Current adoption status • Demo: “Usernetes”
  • 4. Introduction to Rootless Containers • Most container runtimes* require root privileges. – ... and lack sufficient protections against privilege escalation. • What can you do if you don't have (and can't get) root privileges? – (Computing clusters in universities for example.) • Rootless containers are containers that can be created and managed without privileged codepaths (some caveats apply). – Requires quite a few kernel technologies, as well as some userspace tricks...
  • 5. “The Security Argument” Another justification is to avoid privileged codepaths entirely: • No privilege escalation if you never actually have privileges! docker:CVE-2014-9357 docker:CVE-2015-3629 docker:CVE-2015-3627 • Configuration mistakes cannot escalate privileges above the original user. docker:CVE-2016-8867 • Path traversal vulnerabilities only affect paths the user can already access. docker:CVE-2015-3630 k8s:CVE-2017-1002101 k8s:CVE-2017-1002102 docker:CVE-2018-15664 (This is not a panacea, the kernel features we use have had security flaws in the past -- especially user namespaces. But you can also restrict their usage inside rootless containers!)
  • 6. User Namespaces • The key component of rootless containers. – Map UIDs/GIDs in the guest to different UIDs/GIDs on the host. – Unprivileged (on the host) users can have (limited) root inside! • Root has UID 0 and full capabilities, but obvious restrictions apply. – Inaccessible files, inserting kernel modules, rebooting, ... • Unprivileged users can map only their own UID/GID (to itself or root). – We need something better to be able to use package managers.
  • 7. User Namespaces • To allow multi-user mappings, shadow-utils now provides newuidmap and newgidmap (packaged by most distributions). – SETUID binaries writing mappings configured in /etc/sub[ug]id /etc/subuid: 1000:420000:65536 /proc/42/uid_map: 0 1000 1 1 420000 65536 Provided by the admin (real root) User can configure map UIDs after unsharing a user namespace
  • 8. User Namespaces Problems: • SETUID binary can be dangerous – newuidmap & newgidmap had two CVEs so far: • CVE-2016-6252 (CVSS v3: 7.8): integer overflow issue • CVE-2018-7169 (CVSS v3: 5.3): supplementary GID issue • Hard to maintain subuid & subgid – Having 64K sub-IDs should be ok for most cases, but to allow nesting user namespaces, an enormous number of sub-IDs would be needed • Potential sub-ID (up to 4G entries) starvation, especially in LDAP environments with many users
  • 9. User Namespaces Alternative way: Single-mapping mode + Ptrace + Xattr • Single-mapping mode does not require newuidmap/newgidmap • Ptrace can emulate fake sub-UIDs/sub-GIDs – No need to hook all syscalls (unlike gVisor) – Seccomp could be used as well in future • Xattr (extended file attributes) can be used for persistent chown(2) emulation (see user.rootlesscontainers). Free from potential newuidmap/newgidmap CVEs • But slow and no real isolation across sub-UIDs/sub-GIDs • Almost adequate for image building purpose, but not panacea
  • 10. Network Namespaces An unprivileged user can create network namespaces by acquiring the root in a user namespace, but cannot set up the veth pair across the parent and the child (i.e. No internet connection) • Note: isolating network namespace is not mandatory (but no iptables, bridges, no namespaced abstract UNIX sockets) The Internet Host (“parent”) UserNS + NetNS (“child”) NetNS NetNS
  • 11. Network Namespaces Prior work: LXC uses SETUID binary (lxc-user-nic) for setting up the veth pair across the parent and the child Problem: SETUID binary can be dangerous! • CVE-2017-5985 (CVSS v3: 3.3): netns privilege escalation • CVE-2018-6556 (NEW! disclosure: 8/10/2018): arbitrary file open(2)
  • 12. Network Namespaces Our approach: use usermode network (“Slirp”) with a TAP device • Completely unprivileged The Internet Host UserNS + NetNS NetNS NetNS TAP “Slirp” TAPFD “sendfd” (SCM_RIGHTS cmsg)
  • 13. Network Namespaces Benchmark of several “Slirp” implementations: • slirp4netns (our implementation based on QEMU) is the fastest because it avoids copying packets across the namespaces MTU=1500 MTU=4000 MTU=16384 MTU=65520 vde_plug 763 Mbps Unsupported Unsupported Unsupported VPNKit 514 Mbps 526 Mbps 540 Mbps Unsupported slirp4netns 1.07 Gbps 2.78 Gbps 4.55 Gbps 9.21 Gbps cf. rootful veth 52.1 Gbps 45.4 Gbps 43.6 Gbps 51.5 Gbps Benchmark: iperf3 (netns -> host), measured on Travis CI See rootless-containers/rootlesskit#12
  • 14. Network Namespaces Setting up /etc/resolv.conf (without chroot) is mess… • resolv.conf may point to 127.0.0.X (for systemd-resolved / dnsmasq) • But 127.0.0.X DNS is unaccessible from network namespaces • We can use bind-mount for replacing resolv.conf, but it is often forcibly unmounted by systemd-resolved / NetworkManager Solution: isolate /etc • Mount an empty tmpfs on /etc • Create the new resolv.conf on the new /etc • Create symlinks for the real /etc/*, except resolv.conf
  • 15. Root Filesystems Your container root filesystem has to live somewhere. Many filesystem features used by “rootful” container runtimes aren’t available. • Ubuntu allows overlayfs in a user namespace, but this isn't supported upstream (due to security concerns). • Btrfs allows unprivileged subvolume management, but requires privileges to set it up beforehand. • Devicemapper is completely locked away from us.
  • 16. Root Filesystems A “simple” work-around is to just extract images to a directory! • It works … but people want storage deduplication. Alternatives: • Reflinks to a "known good" extracted image (inode exhaustion). – (Can use on XFS, btrfs, ... but not ext4 family.) • Unprivileged userspace overlayfs using FUSE (Linux >=4.18). (Container images themselves have significant flaws as well.)
  • 17. cgroups /sys/fs/cgroup is a roadblock to many features we want in rootless containers (accounting, pause and resume, even getting a list of PIDs!). • By default completely owned by root (and managed by systemd). There are a variety of workarounds, with various downsides: • cgroup namespaces (with nsdelegate) only work in cgroupv2. • LXC’s pam_cgfs requires installation of a PAM module (and only works for logged-in users).
  • 19. runc Fully supported since 1.0.0-rc4 (merged March 2017). • Some minor features don’t work because of outside restrictions. • Originally only supported completely-unprivileged (no funny business) mode. With 1.0.0-rc5, it supports “partially privileged” mode: • /sys/fs/cgroups can be used if they are set up to be writable. • Multi-user mappings are supported if they are set up with /etc/sub[ug]id. CLONE_NEWCGROUP still not supported (but nsdelegate is v2-only).
  • 20. umoci and orca-build umoci is the original generic OCI image manipulation tool. • https://github.com/openSUSE/umoci • Supports extraction (unpack) and layer generation (repack). • It has supported rootless mode since the beginning. – Emulates CAP_DAC_OVERRIDE with recursive chmod. – Supports persistent xattr-based chown(2) emulation. orca-build was one of the first dameon-less OCI (Dockerfile) builders. • Built on top of umoci, skopeo, and runc. • Supports rootless building, and is only 500 lines of Python. • Currently have plans to merge into umoci as a contrib/ wrapper.
  • 21. BuildKit and img • BuildKit: next-generation backend for `docker build` – Integrated to Docker since v18.06, but can be also used as a standalone daemon, with support for the rootless mode – Uses the host network namespace at the moment • Not a huge problem when BuildKit itself is containerized – Rootless BuildKit has been used in OpenFaaS cloud • img: rootless and daemonless image builder based on BuildKit, by Jessie Frazelle – Same as BuildKit but daemonless
  • 22. Kaniko • Google’s unprivileged container image builder • Different from our approach – Kaniko itself needs to be executed in a container (without --privileged) – Dockerfile RUN instructions are executed without creating nested containers inside the Kaniko container • A RUN instruction gains the root in the Kaniko container • Seems inappropriate for malicious Dockerfiles due to the lack of isolation – Potential cloud credential leakage: #106
  • 23. Docker (Moby) & Podman • Docker / Moby – Rootless mode is being proposed: #37375 – Supports both slirp4netns and VPNKit for network isolation – Even Swarm-mode works! (except overlay NW atm) • Podman: Red Hat’s daemonless replacement for docker – Already supports rootless mode – Uses slirp4netns (Thanks Giuseppe Scrivano!)
  • 24. Kubernetes & CRI runtimes • kubelet, kube-proxy, and dockershim require a bunch of hacks for running without cgroups and sysctl – No hack needed for kube-apiserver and kube-scheduler – POC available; Planning to propose KEP to SIG-node soon • Alternative CRI runtimes: – CRI-O: Already supports rootless mode – containerd: rootless mode is on plan • TODO: stability improvement & multi-node network
  • 25. “Usernetes” Experimental binary distribution of rootless Moby (Docker), CRI-O and Kubernetes, installable under $HOME without mess https://github.com/rootless-containers/usernetes $ tar xjvf usernetes-x86_64.tbz $ cd usernetes $ ./run.sh $ ./kubectl.sh run -it --image..