SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
containerd Deep Dive
Akihiro Suda (NTT) & Wei Fu (Alibaba Cloud)
KubeCon EU 2020 Virtual (Aug 19, 2020)
containerd overview
Akihiro Suda (NTT)
What is ?
● “Mid-level Container runtime”
○ Below platforms (Docker, Kubernetes)
○ Above lower level runtimes (runc)
● Resource Manager
○ Container processes
○ Image artifacts
○ Filesystem snapshots
○ Metadata and dependencies
● CNCF graduated project since February 2019
○ Following Kubernetes, Prometheus, Envoy,
and CoreDNS
Highly customizable
● Runtime plugins
○ Runc, gVisor, Kata, Firecracker...
● Snapshotter plugins
○ OverlayFS, BtrFS, ZFS, …
● Content store plugins
○ Local, IPFS...
● Stream processor plugins
○ ImgCrypt, zstd...
Adoption of containerd
● Container engines
● Kubernetes distributions
● Managed Kubernetes Services
Docker & Moby k3c PouchContainer
k3s kubespray microk8s
Alibaba ACK
Amazon EKS
(Fargate nodes)
Azure AKS
Google GKE IBM IKS
kind minikube
Charmed
Kubernetes
And more...
Adoption of containerd
● BuildKit
○ The modern implementation of `docker build`
● LinuxKit
○ Small Linux distro with containerd as the init
● Faasd
○ OpenFaaS for containerd
● VMware Fusion Nautilus
○ containerd on macOS, using VMware as the runtime plugin
Upcoming features in v1.4
Akihiro Suda (NTT)
Lazy pulling of images
● Run containers before completion of downloading the images
● Use cases:
○ Python/Ruby/Java/dotNET images
○ FaaS
○ Web apps with huge amount of HTML templates and media files
○ Jupyter Notebooks with big data samples included
○ Full GNOME/KDE desktop
Lazy pulling of images: Stargz & eStargz
● The containerd snapshotter plugin for Stargz & eStargz
https://github.com/containerd/stargz-snapshotter
● Stargz: seekable tar.gz for lazy-pullable container images
● eStargz: extended Stargz for batching frequently used files
● Both are fully compatible with legacy OCI tar.gz
Lazy pulling of images: Stargz & eStargz
Metadata 0
File 0
Metadata 1
File 1
Metadata {n-1}
File {n-1}
Footer
...
gzip
legacy tar.gz Stargz
Metadata 0
File 0
gzip
Metadata 1
File 1
gzip
...
Metadata {n-1}
File {n-1}
gzip
gzip
Footer
Metadata
stargz.index.json
Can’t inspect file offsets without
reading the whole archive
Can inspect the file offsets
immediately
Lazy pulling of images: Stargz & eStargz
● eStargz profiles the actual file access pattern and reorders the file entries,
so that relevant files can be prefetched in a single HTTP request
/usr/bin/apt-get
/bin/ls
/bin/vi
/lib/libc.so
/lib/libjpeg.so
/usr/bin/python3
.../usr/lib/python3/.../foo
/usr/lib/python3/.../bar
/app.py
/bin/ls
/app.py
/usr/bin/python3
/lib/libc.so
/usr/lib/python3/.../foo
/usr/lib/python3/.../bar
.../bin/vi
/lib/libjpeg.so
/usr/bin/apt-get
Stargz eStargz
Lazy pulling of images: Stargz & eStargz
Lazy pulling of images: Stargz & eStargz
Yesterday’s talk
https://sched.co/ZepQ
Support for SELinux MCS on CRI mode
● MCS: multi-category security
Containers
Volumes
UID=0
C42
UID=0
C42
UID=0
C43
UID=0
C43
Support for cgroup v2
● The new cgroup hierarchy, adopted by Fedora (since 31)
● Simpler layout
○ V1: /sys/fs/cgroup/{memory,cpu,devices,pids….}/foo
○ V2: /sys/fs/cgroup/foo
● Supports eBPF integration, pressure metrics, improved OOM control...
● Friendly to non-root users
Improved support for rootless mode
● Run containerd (and relevant components) as a non-root user
● Protect the host from potential vulnerabilities
● Adoption in containerd-related projects
○ Docker
○ BuildKit
○ k3s
○ k3c (on plan)
○ Kubernetes (on proposal, KEP 1371)
Improved support for rootless mode
● [v1.3] No support for resource limitation (docker run --cpus … --memory ...)
○ Because unprivileged users cannot control cgroups
● [v1.3] No support for overlayfs snapshotter
○ Because unprivileged users cannot mount overlayfs
(except on Ubuntu/Debian kernels)
○ “Native” snapshotter can be used, but slow and wastes the disk
Improved support for rootless mode
● [v1.3] No support for resource limitation (docker run --cpus … --memory ...)
○ Because unprivileged users cannot control cgroups
● [v1.3] No support for overlayfs snapshotter
○ Because unprivileged users cannot mount overlayfs
(except on Ubuntu/Debian kernels)
○ “Native” snapshotter can be used, but slow and wastes the disk
→ v1.4 supports resource limitation
(requires cgroup v2 and systemd)
→ v1.4 supports FUSE-OverlayFS snapshotter
(requires kernel >= 4.18)
Demo: Rootless Kubernetes with Cgroup v2
“Usernetes” https://github.com/rootless-containers/usernetes
https://asciinema.org/a/349859
Other changes in v1.4
● Windows CRI
● systemd NOTIFY_SOCKET
● Support reloading CNI config without restarting the daemon
● Socat binary is no longer needed
Release note: https://github.com/containerd/containerd/releases
v1.5 planning
● NRI: Node Resource Interface (#4411)
○ The new common interface for node resources such as cgroup
○ The plugin spec is very similar to CNI
● Sandbox API (#4131)
○ Pod sandbox as a first-class object
○ No “/pause” process
● Filesystem quota (#759)
containerd: external plugins
Wei Fu (Alibaba Cloud)
Backend as external plugins
● Big goal - no re-compilation required!!!
● Stream processors
● gRPC proxy plugin for image storage
● RuntimeV2 proto for OCI Runtime
Stream processor
● OCI Image layer data packaged in tar archive
● OCI image spec only supports few compression algorithms
○ +gzip/+zstd, but +gzip is more common
● How to handle experimental media-type stream?
○ Or encryption purpose?
Image
Layer
Snapshot
Tar Stream
Processor
Diff Service
+gzip
Custom?
Stream processor
● Stream processor(SP) is binary plugin handling media-type stream
○ Accepts customize media-types, returns other one
○ Call binary for media-type converter
● Example
○ containerd/imgcrypt
Image
Layer
Snapshot
Tar
SP
Diff Service
Tar+Gzip
SP
Tar(+Gzip)+encrypted
SP
Other Customize SP
Stream processor - Demo
● Integrate with +zstd media-type
● asciinema link
[stream_processors]
[stream_processors."zstd"]
accepts = ["application/vnd.oci.image.layer.v1.tar+zstd"]
returns = "application/vnd.oci.image.layer.v1.tar"
path = "zstd"
args = ["-dcf"]
Snapshot proxy plugin
// Snapshot service manages snapshots
service Snapshots {
rpc Prepare(PrepareSnapshotRequest) returns (PrepareSnapshotResponse);
rpc View(ViewSnapshotRequest) returns (ViewSnapshotResponse);
rpc Mounts(MountsRequest) returns (MountsResponse);
rpc Commit(CommitSnapshotRequest) returns (google.protobuf.Empty);
rpc Remove(RemoveSnapshotRequest) returns (google.protobuf.Empty);
rpc Stat(StatSnapshotRequest) returns (StatSnapshotResponse);
rpc Update(UpdateSnapshotRequest) returns (UpdateSnapshotResponse);
rpc List(ListSnapshotsRequest) returns (stream ListSnapshotsResponse);
rpc Usage(UsageRequest) returns (UsageResponse);
}
Snapshot proxy plugin
package main
import(
"net"
"log"
"github.com/containerd/containerd/api/services/snapshots/v1"
"github.com/containerd/containerd/contrib/snapshotservice"
)
func main() {
rpc := grpc.NewServer()
sn := CustomSnapshotter()
service := snapshotservice.FromSnapshotter(sn)
snapshots.RegisterSnapshotsServer(rpc, service)
// Listen and serve
l, err := net.Listen("unix", "/var/run/mysnapshotter.sock")
if err != nil {
log.Fatalf("error: %vn", err)
}
if err := rpc.Serve(l); err != nil {
log.Fatalf("error: %vn", err)
}
}
● Configure with proxy_plugins
● Example
○ stargz-snapshotter
○ CVMFS Containerd Snapshotter
[proxy_plugins]
[proxy_plugins.customsnapshot]
type = "snapshot"
address = "/var/run/mysnapshotter.sock"
Runtime V2
● A first class shim API for runtime authors to integrate with containerd
○ More VM like runtimes have internal state and more abstract actions
○ A CLI approach introduces issues with state management
○ Each runtimes has its own values, but keep containerd in solid core scope
● Example
○ gVisor
○ KataContainer
○ Firecracker
Runtime V2
service Task {
rpc State(StateRequest) returns (StateResponse);
rpc Create(CreateTaskRequest) returns (CreateTaskResponse);
rpc Start(StartRequest) returns (StartResponse);
rpc Delete(DeleteRequest) returns (DeleteResponse);
rpc Pids(PidsRequest) returns (PidsResponse);
rpc Pause(PauseRequest) returns (google.protobuf.Empty);
rpc Resume(ResumeRequest) returns (google.protobuf.Empty);
rpc Checkpoint(CheckpointTaskRequest) returns (google.protobuf.Empty);
rpc Kill(KillRequest) returns (google.protobuf.Empty);
rpc Exec(ExecProcessRequest) returns (google.protobuf.Empty);
rpc ResizePty(ResizePtyRequest) returns (google.protobuf.Empty);
rpc CloseIO(CloseIORequest) returns (google.protobuf.Empty);
rpc Update(UpdateTaskRequest) returns (google.protobuf.Empty);
rpc Wait(WaitRequest) returns (WaitResponse);
rpc Stats(StatsRequest) returns (StatsResponse);
rpc Connect(ConnectRequest) returns (ConnectResponse);
rpc Shutdown(ShutdownRequest) returns (google.protobuf.Empty);
}
Runtime V2 - Binary
● Binary naming convention
○ Name io.containerd.runc.v2 --> Binary containerd-shim-runc-v2
■ So both io.containerd.runc.v1 and io.containerd.runc.v2 are runtime V2
■ runc.v2 supports grouping several containers with less resource
■ runc.v2 as CRI plugin’s default runtime
○ Via a runtime binary available in containerd’s PATH
● Required start/delete sub-commands
○ Resources created by container will be cleanup by delete sub-command
Runtime V2 - Logging
● fifo/npipe as default channel
○ Receiver consumes more resources to handle log output.
dockerd
CRI-plugin
containerd shim
kernel
Named Pipe
Runtime V2 - Logging
● fifo/npipe as default channel
○ Receiver consumes more resources to handle log output.
○ And it requires that receiver must be alive!!!
○ Impact running containers if receiver is down too long.
containerd shim
Named Pipe
kernel
Runtime V2 - Logging
● Support pluggable logging via STDIO URIs
○ fifo - Linux (default)
○ npipe - Windows (default)
○ binary - Linux & Windows
○ file - Linux & Windows
schema path:// ?key=valueSTDIO URI
file file :// /var/log/cntr/hi ?maxSize=100MB
binary binary :// /usr/bin/syslog ?addr=192.168.0.3
Thank you

Más contenido relacionado

La actualidad más candente

Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
Brendan Gregg
 

La actualidad más candente (20)

Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
 
An overview of the Kubernetes architecture
An overview of the Kubernetes architectureAn overview of the Kubernetes architecture
An overview of the Kubernetes architecture
 
Docker, LinuX Container
Docker, LinuX ContainerDocker, LinuX Container
Docker, LinuX Container
 
Nginx Reverse Proxy with Kafka.pptx
Nginx Reverse Proxy with Kafka.pptxNginx Reverse Proxy with Kafka.pptx
Nginx Reverse Proxy with Kafka.pptx
 
CD using ArgoCD(KnolX).pdf
CD using ArgoCD(KnolX).pdfCD using ArgoCD(KnolX).pdf
CD using ArgoCD(KnolX).pdf
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Kubernetes Introduction
Kubernetes IntroductionKubernetes Introduction
Kubernetes Introduction
 
Intro to kubernetes
Intro to kubernetesIntro to kubernetes
Intro to kubernetes
 
Introduction to Git and GitHub
Introduction to Git and GitHubIntroduction to Git and GitHub
Introduction to Git and GitHub
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
 
Containerd + buildkit breakout
Containerd + buildkit breakoutContainerd + buildkit breakout
Containerd + buildkit breakout
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
 

Similar a [KubeCon EU 2020] containerd Deep Dive

A Kernel of Truth: Intrusion Detection and Attestation with eBPF
A Kernel of Truth: Intrusion Detection and Attestation with eBPFA Kernel of Truth: Intrusion Detection and Attestation with eBPF
A Kernel of Truth: Intrusion Detection and Attestation with eBPF
oholiab
 
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...
DevSecCon
 
LXC Docker and the Future of Software Delivery
LXC Docker and the Future of Software DeliveryLXC Docker and the Future of Software Delivery
LXC Docker and the Future of Software Delivery
Docker, Inc.
 

Similar a [KubeCon EU 2020] containerd Deep Dive (20)

Extended and embedding: containerd update & project use cases
Extended and embedding: containerd update & project use casesExtended and embedding: containerd update & project use cases
Extended and embedding: containerd update & project use cases
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!
 
OSDC 2016 | rkt and Kubernetes: What’s new with Container Runtimes and Orches...
OSDC 2016 | rkt and Kubernetes: What’s new with Container Runtimes and Orches...OSDC 2016 | rkt and Kubernetes: What’s new with Container Runtimes and Orches...
OSDC 2016 | rkt and Kubernetes: What’s new with Container Runtimes and Orches...
 
OSDC 2016 - rkt and Kubernentes what's new with Container Runtimes and Orches...
OSDC 2016 - rkt and Kubernentes what's new with Container Runtimes and Orches...OSDC 2016 - rkt and Kubernentes what's new with Container Runtimes and Orches...
OSDC 2016 - rkt and Kubernentes what's new with Container Runtimes and Orches...
 
containerd and CRI
containerd and CRIcontainerd and CRI
containerd and CRI
 
Making your app soar without a container manifest
Making your app soar without a container manifestMaking your app soar without a container manifest
Making your app soar without a container manifest
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy PullingFaster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
 
Настройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'aНастройка окружения для кросскомпиляции проектов на основе docker'a
Настройка окружения для кросскомпиляции проектов на основе docker'a
 
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special EditionIntroduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
Introduction to Docker, December 2014 "Tour de France" Bordeaux Special Edition
 
KubeCon EU 2016: "rktnetes": what's new with container runtimes and Kubernetes
KubeCon EU 2016: "rktnetes": what's new with container runtimes and KubernetesKubeCon EU 2016: "rktnetes": what's new with container runtimes and Kubernetes
KubeCon EU 2016: "rktnetes": what's new with container runtimes and Kubernetes
 
A Kernel of Truth: Intrusion Detection and Attestation with eBPF
A Kernel of Truth: Intrusion Detection and Attestation with eBPFA Kernel of Truth: Intrusion Detection and Attestation with eBPF
A Kernel of Truth: Intrusion Detection and Attestation with eBPF
 
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...
DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation...
 
Docker 0.11 at MaxCDN meetup in Los Angeles
Docker 0.11 at MaxCDN meetup in Los AngelesDocker 0.11 at MaxCDN meetup in Los Angeles
Docker 0.11 at MaxCDN meetup in Los Angeles
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013LXC, Docker, and the future of software delivery | LinuxCon 2013
LXC, Docker, and the future of software delivery | LinuxCon 2013
 
LXC Docker and the Future of Software Delivery
LXC Docker and the Future of Software DeliveryLXC Docker and the Future of Software Delivery
LXC Docker and the Future of Software Delivery
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
 
Enabling Security via Container Runtimes
Enabling Security via Container RuntimesEnabling Security via Container Runtimes
Enabling Security via Container Runtimes
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
Intro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps WorkshopIntro to Kubernetes & GitOps Workshop
Intro to Kubernetes & GitOps Workshop
 

Más de Akihiro Suda

Más de Akihiro Suda (20)

20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
20240321 [KubeCon EU Pavilion] Lima.pdf_
20240321 [KubeCon EU Pavilion] Lima.pdf_20240321 [KubeCon EU Pavilion] Lima.pdf_
20240321 [KubeCon EU Pavilion] Lima.pdf_
 
20240320 [KubeCon EU Pavilion] containerd.pdf
20240320 [KubeCon EU Pavilion] containerd.pdf20240320 [KubeCon EU Pavilion] containerd.pdf
20240320 [KubeCon EU Pavilion] containerd.pdf
 
20240201 [HPC Containers] Rootless Containers.pdf
20240201 [HPC Containers] Rootless Containers.pdf20240201 [HPC Containers] Rootless Containers.pdf
20240201 [HPC Containers] Rootless Containers.pdf
 
[Podman Special Event] Kubernetes in Rootless Podman
[Podman Special Event] Kubernetes in Rootless Podman[Podman Special Event] Kubernetes in Rootless Podman
[Podman Special Event] Kubernetes in Rootless Podman
 
[KubeConNA2023] Lima pavilion
[KubeConNA2023] Lima pavilion[KubeConNA2023] Lima pavilion
[KubeConNA2023] Lima pavilion
 
[KubeConNA2023] containerd pavilion
[KubeConNA2023] containerd pavilion[KubeConNA2023] containerd pavilion
[KubeConNA2023] containerd pavilion
 
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
 
[CNCF TAG-Runtime] Usernetes Gen2
[CNCF TAG-Runtime] Usernetes Gen2[CNCF TAG-Runtime] Usernetes Gen2
[CNCF TAG-Runtime] Usernetes Gen2
 
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
 
The internals and the latest trends of container runtimes
The internals and the latest trends of container runtimesThe internals and the latest trends of container runtimes
The internals and the latest trends of container runtimes
 
[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilion[KubeConEU2023] Lima pavilion
[KubeConEU2023] Lima pavilion
 
[KubeConEU2023] containerd pavilion
[KubeConEU2023] containerd pavilion[KubeConEU2023] containerd pavilion
[KubeConEU2023] containerd pavilion
 
[Container Plumbing Days 2023] Why was nerdctl made?
[Container Plumbing Days 2023] Why was nerdctl made?[Container Plumbing Days 2023] Why was nerdctl made?
[Container Plumbing Days 2023] Why was nerdctl made?
 
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
 
[CNCF TAG-Runtime 2022-10-06] Lima
[CNCF TAG-Runtime 2022-10-06] Lima[CNCF TAG-Runtime 2022-10-06] Lima
[CNCF TAG-Runtime 2022-10-06] Lima
 
Dockerからcontainerdへの移行
Dockerからcontainerdへの移行Dockerからcontainerdへの移行
Dockerからcontainerdへの移行
 
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
[Paris Container Day 2021] nerdctl: yet another Docker & Docker Compose imple...
 
[Docker Tokyo #35] Docker 20.10
[Docker Tokyo #35] Docker 20.10[Docker Tokyo #35] Docker 20.10
[Docker Tokyo #35] Docker 20.10
 
[KubeCon EU 2021] Introduction and Deep Dive Into Containerd
[KubeCon EU 2021] Introduction and Deep Dive Into Containerd[KubeCon EU 2021] Introduction and Deep Dive Into Containerd
[KubeCon EU 2021] Introduction and Deep Dive Into Containerd
 

Último

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Último (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 

[KubeCon EU 2020] containerd Deep Dive

  • 1. containerd Deep Dive Akihiro Suda (NTT) & Wei Fu (Alibaba Cloud) KubeCon EU 2020 Virtual (Aug 19, 2020)
  • 3. What is ? ● “Mid-level Container runtime” ○ Below platforms (Docker, Kubernetes) ○ Above lower level runtimes (runc) ● Resource Manager ○ Container processes ○ Image artifacts ○ Filesystem snapshots ○ Metadata and dependencies ● CNCF graduated project since February 2019 ○ Following Kubernetes, Prometheus, Envoy, and CoreDNS
  • 4.
  • 5. Highly customizable ● Runtime plugins ○ Runc, gVisor, Kata, Firecracker... ● Snapshotter plugins ○ OverlayFS, BtrFS, ZFS, … ● Content store plugins ○ Local, IPFS... ● Stream processor plugins ○ ImgCrypt, zstd...
  • 6. Adoption of containerd ● Container engines ● Kubernetes distributions ● Managed Kubernetes Services Docker & Moby k3c PouchContainer k3s kubespray microk8s Alibaba ACK Amazon EKS (Fargate nodes) Azure AKS Google GKE IBM IKS kind minikube Charmed Kubernetes And more...
  • 7. Adoption of containerd ● BuildKit ○ The modern implementation of `docker build` ● LinuxKit ○ Small Linux distro with containerd as the init ● Faasd ○ OpenFaaS for containerd ● VMware Fusion Nautilus ○ containerd on macOS, using VMware as the runtime plugin
  • 8. Upcoming features in v1.4 Akihiro Suda (NTT)
  • 9. Lazy pulling of images ● Run containers before completion of downloading the images ● Use cases: ○ Python/Ruby/Java/dotNET images ○ FaaS ○ Web apps with huge amount of HTML templates and media files ○ Jupyter Notebooks with big data samples included ○ Full GNOME/KDE desktop
  • 10. Lazy pulling of images: Stargz & eStargz ● The containerd snapshotter plugin for Stargz & eStargz https://github.com/containerd/stargz-snapshotter ● Stargz: seekable tar.gz for lazy-pullable container images ● eStargz: extended Stargz for batching frequently used files ● Both are fully compatible with legacy OCI tar.gz
  • 11. Lazy pulling of images: Stargz & eStargz Metadata 0 File 0 Metadata 1 File 1 Metadata {n-1} File {n-1} Footer ... gzip legacy tar.gz Stargz Metadata 0 File 0 gzip Metadata 1 File 1 gzip ... Metadata {n-1} File {n-1} gzip gzip Footer Metadata stargz.index.json Can’t inspect file offsets without reading the whole archive Can inspect the file offsets immediately
  • 12. Lazy pulling of images: Stargz & eStargz ● eStargz profiles the actual file access pattern and reorders the file entries, so that relevant files can be prefetched in a single HTTP request /usr/bin/apt-get /bin/ls /bin/vi /lib/libc.so /lib/libjpeg.so /usr/bin/python3 .../usr/lib/python3/.../foo /usr/lib/python3/.../bar /app.py /bin/ls /app.py /usr/bin/python3 /lib/libc.so /usr/lib/python3/.../foo /usr/lib/python3/.../bar .../bin/vi /lib/libjpeg.so /usr/bin/apt-get Stargz eStargz
  • 13. Lazy pulling of images: Stargz & eStargz
  • 14. Lazy pulling of images: Stargz & eStargz Yesterday’s talk https://sched.co/ZepQ
  • 15. Support for SELinux MCS on CRI mode ● MCS: multi-category security Containers Volumes UID=0 C42 UID=0 C42 UID=0 C43 UID=0 C43
  • 16. Support for cgroup v2 ● The new cgroup hierarchy, adopted by Fedora (since 31) ● Simpler layout ○ V1: /sys/fs/cgroup/{memory,cpu,devices,pids….}/foo ○ V2: /sys/fs/cgroup/foo ● Supports eBPF integration, pressure metrics, improved OOM control... ● Friendly to non-root users
  • 17. Improved support for rootless mode ● Run containerd (and relevant components) as a non-root user ● Protect the host from potential vulnerabilities ● Adoption in containerd-related projects ○ Docker ○ BuildKit ○ k3s ○ k3c (on plan) ○ Kubernetes (on proposal, KEP 1371)
  • 18. Improved support for rootless mode ● [v1.3] No support for resource limitation (docker run --cpus … --memory ...) ○ Because unprivileged users cannot control cgroups ● [v1.3] No support for overlayfs snapshotter ○ Because unprivileged users cannot mount overlayfs (except on Ubuntu/Debian kernels) ○ “Native” snapshotter can be used, but slow and wastes the disk
  • 19. Improved support for rootless mode ● [v1.3] No support for resource limitation (docker run --cpus … --memory ...) ○ Because unprivileged users cannot control cgroups ● [v1.3] No support for overlayfs snapshotter ○ Because unprivileged users cannot mount overlayfs (except on Ubuntu/Debian kernels) ○ “Native” snapshotter can be used, but slow and wastes the disk → v1.4 supports resource limitation (requires cgroup v2 and systemd) → v1.4 supports FUSE-OverlayFS snapshotter (requires kernel >= 4.18)
  • 20. Demo: Rootless Kubernetes with Cgroup v2 “Usernetes” https://github.com/rootless-containers/usernetes https://asciinema.org/a/349859
  • 21. Other changes in v1.4 ● Windows CRI ● systemd NOTIFY_SOCKET ● Support reloading CNI config without restarting the daemon ● Socat binary is no longer needed Release note: https://github.com/containerd/containerd/releases
  • 22. v1.5 planning ● NRI: Node Resource Interface (#4411) ○ The new common interface for node resources such as cgroup ○ The plugin spec is very similar to CNI ● Sandbox API (#4131) ○ Pod sandbox as a first-class object ○ No “/pause” process ● Filesystem quota (#759)
  • 23. containerd: external plugins Wei Fu (Alibaba Cloud)
  • 24.
  • 25. Backend as external plugins ● Big goal - no re-compilation required!!! ● Stream processors ● gRPC proxy plugin for image storage ● RuntimeV2 proto for OCI Runtime
  • 26. Stream processor ● OCI Image layer data packaged in tar archive ● OCI image spec only supports few compression algorithms ○ +gzip/+zstd, but +gzip is more common ● How to handle experimental media-type stream? ○ Or encryption purpose? Image Layer Snapshot Tar Stream Processor Diff Service +gzip Custom?
  • 27. Stream processor ● Stream processor(SP) is binary plugin handling media-type stream ○ Accepts customize media-types, returns other one ○ Call binary for media-type converter ● Example ○ containerd/imgcrypt Image Layer Snapshot Tar SP Diff Service Tar+Gzip SP Tar(+Gzip)+encrypted SP Other Customize SP
  • 28. Stream processor - Demo ● Integrate with +zstd media-type ● asciinema link [stream_processors] [stream_processors."zstd"] accepts = ["application/vnd.oci.image.layer.v1.tar+zstd"] returns = "application/vnd.oci.image.layer.v1.tar" path = "zstd" args = ["-dcf"]
  • 29. Snapshot proxy plugin // Snapshot service manages snapshots service Snapshots { rpc Prepare(PrepareSnapshotRequest) returns (PrepareSnapshotResponse); rpc View(ViewSnapshotRequest) returns (ViewSnapshotResponse); rpc Mounts(MountsRequest) returns (MountsResponse); rpc Commit(CommitSnapshotRequest) returns (google.protobuf.Empty); rpc Remove(RemoveSnapshotRequest) returns (google.protobuf.Empty); rpc Stat(StatSnapshotRequest) returns (StatSnapshotResponse); rpc Update(UpdateSnapshotRequest) returns (UpdateSnapshotResponse); rpc List(ListSnapshotsRequest) returns (stream ListSnapshotsResponse); rpc Usage(UsageRequest) returns (UsageResponse); }
  • 30. Snapshot proxy plugin package main import( "net" "log" "github.com/containerd/containerd/api/services/snapshots/v1" "github.com/containerd/containerd/contrib/snapshotservice" ) func main() { rpc := grpc.NewServer() sn := CustomSnapshotter() service := snapshotservice.FromSnapshotter(sn) snapshots.RegisterSnapshotsServer(rpc, service) // Listen and serve l, err := net.Listen("unix", "/var/run/mysnapshotter.sock") if err != nil { log.Fatalf("error: %vn", err) } if err := rpc.Serve(l); err != nil { log.Fatalf("error: %vn", err) } } ● Configure with proxy_plugins ● Example ○ stargz-snapshotter ○ CVMFS Containerd Snapshotter [proxy_plugins] [proxy_plugins.customsnapshot] type = "snapshot" address = "/var/run/mysnapshotter.sock"
  • 31. Runtime V2 ● A first class shim API for runtime authors to integrate with containerd ○ More VM like runtimes have internal state and more abstract actions ○ A CLI approach introduces issues with state management ○ Each runtimes has its own values, but keep containerd in solid core scope ● Example ○ gVisor ○ KataContainer ○ Firecracker
  • 32. Runtime V2 service Task { rpc State(StateRequest) returns (StateResponse); rpc Create(CreateTaskRequest) returns (CreateTaskResponse); rpc Start(StartRequest) returns (StartResponse); rpc Delete(DeleteRequest) returns (DeleteResponse); rpc Pids(PidsRequest) returns (PidsResponse); rpc Pause(PauseRequest) returns (google.protobuf.Empty); rpc Resume(ResumeRequest) returns (google.protobuf.Empty); rpc Checkpoint(CheckpointTaskRequest) returns (google.protobuf.Empty); rpc Kill(KillRequest) returns (google.protobuf.Empty); rpc Exec(ExecProcessRequest) returns (google.protobuf.Empty); rpc ResizePty(ResizePtyRequest) returns (google.protobuf.Empty); rpc CloseIO(CloseIORequest) returns (google.protobuf.Empty); rpc Update(UpdateTaskRequest) returns (google.protobuf.Empty); rpc Wait(WaitRequest) returns (WaitResponse); rpc Stats(StatsRequest) returns (StatsResponse); rpc Connect(ConnectRequest) returns (ConnectResponse); rpc Shutdown(ShutdownRequest) returns (google.protobuf.Empty); }
  • 33. Runtime V2 - Binary ● Binary naming convention ○ Name io.containerd.runc.v2 --> Binary containerd-shim-runc-v2 ■ So both io.containerd.runc.v1 and io.containerd.runc.v2 are runtime V2 ■ runc.v2 supports grouping several containers with less resource ■ runc.v2 as CRI plugin’s default runtime ○ Via a runtime binary available in containerd’s PATH ● Required start/delete sub-commands ○ Resources created by container will be cleanup by delete sub-command
  • 34. Runtime V2 - Logging ● fifo/npipe as default channel ○ Receiver consumes more resources to handle log output. dockerd CRI-plugin containerd shim kernel Named Pipe
  • 35. Runtime V2 - Logging ● fifo/npipe as default channel ○ Receiver consumes more resources to handle log output. ○ And it requires that receiver must be alive!!! ○ Impact running containers if receiver is down too long. containerd shim Named Pipe kernel
  • 36. Runtime V2 - Logging ● Support pluggable logging via STDIO URIs ○ fifo - Linux (default) ○ npipe - Windows (default) ○ binary - Linux & Windows ○ file - Linux & Windows schema path:// ?key=valueSTDIO URI file file :// /var/log/cntr/hi ?maxSize=100MB binary binary :// /usr/bin/syslog ?addr=192.168.0.3