Talked at Container Plumbing Days about speeding up container startup by lazy pulling images on Kubernetes, containerd, BuildKit, Podman and CRI-O with eStargz and zstd:chunked.
eStargz and Stargz Snapshotter: https://github.com/containerd/stargz-snapshotter
zstd:chunked proposal: https://github.com/containers/storage/pull/775
Patch set to enable lazy pulling on Podman and CRI-O (a.k.a. Additional Layer Store): https://github.com/containers/storage/pull/795
https://github.com/containerd/stargz-snapshotter/pull/281
Starting up Containers Super Fast With Lazy Pulling of Images
1. Copyright(c)2021 NTT Corp. All Rights Reserved
Star%ng up Containers Super Fast With Lazy Pulling of Images
Kohei Tokunaga, NTT Corporation
Container Plumbing Days (March 10)
2. Copyright(c)2021 NTT Corp. All Rights Reserved
TL;DR
l Pull is one of the ?me-consuming steps in the container lifecycle
l OCI-alterna?ve but OCI-compa?ble image formats are trying to solve it by lazy pulling
• Legacy (lazy-pull-agnos?c) run?mes can run them (zstd requires recent run?mes).
• eStargz by containerd Stargz SnapshoJer subproject
• zstd:chunked discussed in Podman community
l Collabora9on in community
• Lazy pulling (eStargz) is available on: containerd, Kubernetes, BuildKit, Kaniko, go-containerregistry, nerdctl
• On-going discussion of “Addi9onal Layer Store” feature to enable lazy pulling on Podman and CRI-O
• hJps://github.com/containers/storage/pull/795
l On-going in 2021: Standardizing eStargz in OCI and improvements for stabilizing Stargz SnapshoJer
Host: EC2 Oregon (m5.2xlarge, Ubuntu 20.04)
Registry: GitHub Container Registry (ghcr.io)
Run?me: containerd + Stargz SnapshoJer (47596d3)
(See detailed info in the later slides)
[sec]
0 5 10 15 20 25 30
zstd:chunked
estargz
estargz-noopt
legacy
gcc:10.2.0 (Compiles and runs printf("hello"))
pull create run
3. Copyright(c)2021 NTT Corp. All Rights Reserved
Pull is 'me-consuming
pulling packages accounts for 76% of container start Cme,
but only 6.4% of that data is read [Harter et al. 2016]
[Harter et al. 2016] Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau. "Slacker: Fast DistribuCon with
Lazy Docker Containers". 14th USENIX Conference on File and Storage Technologies (FAST ’16). February 22–25, 2016, Santa Clara, CA, USA
Caching images
Minimizing image size
Cold start is still slow
Not all images are minimizable
Language runtimes, frameworks, etc.
Workarounds are known but not enough
Node
Registry
Image Container
pull run
4. Copyright(c)2021 NTT Corp. All Rights Reserved
Problem on the current OCI/Docker image
sha256:deadbeaf…
sha256:1a3b5c…
sha256:ffe63c…
sha256:6ccde1…
GET /v2/<image-name>/blobs/
bin/bash
bin/ls
etc/passwd
etc/group
usr/bin/apt
layer = tarball (+compression)
A container is a set of tarball layers
A container can’t be started until the all layers become locally available
l Need to scan the entire blob even for
extracting single file entry
• If the blob is gzip-compressed, it’s
non-seekable anymore
l No parallel extraction
• Need to scan the blob from the top,
sequentially
5. Copyright(c)2021 NTT Corp. All Rights Reserved
Alterna(ve image formats: eStargz and zstd:chunked
OCI run?mes
Container Registry
lazypull
doesn’t download the entire image on pull operation but fetches necessary chunks of contents on-demand
eStargz
zstd:chunked
BuildKit
l Enable lazy pulling of container images from the registry
l Compa?ble to OCI Image Specifica?on so lazy-pulling-agnos(c run(mes s(ll can run them
• eStargz by containerd Stargz SnapshoJer subproject is compa?ble to gzip layers
• zstd:chunked discussed in Podman community is compa?ble to zstd layers
• Note: zstd requires recent run?mes (discussed later)
CRI-O
Podman
7. Copyright(c)2021 NTT Corp. All Rights Reserved
eStargz overview
l A variant of gzip compression. Compa@ble to RFC1952 (gzip)
• Proposed by containerd Stargz SnapshoGer hGps://github.com/containerd/stargz-snapshoGer
• Based on the stargz by Google CRFS (hGps://github.com/google/crfs)
• eStargz comes with content verifica@on and performance op@miza@on
l gzip-compressed layer is supported by OCI/Docker Image Spec and common in the current ecosystem
• Legacy (lazy-pulling-agnos@c) run@mes s@ll can run eStargz (but without lazy pulling)
l Lazy pulling of eStargz starts to be supported in community
• Crea@ng eStargz: Kaniko, nerdctl, go-containerregistry, etc.
• Lazy pulling of eStargz: containerd (and Kubernetes), BuildKit
• Tracking at hGps://github.com/containerd/stargz-snapshoGer/issues/258
8. Copyright(c)2021 NTT Corp. All Rights Reserved
The structure of eStargz
l Seekable: Each file payload is compressed separately so can be extracted separately
• SEll compaEble to RFC 1952 (gzip)
• Each file/chunk can be fetched from the registry on-demand, using HTTP Range Request
l Metadata (name, type, owner, …) of all files are stored to TOC (Table Of Contents) entry
• Filesystem can be mounted without fetching the enEre blob
l PrioriAzed files enables to prefetch likely accessed files
⚠ eStargz is incompaEble to stargz: “footer” is changed to make eStargz compaEble to RFC 1952
eStargz
bin/ls
usr/bin/apt
entrypoint.sh
bin/bash PrioriAzed files
Prefetched by a single HTTP Range Request
TOC(Table Of Contents) and footer
l TOC contains files metadata, offset, etc…
l footer contains offset of TOC
⚠ footer is incompaEble to stargz
Files fetched on demand
Can also be aggressively downloadeed in
background
gzip member
per regular file/chunk
Each chunk can be fetched and
extracted separately,
using HTTP Range Request of registry
stargz
bin/ls
usr/bin/apt
entrypoint.sh
bin/bash
For more details: hUps://github.com/containerd/stargz-snapshoUer/blob/master/docs/stargz-estargz.md
9. Copyright(c)2021 NTT Corp. All Rights Reserved
Workload-based Op/miza/on
proc
container
Input image Output image
Convert & Op/mize
Profile file access
l Downloading each file/chunk on-demand costs extra overhead on each file access.
l Leveraging eStargz, CLI converter command ctr-remote provides workload-based op/miza/on
• Workload: entrypoint, envvar, etc… specified in Dockerfile (e.g. ENTRYPOINT)
l ctr-remote (opRmizer tool) analyzes which files are likely accessed during runRme
• Runs provided image and profiles all file accesses
• Regards accessed files are also likely accessed during runRme (= priori/zed files)
• Stargz SnapshoTer will prefetch these files when mounts this image
eStargz
For more details: hTps://github.com/containerd/stargz-snapshoTer/blob/master/docs/ctr-remote.md
10. Copyright(c)2021 NTT Corp. All Rights Reserved
Content Verifica-on in eStargz
chunkDigest chunkDigest chunkDigest
containerd.io/snapshot/stargz/toc.digest
file/chunk data file/chunk data file/chunk data
Verified on resolve
Verified on mount
Verified on each fetch
references by digest
references by digest
OCI Manifest
TOC (metadata of all files)
l Chunks are lazily pulled from registry on-demand
• so they cannot verified when mounKng the layer
l Chunks are “lazily” verified
• TOC (metadata file) records digests per chunk
• Each chunk can be verified when it’s fetched to the node
• TOC itself is verified when mounKng that layer using the digest wriPen in the manifest
(OCI AnnotaKon)
For more details: hPps://github.com/containerd/stargz-snapshoPer/blob/master/docs/verificaKon.md
(the above figure is from this doc)
12. Copyright(c)2021 NTT Corp. All Rights Reserved
zstd:chunked overview
l A variant of zstd compression discussed in Podman community.
• Proposed by Giuseppe Scrivano, Red Hat: hHps://github.com/containers/storage/pull/775
• Fast decompression by zstd
• The idea is based on stargz (i.e. compresses each file separately)
l Our lazy pulling implementaOon for zstd:chunked
• hHps://github.com/containerd/stargz-snapshoHer/pull/281
l zstd-compressed (non-chunked) layer is start to be supported by recent runOmes
• OCI image spec supports since August 2019
• hHps://github.com/opencontainers/image-spec/pull/788
• Containerd (v1.5 beta) supports zstd compressed layers.
• CRI-O, Podman (github.com/containers/storage) supports zstd layers.
• Docker: In progress hHps://github.com/moby/moby/issues/28394
13. Copyright(c)2021 NTT Corp. All Rights Reserved
The structure of zstd:chunked
Figure is from the proposal by Giuseppe Scrivano, Red Hat:
hGps://github.com/containers/storage/pull/775
l Seekable: Each file payload is compressed separately so can be extracted separately
l Metadata (name, type, owner, …) of all files are stored to the Manifest(= TOC of eStargz/stargz)
Ø Filesystem can be mounted without fetching the enUre blob
l Manifest(TOC) and Footer are prefixed by skippable frame
Ø Invisible to zstd:chunked-agnosUc runUmes (decompressor)
l Prefetch is not supported (currently)
14. Copyright(c)2021 NTT Corp. All Rights Reserved
Content Verifica-on in zstd:chunked
digest digest digest
io.containers.zstd-chunked.manifest-checksum
File payload File payload File payload
Verified on resolve
Verified on mount
Verified on each fetch
references by digest
references by digest
OCI Manifest
TOC Manifest (metadata of all files)
l “lazy” verificaJon based on eStargz
• Manifest (TOC; metadata file) records digests of all files payload
• Each file payload can be verified when it’s fetched to the node
• TOC itself is verified when mounJng that layer using the digest wriPen in the manifest
(OCI AnnotaJon)
16. Copyright(c)2021 NTT Corp. All Rights Reserved
Time to take for container startup
l Measures the container startup ?me which includes:
• Pulling an image from GitHub Container Registry
• For language container, running “print hello world” program in the container
• For server container, wai?ng for the readiness (un?l “up and running” message is printed)
• This method is based on Hello Bench [Harter, et al. 2016]
l Takes 95 percen?le of 100 opera?ons
l Host: EC2 Oregon (m5.2xlarge, Ubuntu 20.04)
l Registry: GitHub Container Registry (ghcr.io)
l Run?me: containerd + Stargz Snapsho[er (patched for enabling zstd:chunked)
• h[ps://github.com/containerd/stargz-snapsho[er/pull/281
• Commit: 47596d33fa737ada970432938e36e9eff93c19f
[Harter et al. 2016] Tyler Harter, Brandon Salmon, Rose Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau. "Slacker: Fast Distribu?on with
Lazy Docker Containers". 14th USENIX Conference on File and Storage Technologies (FAST ’16). February 22–25, 2016, Santa Clara, CA, USA
17. Copyright(c)2021 NTT Corp. All Rights Reserved
Time to take for container startup
[sec]
Waits for prefetch compleAon
0 5 10 15 20 25 30
zstd:chunked
estargz
estargz-noopt
legacy
gcc:10.2.0 (Compiles and runs prinJ("hello"))
pull create run
18. Copyright(c)2021 NTT Corp. All Rights Reserved
Time to take for container startup
[sec]
Waits for prefetch compleAon
0 5 10 15 20 25
zstd:chunked
estargz
estargz-noopt
legacy
tomcat:10.0.0-jdk15-openjdk-buster(runs unAl "Server startup" is printed)
pull create run
20. Copyright(c)2021 NTT Corp. All Rights Reserved
Lazy pulling on Kubernetes (eStargz)
Nodes
Stargz
Snapsho9er
External process
gRPC API
kubelet
Lazy
pull
Container Registry
Stargz
Snapsho9er
Stargz
Snapsho9er
l Lazy pulling can be enabled on Kubernetes without patches (eStargz is only supported currently)
l containerd + Stargz SnapshoEer (discussed later) are needed on each node (CRI-O is WIP)
l Real-world use-case at CERN for speeding up analysis pipeline [1] (13x faster pull for 5GB image)
l On-going integraUon with KinD and k3s
• KinD: hEps://github.com/k3s-io/k3s/pull/2936
• k3s: hEps://github.com/kubernetes-sigs/kind/pull/2076
[1] Ricardo Rocha & Spyridon Trigazis, CERN. “Speeding Up Analysis Pipelines with Remote Container Images”. KubeCon+CloudNaUveCon 2020
NA. hEps://sched.co/ekDj
image
21. Copyright(c)2021 NTT Corp. All Rights Reserved
Lazy pulling on containerd (eStargz)
proc
container
Node
Stargz
Snapsho8er
Lazy
pull
Container Registry
Fetching files/chunks on demand MounAng rooBs as FUSE
l Stargz Snapsho?er plugin enables lazy pulling of eStargz on containerd (zstd:chunked is on-proposal)
• Stargz Snapsho?er is developed in a non-core subproject of containerd
• h?ps://github.com/containerd/stargz-snapsho?er
l Mounts rooJs snapshots from registry as FUSE and downloads accessed file contents on-demand
• Mountable from arbitrary remote store by implemenPng dedicated remote snapsho?er
l nerdctl (Docker-compaPble CLI for containerd; h?ps://github.com/AkihiroSuda/nerdctl) supports lazy
pulling of eStargz on containerd
Implemented as a
“remote snapsho?er”
image
22. Copyright(c)2021 NTT Corp. All Rights Reserved
FROM ghcr.io/stargz-containers/golang:1.15.3-esgz as dev
COPY ./hello.go /hello.go
RUN go build -o hello /hello.go
COPY and RUN without wai2ng for the pull comple2on
Container Registry
Lazy pulling on BuildKit (eStargz)
golang:1.15.3-esgz
• /usr/local/go/bin/go
• /usr/local/go/src/fmt/…
etc...
Fetch files/chunks on demand
Build on node
Lazy
pull
l BuildKit > v0.8.0 experimentally supports lazy pulling of eStargz base images during build
• FROM instrucMon is skipped and chunks are lazily pulled on-demand during COPY/RUN/etc.
l Can shorten the Mme of build e.g. on temporary (and fresh) CI instances with big base images.
l More details at blog: hVps://medium.com/nVlabs/buildkit-lazypull-66c37690963f
• speeding up building ”hello world” image from tens of seconds to a few seconds at the best
image
23. Copyright(c)2021 NTT Corp. All Rights Reserved
Lazy pulling on Podman & CRI-O (eStargz,zstd:chunked) [WIP]
l AddiFonal layer store enables lazy pulling on Podman and CRI-O
• Patches are WIP: hGps://github.com/containers/storage/pull/795
l Podman/CRI-O use layers provided by addiLonal layer store for rooNs instead of pulling the
image from the registry
Node
AddiFonal
Layer Store
Lazy
pull
Remote Store
proc
container
Provides layers
Podman, CRI-O
Using layers for rooKs
image
24. Copyright(c)2021 NTT Corp. All Rights Reserved
Discussion status of addi-onal layer store
l Our PoC of addi@onal layer store: hCps://github.com/containerd/stargz-snapshoCer/pull/281
• Lazily pulls eStargz and zstd:chunked from the standard container registries
• Mounts image layers from the registry to the node
l On-going discussions:
• chunk verifica@on, expor@ng layers from addi@onal layer store, and GC
Node
proc
container
Moun-ng layers as FUSE
Podman, CRI-O
Using layers for rooFs
Addi-onal
Layer Store
Lazy
pull
Container Registry
Fetching files/chunks on demand
image
26. Copyright(c)2021 NTT Corp. All Rights Reserved
Updates will come in 2021
Standardizing Lazy pulling
l eStargz is proposed to OCI Image Spec
l Discussion is on-going
l Backward-compaHble extensions
hKps://github.com/opencontainers/image-spec/issues/815
Enabling lazy pulling in more projects hKps://github.com/containerd/stargz-snapshoKer/issues/258
l KinD: hKps://github.com/kubernetes-sigs/kind/pull/2076
l K3s: hKps://github.com/k3s-io/k3s/pull/2936
Features and improvements for stabilizing Stargz SnapshoAer & AddiDonal Layer Store
l Less-privileged authenHcaHon on Kubernetes: hKps://github.com/containerd/containerd/issues/5105
l Higher availability of Stargz SnapshoKer (mounHng images from mulHple backend registries)
l Improvements on memory consumpHon of Stargz SnapshoKer
l Speeding up image conversion
l StaHc opHmizaHon of images
l etc…
27. Copyright(c)2021 NTT Corp. All Rights Reserved
Summary
l Pull is one of the ?me-consuming steps in the container lifecycle
l OCI-alterna?ve but OCI-compa?ble image formats are trying to solve it by lazy pulling
• Legacy (lazy-pull-agnos?c) run?mes can run them (zstd requires recent run?mes).
• eStargz by containerd Stargz SnapshoJer subproject
• zstd:chunked discussed in Podman community
l Collabora6on in community
• Lazy pulling (eStargz) is available on: containerd, Kubernetes, BuildKit, Kaniko, go-containerregistry, nerdctl
• On-going discussion of “Addi6onal Layer Store” feature to enable lazy pulling on Podman and CRI-O
• hJps://github.com/containers/storage/pull/795
l On-going in 2021: Standardizing eStargz in OCI and improvements for stabilizing Stargz SnapshoJer