A hands-on workshop that covers 18 best practices in 4 categories or in other words ✅️ Dos & Don'ts.
After a general introduction, we will have a look at the essential practices (aka must do), then move to the image practices, then we will go through the security practices, and finally, some general practices.
Please note, this workshop assumes that you have a basic knowledge of Docker.
Hands-on repo:
https://github.com/aabouzaid/docker-best-practices-workshop
1. Docker Best Practices Workshop
How to work effectively with Docker
Ahmed AbouZaid, DevOps Engineer, Camunda
21.09.2021
2. 2
Ahmed AbouZaid
A passionate DevOps engineer, Cloud/Kubernetes
specialist, Free/Open source geek, and an author.
• I believe in self CI/CD (Continuous Improvements/Development)
also that “The whole is greater than the sum of its parts”
• DevOps transformation, automation, data, and metrics
are my preferred areas
• And I like to help both businesses and people to grow
Find me at:
tech.aabouzaid.com | linkedin.com/in/aabouzaid
About
September 2021, Kayaking in the Spree
✅ Do Kayaking 🚫 Don’t sit like that!
3. 3
Content
Quick Introduction
Essential Practices
• Use Dockerfile linter
• Check Docker language specific best practices
• Create a single application per Docker image
• Create configurable ephemeral containers
Image Practices
• Understanding Docker image
• Use optimal base image
• Pin versions everywhere
• Create image with the optimal size
• Use multi-stage whenever possible
• Avoid any unnecessary files
Security Practices
• Always use trusted images
• Never use untrusted resources
• Never store sensitive data in the image
• Use a non-root user
• Scan image vulnerabilities
Misc Practices
• Leverage Docker build cache
• Avoid system cache
• Create a unified image across envs
• Use ENTRYPOINT with CMD
Next steps
5. 5
Overview
In this workshop, in a hands-on approach,
we will cover 18 best practices in 4 categories
or in other words ✅ Dos & 🚫 Don'ts.
After a general introduction, we will have a
look on the essential practices (aka must do),
then move to the image practices, then
we will go through the security practices,
and finally, some general practices.
Please note, this workshop assumes that
you have a basic knowledge of Docker.
Timeline
• 30 min: Review the best practices
• 10 min: Questions
• 10 min: Break
• 20 min: Apply the best practices
• 20 min: Discussion
7. 7
Containers, Docker, and Kubernetes
Containers
Technology for packaging an application
along with its runtime dependencies
Docker
Docker is the de facto standard to build
and share containerized apps
Kubernetes
A cloud-native platform to manage
and orchestrate containers workloads
Image: o_m/Shutterstock
8. 8
Dockerfile, Docker Image, and Docker Container
Dockerfile
A text file contains a set of instructions that is used
to build a Docker image
Docker Image
A combination of layered filesystems stacked
on top of each other to create a customizable usable image
Docker Container
A runtime instance of a Docker image
10. 10
• First things first, use a Dockerfile linter!
Use hadolint!
• It will help you to apply best practice
by default
• By using hadolint, you will avoid
at least 50% of the Docker issues
• Use it via CLI or integrate it with IDE,
e.g. VS Code hadolint extension
1.1 Use Dockerfile linter
11. 11
• There are Docker general best practices that work
for all languages
• Usually each language group (e.g., interpreted,
native, JVM) has common best practices
• Some languages have their own best practices
• Check if the language that you use has language
specific best practices
1.2 Check Docker language-specific best practices
12. 12
• A Docker image with a single application
is more:
• Maintainable
• Scalable
• Secure
• Reusable
• Portable
• Multiple processes within container
usually a nightmare in development
as well as in operations
1.3 Create a single application per Docker image
Image: Docker.com - What is a Container?
13. 13
• “An ephemeral container can be stopped and destroyed, then rebuilt and
replaced with an absolute minimum set up and configuration”
• Avoid dynamic configuration at runtime whenever possible
• Set configuration defaults but don’t store env related configuration
• Follow “The Twelve-Factor App” methodology as much as possible
1.4 Create configurable ephemeral containers
15. 15
• Docker image is made of layers
• Docker image layers are immutable (Read-only)
• Each instruction in Dockerfile is a layer in Docker image
• The previous layers cannot be changed by next instructions
• Removing files from previous layer just hide them but they are still there
Understanding Docker image
Only “ADD”, “COPY”, “RUN”
can create filesystem layers
(which increase image size)
ℹ Note
16. 16
• Use official images or from well-known identities
• Use the smallest base image that fits your use case
• Avoid using generic images when good language specific images are available
2.1 Use optimal base image
✅ Do 🚫 Don’t
FROM python:3.8.10-alpine3.14 FROM alpine:3.14
RUN apk add 'python3=3.8.10-r0'
17. 17
• Never use base image without a tag or with ‘latest’ tag
• Avoid pinning to major version
• In most cases pinning minor version should be fine
• Pin up to patch version for critical components
• Also pin the version of the dependances
2.2 Pin versions everywhere
✅ Do 🚫 Don’t
FROM python:3.8
RUN pip install Flask==2.0.0
FROM python
RUN pip install Flask
18. 18
• As a rule of thumb, smaller Docker images are better
• However, be aware of:
• Too small base image means increase in the build time (CI)
• Too big base image means increase in the deploy time (CD)
• Try to find the sweet spot to balance between build and deploy time
according to your needs and use cases
2.3 Create image with the optimal size
✅ Do (or not) 🚫 Don’t
FROM node:14.17.6-alpine3.14
RUN apk add --no-cache curl
FROM alpine:3.14
RUN apk add --no-cache 'nodejs=14.17.6-r0' curl
Build time: 2s (3 builds avg, no layers cache)
Image size: 120MB
Build time: 6s (3 builds avg, no layers cache)
Image size: 46.3MB
19. 19
• Multi-stage feature allows you to build
smaller and cleaner images by splitting
the build image from the runtime image
• It’s super useful for languages that
create artifacts like Golang, Java, etc.
• Also it’s helpful to run various tests
during the development
• Additionally, it’s better for security
because it reduces the attack surface
2.4 Use multi-stage whenever possible
✅ Do
# Build stage.
FROM maven:3.6-openjdk-17 AS builder
[...]
RUN mvn clean package
# Runtime stage.
FROM openjdk:17-jdk-alpine3.14
COPY --from=builder /myapp.jar /opt/
ENTRYPOINT ["java", "-jar", "/opt/myapp.jar"]
20. 20
• Every extra file could increase build time, image size, or even both!
• Specify the files and paths that need to be part of the image
• Use “.dockerignore” to filter any unnecessary files
• If necessary, restructure your repo/code to have only needed files
in seperate folders
2.5 Avoid any unnecessary files
✅ Do 🚫 Don’t
FROM python
# Only needed files are added to the image
COPY myapp.py /opt
ENTRYPOINT ["python", "/opt/myapp.py"]
FROM python
# The whole repo/context is added to the image
COPY . /opt
ENTRYPOINT ["python", "/opt/myapp.py"]
22. 22
• Use image from trusted repositories
• Use official images whenever possible
• If no official image, use only images from well-known identities
• For critical components, don’t use public Docker repositories
• Sign your images with Docker Content Trust (DCT)
3.1 Always use trusted images
✅ Do 🚫 Don’t
FROM openjdk:12 FROM coolestGuyInTheTown/openjdk:12
23. 23
• Using a trusted image doesn’t help if untrusted resources are used in the image itself
• Always use resources from trusted sources
• When a Git resource is used, always use Git hash because Git tags are mutable
• In general, try to minimize number of external resources used in the image
✅ Do 🚫 Don’t
FROM alpine
# You know what you get exactly
ARG HELPER_SCRIPT_URL=
https://raw.githubusercontent.com/trusted-user/
awesome-scripts/5330224/some-helper-script.sh
# Or better:
COPY scripts/some-helper-script.sh /tmp
FROM alpine
# The resource could be changed anytime!
ARG HELPER_SCRIPT_URL=
https://raw.githubusercontent.com/random-user/
awesome-scripts/master/some-helper-script.sh
3.2 Never use untrusted resources
24. 24
• Any data saved in one of the layers cannot be removed in the next layer!
It will be only hidden and could be easily retrieved
• For runtime secrets, use env vars to access the sensitive data
• For build time secrets, use Docker BuildKit which allows to access sensitive data
securely during the build time (never use ARG for build time secrets)
3.3 Never store sensitive data in the image
✅ Do 🚫 Don’t
RUN --mount=type=secret,id=GITHUB_NPM_TOKEN
npm set //npm.pkg.github.com/:_authToken
$GITHUB_NPM_TOKEN && npm install
# This file will be stored in the image
COPY .npmrc .
RUN npm install && rm .npmrc
# Also build args will be stored in the image
ARG GITHUB_NPM_TOKEN
RUN npm set //npm.pkg.github.com/:_authToken
$GITHUB_NPM_TOKEN && npm install
$ export GITHUB_NPM_TOKEN=top_secret
$ export DOCKER_BUILDKIT=1
$ docker build --secret id=GITHUB_NPM_TOKEN .
25. 25
• By default, Docker will use “root” to execute the container commands
• Using root user is a bad practice and considered a security risk
• Always (or whenever possible) set “USER” instruction to a non-root user
• Remember that the user must already exist in the Docker image system
to be used with the “USER” instruction
3.4 Use a non-root user
✅ Do 🚫 Don’t
FROM alpine
USER nobody
CMD ["whoami"]
FROM alpine
# The root user will be used to execute commands
CMD ["whoami"]
Output: nobody Output: root
26. 26
• Docker images vulnerability scanning tools mainly aim to detect exploits
in the image libraries
• There are many solutions and tools like Trivy, Snyk, and even integrated
with cloud like GCR (Google Container Registry)
• Scan your images during development as well as in production
• Depends on your use case, scan your images with every build or at least daily
3.5 Scan image vulnerabilities
28. 28
• As mentioned before, Docker image
consists of a stack of immutable layers
• Each instruction of the Dockerfile is an
independent layer
• When a layer is generated it’s cached
locally to be reused again
• However, if there is a change
in one layer, its cache is invalidated
together with all next layers
4.1 Leverage Docker build cache
29. 29
• In Dockerfile, put less frequently changing instructions at the top of the file
and more likely changing instructions at the end of the file
• Docker build cache is super helpful in the local development as well as in CI/CD
(when the build is done on a single machine or with distributed caching layer)
4.1 Leverage Docker build cache (continued)
✅ Do 🚫 Don’t
FROM alpine
# The ENV and RUN layers will be reused
# even when the source code changed
ENV LOG_LEVEL=info
RUN apk add python3
COPY myapp.py /opt
FROM alpine
# Any change in the source code will invalidate
# the cache of all next layers
COPY myapp.py /opt
RUN apk add python3
ENV LOG_LEVEL=info
30. 30
4.2 Avoid system cache
• Systems use caching to speed up things that used frequently
• Each system is caching different things, for example package manager metadata
• In Docker images build, system caches usually don’t add any value
since containers are immutable and each command run in a single layer
• As a rule of thumb, avoid system caches because they increase image size
• Remember that each system has different options to disable caches
✅ Do 🚫 Don’t
FROM alpine
RUN apk add --no-cache curl
FROM alpine
RUN apk add curl
31. 31
• In general, try to build your image
the same way for all envs (e.g., dev,
stage, and prod)
• Try to make your image env-agnostic
so it works seamlessly across envs
• Utilize multi-stage whenever possible
and use “prod” as a base for other envs
• For the advanced/complex use cases,
use Docker BuildKit which gives you
more control over builds
✅ Do
FROM alpine As base
RUN apk add curl
FROM base As prod
RUN apk add python3
FROM prod As dev
RUN apk add python3-dev
# Build dev image (build the whole file)
$ docker build -t myapp:dev .
# Build prod image (stop at the prod stage)
$ docker build --target prod -t myapp:v1 .
4.3 Create a unified image across envs
32. 32
• Both “ENTRYPOINT” and “CMD” are Dockerfile instructions
which used to control the default command within the Docker image
• Either of “ENTRYPOINT” and “CMD” could be used independently
• However, using both of them at the same time makes things easier
to customize containers behaviour, especially in Kubernetes
• As a rule of thumb, if your application customizable via arguments
use “ENTRYPOINT” for the main command and “CMD” for default arguments
4.4 Use ENTRYPOINT with CMD
✅ Do
FROM alpine
ENTRYPOINT ["echo"]
CMD ["-e", "HellonWorld"]
34. 34
• Find the last Docker image you have created and refactor it according to
the best practices in this workshop
• Integrate hadolint (Dockerfiles linter) with your local IDE and your team CI pipeline
• Find out some interesting Docker scenarios on Katakoda and get hands-on
• Advanced topics:
• Sign your Docker images with Docker Content Trust (DCT)
• Take a look on BuildKit which is a Dockerfile-agnostic builder toolkit
More details: Faster Builds and Smaller Images Using BuildKit
• Do you know that Docker is not only the container management system?
Read more about Docker Alternative Container Tools
Next steps
36. 36
References
• Intro Guide to Dockerfile Best Practices - Docker Blog
• Best practices for writing Dockerfiles - Docker Documentation
• Image-building best practices - Docker Documentation
• Best practices for building containers - Google Cloud Architecture Center
• Top 20 Dockerfile best practices for security - Sysdig
• On Docker Articles - vsupalov.com