Internal presentation of Docker, Lightweight Virtualization, and linux Containers; at Spotify NYC offices, featuring engineers from Yandex, LinkedIn, Criteo, and NASA!
2. Outline
●
Why Linux Containers?
●
What are Linux Containers exactly?
●
What do we need on top of LXC?
●
Why Docker?
●
What is Docker exactly?
●
Where is it going?
3. Outline
●
Why Linux Containers?
●
What are Linux Containers exactly?
●
What do we need on top of LXC?
●
Why Docker?
●
What is Docker exactly?
●
Where is it going?
13. Many targets
●
your local development environment
●
your coworkers' developement environment
●
your Q&A team's test environment
●
some random demo/test server
●
the staging server(s)
●
the production server(s)
●
bare metal
●
virtual machines
●
shared hosting
20. Solution to the transport problem:
the intermodal shipping container
21. Solution to the transport problem:
the intermodal shipping container
●
●
●
●
●
90% of all cargo now shipped in a standard
container
faster and cheaper to load and unload on ships
(by an order of magnitude)
less theft, less damage
freight cost used to be >25% of final goods
cost, now <3%
5000 ships deliver 200M containers per year
23. Linux containers...
Units of software delivery (ship it!)
●
run everywhere
–
–
regardless of host distro
–
●
regardless of kernel version
(but container and host architecture must match*)
run anything
–
if it can run on the host, it can run in the container
–
i.e., if it can run on a Linux kernel, it can run
*Unless you emulate CPU with qemu and binfmt
24. Outline
●
Why Linux Containers?
●
What are Linux Containers exactly?
●
What do we need on top of LXC?
●
Why Docker?
●
What is Docker exactly?
●
Where is it going?
26. High level approach:
it's a lightweight VM
●
own process space
●
own network interface
●
can run stuff as root
●
can have its own /sbin/init
(different from the host)
« Machine Container »
27. Low level approach:
it's chroot on steroids
●
can also not have its own /sbin/init
●
container = isolated process(es)
●
share kernel with host
●
no device emulation (neither HVM nor PV)
« Application Container »
28. Separation of concerns:
Dave the Developer
●
inside my container:
–
my code
–
my libraries
–
my package manager
–
my app
–
my data
29. Separation of concerns:
Oscar the Ops guy
●
outside the container:
–
logging
–
remote access
–
network configuration
–
monitoring
30. How does it work?
Isolation with namespaces
●
pid
●
mnt
●
net
●
uts
●
ipc
●
user
31. pid namespace
jpetazzo@tarrasque:~$ ps aux | wc -l
212
jpetazzo@tarrasque:~$ sudo docker run -t
bash
root@ea319b8ac416:/# ps aux
USER
PID %CPU %MEM
VSZ
RSS TTY
STAT
COMMAND
root
1 0.0 0.0 18044 1956 ?
S
bash
root
16 0.0 0.0 15276 1136 ?
R+
ps aux
(That's 2 processes)
-i ubuntu
START
TIME
02:54
0:00
02:55
0:00
36. user namespace
●
●
no « demo » for this one... Yet!
UID 0→1999 in container C1 is mapped to
UID 10000→11999 in host;
UID 0→1999 in container C2 is mapped to
UID 12000→13999 in host; etc.
●
required lots of VFS and FS patches (esp. XFS)
●
what will happen with copy-on-write?
–
double translation at VFS?
–
single root UID on read-only FS?
37. How does it work?
Isolation with cgroups
●
memory
●
cpu
●
blkio
●
devices
38. memory cgroup
●
keeps track pages used by each group:
–
file (read/write/mmap from block devices; swap)
–
anonymous (stack, heap, anonymous mmap)
–
active (recently accessed)
–
inactive (candidate for eviction)
●
each page is « charged » to a group
●
pages can be shared (e.g. if you use any COW FS)
●
Individual (per-cgroup) limits and out-of-memory killer
39. cpu and cpuset cgroups
●
keep track of user/system CPU time
●
set relative weight per group
●
pin groups to specific CPU(s)
–
Can be used to « reserve » CPUs for some apps
–
This is also relevant for big NUMA systems
40. blkio cgroups
●
keep track IOs for each block device
–
read vs write; sync vs async
●
set relative weights
●
set throttle (limits) for each block device
–
read vs write; bytes/sec vs operations/sec
Note: earlier versions (pre-3.8) didn't account async
correctly.
3.8 is better, but use 3.10 for best results.
44. Efficiency: almost no overhead
●
●
●
●
processes are isolated, but run straight on the
host
CPU performance
= native performance
memory performance
= a few % shaved off for (optional) accounting
network performance
= small overhead; can be optimized to zero
overhead
45. Outline
●
Why Linux Containers?
●
What are Linux Containers exactly?
●
What do we need on top of LXC?
●
Why Docker?
●
What is Docker exactly?
●
Where is it going?
49. Outline
●
Why Linux Containers?
●
What are Linux Containers exactly?
●
What do we need on top of LXC?
●
Why Docker?
●
What is Docker exactly?
●
Where is it going?
50. What can Docker do?
●
Open Source engine to commoditize LXC
●
using copy-on-write for quick provisioning
●
allowing to create and share images
●
●
standard format for containers
(stack of layers; 1 layer = tarball+metadata)
standard, reproducible way to easily build
trusted images (Dockerfile, Stackbrew...)
51. Authoring images
with run/commit
1) docker run ubuntu bash
2) apt-get install this and that
3) docker commit <containerid> <imagename>
4) docker run <imagename> bash
5) git clone git://.../mycode
6) pip install -r requirements.txt
7) docker commit <containerid> <imagename>
8) repeat steps 4-7 as necessary
9) docker tag <imagename> <user/image>
10) docker push <user/image>
52. Authoring images
with a Dockerfile
FROM ubuntu
RUN
RUN
RUN
RUN
RUN
apt-get
apt-get
apt-get
apt-get
apt-get
-y update
install -y
install -y
install -y
install -y
g++
erlang-dev erlang-manpages erlang-base-hipe ...
libmozjs185-dev libicu-dev libtool ...
make wget
RUN wget http://.../apache-couchdb-1.3.1.tar.gz | tar -C /tmp -zxfRUN cd /tmp/apache-couchdb-* && ./configure && make install
RUN printf "[httpd]nport = 8101nbind_address = 0.0.0.0" >
/usr/local/etc/couchdb/local.d/docker.ini
EXPOSE 8101
CMD ["/usr/local/bin/couchdb"]
docker build -t jpetazzo/couchdb .
53. Running containers
●
SSH to Docker host and manual pull+run
●
REST API (feel free to add SSL certs, OAUth...)
●
OpenStack Nova
●
OpenStack Heat
●
who's next? OpenShift, CloudFoundry?
●
multiple Open Source PAAS built on Docker
(more on this later)
54. Yes, but...
●
●
●
« I don't need Docker;
I can do all that stuff with LXC tools, rsync,
some scripts! »
correct on all accounts;
but it's also true for apt, dpkg, rpm, yum, etc.
the whole point is to commoditize,
i.e. make it ridiculously easy to use
57. What this really means…
●
instead of writing « very small shell scripts » to
manage containers, write them to do the rest:
–
continuous deployment/integration/testing
–
orchestration
●
= use Docker as a building block
●
re-use other people images (yay ecosystem!)
58. Docker: sharing images
●
●
●
you can push/pull images to/from a registry
(public or private)
you can search images through a public index
dotCloud Docker Inc. the community
maintains a collection of base images
(Ubuntu, Fedora...)
●
coming soon: Stackbrew
●
satisfaction guaranteed or your money back
59. Docker: not sharing images
●
private registry
–
–
or security credentials
–
●
for proprietary code
or fast local access
the private registry is available
as an image on the public registry
(yes, that makes sense)
60. Example of powerful workflow
●
●
●
●
code in local environment
(« dockerized » or not)
each push to the git repo triggers a hook
the hook tells a build server to clone the code and run
« docker build » (using the Dockerfile)
the containers are tested (nosetests, Jenkins...),
and if the tests pass, pushed to the registry
●
production servers pull the containers and run them
●
for network services, load balancers are updated
61. Orchestration (0.6.5)
●
●
●
you can name your containers
they get a generated name by default
(red_ant, gold_monkey...)
you can link your containers
docker run -d -name frontdb
docker run -d -link frontdb:sql frontweb
→ container frontweb gets one bazillion environment vars
64. Dynamic Disco
●
beam
–
introspection API
–
based on Redis protocol
(i.e. all Redis clients work)
–
works well for synchronous req/rep and streams
–
reimplementation of Redis core in Go
–
think of it as « live environment variables »,
that you can watch/subscribe to
65. Outline
●
Why Linux Containers?
●
What are Linux Containers exactly?
●
What do we need on top of LXC?
●
Why Docker?
●
What is Docker exactly?
●
Where is it going?
66. What's Docker exactly?
●
rewrite of dotCloud internal container engine
–
–
●
original version: Python, tied to dotCloud's internal stuff
released version: Go, legacy-free
the Docker daemon runs in the background
–
manages containers, images, and builds
–
HTTP API (over UNIX or TCP socket)
–
embedded CLI talking to the API
●
Open Source (GitHub public repository + issue tracking)
●
user and dev mailing lists
67. Docker: the community
●
Docker: >200 contributors
●
<7% of them work for dotCloud Docker inc.
●
latest milestone (0.6): 40 contributors
●
~50% of all commits by external contributors
●
GitHub repository: >800 forks
68. Docker Inc.: the company
●
dotCloud Inc.
–
–
2010: YCombinator
–
2011: 10M$ funding by Trinity+Benchmark
–
●
the first polyglot PAAS ever
2013: start Docker project
Docker Inc.
–
March 2013: public repository on GitHub
–
October 2013: name change
69. Docker: the ecosystem
●
Cocaine (PAAS; has Docker plugin)
●
CoreOS (full distro based on Docker)
●
Deis (PAAS; available)
●
Dokku (mini-Heroku in 100 lines of bash)
●
Flynn (PAAS; in development)
●
Maestro (orchestration from a simple YAML file)
●
OpenStack integration (in Havana, Nova has a Docker driver)
●
Pipework (high-performance, Software Defined Networks)
●
Shipper (fabric-like orchestration)
And many more; including SAAS offerings (Orchard, Quay...)
70. Outline
●
Why Linux Containers?
●
What are Linux Containers exactly?
●
What do we need on top of LXC?
●
Why Docker?
●
What is Docker exactly?
●
Where is it going?
73. device-mapper thin snapshots
(aka « thinp »)
●
start with a 10 GB empty ext4 filesystem
–
●
snapshot: that's the root of everything
base image:
–
–
untar image on the clone
–
●
clone the original snapshot
re-snapshot; that's your image
create container from image:
–
clone the image snapshot
–
run; repeat cycle as many times as needed
74. AUFS vs THINP
AUFS
●
●
●
●
easy to see changes
small change =
copy whole file
~42 layers
patched kernel
(Debian, Ubuntu OK)
THINP
●
●
●
●
must diff manually
small change =
copy 1 block (100k-1M)
unlimited layers
stock kernel (>3.2)
(RHEL 2.6.32 OK)
●
efficient caching
●
duplicated pages
●
no quotas
●
FS size acts as quota
75. Misconceptions about THINP
●
●
●
●
« performance degradation »
no; that was with « old » LVM snapshots
« can't handle 1000s of volumes »
that's LVM; Docker uses devmapper directly
« if snapshot volume is out of space,
it breaks and you lose everything »
that's « old » LVM snapshots; thinp halts I/O
« if still use disk space after 'rm -rf' »
no, thanks to 'discard passdown'