SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
Docker Container: Isolation and Security
Eric Fu
1
chroot
In UNIX, everything is a file.
2
Overview
Isolation ‑ Linux Namespaces
Isolation ‑ Control Groups
Container Security
3
Isolation ‑ Linux Namespaces
Process‑level Isolation
4
Linux Namespaces
Category Clone Flag Kernel version
Mount namespaces CLONE_NEWNS Linux 2.4.19
UTS namespaces CLONE_NEWUTS Linux 2.6.19
IPC namespaces CLONE_NEWIPC Linux 2.6.19
PID namespaces CLONE_NEWPID Linux 2.6.24
Network namespaces CLONE_NEWNET Linux 2.6.24, completed in 2.6.29
User namespaces CLONE_NEWUSER Linux 2.6.23, completed in 3.8
5
clone()
static char container_stack[STACK_SIZE];
char* const container_args[] = {"/bin/bash", NULL};
int container_main(void* arg)
{
// Open a shell
execv(container_args[0], container_args);
// Should never be here
}
int main()
{
int container_pid = clone(container_main, container_stack+STACK_SIZE,
SIGCHLD, NULL);
waitpid(container_pid, NULL, 0);
return 0;
}
6
UTS Namespace ( CLONE_NEWUTS )
Isolates system identifiers:  nodename and  domainname .
int container_main(void* arg)
{
sethostname("container", 10);
// Open a shell
execv(container_args[0], container_args);
// Should never be here
}
7
IPC Namespace ( CLONE_NEWIPC )
Isolates IPC resources: SystemV IPC objects and POSIX message queues.
root@eric-vm:/home/eric/linux_namespace# ipcmk -Q
Message queue id: 0
root@eric-vm:/home/eric/linux_namespace# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0xd5467105 0 root 644 0 0
root@eric-vm:/home/eric/linux_namespace# ./test_ipc_ns
Parent - start a container!
Container - inside the container!
root@container:/home/eric/linux_namespace# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
8
PID Namespace ( CLONE_NEWPID )
Isolate the PID space.
Processes in different PID namespaces can have the same PID.
eric@eric-vm:~/linux_namespace$ sudo ./test_pid_ns
Parent (2536) - start a container!
Container (1) - inside the container!
Why  ps aux still show all processes?
9
Mount Namespace ( CLONE_NEWNS )
Isolate the set of filesystem mount points seen by a group of processes.
Processes in different mount namespaces can have different views of the filesystem hierarchy.
mount("proc", "/proc", "proc", 0, NULL);
Inside the container:
/ # ps aux
PID USER TIME COMMAND
1 root 0:00 /bin/sh
3 root 0:00 ps aux
10
Mount a Real Docker Image
docker save alpine | undocker -i -o rootfs alpine
// System mount points
mount("proc", "rootfs/proc", "proc", 0, NULL);
mount("sysfs", "rootfs/sys", "sysfs", 0, NULL);
mount("none", "rootfs/tmp", "tmpfs", 0, NULL);
mount("udev", "rootfs/dev", "devtmpfs", 0, NULL);
// Config files
mount("conf/hosts", "rootfs/etc/hosts", "none", MS_BIND, NULL);
mount("conf/hostname", "rootfs/etc/hostname", "none", MS_BIND, NULL);
mount("conf/resolv.conf", "rootfs/etc/resolv.conf", "none", MS_BIND, NULL);
// Chroot
chdir("./rootfs");
chroot("./");
11
User namespace ( CLONE_NEWUSER )
Isolates the user and group ID spaces.
A process's UID and GID can be different inside and outside a user namespace.
void set_map(char* file, int inside_id, int outside_id, int len) {
FILE *fd = fopen(file, "w");
fprintf(fd, "%d %d %d", inside_id, outside_id, len);
fclose(fd);
}
void set_uid_map(pid_t pid, int inside_id, int outside_id, int len) {
char file[256];
sprintf(file, "/proc/%d/uid_map", pid);
set_map(file, inside_id, outside_id, len);
}
void set_gid_map(pid_t pid, int inside_id, int outside_id, int len) {
char file[256];
sprintf(file, "/proc/%d/gid_map", pid);
set_map(file, inside_id, outside_id, len);
}
12
Network namespace ( CLONE_NEWNET )
Preparation
brctl addbr br0
ifconfig br0 192.168.10.1/24 up
Host
ip link add veth0 type veth peer name veth1
ip link set veth1 netns $PID
brctl addif br0 veth0
ip link set veth0 up
Container
ip link set dev veth1 name eth0
ip link set eth0 up
ip link set lo up
ip addr add 192.168.10.2/24 dev eth0
ip route add default via 192.168.10.1
13
Network Topology
14
Isolation ‑ Control Groups
Resource Limiting
15
Linux Control Groups
blkio (Disk I/O)
cpu (CPU quota)
cpuset (CPU cores)
devices
memory
net_cls (Network package class id)
net_prio (Network package priority)
hugetlb (HugeTLB)
cpuacct
freezer
16
Glance
root@eric-vm:/sys/fs/cgroup# ls
blkio cpuacct cpuset freezer memory net_cls,net_prio perf_event systemd
cpu cpu,cpuacct devices hugetlb net_cls net_prio pids
root@eric-vm:/sys/fs/cgroup/cpu$ sudo mkdir test
root@eric-vm:/sys/fs/cgroup/cpu/test$ ls
cgroup.clone_children cpuacct.stat cpuacct.usage_percpu cpu.cfs_quota_us cpu.stat
cgroup.procs cpuacct.usage cpu.cfs_period_us cpu.shares notify_o
17
We have a CPU killer
int main()
{
int i = 0;
for (;;) i++;
return 0;
}
 top 
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3985 eric 20 0 4224 648 576 R 99.9 0.1 0:15.53 deadloop
18
Usage
Create a group. (Yes, just  mkdir )
sudo mkdir /sys/fs/cgroup/cpu/test
Set a limit. 20000 means 20% CPU time.
echo 20000 > /sys/fs/cgroup/cpu,cpuacct/test
Add a process to our group.
echo 3985 >> /sys/fs/cgroup/cpu,cpuacct/test/tasks
19
Container Security
20
"Container"
Linux kernel namespaces provide the isolation (hence “container”) in which we place one or
more processes
Linux kernel cgroups (“Control groups”) provide resource limiting and accounting (CPU,
memory, I/O bandwidth, etc.)
21
Container Properties
A shared kernel across all containers on a single host.
Unique filesystem, a layered model using CoW (copy‑on‑write) union filesystems.
Linux namespaces are shareable (Kubernetes “pod”)
One process per container
22
Linux Capabilities
Add/Drop unnecessary capabilities from a container.
$ docker run --rm -ti busybox sh
/ # hostname foo
hostname: sethostname: Operation not permitted
$ docker run --rm -ti --cap-add=SYS_ADMIN busybox sh
/ # hostname foo
<hostname changed>
$ docker run --rm -ti --cap-drop=NET_RAW busybox sh
/ # ping 8.8.8.8
ping: permission denied (are you root?)
23
Linux Capabilities
24
Seccomp
Block specific syscalls from being used by container binaries.
$ cat policy.json
{
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"name": "chmod",
"action": "SCMP_ACT_ERRNO"
}
]
}
$ docker run --rm -it --security-opt seccomp:policy.json busybox chmod 640
/etc/resolv.conf
chmod: /etc/resolv.conf: Operation not permitted
25
AppArmor/SELinux
Limit access to specific filesystem paths in container
https://raw.githubusercontent.com/jessfraz/bane/master/docker‑nginx‑sample
$ docker run --rm -ti --security-opt="apparmor:docker-nginx-sample" 
-p 80:80 nginx bash
root@6da5a2a930b9:/# top
bash: /usr/bin/top: Permission denied
root@6da5a2a930b9:/# touch ~/thing
touch: cannot touch 'thing': Permission denied
26
Attack a Container!
“attack surface”
Host <‑> Container
Container <‑> Container
External ‑> Container
Application Security
27
Host <‑> Container
Protecting the host from containers
THREAT MITIGATION
DoS Host (use up CPU,
memory, disk), Forkbomb
Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel
4.3)
Access host/private
information
Namespace configuration; AppArmor/SELinux profiles, seccomp
(1.10)
Kernel modification/insert
module
Capabilities (already dropped); seccomp, LSMs; don’t run  --
privileged mode
Docker administrative
access (API socket
access)
Don’t share the Docker UNIX socket without Authz plugin
limitations; use TLS certificates for TCP endpoint configurations
28
Container <‑> Container
Malicious or Multi‑tenant
THREAT MITIGATION
DoS other containers (noisy
neighbor using significant % of
CPU, memory, disk)
Cgroup controls, disk quotas (1.12), kernel pids limit (1.11
+ Kernel 4.3)
Access other container’s
information (pids, files, etc.)
Namespace configuration; AppArmor/SELinux profile for
containers
Docker API access (full control
over other containers)
Don’t share the Docker UNIX socket without Authz
plugin limitations (1.10); use TLS certificates for TCP
endpoint configurations
29
External ‑> Container
The big, bad Internet
THREAT MITIGATION
DDoS attacks
Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel
4.3), Proactive monitoring infrastructure/operational readiness
Malicious (remote)
access
Appropriate application security model No weak/default passwords! ‑
‑readonly filesystem (limit blast radius)
Unpatched exploits
(underlying OS layers)
Vulnerability scanning (IBM Bluemix, Docker Data Center, CoreOS
Clair, Red Hat “SmartState” CloudForms (w/Black Duck)
30
Application Security
Significant container benefit: provided protections are in place (seccomp, LSMs, dropped caps,
user namespaces) the exploited application has greatly reduced ability to inflict harm beyond
container “walls”
Proper handling of secrets through dev/build/deploy process (no passwords in Dockerfile,
as an example)
Unnecessary services not exposed externally (shared namespaces; internal/management
networks)
Secure coding/design principles
31
Thank You!
32

Más contenido relacionado

La actualidad más candente

Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
Dobrica Pavlinušić
 
Docker Internals - Twilio talk November 14th, 2013
Docker Internals - Twilio talk November 14th, 2013Docker Internals - Twilio talk November 14th, 2013
Docker Internals - Twilio talk November 14th, 2013
Guillaume Charmes
 
Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Docker - container and lightweight virtualization
Docker - container and lightweight virtualization
Sim Janghoon
 

La actualidad más candente (20)

Docker Meetup: Docker Networking 1.11, by Madhu Venugopal
Docker Meetup: Docker Networking 1.11, by Madhu VenugopalDocker Meetup: Docker Networking 1.11, by Madhu Venugopal
Docker Meetup: Docker Networking 1.11, by Madhu Venugopal
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
 
Lxc- Introduction
Lxc- IntroductionLxc- Introduction
Lxc- Introduction
 
Linux Containers From Scratch
Linux Containers From ScratchLinux Containers From Scratch
Linux Containers From Scratch
 
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copyLinux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
 
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
Cgroups, namespaces, and beyond: what are containers made from? (DockerCon Eu...
 
Docker orchestration using core os and ansible - Ansible IL 2015
Docker orchestration using core os and ansible - Ansible IL 2015Docker orchestration using core os and ansible - Ansible IL 2015
Docker orchestration using core os and ansible - Ansible IL 2015
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdCoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love Systemd
 
Introduction to linux containers
Introduction to linux containersIntroduction to linux containers
Introduction to linux containers
 
Docker Networking - Current Status and goals of Experimental Networking
Docker Networking - Current Status and goals of Experimental NetworkingDocker Networking - Current Status and goals of Experimental Networking
Docker Networking - Current Status and goals of Experimental Networking
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
 
Docker Internals - Twilio talk November 14th, 2013
Docker Internals - Twilio talk November 14th, 2013Docker Internals - Twilio talk November 14th, 2013
Docker Internals - Twilio talk November 14th, 2013
 
Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Docker - container and lightweight virtualization
Docker - container and lightweight virtualization
 
Linux Containers From Scratch: Makfile MicroVPS
Linux Containers From Scratch: Makfile MicroVPSLinux Containers From Scratch: Makfile MicroVPS
Linux Containers From Scratch: Makfile MicroVPS
 
Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302
 
CoreOS @Codetalks Hamburg
CoreOS @Codetalks HamburgCoreOS @Codetalks Hamburg
CoreOS @Codetalks Hamburg
 
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
Docker 1.11 Meetup: Containerd and runc, by Arnaud Porterie and Michael Crosby
 
LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?LXC, Docker, security: is it safe to run applications in Linux Containers?
LXC, Docker, security: is it safe to run applications in Linux Containers?
 
Introduction to Docker & CoreOS - Symfony User Group Cologne
Introduction to Docker & CoreOS - Symfony User Group CologneIntroduction to Docker & CoreOS - Symfony User Group Cologne
Introduction to Docker & CoreOS - Symfony User Group Cologne
 
Pipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and Docker
 

Similar a Docker Container: isolation and security

Linux Container Technology 101
Linux Container Technology 101Linux Container Technology 101
Linux Container Technology 101
inside-BigData.com
 
Secure development on Kubernetes by Andreas Falk
Secure development on Kubernetes by Andreas FalkSecure development on Kubernetes by Andreas Falk
Secure development on Kubernetes by Andreas Falk
SBA Research
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questions
Teja Bheemanapally
 
Evolution of Linux Containerization
Evolution of Linux Containerization Evolution of Linux Containerization
Evolution of Linux Containerization
WSO2
 
lxc-namespace.pdf
lxc-namespace.pdflxc-namespace.pdf
lxc-namespace.pdf
-
 

Similar a Docker Container: isolation and security (20)

Docker London: Container Security
Docker London: Container SecurityDocker London: Container Security
Docker London: Container Security
 
Docker Security Paradigm
Docker Security ParadigmDocker Security Paradigm
Docker Security Paradigm
 
How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016How Secure Is Your Container? ContainerCon Berlin 2016
How Secure Is Your Container? ContainerCon Berlin 2016
 
MINCS - containers in the shell script (Eng. ver.)
MINCS - containers in the shell script (Eng. ver.)MINCS - containers in the shell script (Eng. ver.)
MINCS - containers in the shell script (Eng. ver.)
 
Docker 基本概念與指令操作
Docker  基本概念與指令操作Docker  基本概念與指令操作
Docker 基本概念與指令操作
 
Linux Container Technology 101
Linux Container Technology 101Linux Container Technology 101
Linux Container Technology 101
 
Build, Ship, and Run Any App, Anywhere using Docker
Build, Ship, and Run Any App, Anywhere using Docker Build, Ship, and Run Any App, Anywhere using Docker
Build, Ship, and Run Any App, Anywhere using Docker
 
Securing Applications and Pipelines on a Container Platform
Securing Applications and Pipelines on a Container PlatformSecuring Applications and Pipelines on a Container Platform
Securing Applications and Pipelines on a Container Platform
 
Secure development on Kubernetes by Andreas Falk
Secure development on Kubernetes by Andreas FalkSecure development on Kubernetes by Andreas Falk
Secure development on Kubernetes by Andreas Falk
 
Docker: Aspects of Container Isolation
Docker: Aspects of Container IsolationDocker: Aspects of Container Isolation
Docker: Aspects of Container Isolation
 
Linux or unix interview questions
Linux or unix interview questionsLinux or unix interview questions
Linux or unix interview questions
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
Evolution of Linux Containerization
Evolution of Linux Containerization Evolution of Linux Containerization
Evolution of Linux Containerization
 
Evoluation of Linux Container Virtualization
Evoluation of Linux Container VirtualizationEvoluation of Linux Container Virtualization
Evoluation of Linux Container Virtualization
 
Container security
Container securityContainer security
Container security
 
What You Should Know About Container Security
What You Should Know About Container SecurityWhat You Should Know About Container Security
What You Should Know About Container Security
 
Linux seccomp(2) vs OpenBSD pledge(2)
Linux seccomp(2) vs OpenBSD pledge(2)Linux seccomp(2) vs OpenBSD pledge(2)
Linux seccomp(2) vs OpenBSD pledge(2)
 
Lecture 4 Cluster Computing
Lecture 4 Cluster ComputingLecture 4 Cluster Computing
Lecture 4 Cluster Computing
 
lxc-namespace.pdf
lxc-namespace.pdflxc-namespace.pdf
lxc-namespace.pdf
 
Ch04 system administration
Ch04 system administration Ch04 system administration
Ch04 system administration
 

Más de 宇 傅

Más de 宇 傅 (12)

Parallel Query Execution
Parallel Query ExecutionParallel Query Execution
Parallel Query Execution
 
The Evolution of Data Systems
The Evolution of Data SystemsThe Evolution of Data Systems
The Evolution of Data Systems
 
The Volcano/Cascades Optimizer
The Volcano/Cascades OptimizerThe Volcano/Cascades Optimizer
The Volcano/Cascades Optimizer
 
PelotonDB - A self-driving database for hybrid workloads
PelotonDB - A self-driving database for hybrid workloadsPelotonDB - A self-driving database for hybrid workloads
PelotonDB - A self-driving database for hybrid workloads
 
Immutable Data Structures
Immutable Data StructuresImmutable Data Structures
Immutable Data Structures
 
The Case for Learned Index Structures
The Case for Learned Index StructuresThe Case for Learned Index Structures
The Case for Learned Index Structures
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming
 
Functional Programming in Java 8
Functional Programming in Java 8Functional Programming in Java 8
Functional Programming in Java 8
 
第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩第三届阿里中间件性能挑战赛冠军队伍答辩
第三届阿里中间件性能挑战赛冠军队伍答辩
 
Data Streaming Algorithms
Data Streaming AlgorithmsData Streaming Algorithms
Data Streaming Algorithms
 
Golang 101
Golang 101Golang 101
Golang 101
 
Paxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus AlgorithmPaxos and Raft Distributed Consensus Algorithm
Paxos and Raft Distributed Consensus Algorithm
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Docker Container: isolation and security

  • 1. Docker Container: Isolation and Security Eric Fu 1
  • 3. Overview Isolation ‑ Linux Namespaces Isolation ‑ Control Groups Container Security 3
  • 4. Isolation ‑ Linux Namespaces Process‑level Isolation 4
  • 5. Linux Namespaces Category Clone Flag Kernel version Mount namespaces CLONE_NEWNS Linux 2.4.19 UTS namespaces CLONE_NEWUTS Linux 2.6.19 IPC namespaces CLONE_NEWIPC Linux 2.6.19 PID namespaces CLONE_NEWPID Linux 2.6.24 Network namespaces CLONE_NEWNET Linux 2.6.24, completed in 2.6.29 User namespaces CLONE_NEWUSER Linux 2.6.23, completed in 3.8 5
  • 6. clone() static char container_stack[STACK_SIZE]; char* const container_args[] = {"/bin/bash", NULL}; int container_main(void* arg) { // Open a shell execv(container_args[0], container_args); // Should never be here } int main() { int container_pid = clone(container_main, container_stack+STACK_SIZE, SIGCHLD, NULL); waitpid(container_pid, NULL, 0); return 0; } 6
  • 7. UTS Namespace ( CLONE_NEWUTS ) Isolates system identifiers:  nodename and  domainname . int container_main(void* arg) { sethostname("container", 10); // Open a shell execv(container_args[0], container_args); // Should never be here } 7
  • 8. IPC Namespace ( CLONE_NEWIPC ) Isolates IPC resources: SystemV IPC objects and POSIX message queues. root@eric-vm:/home/eric/linux_namespace# ipcmk -Q Message queue id: 0 root@eric-vm:/home/eric/linux_namespace# ipcs -q ------ Message Queues -------- key msqid owner perms used-bytes messages 0xd5467105 0 root 644 0 0 root@eric-vm:/home/eric/linux_namespace# ./test_ipc_ns Parent - start a container! Container - inside the container! root@container:/home/eric/linux_namespace# ipcs -q ------ Message Queues -------- key msqid owner perms used-bytes messages 8
  • 9. PID Namespace ( CLONE_NEWPID ) Isolate the PID space. Processes in different PID namespaces can have the same PID. eric@eric-vm:~/linux_namespace$ sudo ./test_pid_ns Parent (2536) - start a container! Container (1) - inside the container! Why  ps aux still show all processes? 9
  • 10. Mount Namespace ( CLONE_NEWNS ) Isolate the set of filesystem mount points seen by a group of processes. Processes in different mount namespaces can have different views of the filesystem hierarchy. mount("proc", "/proc", "proc", 0, NULL); Inside the container: / # ps aux PID USER TIME COMMAND 1 root 0:00 /bin/sh 3 root 0:00 ps aux 10
  • 11. Mount a Real Docker Image docker save alpine | undocker -i -o rootfs alpine // System mount points mount("proc", "rootfs/proc", "proc", 0, NULL); mount("sysfs", "rootfs/sys", "sysfs", 0, NULL); mount("none", "rootfs/tmp", "tmpfs", 0, NULL); mount("udev", "rootfs/dev", "devtmpfs", 0, NULL); // Config files mount("conf/hosts", "rootfs/etc/hosts", "none", MS_BIND, NULL); mount("conf/hostname", "rootfs/etc/hostname", "none", MS_BIND, NULL); mount("conf/resolv.conf", "rootfs/etc/resolv.conf", "none", MS_BIND, NULL); // Chroot chdir("./rootfs"); chroot("./"); 11
  • 12. User namespace ( CLONE_NEWUSER ) Isolates the user and group ID spaces. A process's UID and GID can be different inside and outside a user namespace. void set_map(char* file, int inside_id, int outside_id, int len) { FILE *fd = fopen(file, "w"); fprintf(fd, "%d %d %d", inside_id, outside_id, len); fclose(fd); } void set_uid_map(pid_t pid, int inside_id, int outside_id, int len) { char file[256]; sprintf(file, "/proc/%d/uid_map", pid); set_map(file, inside_id, outside_id, len); } void set_gid_map(pid_t pid, int inside_id, int outside_id, int len) { char file[256]; sprintf(file, "/proc/%d/gid_map", pid); set_map(file, inside_id, outside_id, len); } 12
  • 13. Network namespace ( CLONE_NEWNET ) Preparation brctl addbr br0 ifconfig br0 192.168.10.1/24 up Host ip link add veth0 type veth peer name veth1 ip link set veth1 netns $PID brctl addif br0 veth0 ip link set veth0 up Container ip link set dev veth1 name eth0 ip link set eth0 up ip link set lo up ip addr add 192.168.10.2/24 dev eth0 ip route add default via 192.168.10.1 13
  • 15. Isolation ‑ Control Groups Resource Limiting 15
  • 16. Linux Control Groups blkio (Disk I/O) cpu (CPU quota) cpuset (CPU cores) devices memory net_cls (Network package class id) net_prio (Network package priority) hugetlb (HugeTLB) cpuacct freezer 16
  • 17. Glance root@eric-vm:/sys/fs/cgroup# ls blkio cpuacct cpuset freezer memory net_cls,net_prio perf_event systemd cpu cpu,cpuacct devices hugetlb net_cls net_prio pids root@eric-vm:/sys/fs/cgroup/cpu$ sudo mkdir test root@eric-vm:/sys/fs/cgroup/cpu/test$ ls cgroup.clone_children cpuacct.stat cpuacct.usage_percpu cpu.cfs_quota_us cpu.stat cgroup.procs cpuacct.usage cpu.cfs_period_us cpu.shares notify_o 17
  • 18. We have a CPU killer int main() { int i = 0; for (;;) i++; return 0; }  top  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3985 eric 20 0 4224 648 576 R 99.9 0.1 0:15.53 deadloop 18
  • 19. Usage Create a group. (Yes, just  mkdir ) sudo mkdir /sys/fs/cgroup/cpu/test Set a limit. 20000 means 20% CPU time. echo 20000 > /sys/fs/cgroup/cpu,cpuacct/test Add a process to our group. echo 3985 >> /sys/fs/cgroup/cpu,cpuacct/test/tasks 19
  • 21. "Container" Linux kernel namespaces provide the isolation (hence “container”) in which we place one or more processes Linux kernel cgroups (“Control groups”) provide resource limiting and accounting (CPU, memory, I/O bandwidth, etc.) 21
  • 22. Container Properties A shared kernel across all containers on a single host. Unique filesystem, a layered model using CoW (copy‑on‑write) union filesystems. Linux namespaces are shareable (Kubernetes “pod”) One process per container 22
  • 23. Linux Capabilities Add/Drop unnecessary capabilities from a container. $ docker run --rm -ti busybox sh / # hostname foo hostname: sethostname: Operation not permitted $ docker run --rm -ti --cap-add=SYS_ADMIN busybox sh / # hostname foo <hostname changed> $ docker run --rm -ti --cap-drop=NET_RAW busybox sh / # ping 8.8.8.8 ping: permission denied (are you root?) 23
  • 25. Seccomp Block specific syscalls from being used by container binaries. $ cat policy.json { "defaultAction": "SCMP_ACT_ALLOW", "syscalls": [ { "name": "chmod", "action": "SCMP_ACT_ERRNO" } ] } $ docker run --rm -it --security-opt seccomp:policy.json busybox chmod 640 /etc/resolv.conf chmod: /etc/resolv.conf: Operation not permitted 25
  • 26. AppArmor/SELinux Limit access to specific filesystem paths in container https://raw.githubusercontent.com/jessfraz/bane/master/docker‑nginx‑sample $ docker run --rm -ti --security-opt="apparmor:docker-nginx-sample" -p 80:80 nginx bash root@6da5a2a930b9:/# top bash: /usr/bin/top: Permission denied root@6da5a2a930b9:/# touch ~/thing touch: cannot touch 'thing': Permission denied 26
  • 27. Attack a Container! “attack surface” Host <‑> Container Container <‑> Container External ‑> Container Application Security 27
  • 28. Host <‑> Container Protecting the host from containers THREAT MITIGATION DoS Host (use up CPU, memory, disk), Forkbomb Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel 4.3) Access host/private information Namespace configuration; AppArmor/SELinux profiles, seccomp (1.10) Kernel modification/insert module Capabilities (already dropped); seccomp, LSMs; don’t run  -- privileged mode Docker administrative access (API socket access) Don’t share the Docker UNIX socket without Authz plugin limitations; use TLS certificates for TCP endpoint configurations 28
  • 29. Container <‑> Container Malicious or Multi‑tenant THREAT MITIGATION DoS other containers (noisy neighbor using significant % of CPU, memory, disk) Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel 4.3) Access other container’s information (pids, files, etc.) Namespace configuration; AppArmor/SELinux profile for containers Docker API access (full control over other containers) Don’t share the Docker UNIX socket without Authz plugin limitations (1.10); use TLS certificates for TCP endpoint configurations 29
  • 30. External ‑> Container The big, bad Internet THREAT MITIGATION DDoS attacks Cgroup controls, disk quotas (1.12), kernel pids limit (1.11 + Kernel 4.3), Proactive monitoring infrastructure/operational readiness Malicious (remote) access Appropriate application security model No weak/default passwords! ‑ ‑readonly filesystem (limit blast radius) Unpatched exploits (underlying OS layers) Vulnerability scanning (IBM Bluemix, Docker Data Center, CoreOS Clair, Red Hat “SmartState” CloudForms (w/Black Duck) 30
  • 31. Application Security Significant container benefit: provided protections are in place (seccomp, LSMs, dropped caps, user namespaces) the exploited application has greatly reduced ability to inflict harm beyond container “walls” Proper handling of secrets through dev/build/deploy process (no passwords in Dockerfile, as an example) Unnecessary services not exposed externally (shared namespaces; internal/management networks) Secure coding/design principles 31