Running Containers at Scale at Netflix. An update on the usage of containers at Netflix. Technical discussions on new features and concepts we've added across container scheduling and execution.
3. Netflix’s Container Management Platform
Titus
Scheduling
● Service & batch jobs
● Resource management
Container Execution
● Docker/AWS Integration
● Netflix Infra Support
Service
Job and Fleet Management
Resource Management & Optimization
Container Execution
Integration
Batch
4. ● 1000+ Applications
● Netflix API, NodeJS Backend UI Scripts
● Machine Learning (GPUs) for personalization
● Encoding and Content use cases
● Netflix Studio use cases
● CDN tracking and planning
● Massively parallel CI system
● Data Pipeline routing and SPaaS
● Big Data platform use cases
Growing set of container use cases
Batch
Q4 15
Basic
Service
1Q 16
Production
Service
4Q 16
Customer
Facing
Service
2Q 17
shadow
5. High Level Titus Architecture
Cassandra
Titus Control Plane
● API
● Scheduling
● Job Lifecycle Control
EC2 Autoscaling
Fenzo
container
container
container
docker
Titus Agents
Mesos agent
Docker
Docker Registry
containercontainerUser Containers
AWS Virtual Machines
Mesos
Titus System ServicesBatch/Workflow
Systems
Service
CI/CD
6. Q1 2018 Container Usage
Common
Jobs Launched 176K jobs / day
Different applications 1K+ different images
Regional isolated Titus stacks 7
Services
Single App Cluster Size 5K (real), 12K containers (benchmark)
Agents managed 16K VMs
Batch
Containers launched 430K / day
Agents autoscaled 350K VMs / month
7. Leveraging existing Netflix and AWS Infrastructure
Single consistent cloud environment between VMs and containers
VMVM
EC2
AWSAutoScaler
VMs
Service App
Cloud Platform
(metrics, IPC, health)
VPC
VMVM
Atlas
TitusJobControl
Containers
Service App
Cloud Platform
(metrics, IPC, health)
Eureka Edda
VMVMContainers
Batch App
Cloud Platform
(metrics, IPC, health)
8. Most Native AWS Container Platform
IP per container
● VPC IP, ENI and security group
● Optimized to share ENIs
● ENI pre-attaching, opportunistic batching of IPs (bursty deploys)
IAM Roles and Metadata Endpoint per container
● Container view of 169.254.169.254
Cryptographic identity per container
● Using Amazon instance identity document
Service job container autoscaling
● Using Native AWS Cloudwatch and Autoscaling policies and engine
Application Load Balancing (ALB)
10. Scheduling / Placement
Considering the realities of …
● Docker, Linux, Image Pulling, etc.
● Complex resources (ENIs)
● Amazon rate limiting
● Filtering (constraints) and ranking (fitness)
● Different profiles for service | batch, critical | operational, etc.
Reliability
Provisioning
Time
Cost
Trade
offs
11. Capacity Management
User configures “capacity groups” based on workload type
Critical (RIs)
● Preallocated instances in order to achieve low provisioning time
● Buffer to support temporary extra capacity needs for deployments
Flex (On-Demand)
● Autoscaled instances based on demand
Opportunistic (Spot) - Coming
● Utilize extra instances with the ability to preempt or evict the workload
12. Centralized Agent Management
Agent Management
Other subsystems
Health checks
Cluster lifecycle
Other signals
Unified component for tracking agent
information, Powers other systems like task
migration, canaries, agent remediations
Cluster B
Agents states =
schedulable
For example: Task Migration
Cluster A
Agent state =
non-schedulable,
drain tasks
Agent Management
Task Migration
Cluster state
15. Multi-tenant networking is hard
Decided early on we wanted full IP stacks per container
But what about?
● Security group support
● IAM role support
● Network bandwidth isolation
● Leverage VPC
16. Virtual Machine Host
Titus Networking
sg=A,B
IP 2
sg=B,C
IP 3
Metadata
service
IPVlan, BPF, IFBs to route app traffic
Container 1 Container 2
sg=A,B
IP 4
Container 3
eth 0
sg=Titus
control plane
eth1
sg=A,B
eth2
sg=B,C
eth-mdeth-md
Titus executor
eth0eth0eth0
IP 2
IP 4
IP 3
IP 1
Metadata
service
eth-md
Metadata
service
169.254.169.254
17. Next challenge: Speed limits of EC2 Networking
Largest EC2 challenge: speed of networking reconfiguration
Changes in how we work with EC2 API’s
● Work with Amazon to redefine networking related API rate limits, buckets
● Pre-attach all networking interfaces
● IPs are asked for in bulk opportunistically
Also, coordination with scheduler
● Prefer instances with containers already in the same security group
For large scale failovers
● Before … hours, after ... minutes
18. ● Detection - health checks
○ Linux subsystems (systemd, filesystems)
○ Docker aspects (runtime health, registry pulls)
○ Titus processes (networking, GPU, security drivers)
○ Mesos aspects (agent, executor)
● Remediation
○ Local reconciliation
○ Docker image cleanup
Overcoming failures on each agent
19. Process Model Evolution
Single process containers
● Worked for some time, until we needed system services
System services
● Telemetry, IAM support, log uploading
● Added as host installed daemons; isolation & multi-tenancy concerns
● Currently injecting system services into containers
Composing system services into containers
● Considered pods; lifecycle and usage complexities limited value
● Considering future of both systemd and docker image composability
20. Resource Isolation
● CPU
○ Started with bursting; was interfering with predictability
○ Resource tiers to avoid interference problems
● Memory
○ Hard limit, OOM kills entire container
● Network
○ Bandwidth throttling
● Disk space
● GPUs
21. Security Isolation
● Deployed user namespaces
○ Challenging due to shared systems without UID shifting
● Needed ad hoc debugging
○ Titus-ssh for user level access to their container
○ Still required power user access for kernel functions
○ Working to automate through tools like Vector (NetflixOSS)
● Seccomp overhead and complexity is prohibitive
○ Working towards automated policies and BPF driven implementations
22. Open Sourcing
Currently in private open source collaboration with those who want ...
● The NetflixOSS container solution (Spinnaker + Titus + Netflix RPC)
● A unified batch and service Mesos scheduler
● More robust & native AWS container platform
Hope to fully open source in early Q2
● If you want access now, let us know
● Looking for collaborators, feedback