Boost Fertility New Invention Ups Success Rates.pdf
Red Hat Summit 2018 5 New High Performance Features in OpenShift
1. 5 NEW HIGH-PERFORMANCE
FEATURES IN RED HAT OPENSHIFT
Patterns and technology to run critical, high
performance line-of-business applications on Red
Hat OpenShift Container Platform
Derek Carr, Jeremy Eder
Red Hat Product Engineering
May 2018
14. WORKLOAD TYPES
Going beyond generic web hosting workloads
Big Data
NFV
FSI
Animation
ISVsHPC
Machine
Learning
● Identify requirement overlap
across verticals
● Plumb enhancements
generically
● Allow flexibility
15. DRILL DOWN ON OVERLAP
Feature FSI NFV ISV BD/ML ANIM HPC
CPU pinning (cpuset) Yes Yes Yes Maybe Maybe Yes
Device passthrough (GPU, NIC, etc.) Yes Yes Yes Yes Maybe Yes
Sysctl support Yes Yes Yes Yes Yes Yes
HugePages Yes Yes Yes Yes Maybe Maybe
NUMA Yes Yes Yes Maybe Maybe Yes
Separate control and data plane Yes Yes Yes Yes Yes Yes
Kernel module loading Yes Yes Yes Maybe Yes Maybe
16. PROGRESS REPORT
What has been done in the last year?
● CPU manager (static pinning)
● HugePages
● Device Plugins (GPU, etc.)
● Sysctl support
● Extended Resources
18. RESOURCES AND TUNING OPTIONS
Natively understood
● CPU
● Memory
● Ephemeral storage
● Persistent storage
● HugePages
● Device Plugins
● Extended Resources
Control knobs available
● CPU Manager policy (none, static)
● sysctls (safe / unsafe)
19. RESOURCE REQUIREMENTS
Describes the compute resources needed by a pod
Limits
● Maximum burst (if available)
Requests
● Minimum amount
(guaranteed)
Overcommit
● Ration of limit to request
20. QUALITY OF SERVICE
Keep the end-user API simple, let the platform optimize for SLA guarantees
Burstable
● Requests < Limits
Best Effort
● No resource limits
Guaranteed
● Requests = Limits
Beware
● Workload SLA
● Eviction
Future
● QoS is an abstraction to allow kubelets to support different tuning
options in the future for particular resource types while keeping
API simple
21. CLUSTER TOPOLOGY
Control Plane
Compute Nodes and Storage Tier
Infrastructure
master
and etcd
master
and etcd
master
and etcd
registry
and
router
registry
and
router
LB
registry
and
router
22. NODE BOOTSTRAPPING
Compute Nodes...
Config Maps
node-compute
node-cpu-bound
node-master
node-highmem
Fetch config from server
● Default kubelet arguments
● Default labels
● Default taints
● Changes are kept in sync
node-config.yaml
(node-highmem)
node-config.yaml
(node-cpu-bound)
24. CPU
How is it exposed?
● Compressible
● Measured in cores
● Not normalized for clock speed
● Use node labels to differentiate
● Assigned per container
25. CPU
How is it enforced?
Result
● Distributed across all cores
● Throttling
Completely Fair
Scheduler (CFS)
● Requests via cpu.shares
● Limits via cpu.cfs_quota_us
Challenge
● CPU bound workloads
(cache affinity and
scheduling latency) are
impacted
26. CPU MANAGEMENT POLICY
Tuning the node for cpu-bound workloads
Supported policies
● none is the default policy (just integrates with CFS)
● static allows containers in Guaranteed pods with integer cpu requests
exclusive CPUs on the node enforced via cpuset cgroup controller
Benefits
● End-user API is simple (kubelet option)
● Increases CPU affinity and decreases context switches
● Kubelet manages local node topology (important when doing devices)
● More dynamic policies could be introduced in the future
27. DEMO 1 - CPU Pinning
Enable cpu pinning via dynamic node config: Demo
● Inspect node config map, see kubeletArguments
for --cpu-manager-policy=static
● Inspect cpuset.cpus of pod containers assigned
either shared or exclusive cores
29. HUGE PAGES
Supports the allocation and consumption of pre-allocated huge pages
Scenario
● Large memory
working set sensitive
to TLB misses
(RDBMS, JVM, cache,
packet processors)
30. HUGE PAGES
Example Pod
Usage
● Pod request
● Node must pre-allocate
● EmptyDir
(medium=hugepages)
● shmget w/ SHM_HUGETLB
31. DEMO 2 - Pod that requires huge pages
Dynamically pre-allocate huge pages and schedule a pod: Demo
● Deploy DaemonSet to pre-allocate huge pages
● Inspect node allocatable
● Deploy a pod that consumes huge pages
33. DEMO 3 - Extended Resources
Counting dongles: Demo
● Implementation detail
○ For device plugins
● Industry leading UX!
○ (PATCH via curl)
34. DEVICE PLUGINS
gRPC service to expose devices to kubelet
Initialization
● Is the device healthy?
Registration
● Register with kubelet
Serving mode
● Monitor device health
● Allocate device
35. DEMO 4, 5 - GPUs
Consume a GPU in OpenShift: Infrastructure Demo, Multi-GPU Jupyter/Caffe Demo
● Deploy
nvidia-device-plugin
DaemonSet
● Inspect node
allocatable
● Deploy a pod that
consumes a GPU
37. SYSCTL
The three types
Unsafe
● Experimental Kubelet Flag
● kernel.sem*
● kernel.shm*
● kernel.msg*
● fs.mqueue.*
● net.*
Safe
● Enabled by default
● kernel.shm_rmid_forced
● net.ipv4.ip_local_port_range
● net.ipv4.tcp_syncookies
Node-level
● Can’t set from a pod
● Potentially affects other
pods
● Many interesting sysctls
● Use TuneD
Red Hat is working to graduate this feature to Beta during Kubernetes 1.11 release
● KEP: Promote sysctl annotations to fields
● Feedback welcome!
40. ROADMAP
Red Hat continues to invest in evolving support
Topic areas
● NUMA
● Co-located device scheduling
● External device monitoring
● Resource API V2