SlideShare una empresa de Scribd logo
1 de 57
Implement Advanced Scheduling Techniques in Kubernetes
Oleg Chunikhin | CTO, Kublr | February 2018
Introduction
• Oleg Chunikhin
• CTO @ Kublr
• Chief Software Architect @ EastBanc Technologies
• Kublr
• Enterprise Kubernetes cluster manager
• Application delivery platform
What to Look For
• Kubernetes overview
• Scheduling algorithm
• Scheduling controls
• Advanced scheduling techniques
• Examples, use cases, and recommendations
Kubernetes | Technology Stack
Kubernetes
• Orchestration
• Network
• Configuration
• Service discovery
• Ingress
• Persistence
• …
Docker
• Distribution
• Configuration
• Isolation
Docker | Architecture
Docker image
repository
Instance
Images
App data
Docker CLI
Overlay
network
Docker daemon
Application containersApplication containers
Kubernetes | Architecture
Master Node
K8s master
components:
etcd, scheduler, api,
controller
K8s
metadata
Docker
kubelet
App
data
K8s node components:
overlay network,
discovery, connectivity
Infrastructure and
application containers
Infrastructure and
application containers
Overlay
network
Kubernetes | Nodes and Pods
Node2
Pod A-2
10.0.1.5
Cnt1
Cnt2
Node 1
Pod A-1
10.0.0.3
Cnt1
Cnt2
Pod B-1
10.0.0.8
Cnt3
Node 1
Kubernetes | Container Orchestration
Docker
Kubelet
K8S Master API
K8S
Scheduler(s)
Pod A
Pod B
K8S
Controller(s)
User
Node 1
Pod A
Pod B Node 2
Pod C
Node 1
Kubernetes | Container Orchestration
Docker
Kubelet
K8S Master API
K8S
Scheduler(s)
K8S
Controller(s)
User
It all starts empty
Node 1
Kubernetes | Container Orchestration
Docker
Kubelet
K8S Master API
K8S
Scheduler(s)
K8S
Controller(s)
User
Kubelet registers node
object in master
Node 1
Kubernetes | Container Orchestration
Docker
Kubelet
K8S Master API
K8S
Scheduler(s)
K8S
Controller(s)
User
Node 1
Node 2
Node 1
Kubernetes | Container Orchestration
Docker
Kubelet
K8S Master API
K8S
Scheduler(s)
K8S
Controller(s)
User
Node 1
Node 2
User creates (unscheduled) Pod
object(s) in Master
Node 1
Kubernetes | Container Orchestration
Docker
Kubelet
K8S Master API
K8S
Scheduler(s)
K8S
Controller(s)
User
Node 1
Node 2
Scheduler notices
unscheduled Pods ...
Node 1
Kubernetes | Container Orchestration
Docker
Kubelet
K8S Master API
K8S
Scheduler(s)
K8S
Controller(s)
User
Node 1
Node 2
…identifies the best
node to run them on…
Pod A
Pod B
Pod C
Node 1
Kubernetes | Container Orchestration
Docker
Kubelet
K8S Master API
K8S
Scheduler(s)
K8S
Controller(s)
User
Node 1
Node 2
…and marks the
pods as scheduled
on corresponding
nodes.
Pod A
Pod B
Pod C
Node 1
Kubernetes | Container Orchestration
Docker
Kubelet
K8S Master API
K8S
Scheduler(s)
K8S
Controller(s)
User
Node 1
Node 2
Kubelet notices pods
scheduled to its nodes…
Pod A
Pod B
Pod C
Node 1
Kubernetes | Container Orchestration
Docker
Kubelet
K8S Master API
K8S
Scheduler(s)
K8S
Controller(s)
User
Node 1
Node 2
…and starts pods’
containers.
Pod A
Pod B
Pod C
Pod A
Pod B
Node 1
Kubernetes | Container Orchestration
Docker
Kubelet
K8S Master API
K8S
Scheduler(s)
K8S
Controller(s)
User
Node 1
Node 2
Scheduler finds the
best node to run pods.
HOW?
Pod A
Pod B
Pod C
Pod A
Pod B
Kubernetes | Scheduling Algorithm
For each pod that needs scheduling:
1. Filter nodes
2. Calculate nodes priorities
3. Schedule pod if possible
Kubernetes | Scheduling Algorithm
Volume filters
• Do pod requested volumes’ zones
fit the node’s zone?
• Can the node attach to the
volumes?
• Are there mounted volumes
conflicts?
• Are there additional volume
topology constraints?
Volume filters
Resource filters
Topology filters
Prioritization
Kubernetes | Scheduling Algorithm
Resource filters
• Does pod requested resources
(CPU, RAM GPU, etc) fit the node’s
available resources?
• Can pod requested ports be
opened on the node?
• Is there no memory or disk
pressure on the node?
Volume filters
Resource filters
Topology filters
Prioritization
Kubernetes | Scheduling Algorithm
Topology filters
• Is the pod requested to run on this
node?
• Are there inter-pod affinity
constraints?
• Does the node match the pod’s
node selector?
• Can the pod tolerate the node’s
taints?
Volume filters
Resource filters
Topology filters
Prioritization
Kubernetes | Scheduling Algorithm
Prioritize with weights for
• Pod replicas distribution
• Least (or most) node utilization
• Balanced resource usage
• Inter-pod affinity priority
• Node affinity priority
• Taint toleration priority
Volume filters
Resource filters
Topology filters
Prioritization
Scheduling Controlling Pods Destination
• Specify resource requirements
• Be aware of volumes
• Use node constraints
• Use affinity and anti-affinity
• Scheduler configuration
• Custom / multiple schedulers
Scheduling Controlled | Resources
• CPU, RAM, other (GPU)
• Requests and limits
• Reserved resources
kind: Node
status:
allocatable:
cpu: "4"
memory: 8070796Ki
pods: "110"
capacity:
cpu: "4"
memory: 8Gi
pods: "110"
kind: Pod
spec:
containers:
- name: main
resources:
requests:
cpu: 100m
memory: 1Gi
Scheduling Controlled | Volumes
• Request volumes in the right zones
• Make sure the node can attach
enough volumes
• Avoid volume location conflicts
• Use volume topology constraints
(alpha in 1.7)
Node 1
Pod A
Node 2 Volume 2
Pod B
Unschedulable
Zone A
Pod C
Requested
Volume
Zone B
Scheduling Controlled | Volumes
• Request volumes in the right zones
• Make sure the node can attach
enough volumes
• Avoid volume location conflicts
• Use volume topology constraints
(alpha in 1.7)
Node 1
Pod A
Volume 2Pod B
Pod C Requested
Volume
Volume 1
Scheduling Controlled | Volumes
• Request volumes in the right zones
• Make sure node can attach enough
volumes
• Avoid volume location conflicts
• Use volume topology constraints
(alpha in 1.7)
Node 1
Volume 1Pod A
Node 2
Volume 2Pod B
Pod C
Scheduling Controlled | Volumes
• Request volumes in the right zones
• Make sure node can attach enough
volumes
• Avoid volume location conflicts
• Use volume topology constraints
(alpha in 1.7)
annotations:
"volume.alpha.kubernetes.io/node-affinity": '{
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [{
"matchExpressions": [{
"key": "kubernetes.io/hostname",
"operator": "In",
"values": ["docker03"]
}]
}]
}}'
Scheduling Controlled | Constraints
• Host constraints
• Labels and node selectors
• Taints and tolerations
Node 1Pod A
kind: Pod
spec:
nodeName: node1
kind: Node
metadata:
name: node1
Scheduling Controlled | Node Constraints
• Host constraints
• Labels and node selectors
• Taints and tolerations
Node 1
Pod A Node 2
Node 3
label: tier: backend
kind: Node
metadata:
labels:
tier: backend
kind: Pod
spec:
nodeSelector:
tier: backend
Scheduling Controlled | Node Constraints
• Host constraints
• Labels and node selectors
• Taints and tolerations
kind: Pod
spec:
tolerations:
- key: error
value: disk
operator: Equal
effect: NoExecute
tolerationSeconds: 60
kind: Node
spec:
taints:
- effect: NoSchedule
key: error
value: disk
timeAdded: null
Pod B
Node 1
tainted
Pod A
tolerate
Scheduling Controlled | Taints
Taints communicate
node conditions
• Key – condition category
• Value – specific condition
• Operator – value wildcard
• Equal
• Exists
• Effect
• NoSchedule – filter at scheduling time
• PreferNoSchedule – prioritize at scheduling time
• NoExecute – filter at scheduling time, evict if executing
• TolerationSeconds – time to tolerate “NoExecute” taint
kind: Pod
spec:
tolerations:
- key: <taint key>
value: <taint value>
operator: <match operator>
effect: <taint effect>
tolerationSeconds: 60
Scheduling Controlled | Affinity
• Node affinity
• Inter-pod affinity
• Inter-pod anti-affinity
kind: Pod
spec:
affinity:
nodeAffinity: { ... }
podAffinity: { ... }
podAntiAffinity: { ... }
Scheduling Controlled | Node Affinity
Scope
• Preferred during scheduling, ignored during execution
• Required during scheduling, ignored during execution
kind: Pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 10
preference: { <node selector term> }
- ...
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- { <node selector term> }
- ... v
Interlude | Node Selector vs Node Selector Term
...
nodeSelector:
<label 1 key>: <label 1 value>
...
...
<node selector term>:
matchExpressions:
- key: <label key>
operator: In | NotIn | Exists | DoesNotExist | Gt | Lt
values:
- <label value 1>
...
...
Scheduling Controlled | Inter-pod Affinity
Scope
• Preferred during scheduling, ignored during execution
• Required during scheduling, ignored during execution
kind: Pod
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 10
podAffinityTerm: { <pod affinity term> }
- ...
requiredDuringSchedulingIgnoredDuringExecution:
- { <pod affinity term> }
- ...
Scheduling Controlled | Inter-pod Anti-affinity
Scope
• Preferred during scheduling, ignored during execution
• Required during scheduling, ignored during execution
kind: Pod
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 10
podAffinityTerm: { <pod affinity term> }
- ...
requiredDuringSchedulingIgnoredDuringExecution:
- { <pod affinity term> }
- ...
Scheduling Controlled | Pod Affinity Terms
• topologyKey – nodes’ label key defining co-location
• labelSelector and namespaces – select group of pods
<pod affinity term>:
topologyKey: <topology label key>
namespaces: [ <namespace>, ... ]
labelSelector:
matchLabels:
<label key>: <label value>
...
matchExpressions:
- key: <label key>
operator: In | NotIn | Exists | DoesNotExist
values: [ <value 1>, ... ]
...
Scheduling Controlled | Affinity Example
affinity:
topologyKey: tier
labelSelector:
matchLabels:
group: a
Node 1
tier: a
Pod B
group: a
Node 3
tier: b
tier: a
Node 4
tier: b
tier: b
Pod B
group: a
Node 1
tier: a
Scheduling Controlled | Scheduler Configuration
• Algorithm provider
• Policy configuration file / ConfigMap
• Extender
Default Scheduler | Algorithm Provider
kube-scheduler
--scheduler-name=default-scheduler
--algorithm-provider=DefaultProvider
--algorithm-provider=ClusterAutoscalerProvider
Default Scheduler | Custom Policy Config
kube-scheduler
--config=<file>
--policy-config-file=<file>
--use-legacy-policy-config=<true|false>
--policy-configmap=<config map name>
--policy-configmap-namespace=<config map ns>
Default Scheduler | Custom Policy Config
{
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
{"name" : "PodFitsHostPorts"},
...
{"name" : "HostName"}
],
"priorities" : [
{"name" : "LeastRequestedPriority", "weight" : 1},
...
{"name" : "EqualPriority", "weight" : 1}
],
"hardPodAffinitySymmetricWeight" : 10,
"alwaysCheckAllPredicates" : false
}
Default Scheduler | Scheduler Extender
{
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [...],
"priorities" : [...],
"extenders" : [{
"urlPrefix": "http://127.0.0.1:12346/scheduler",
"filterVerb": "filter",
"bindVerb": "bind",
"prioritizeVerb": "prioritize",
"weight": 5,
"enableHttps": false,
"nodeCacheCapable": false
}],
"hardPodAffinitySymmetricWeight" : 10,
"alwaysCheckAllPredicates" : false
}
Default Scheduler | Scheduler Extender
func fiter(pod, nodes) api.NodeList
func prioritize(pod, nodes) HostPriorityList
func bind(pod, node)
Scheduling Controlled | Multiple Schedulers
kind: Pod
Metadata:
name: pod2
spec:
schedulerName: my-scheduler
kind: Pod
Metadata:
name: pod1
spec:
...
Scheduling Controlled | Custom Scheduler
Naive implementation
• In an infinite loop:
• Get list of Nodes: /api/v1/nodes
• Get list of Pods: /api/v1/pods
• Select Pods with
status.phase == Pending and
spec.schedulerName == our-name
• For each pod:
• Calculate target Node
• Create a new Binding object: POST /api/v1/bindings
apiVersion: v1
kind: Binding
Metadata:
namespace: default
name: pod1
target:
apiVersion: v1
kind: Node
name: node1
Scheduling Controlled | Custom Scheduler
Better implementation
• Watch Pods: /api/v1/pods
• On each Pod event:
• Process if the Pod with
status.phase == Pending and
spec.schedulerName == our-name
• Get list of Nodes: /api/v1/nodes
• Calculate target Node
• Create a new Binding object: POST /api/v1/bindings
apiVersion: v1
kind: Binding
Metadata:
namespace: default
name: pod1
target:
apiVersion: v1
kind: Node
name: node1
Scheduling Controlled | Custom Scheduler
Even better implementation
• Watch Nodes: /api/v1/nodes
• On each Node event:
• Update Node cache
• Watch Pods: /api/v1/pods
• On each Pod event:
• Process if the Pod with
status.phase == Pending and
spec.schedulerName == our-name
• Calculate target Node
• Create a new Binding object: POST /api/v1/bindings
apiVersion: v1
kind: Binding
Metadata:
namespace: default
name: pod1
target:
apiVersion: v1
kind: Node
name: node1
Custom Scheduler | Standard Filters
• Minimal set of filters
• kube-scheduler
• Extend
• Re-implement
GitHub kubernetes/kubernetes
plugin/pkg/scheduler/scheduler.go
plugin/pkg/scheduler/algorithm/predicates/predicates.go
Use Case | Distributed Pods
apiVersion: v1
kind: Pod
metadata:
name: db-replica-3
labels:
component: db
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: component
operator: In
values: [ "db" ]
Node 2
db-replica-2
Node 1
Node 3
db-replica-1
db-replica-3
Use Case | Co-located Pods
apiVersion: v1
kind: Pod
metadata:
name: app-replica-1
labels:
component: web
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: component
operator: In
values: [ "db" ]
Node 2
db-replica-2
Node 1
Node 3
db-replica-1
app-replica-1
Use Case | Reliable Service on Spot Nodes
• “fixed” node group
Expensive, more reliable, fixed number
Tagged with label nodeGroup: fixed
• “spot” node group
Inexpensive, unreliable, auto-scaled
Tagged with label nodeGroup: spot
• Scheduling rules:
• At least two pods on “fixed” nodes
• All other pods favor “spot” nodes
• Custom scheduler
Scheduling | Dos and Don’ts
DO
• Use resource-based scheduling instead of
node-based
• Specify resource requests
• Keep requests == limits
• Especially for non-elastic resources
• Memory is non-elastic!
• Safeguard against missing resource specs
• Namespace default limits
• Admission controllers
• Plan architecture of localized volumes (EBS,
local)
• Use inter-pod affinity/anti-affinity if possible
DON’T
• ... assign pod to nodes directly
• ... use pods with no resource requests
• ... use resource requests rather node
• ... use node-affinity or node assignment if
possible
Scheduling | Key Takeaways
• Scheduling filters and priorities
• Resource requests and availability
• Inter-pod affinity/anti-affinity
• Volumes localization (AZ)
• Node labels and selectors
• Node affinity/anti-affinity
• Node taints and tolerations
• Scheduler(s) tweaking and customization
Oleg Chunikhin
Chief Technology Officer
oleg@kublr.com
kublr.com
Thank you!

Más contenido relacionado

La actualidad más candente

Kubernetes One-Click Deployment: Hands-on Workshop (Munich)
Kubernetes One-Click Deployment: Hands-on Workshop (Munich)Kubernetes One-Click Deployment: Hands-on Workshop (Munich)
Kubernetes One-Click Deployment: Hands-on Workshop (Munich)
QAware GmbH
 

La actualidad más candente (20)

Kubernetes data science and machine learning
Kubernetes data science and machine learningKubernetes data science and machine learning
Kubernetes data science and machine learning
 
Kubernetes as Infrastructure Abstraction
Kubernetes as Infrastructure AbstractionKubernetes as Infrastructure Abstraction
Kubernetes as Infrastructure Abstraction
 
Kubernetes intro public - kubernetes user group 4-21-2015
Kubernetes intro   public - kubernetes user group 4-21-2015Kubernetes intro   public - kubernetes user group 4-21-2015
Kubernetes intro public - kubernetes user group 4-21-2015
 
How Kubernetes scheduler works
How Kubernetes scheduler worksHow Kubernetes scheduler works
How Kubernetes scheduler works
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
K8s best practices from the field!
K8s best practices from the field!K8s best practices from the field!
K8s best practices from the field!
 
Sf bay area Kubernetes meetup dec8 2016 - deployment models
Sf bay area Kubernetes meetup dec8 2016 - deployment modelsSf bay area Kubernetes meetup dec8 2016 - deployment models
Sf bay area Kubernetes meetup dec8 2016 - deployment models
 
Running I/O intensive workloads on Kubernetes, by Nati Shalom
Running I/O intensive workloads on Kubernetes, by Nati ShalomRunning I/O intensive workloads on Kubernetes, by Nati Shalom
Running I/O intensive workloads on Kubernetes, by Nati Shalom
 
Intro to kubernetes
Intro to kubernetesIntro to kubernetes
Intro to kubernetes
 
Setup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes FederationSetup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes Federation
 
Kubernetes intro public - kubernetes meetup 4-21-2015
Kubernetes intro   public - kubernetes meetup 4-21-2015Kubernetes intro   public - kubernetes meetup 4-21-2015
Kubernetes intro public - kubernetes meetup 4-21-2015
 
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
 
Kubernetes security
Kubernetes securityKubernetes security
Kubernetes security
 
Enabling ceph-mgr to control Ceph services via Kubernetes
Enabling ceph-mgr to control Ceph services via KubernetesEnabling ceph-mgr to control Ceph services via Kubernetes
Enabling ceph-mgr to control Ceph services via Kubernetes
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Fully automated kubernetes deployment and management
Fully automated kubernetes deployment and managementFully automated kubernetes deployment and management
Fully automated kubernetes deployment and management
 
Spinnaker on Kubernetes
Spinnaker on KubernetesSpinnaker on Kubernetes
Spinnaker on Kubernetes
 
Kubernetes One-Click Deployment: Hands-on Workshop (Munich)
Kubernetes One-Click Deployment: Hands-on Workshop (Munich)Kubernetes One-Click Deployment: Hands-on Workshop (Munich)
Kubernetes One-Click Deployment: Hands-on Workshop (Munich)
 
The Operator Pattern - Managing Stateful Services in Kubernetes
The Operator Pattern - Managing Stateful Services in KubernetesThe Operator Pattern - Managing Stateful Services in Kubernetes
The Operator Pattern - Managing Stateful Services in Kubernetes
 
Container Runtime Security with Falco, by Néstor Salceda
Container Runtime Security with Falco, by Néstor SalcedaContainer Runtime Security with Falco, by Néstor Salceda
Container Runtime Security with Falco, by Néstor Salceda
 

Similar a Implement Advanced Scheduling Techniques in Kubernetes

Implementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdf
Implementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdfImplementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdf
Implementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdf
ssuserf4844f
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
Chester Chen
 

Similar a Implement Advanced Scheduling Techniques in Kubernetes (20)

Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
 
Kubernetes Workshop
Kubernetes WorkshopKubernetes Workshop
Kubernetes Workshop
 
Apache Hadoop YARN State of the Union
Apache Hadoop YARN State of the UnionApache Hadoop YARN State of the Union
Apache Hadoop YARN State of the Union
 
Kubernetes Internals
Kubernetes InternalsKubernetes Internals
Kubernetes Internals
 
Kubernetes fundamentals
Kubernetes fundamentalsKubernetes fundamentals
Kubernetes fundamentals
 
Kubernetes Networking 101 kubecon EU 2022
Kubernetes Networking 101 kubecon EU 2022Kubernetes Networking 101 kubecon EU 2022
Kubernetes Networking 101 kubecon EU 2022
 
Kubernetes "Ubernetes" Cluster Federation by Quinton Hoole (Google, Inc) Huaw...
Kubernetes "Ubernetes" Cluster Federation by Quinton Hoole (Google, Inc) Huaw...Kubernetes "Ubernetes" Cluster Federation by Quinton Hoole (Google, Inc) Huaw...
Kubernetes "Ubernetes" Cluster Federation by Quinton Hoole (Google, Inc) Huaw...
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Containers orchestrators: Docker vs. Kubernetes
Containers orchestrators: Docker vs. KubernetesContainers orchestrators: Docker vs. Kubernetes
Containers orchestrators: Docker vs. Kubernetes
 
Implementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdf
Implementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdfImplementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdf
Implementing-SaaS-on-Kubernetes-Michael-Knapp-Andrew-Gao-Capital-One.pdf
 
Open stackaustinmeetupsept21
Open stackaustinmeetupsept21Open stackaustinmeetupsept21
Open stackaustinmeetupsept21
 
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd についてKubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
 
Kubernetes overview 101
Kubernetes overview 101Kubernetes overview 101
Kubernetes overview 101
 
Cassandra and drivers
Cassandra and driversCassandra and drivers
Cassandra and drivers
 
Openstack days sv building highly available services using kubernetes (preso)
Openstack days sv   building highly available services using kubernetes (preso)Openstack days sv   building highly available services using kubernetes (preso)
Openstack days sv building highly available services using kubernetes (preso)
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
 
Google Kubernetes Engine Deep Dive Meetup
Google Kubernetes Engine Deep Dive MeetupGoogle Kubernetes Engine Deep Dive Meetup
Google Kubernetes Engine Deep Dive Meetup
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 

Más de Kublr

Centralizing Kubernetes Management in Restrictive Environments
Centralizing Kubernetes Management in Restrictive EnvironmentsCentralizing Kubernetes Management in Restrictive Environments
Centralizing Kubernetes Management in Restrictive Environments
Kublr
 

Más de Kublr (18)

Container Runtimes and Tooling, v2
Container Runtimes and Tooling, v2Container Runtimes and Tooling, v2
Container Runtimes and Tooling, v2
 
Container Runtimes and Tooling
Container Runtimes and ToolingContainer Runtimes and Tooling
Container Runtimes and Tooling
 
Kubernetes in Hybrid Environments with Submariner
Kubernetes in Hybrid Environments with SubmarinerKubernetes in Hybrid Environments with Submariner
Kubernetes in Hybrid Environments with Submariner
 
Intro into Rook and Ceph on Kubernetes
Intro into Rook and Ceph on KubernetesIntro into Rook and Ceph on Kubernetes
Intro into Rook and Ceph on Kubernetes
 
Hybrid architecture solutions with kubernetes and the cloud native stack
Hybrid architecture solutions with kubernetes and the cloud native stackHybrid architecture solutions with kubernetes and the cloud native stack
Hybrid architecture solutions with kubernetes and the cloud native stack
 
Multi-cloud Kubernetes BCDR with Velero
Multi-cloud Kubernetes BCDR with VeleroMulti-cloud Kubernetes BCDR with Velero
Multi-cloud Kubernetes BCDR with Velero
 
Kubernetes Networking 101
Kubernetes Networking 101Kubernetes Networking 101
Kubernetes Networking 101
 
Kubernetes Ingress 101
Kubernetes Ingress 101Kubernetes Ingress 101
Kubernetes Ingress 101
 
Kubernetes persistence 101
Kubernetes persistence 101Kubernetes persistence 101
Kubernetes persistence 101
 
Portable CI/CD Environment as Code with Kubernetes, Kublr and Jenkins
Portable CI/CD Environment as Code with Kubernetes, Kublr and JenkinsPortable CI/CD Environment as Code with Kubernetes, Kublr and Jenkins
Portable CI/CD Environment as Code with Kubernetes, Kublr and Jenkins
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-step
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-stepSetting up CI/CD Pipeline with Kubernetes and Kublr step by-step
Setting up CI/CD Pipeline with Kubernetes and Kublr step by-step
 
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
Canary Releases on Kubernetes with Spinnaker, Istio, & Prometheus (2020)
 
How to Run Kubernetes in Restrictive Environments
How to Run Kubernetes in Restrictive EnvironmentsHow to Run Kubernetes in Restrictive Environments
How to Run Kubernetes in Restrictive Environments
 
Introduction to Kubernetes RBAC
Introduction to Kubernetes RBACIntroduction to Kubernetes RBAC
Introduction to Kubernetes RBAC
 
How Self-Healing Nodes and Infrastructure Management Impact Reliability
How Self-Healing Nodes and Infrastructure Management Impact ReliabilityHow Self-Healing Nodes and Infrastructure Management Impact Reliability
How Self-Healing Nodes and Infrastructure Management Impact Reliability
 
Centralizing Kubernetes Management in Restrictive Environments
Centralizing Kubernetes Management in Restrictive EnvironmentsCentralizing Kubernetes Management in Restrictive Environments
Centralizing Kubernetes Management in Restrictive Environments
 
Canary Releases on Kubernetes w/ Spinnaker, Istio, and Prometheus
Canary Releases on Kubernetes w/ Spinnaker, Istio, and PrometheusCanary Releases on Kubernetes w/ Spinnaker, Istio, and Prometheus
Canary Releases on Kubernetes w/ Spinnaker, Istio, and Prometheus
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

Implement Advanced Scheduling Techniques in Kubernetes

  • 1. Implement Advanced Scheduling Techniques in Kubernetes Oleg Chunikhin | CTO, Kublr | February 2018
  • 2. Introduction • Oleg Chunikhin • CTO @ Kublr • Chief Software Architect @ EastBanc Technologies • Kublr • Enterprise Kubernetes cluster manager • Application delivery platform
  • 3. What to Look For • Kubernetes overview • Scheduling algorithm • Scheduling controls • Advanced scheduling techniques • Examples, use cases, and recommendations
  • 4. Kubernetes | Technology Stack Kubernetes • Orchestration • Network • Configuration • Service discovery • Ingress • Persistence • … Docker • Distribution • Configuration • Isolation
  • 5. Docker | Architecture Docker image repository Instance Images App data Docker CLI Overlay network Docker daemon Application containersApplication containers
  • 6. Kubernetes | Architecture Master Node K8s master components: etcd, scheduler, api, controller K8s metadata Docker kubelet App data K8s node components: overlay network, discovery, connectivity Infrastructure and application containers Infrastructure and application containers Overlay network
  • 7. Kubernetes | Nodes and Pods Node2 Pod A-2 10.0.1.5 Cnt1 Cnt2 Node 1 Pod A-1 10.0.0.3 Cnt1 Cnt2 Pod B-1 10.0.0.8 Cnt3
  • 8. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) Pod A Pod B K8S Controller(s) User Node 1 Pod A Pod B Node 2 Pod C
  • 9. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User It all starts empty
  • 10. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Kubelet registers node object in master
  • 11. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2
  • 12. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 User creates (unscheduled) Pod object(s) in Master
  • 13. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 Scheduler notices unscheduled Pods ...
  • 14. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 …identifies the best node to run them on… Pod A Pod B Pod C
  • 15. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 …and marks the pods as scheduled on corresponding nodes. Pod A Pod B Pod C
  • 16. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 Kubelet notices pods scheduled to its nodes… Pod A Pod B Pod C
  • 17. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 …and starts pods’ containers. Pod A Pod B Pod C Pod A Pod B
  • 18. Node 1 Kubernetes | Container Orchestration Docker Kubelet K8S Master API K8S Scheduler(s) K8S Controller(s) User Node 1 Node 2 Scheduler finds the best node to run pods. HOW? Pod A Pod B Pod C Pod A Pod B
  • 19. Kubernetes | Scheduling Algorithm For each pod that needs scheduling: 1. Filter nodes 2. Calculate nodes priorities 3. Schedule pod if possible
  • 20. Kubernetes | Scheduling Algorithm Volume filters • Do pod requested volumes’ zones fit the node’s zone? • Can the node attach to the volumes? • Are there mounted volumes conflicts? • Are there additional volume topology constraints? Volume filters Resource filters Topology filters Prioritization
  • 21. Kubernetes | Scheduling Algorithm Resource filters • Does pod requested resources (CPU, RAM GPU, etc) fit the node’s available resources? • Can pod requested ports be opened on the node? • Is there no memory or disk pressure on the node? Volume filters Resource filters Topology filters Prioritization
  • 22. Kubernetes | Scheduling Algorithm Topology filters • Is the pod requested to run on this node? • Are there inter-pod affinity constraints? • Does the node match the pod’s node selector? • Can the pod tolerate the node’s taints? Volume filters Resource filters Topology filters Prioritization
  • 23. Kubernetes | Scheduling Algorithm Prioritize with weights for • Pod replicas distribution • Least (or most) node utilization • Balanced resource usage • Inter-pod affinity priority • Node affinity priority • Taint toleration priority Volume filters Resource filters Topology filters Prioritization
  • 24. Scheduling Controlling Pods Destination • Specify resource requirements • Be aware of volumes • Use node constraints • Use affinity and anti-affinity • Scheduler configuration • Custom / multiple schedulers
  • 25. Scheduling Controlled | Resources • CPU, RAM, other (GPU) • Requests and limits • Reserved resources kind: Node status: allocatable: cpu: "4" memory: 8070796Ki pods: "110" capacity: cpu: "4" memory: 8Gi pods: "110" kind: Pod spec: containers: - name: main resources: requests: cpu: 100m memory: 1Gi
  • 26. Scheduling Controlled | Volumes • Request volumes in the right zones • Make sure the node can attach enough volumes • Avoid volume location conflicts • Use volume topology constraints (alpha in 1.7) Node 1 Pod A Node 2 Volume 2 Pod B Unschedulable Zone A Pod C Requested Volume Zone B
  • 27. Scheduling Controlled | Volumes • Request volumes in the right zones • Make sure the node can attach enough volumes • Avoid volume location conflicts • Use volume topology constraints (alpha in 1.7) Node 1 Pod A Volume 2Pod B Pod C Requested Volume Volume 1
  • 28. Scheduling Controlled | Volumes • Request volumes in the right zones • Make sure node can attach enough volumes • Avoid volume location conflicts • Use volume topology constraints (alpha in 1.7) Node 1 Volume 1Pod A Node 2 Volume 2Pod B Pod C
  • 29. Scheduling Controlled | Volumes • Request volumes in the right zones • Make sure node can attach enough volumes • Avoid volume location conflicts • Use volume topology constraints (alpha in 1.7) annotations: "volume.alpha.kubernetes.io/node-affinity": '{ "requiredDuringSchedulingIgnoredDuringExecution": { "nodeSelectorTerms": [{ "matchExpressions": [{ "key": "kubernetes.io/hostname", "operator": "In", "values": ["docker03"] }] }] }}'
  • 30. Scheduling Controlled | Constraints • Host constraints • Labels and node selectors • Taints and tolerations Node 1Pod A kind: Pod spec: nodeName: node1 kind: Node metadata: name: node1
  • 31. Scheduling Controlled | Node Constraints • Host constraints • Labels and node selectors • Taints and tolerations Node 1 Pod A Node 2 Node 3 label: tier: backend kind: Node metadata: labels: tier: backend kind: Pod spec: nodeSelector: tier: backend
  • 32. Scheduling Controlled | Node Constraints • Host constraints • Labels and node selectors • Taints and tolerations kind: Pod spec: tolerations: - key: error value: disk operator: Equal effect: NoExecute tolerationSeconds: 60 kind: Node spec: taints: - effect: NoSchedule key: error value: disk timeAdded: null Pod B Node 1 tainted Pod A tolerate
  • 33. Scheduling Controlled | Taints Taints communicate node conditions • Key – condition category • Value – specific condition • Operator – value wildcard • Equal • Exists • Effect • NoSchedule – filter at scheduling time • PreferNoSchedule – prioritize at scheduling time • NoExecute – filter at scheduling time, evict if executing • TolerationSeconds – time to tolerate “NoExecute” taint kind: Pod spec: tolerations: - key: <taint key> value: <taint value> operator: <match operator> effect: <taint effect> tolerationSeconds: 60
  • 34. Scheduling Controlled | Affinity • Node affinity • Inter-pod affinity • Inter-pod anti-affinity kind: Pod spec: affinity: nodeAffinity: { ... } podAffinity: { ... } podAntiAffinity: { ... }
  • 35. Scheduling Controlled | Node Affinity Scope • Preferred during scheduling, ignored during execution • Required during scheduling, ignored during execution kind: Pod spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 10 preference: { <node selector term> } - ... requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - { <node selector term> } - ... v
  • 36. Interlude | Node Selector vs Node Selector Term ... nodeSelector: <label 1 key>: <label 1 value> ... ... <node selector term>: matchExpressions: - key: <label key> operator: In | NotIn | Exists | DoesNotExist | Gt | Lt values: - <label value 1> ... ...
  • 37. Scheduling Controlled | Inter-pod Affinity Scope • Preferred during scheduling, ignored during execution • Required during scheduling, ignored during execution kind: Pod spec: affinity: podAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 10 podAffinityTerm: { <pod affinity term> } - ... requiredDuringSchedulingIgnoredDuringExecution: - { <pod affinity term> } - ...
  • 38. Scheduling Controlled | Inter-pod Anti-affinity Scope • Preferred during scheduling, ignored during execution • Required during scheduling, ignored during execution kind: Pod spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 10 podAffinityTerm: { <pod affinity term> } - ... requiredDuringSchedulingIgnoredDuringExecution: - { <pod affinity term> } - ...
  • 39. Scheduling Controlled | Pod Affinity Terms • topologyKey – nodes’ label key defining co-location • labelSelector and namespaces – select group of pods <pod affinity term>: topologyKey: <topology label key> namespaces: [ <namespace>, ... ] labelSelector: matchLabels: <label key>: <label value> ... matchExpressions: - key: <label key> operator: In | NotIn | Exists | DoesNotExist values: [ <value 1>, ... ] ...
  • 40. Scheduling Controlled | Affinity Example affinity: topologyKey: tier labelSelector: matchLabels: group: a Node 1 tier: a Pod B group: a Node 3 tier: b tier: a Node 4 tier: b tier: b Pod B group: a Node 1 tier: a
  • 41. Scheduling Controlled | Scheduler Configuration • Algorithm provider • Policy configuration file / ConfigMap • Extender
  • 42. Default Scheduler | Algorithm Provider kube-scheduler --scheduler-name=default-scheduler --algorithm-provider=DefaultProvider --algorithm-provider=ClusterAutoscalerProvider
  • 43. Default Scheduler | Custom Policy Config kube-scheduler --config=<file> --policy-config-file=<file> --use-legacy-policy-config=<true|false> --policy-configmap=<config map name> --policy-configmap-namespace=<config map ns>
  • 44. Default Scheduler | Custom Policy Config { "kind" : "Policy", "apiVersion" : "v1", "predicates" : [ {"name" : "PodFitsHostPorts"}, ... {"name" : "HostName"} ], "priorities" : [ {"name" : "LeastRequestedPriority", "weight" : 1}, ... {"name" : "EqualPriority", "weight" : 1} ], "hardPodAffinitySymmetricWeight" : 10, "alwaysCheckAllPredicates" : false }
  • 45. Default Scheduler | Scheduler Extender { "kind" : "Policy", "apiVersion" : "v1", "predicates" : [...], "priorities" : [...], "extenders" : [{ "urlPrefix": "http://127.0.0.1:12346/scheduler", "filterVerb": "filter", "bindVerb": "bind", "prioritizeVerb": "prioritize", "weight": 5, "enableHttps": false, "nodeCacheCapable": false }], "hardPodAffinitySymmetricWeight" : 10, "alwaysCheckAllPredicates" : false }
  • 46. Default Scheduler | Scheduler Extender func fiter(pod, nodes) api.NodeList func prioritize(pod, nodes) HostPriorityList func bind(pod, node)
  • 47. Scheduling Controlled | Multiple Schedulers kind: Pod Metadata: name: pod2 spec: schedulerName: my-scheduler kind: Pod Metadata: name: pod1 spec: ...
  • 48. Scheduling Controlled | Custom Scheduler Naive implementation • In an infinite loop: • Get list of Nodes: /api/v1/nodes • Get list of Pods: /api/v1/pods • Select Pods with status.phase == Pending and spec.schedulerName == our-name • For each pod: • Calculate target Node • Create a new Binding object: POST /api/v1/bindings apiVersion: v1 kind: Binding Metadata: namespace: default name: pod1 target: apiVersion: v1 kind: Node name: node1
  • 49. Scheduling Controlled | Custom Scheduler Better implementation • Watch Pods: /api/v1/pods • On each Pod event: • Process if the Pod with status.phase == Pending and spec.schedulerName == our-name • Get list of Nodes: /api/v1/nodes • Calculate target Node • Create a new Binding object: POST /api/v1/bindings apiVersion: v1 kind: Binding Metadata: namespace: default name: pod1 target: apiVersion: v1 kind: Node name: node1
  • 50. Scheduling Controlled | Custom Scheduler Even better implementation • Watch Nodes: /api/v1/nodes • On each Node event: • Update Node cache • Watch Pods: /api/v1/pods • On each Pod event: • Process if the Pod with status.phase == Pending and spec.schedulerName == our-name • Calculate target Node • Create a new Binding object: POST /api/v1/bindings apiVersion: v1 kind: Binding Metadata: namespace: default name: pod1 target: apiVersion: v1 kind: Node name: node1
  • 51. Custom Scheduler | Standard Filters • Minimal set of filters • kube-scheduler • Extend • Re-implement GitHub kubernetes/kubernetes plugin/pkg/scheduler/scheduler.go plugin/pkg/scheduler/algorithm/predicates/predicates.go
  • 52. Use Case | Distributed Pods apiVersion: v1 kind: Pod metadata: name: db-replica-3 labels: component: db spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: kubernetes.io/hostname labelSelector: matchExpressions: - key: component operator: In values: [ "db" ] Node 2 db-replica-2 Node 1 Node 3 db-replica-1 db-replica-3
  • 53. Use Case | Co-located Pods apiVersion: v1 kind: Pod metadata: name: app-replica-1 labels: component: web spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - topologyKey: kubernetes.io/hostname labelSelector: matchExpressions: - key: component operator: In values: [ "db" ] Node 2 db-replica-2 Node 1 Node 3 db-replica-1 app-replica-1
  • 54. Use Case | Reliable Service on Spot Nodes • “fixed” node group Expensive, more reliable, fixed number Tagged with label nodeGroup: fixed • “spot” node group Inexpensive, unreliable, auto-scaled Tagged with label nodeGroup: spot • Scheduling rules: • At least two pods on “fixed” nodes • All other pods favor “spot” nodes • Custom scheduler
  • 55. Scheduling | Dos and Don’ts DO • Use resource-based scheduling instead of node-based • Specify resource requests • Keep requests == limits • Especially for non-elastic resources • Memory is non-elastic! • Safeguard against missing resource specs • Namespace default limits • Admission controllers • Plan architecture of localized volumes (EBS, local) • Use inter-pod affinity/anti-affinity if possible DON’T • ... assign pod to nodes directly • ... use pods with no resource requests • ... use resource requests rather node • ... use node-affinity or node assignment if possible
  • 56. Scheduling | Key Takeaways • Scheduling filters and priorities • Resource requests and availability • Inter-pod affinity/anti-affinity • Volumes localization (AZ) • Node labels and selectors • Node affinity/anti-affinity • Node taints and tolerations • Scheduler(s) tweaking and customization
  • 57. Oleg Chunikhin Chief Technology Officer oleg@kublr.com kublr.com Thank you!

Notas del editor

  1. Thank you for coming to see my presentation Oleg Chunikhin CTO at Kublr Chief Software Architect at EastBanc Technologies Kublr we develop an enterprise Kubernetes management platform We see that quite often rich and powerful scheduling controls Kubernetes provides are underutilized, and essentially manual scheduling is used We prepared this scheduling overview presentation to explain how cloud native applications can be made better by utilizing full power of k8s scheduling.
  2. I will spend a few minutes reintroducing docker and kubernetes architecture concepts… before we dig into kubernetes scheduling. Talking about scheduling, I’ll try to explain capabilities, … controls available to cluster users and administrators, … and extension points We’ll also look at a couple of examples and… Some recommendations
  3. Kubernetes can schedule other types of containers, e.g. rkt Docker containers can be managed through other orchestration technologies, such as Mesos Docker Swarm Hashicorp Nomad Docker-Kubernetes is still arguably the most common combination and we will be talking specifically about it today.
  4. The architecture and concepts are shared with other Distribution Configuration Isolation Image repository may be public or private Signed images are supported Overlay network is not required Different Linux process isolation technologies – namespaces, security groups и т.д.
  5. Master: API Metadata database Can run in HA mode1, 3, or 5 instances) Nodes K8s agents, docker, system containers, and application containers After initialization and setup nodes are fully controlled by the master
  6. Registering nodes in the wizard Appointment of pods on the nodes The address allocation is submitted (from the pool of addresses of the overlay network allocated to the node at registration) Joint launch of containers in the pod Sharing the address space of a dataport and data volumes with containers The overall life cycle of the pod and its container The life cycle of the pod is very simple - moving and changing is not allowed, you must be re-created
  7. Master API maintains the general picture – vision of desired and current known state Master relies on other components – controllers, kubelet – to update current known state User modifies to-be state and reads current state Controllers “clarify” to-be state Kubelet perform actions to achieve to-be state, and reports current state Scheduler is just one of the controllers, responsible for assigning unassigned pods to specific nodes
  8. First there was nothing
  9. Master API maintains the general picture User modifies to-be state and reads current state Controllers “clarify” to-be state Kubelet perform actions to achieve to-be state, and reports current state Scheduler is just one of the controllers, responsible for assigning unassigned pods to specific nodes
  10. Master API maintains the general picture User modifies to-be state and reads current state Controllers “clarify” to-be state Kubelet perform actions to achieve to-be state, and reports current state Scheduler is just one of the controllers, responsible for assigning unassigned pods to specific nodes
  11. Master API maintains the general picture User modifies to-be state and reads current state Controllers “clarify” to-be state Kubelet perform actions to achieve to-be state, and reports current state Scheduler is just one of the controllers, responsible for assigning unassigned pods to specific nodes
  12. Master API maintains the general picture User modifies to-be state and reads current state Controllers “clarify” to-be state Kubelet perform actions to achieve to-be state, and reports current state Scheduler is just one of the controllers, responsible for assigning unassigned pods to specific nodes
  13. Master API maintains the general picture User modifies to-be state and reads current state Controllers “clarify” to-be state Kubelet perform actions to achieve to-be state, and reports current state Scheduler is just one of the controllers, responsible for assigning unassigned pods to specific nodes
  14. Master API maintains the general picture User modifies to-be state and reads current state Controllers “clarify” to-be state Kubelet perform actions to achieve to-be state, and reports current state Scheduler is just one of the controllers, responsible for assigning unassigned pods to specific nodes
  15. Master API maintains the general picture User modifies to-be state and reads current state Controllers “clarify” to-be state Kubelet perform actions to achieve to-be state, and reports current state Scheduler is just one of the controllers, responsible for assigning unassigned pods to specific nodes
  16. Master API maintains the general picture User modifies to-be state and reads current state Controllers “clarify” to-be state Kubelet perform actions to achieve to-be state, and reports current state Scheduler is just one of the controllers, responsible for assigning unassigned pods to specific nodes
  17. Master API maintains the general picture User modifies to-be state and reads current state Controllers “clarify” to-be state Kubelet perform actions to achieve to-be state, and reports current state Scheduler is just one of the controllers, responsible for assigning unassigned pods to specific nodes
  18. Pod requests new volumes, can they be created in a zone where the can be attached to the node? If requested volumes already exist, can they be attached to the node? If the volumes are already attached/mounted, can they be mounted to this node? Any other user-specified constraints?
  19. This most often happens in AWS, where EBS can only be attached to instances in the same AZ where EBS is located
  20. This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the labelSelector in the specified namespaces, where co-located is defined as running on a node whose value of the label with key topologyKey matches that of any node on which any of the selected pods is running. Empty topologyKey: For PreferredDuringScheduling pod anti-affinity, empty topologyKey is interpreted as "all topologies" ("all topologies" here means all the topologyKeys indicated by scheduler command-line argument --failure-domains); For affinity and for RequiredDuringScheduling pod anti-affinity, empty topologyKey is not allowed.
  21. This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the labelSelector in the specified namespaces, where co-located is defined as running on a node whose value of the label with key topologyKey matches that of any node on which any of the selected pods is running. Empty topologyKey: For PreferredDuringScheduling pod anti-affinity, empty topologyKey is interpreted as "all topologies" ("all topologies" here means all the topologyKeys indicated by scheduler command-line argument --failure-domains); For affinity and for RequiredDuringScheduling pod anti-affinity, empty topologyKey is not allowed.