SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
SFO17- 421 The Linux Kernel
Scheduler
Viresh Kumar (PMWG)
ENGINEERS AND DEVICES
WORKING TOGETHER
Topics
● CPU Scheduler
● The O(1) scheduler
● Current scheduler design
● Scheduling classes
● schedule()
● Scheduling classes and policies
● Sched class: STOP
● Sched class: Deadline
● Sched class: Realtime
● Sched class: CFS
● Sched class: Idle
● Runqueue
ENGINEERS AND DEVICES
WORKING TOGETHER
CPU Scheduler
● Shares CPU between tasks.
● Selects next task to run (on each CPU) when:
○ Running task terminates
○ Running task sleeps (waiting for an event)
○ A new task created or sleeping task wakes up
○ Running task’s timeslice is over
● Goals:
○ Be *fair*
○ Timeslice based on task priority
○ Low task response time
○ High (task completion) throughput
○ Balance load among CPUs
○ Power efficient
○ Low overhead of running scheduler’s code
● Work with mainframes, servers, desktops/laptops, embedded/phones.
ENGINEERS AND DEVICES
WORKING TOGETHER
The O(1) scheduler
● 140 priorities. 0-99: RT tasks and 100-139: User tasks.
● Each CPU’s runqueue had 2 arrays: Active and Expired.
● Each arrays had 140 entries, one per priority level.
● Each entry was a linked list serviced in FIFO order.
● Bitmap (of 140 bits) used to find which priority lists aren’t empty.
● Timeslices were assigned to tasks based on these priorities.
● Expired tasks moved from Active to Passive array.
● Once Active array was empty, the arrays were swapped.
● Enqueue and dequeue of tasks and next task selection done in constant time.
● Replaced by CFS in Linux 2.6.23 (2007).
ENGINEERS AND DEVICES
WORKING TOGETHER
The O(1) scheduler (Cont..)
Priority 0
Priority 1
Priority 139
...
Priority 99
...
CPU Active array
Priority 0
Priority 1
Priority 139
...
Priority 99
...
CPU Expired array
Real Time task priorities
User task priorities
ENGINEERS AND DEVICES
WORKING TOGETHER
Current scheduler design
● Introduced by Ingo Molnar in 2.6.23 (2007).
● Scheduling policies within scheduling classes.
● Scheduling class with higher priority served first.
● Task can migrate between CPUs, scheduling policies and scheduling classes.
ENGINEERS AND DEVICES
WORKING TOGETHER
Scheduling Classes
● Represented by following structure:
struct sched_class {
const struct sched_class *next;
void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags);
void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags);
struct task_struct * (*pick_next_task) (struct rq *rq, struct task_struct *prev, struct rq_flags *rf);
…
};
STOP
next
DL RT CFS IDLE
nullnext next next next
ENGINEERS AND DEVICES
WORKING TOGETHER
Schedule()
● Picks the next runnable task to run on a CPU.
● Searches for a task starting from the highest priority class (stop).
● Helper: for_each_class().
● pick_next_task():
again:
for_each_class(class) {
p = class->pick_next_task(rq, prev, rf);
if (p) {
if (unlikely(p == RETRY_TASK))
goto again;
return p;
}
}
/* The idle class should always have a runnable task: */
BUG();
ENGINEERS AND DEVICES
WORKING TOGETHER
Scheduling classes and policies
● Stop
○ No policy
● Deadline
○ SCHED_DEADLINE
● Real Time
○ SCHED_FIFO
○ SCHED_RR
● Fair
○ SCHED_NORMAL
○ SCHED_BATCH
○ SCHED_IDLE
● Idle
○ No policy
ENGINEERS AND DEVICES
WORKING TOGETHER
Sched class: STOP
● Highest priority class.
● Only available for SMP (stop_machine() isn’t used in UP).
● Can preempt everything and is preempted by nothing.
● Mechanism to stop running everything else and run a specific function on
CPU(s).
● No scheduling policies.
● One kernel thread per CPU: “migration/N”.
● Used by task migration, CPU hotplug, RCU, ftrace, clockevents, etc.
ENGINEERS AND DEVICES
WORKING TOGETHER
Sched class: Deadline (DL)
● Introduced by Dario Faggioli & Juri Lelli, v3.14 (2013).
● Highest priority tasks in the system.
● Scheduling policy: SCHED_DEADLINE.
● Implemented with red-black tree (self balancing).
● Used for periodic real time tasks, eg. Video encoding/decoding.
ENGINEERS AND DEVICES
WORKING TOGETHER
Sched class: Realtime (RT)
● POSIX “real-time” tasks.
● Task priorities: 0 - 99.
● Inverted priorities: 0 highest in kernel, lowest in user space.
● Scheduling policies for tasks at same priority:
○ SCHED_FIFO
○ SCHED_RR, 100ms timeslice by default.
● Implemented with Linked lists.
● Used for short latency sensitive tasks, eg. IRQ threads.
ENGINEERS AND DEVICES
WORKING TOGETHER
Sched class: CFS (Completely fair scheduler)
● Introduced by Ingo Molnar (motivated by Rotating Staircase Deadline
Scheduler by Con Kolivas).
● Scheduling policies:
○ SCHED_NORMAL: Normal Unix tasks
○ SCHED_BATCH: Batch (non-interactive) tasks
○ SCHED_IDLE: Low priority tasks
● Implemented with red-black tree (self balancing).
● Tracks Virtual runtime of tasks (amount of time task has run).
● Tasks with shortest vruntime runs first.
● Priority is used to set task’s weight, that affects vruntime.
● Higher the weight, slower will vruntime increase.
● Task’s priority is calculated by 120 + nice (-20 to +19).
● Used for all other system tasks, eg: shell.
ENGINEERS AND DEVICES
WORKING TOGETHER
Sched class: Idle
● Lowest priority scheduling class.
● No scheduling policies.
● One kernel thread (idle) per CPU: “swapper/N”.
● Idle thread runs only when nothing else is runnable on a CPU.
● Idle thread may take the CPU to low power state.
ENGINEERS AND DEVICES
WORKING TOGETHER
Runqueue
● Each CPU has an instance of “struct rq”.
● Each “rq” has DL, RT and CFS runqueues within it.
● Runnable tasks are enqueued on those runqueues.
● Lots of other information, stats are available in struct rq.
struct rq {
…
struct cfs_rq cfs;
struct rt_rq rt;
struct dl_rq dl;
...
}
ENGINEERS AND DEVICES
WORKING TOGETHER
Runqueue (Cont.)
STOP
CFS
IDLE
DEADLINE
REALTIME
RQ CPU 0
STOP
CFS
IDLE
DEADLINE
REALTIME
RQ CPU 1
Thank You
#SFO17
BUD17 keynotes and videos on: connect.linaro.org
For further information: www.linaro.org

Más contenido relacionado

La actualidad más candente

U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoC
Macpaul Lin
 

La actualidad más candente (20)

BusyBox for Embedded Linux
BusyBox for Embedded LinuxBusyBox for Embedded Linux
BusyBox for Embedded Linux
 
kdump: usage and_internals
kdump: usage and_internalskdump: usage and_internals
kdump: usage and_internals
 
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common KernelLAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
LAS16-105: Walkthrough of the EAS kernel adaptation to the Android Common Kernel
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
Linux scheduler
Linux schedulerLinux scheduler
Linux scheduler
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
WALT vs PELT : Redux - SFO17-307
WALT vs PELT : Redux  - SFO17-307WALT vs PELT : Redux  - SFO17-307
WALT vs PELT : Redux - SFO17-307
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
SFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress UpdateSFO15-302: Energy Aware Scheduling: Progress Update
SFO15-302: Energy Aware Scheduling: Progress Update
 
Linux Preempt-RT Internals
Linux Preempt-RT InternalsLinux Preempt-RT Internals
Linux Preempt-RT Internals
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit Architecture
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
A practical guide to buildroot
A practical guide to buildrootA practical guide to buildroot
A practical guide to buildroot
 
Embedded Linux Kernel - Build your custom kernel
Embedded Linux Kernel - Build your custom kernelEmbedded Linux Kernel - Build your custom kernel
Embedded Linux Kernel - Build your custom kernel
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
 
Linux Serial Driver
Linux Serial DriverLinux Serial Driver
Linux Serial Driver
 
Linux kernel
Linux kernelLinux kernel
Linux kernel
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with schedulerLCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoC
 

Similar a The Linux Kernel Scheduler (For Beginners) - SFO17-421

Similar a The Linux Kernel Scheduler (For Beginners) - SFO17-421 (20)

Linux Scheduler Latest_ viresh Kumar.pdf
Linux Scheduler Latest_ viresh Kumar.pdfLinux Scheduler Latest_ viresh Kumar.pdf
Linux Scheduler Latest_ viresh Kumar.pdf
 
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
 
Scheduling in Android
Scheduling in AndroidScheduling in Android
Scheduling in Android
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
multiprocessor real_ time scheduling.ppt
multiprocessor real_ time scheduling.pptmultiprocessor real_ time scheduling.ppt
multiprocessor real_ time scheduling.ppt
 
Os2
Os2Os2
Os2
 
Job Queues Overview
Job Queues OverviewJob Queues Overview
Job Queues Overview
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way
 
RTAI - Earliest Deadline First
RTAI - Earliest Deadline FirstRTAI - Earliest Deadline First
RTAI - Earliest Deadline First
 
An End to Order
An End to OrderAn End to Order
An End to Order
 
Linux Internals - Part II
Linux Internals - Part IILinux Internals - Part II
Linux Internals - Part II
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
Gluster dev session #6 understanding gluster's network communication layer
Gluster dev session #6  understanding gluster's network   communication layerGluster dev session #6  understanding gluster's network   communication layer
Gluster dev session #6 understanding gluster's network communication layer
 
RTX Kernal
RTX KernalRTX Kernal
RTX Kernal
 
05_Performance Optimization.pdf
05_Performance Optimization.pdf05_Performance Optimization.pdf
05_Performance Optimization.pdf
 
Real Time System
Real Time SystemReal Time System
Real Time System
 
An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)
 
Rt linux-lab1
Rt linux-lab1Rt linux-lab1
Rt linux-lab1
 

Más de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 

Más de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Último

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Último (20)

AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 

The Linux Kernel Scheduler (For Beginners) - SFO17-421

  • 1. SFO17- 421 The Linux Kernel Scheduler Viresh Kumar (PMWG)
  • 2. ENGINEERS AND DEVICES WORKING TOGETHER Topics ● CPU Scheduler ● The O(1) scheduler ● Current scheduler design ● Scheduling classes ● schedule() ● Scheduling classes and policies ● Sched class: STOP ● Sched class: Deadline ● Sched class: Realtime ● Sched class: CFS ● Sched class: Idle ● Runqueue
  • 3. ENGINEERS AND DEVICES WORKING TOGETHER CPU Scheduler ● Shares CPU between tasks. ● Selects next task to run (on each CPU) when: ○ Running task terminates ○ Running task sleeps (waiting for an event) ○ A new task created or sleeping task wakes up ○ Running task’s timeslice is over ● Goals: ○ Be *fair* ○ Timeslice based on task priority ○ Low task response time ○ High (task completion) throughput ○ Balance load among CPUs ○ Power efficient ○ Low overhead of running scheduler’s code ● Work with mainframes, servers, desktops/laptops, embedded/phones.
  • 4. ENGINEERS AND DEVICES WORKING TOGETHER The O(1) scheduler ● 140 priorities. 0-99: RT tasks and 100-139: User tasks. ● Each CPU’s runqueue had 2 arrays: Active and Expired. ● Each arrays had 140 entries, one per priority level. ● Each entry was a linked list serviced in FIFO order. ● Bitmap (of 140 bits) used to find which priority lists aren’t empty. ● Timeslices were assigned to tasks based on these priorities. ● Expired tasks moved from Active to Passive array. ● Once Active array was empty, the arrays were swapped. ● Enqueue and dequeue of tasks and next task selection done in constant time. ● Replaced by CFS in Linux 2.6.23 (2007).
  • 5. ENGINEERS AND DEVICES WORKING TOGETHER The O(1) scheduler (Cont..) Priority 0 Priority 1 Priority 139 ... Priority 99 ... CPU Active array Priority 0 Priority 1 Priority 139 ... Priority 99 ... CPU Expired array Real Time task priorities User task priorities
  • 6. ENGINEERS AND DEVICES WORKING TOGETHER Current scheduler design ● Introduced by Ingo Molnar in 2.6.23 (2007). ● Scheduling policies within scheduling classes. ● Scheduling class with higher priority served first. ● Task can migrate between CPUs, scheduling policies and scheduling classes.
  • 7. ENGINEERS AND DEVICES WORKING TOGETHER Scheduling Classes ● Represented by following structure: struct sched_class { const struct sched_class *next; void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags); void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags); struct task_struct * (*pick_next_task) (struct rq *rq, struct task_struct *prev, struct rq_flags *rf); … }; STOP next DL RT CFS IDLE nullnext next next next
  • 8. ENGINEERS AND DEVICES WORKING TOGETHER Schedule() ● Picks the next runnable task to run on a CPU. ● Searches for a task starting from the highest priority class (stop). ● Helper: for_each_class(). ● pick_next_task(): again: for_each_class(class) { p = class->pick_next_task(rq, prev, rf); if (p) { if (unlikely(p == RETRY_TASK)) goto again; return p; } } /* The idle class should always have a runnable task: */ BUG();
  • 9. ENGINEERS AND DEVICES WORKING TOGETHER Scheduling classes and policies ● Stop ○ No policy ● Deadline ○ SCHED_DEADLINE ● Real Time ○ SCHED_FIFO ○ SCHED_RR ● Fair ○ SCHED_NORMAL ○ SCHED_BATCH ○ SCHED_IDLE ● Idle ○ No policy
  • 10. ENGINEERS AND DEVICES WORKING TOGETHER Sched class: STOP ● Highest priority class. ● Only available for SMP (stop_machine() isn’t used in UP). ● Can preempt everything and is preempted by nothing. ● Mechanism to stop running everything else and run a specific function on CPU(s). ● No scheduling policies. ● One kernel thread per CPU: “migration/N”. ● Used by task migration, CPU hotplug, RCU, ftrace, clockevents, etc.
  • 11. ENGINEERS AND DEVICES WORKING TOGETHER Sched class: Deadline (DL) ● Introduced by Dario Faggioli & Juri Lelli, v3.14 (2013). ● Highest priority tasks in the system. ● Scheduling policy: SCHED_DEADLINE. ● Implemented with red-black tree (self balancing). ● Used for periodic real time tasks, eg. Video encoding/decoding.
  • 12. ENGINEERS AND DEVICES WORKING TOGETHER Sched class: Realtime (RT) ● POSIX “real-time” tasks. ● Task priorities: 0 - 99. ● Inverted priorities: 0 highest in kernel, lowest in user space. ● Scheduling policies for tasks at same priority: ○ SCHED_FIFO ○ SCHED_RR, 100ms timeslice by default. ● Implemented with Linked lists. ● Used for short latency sensitive tasks, eg. IRQ threads.
  • 13. ENGINEERS AND DEVICES WORKING TOGETHER Sched class: CFS (Completely fair scheduler) ● Introduced by Ingo Molnar (motivated by Rotating Staircase Deadline Scheduler by Con Kolivas). ● Scheduling policies: ○ SCHED_NORMAL: Normal Unix tasks ○ SCHED_BATCH: Batch (non-interactive) tasks ○ SCHED_IDLE: Low priority tasks ● Implemented with red-black tree (self balancing). ● Tracks Virtual runtime of tasks (amount of time task has run). ● Tasks with shortest vruntime runs first. ● Priority is used to set task’s weight, that affects vruntime. ● Higher the weight, slower will vruntime increase. ● Task’s priority is calculated by 120 + nice (-20 to +19). ● Used for all other system tasks, eg: shell.
  • 14. ENGINEERS AND DEVICES WORKING TOGETHER Sched class: Idle ● Lowest priority scheduling class. ● No scheduling policies. ● One kernel thread (idle) per CPU: “swapper/N”. ● Idle thread runs only when nothing else is runnable on a CPU. ● Idle thread may take the CPU to low power state.
  • 15. ENGINEERS AND DEVICES WORKING TOGETHER Runqueue ● Each CPU has an instance of “struct rq”. ● Each “rq” has DL, RT and CFS runqueues within it. ● Runnable tasks are enqueued on those runqueues. ● Lots of other information, stats are available in struct rq. struct rq { … struct cfs_rq cfs; struct rt_rq rt; struct dl_rq dl; ... }
  • 16. ENGINEERS AND DEVICES WORKING TOGETHER Runqueue (Cont.) STOP CFS IDLE DEADLINE REALTIME RQ CPU 0 STOP CFS IDLE DEADLINE REALTIME RQ CPU 1
  • 17. Thank You #SFO17 BUD17 keynotes and videos on: connect.linaro.org For further information: www.linaro.org