SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Wed 5 March, 11:15am, Daniel Lezcano, Mike Turquette
LCA14-306: CPUidle & CPUfreq
integration with scheduler
Introduction
● Power aware discussion
● Patchset « Small task packing »
− Some informations shared between cpuidle and the
scheduler
− https://lwn.net/Articles/520857/
● « Line on the sand » by Ingo Molnar
− Integrate first cpuidle and cpufreq with the scheduler
− http://lwn.net/Articles/552885/
Scheduler CPUidle
Idle task
Governor
CPUidle backend
driver
cpuidle_idle_callswitch_to
cpuidle_select cpuidle_enter
CPUidle + scheduler : Current design
Idle time measurement
● From the scheduler :
− The duration of the idle task is running
− Includes the interrupt processing time
● From CPUidle :
− The duration between interrupts
● CPUIdle code happens with local interrupts disabled
● T(idle task) = Σ T(CPUidle) + Σ T(irqs)
Idle time measurement
Idle time measurement unification
● What is the impact of returning to the
scheduler each time an interrupt occurred ?
− Scheduler will choose the idle task again if nothing
to do
− Mainloop code simplified
− Idle time measured nearly the same for the
scheduler and cpuidle
− Probably a negative impact on performance to fix
Load balance
● Taking the decision to balance a task when
going to idle
■ Use of avg_idle
● Does not use how long the cpu will sleep
■ The idle state should be selected before
■ CPUIdle should give the state the cpu will be
● Balance a task to the idlest cpu
■ Does not use the cpu's exit latency
■ CPUidle should give back the state the cpu is
CPUidle main function
● Reduce the distance between the scheduler
and the cpuidle framework
− Move the idle task to kernel/sched
− Move the cpuidle_idle function in the idle task code
− Integrate the idle mainloop and cpuidle_idle_call
● Allows to access the scheduler's private
structure definition
Menu governor split
● The events could be classified in three
categories :
1. Predictable → timers
2. Repetitive → IOs
3. Random → key stroke, incoming packet
● Category 2 could be integrated into the
scheduler
IO latency tracking
● IO are repetitive within a reasonable interval to
assume it as predictable enough
IO latency tracking
● Measurement from the scheduler
− io_schedule
− io_schedule_timeout
● Count per task the io latency
− Task migration moves IO history unlike current
governor
− Latency constraint for the task
Combine informations
● Move predictable event framework in the
scheduler
● Informations combined between the scheduler
and menu governor will be more accurate
− Idle balance decision based on the idle state a cpu
is or about to enter
− Load tracking from task for idle state exit latency
− CPU computation power and topology
− DVFS strategies for exit idle state boost
Scheduler + CPUidle
● The scheduler should have all the informations
to tell CPUidle :
− How long it will sleep
− What is the latency constraint
● The CPUidle should use the information
provided by the scheduler :
− Select an idle state
− Use the backend driver idle callback
− No more heuristics
Status
● A lot of cleanups around the idle mainloop
● CPUidle main function inside the idle mainloop
− Code distance reduced, sharing the structures
scheduler/cpuidle
− Communication between sub-systems made easier
Work in progress
● First iteration of IO latency tracking
implemented
− Validation in progress
● Simple governor for CPUIdle
− Select a state
● Idle time unification experimentation
CPUfreq + scheduler
The title is misleading … CPUfreq may completely
disappear in the future.
CPUfreq + scheduler
The title is misleading … CPUfreq may completely
disappear in the future.
Goal is to initiate CPU dynamic voltage & frequency
scaling (DVFS) from the Linux scheduler
CPUfreq + scheduler
The title is misleading … CPUfreq may completely
disappear in the future.
Goal is to initiate CPU dynamic voltage & frequency
scaling (DVFS) from the Linux scheduler
Nobody knows what this will look like, so please ask
questions and raise suggestions
• Polling workqueue
• E.g. ondemand
• Based on idle time / busyness
• No relation to decisions taken by the scheduler
• Task may be run at any time
• No relation to idle task
• In fact, task will not wake-up during idle
CPUfreq today
• Replace polling loop with event driven action
• Scheduler already takes action which affects available
compute capacity
• Load balance
• Migrating tasks to and from CPUs of different compute capacity
• DVFS transitions are a natural fit
Event driven behavior
• Method to initiate CPU DVFS transitions from the
scheduler
• Identify call sites to initiate those transitions
• Enqueue/dequeue task
• Load balance
• Idle entry/exit
• Aggressively schedule deadline tasks
• Maybe others
• Define interface between the scheduler & the DVFS
thingy
• Currently a power driver in Morten’s RFC
• Remove CPUfreq governor layer from the power driver completely?
Lots of work ahead
• Experiment with policy
• When and where to evaluate if frequency should be changed
• What metrics are important to the algorithm?
• DVFS versus race-to-idle
• Integrate with power model
• Benchmark performance & power
• Performance regressions
• Does it save power?
• Make it work with non-CPUfreq things like PSCI and
ACPI for changing CPU P-state
Lots of work ahead, part 2
• https://lkml.org/lkml/2013/10/11/547
• Replaces polling loop in CPUfreq governor with
scheduler event-driven action
• CPUfreq machine drivers are re-used initially
• CPUfreq governor becomes a shim layer to the power
driver
Morten’s power aware scheduling RFC
• DVFS task is itself scheduled on a workqueue
• Might not be run for some time after the scheduler determines that a
DVFS transition should happen
• Kworker threads are filtered out
• Prevents infinite reentrancy into the scheduler
• CPU capacity is not changed when enqueuing and dequeuing these
tasks
Nitty gritty details
include/linux/sched/power.h
struct power_driver {
/*
* Power driver calls may happen from scheduler context with irq
* disabled and rq locks held. This must be taken into account in
* the power driver.
*/
/* cpu already at max capacity? */
int (*at_max_capacity) (int cpu);
/* Increase cpu capacity hint */
int (*go_faster) (int cpu, int hint);
/* Decrease cpu capacity hint */
int (*go_slower) (int cpu, int hint);
/* Best cpu to wake up */
int (*best_wake_cpu) (void);
/* Scheduler call-back without rq lock held and with irq enabled */
void (*late_callback) (int cpu);
};
• https://github.com/mturquette/linux/commits/sched-cpufreq
• Replaced workqueue method with per-CPU kthread
• This allows removal of the kworker filter
• Please commence bikeshedding over the name of this kthread
• Use SCHED_FIFO policy for the task
• Will be run before the normal work (right?)
• These patches were just validated yesterday
• Bugs
• Holes in logic
• Misunderstandings
• Voided warranties
Incremental changes on top
• Gather more opinions on the power driver interface
• Is go_faster/go_slower the right way?
• Spoiler alert: Probably not.
• When else might we want to evaluate CPU frequency?
• Idle entry/exit as mentioned by Daniel
• Cluster-level considerations
• Sched domains
• Not just per-core
• Four Cortex-A9’s with single CPU clock
• Coordinate with the power model work
What’s next?
Questions?
More about Linaro Connect: http://connect.linaro.org
More about Linaro: http://www.linaro.org/about/
More about Linaro engineering: http://www.linaro.org/engineering/
Linaro members: www.linaro.org/members

Más contenido relacionado

La actualidad más candente

Process Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux KernelProcess Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux KernelHaifeng Li
 
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)Linaro
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)Brendan Gregg
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdfAdrian Huang
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EASLinaro
 
DMA Survival Guide
DMA Survival GuideDMA Survival Guide
DMA Survival GuideKernel TLV
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux InterruptsKernel TLV
 
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyViller Hsiao
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingScyllaDB
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumScyllaDB
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringGeorg Schönberger
 
Linux Kernel Module - For NLKB
Linux Kernel Module - For NLKBLinux Kernel Module - For NLKB
Linux Kernel Module - For NLKBshimosawa
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
 
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...The Linux Foundation
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFBrendan Gregg
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingBrendan Gregg
 

La actualidad más candente (20)

Process Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux KernelProcess Scheduler and Balancer in Linux Kernel
Process Scheduler and Balancer in Linux Kernel
 
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EAS
 
DMA Survival Guide
DMA Survival GuideDMA Survival Guide
DMA Survival Guide
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux Interrupts
 
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrency
 
New Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using TracingNew Ways to Find Latency in Linux Using Tracing
New Ways to Find Latency in Linux Using Tracing
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
Linux Kernel Module - For NLKB
Linux Kernel Module - For NLKBLinux Kernel Module - For NLKB
Linux Kernel Module - For NLKB
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
 
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPFVelocity 2017 Performance analysis superpowers with Linux eBPF
Velocity 2017 Performance analysis superpowers with Linux eBPF
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
 
Linux scheduler
Linux schedulerLinux scheduler
Linux scheduler
 

Destacado

LCA13: Who Disturbs My Slumber
LCA13: Who Disturbs My SlumberLCA13: Who Disturbs My Slumber
LCA13: Who Disturbs My SlumberLinaro
 
BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement Linaro
 
Q2.12: Scheduler Inputs
Q2.12: Scheduler InputsQ2.12: Scheduler Inputs
Q2.12: Scheduler InputsLinaro
 
LCE12: big.LITTLE TC2 update
LCE12: big.LITTLE TC2 updateLCE12: big.LITTLE TC2 update
LCE12: big.LITTLE TC2 updateLinaro
 
LCE12: big.LITTLE Mini-Summit
LCE12: big.LITTLE Mini-SummitLCE12: big.LITTLE Mini-Summit
LCE12: big.LITTLE Mini-SummitLinaro
 
LAS16-400K2: TianoCore – Open Source UEFI Community Update
LAS16-400K2: TianoCore – Open Source UEFI Community UpdateLAS16-400K2: TianoCore – Open Source UEFI Community Update
LAS16-400K2: TianoCore – Open Source UEFI Community UpdateLinaro
 
LCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 PlenaryLCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 PlenaryLinaro
 
Q4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresQ4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresLinaro
 

Destacado (8)

LCA13: Who Disturbs My Slumber
LCA13: Who Disturbs My SlumberLCA13: Who Disturbs My Slumber
LCA13: Who Disturbs My Slumber
 
BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement BUD17-218: Scheduler Load tracking update and improvement
BUD17-218: Scheduler Load tracking update and improvement
 
Q2.12: Scheduler Inputs
Q2.12: Scheduler InputsQ2.12: Scheduler Inputs
Q2.12: Scheduler Inputs
 
LCE12: big.LITTLE TC2 update
LCE12: big.LITTLE TC2 updateLCE12: big.LITTLE TC2 update
LCE12: big.LITTLE TC2 update
 
LCE12: big.LITTLE Mini-Summit
LCE12: big.LITTLE Mini-SummitLCE12: big.LITTLE Mini-Summit
LCE12: big.LITTLE Mini-Summit
 
LAS16-400K2: TianoCore – Open Source UEFI Community Update
LAS16-400K2: TianoCore – Open Source UEFI Community UpdateLAS16-400K2: TianoCore – Open Source UEFI Community Update
LAS16-400K2: TianoCore – Open Source UEFI Community Update
 
LCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 PlenaryLCE12: LCE12 ARMv8 Plenary
LCE12: LCE12 ARMv8 Plenary
 
Q4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad coresQ4.11: Sched_mc on dual / quad cores
Q4.11: Sched_mc on dual / quad cores
 

Similar a Integrating CPU idle, frequency scaling with Linux scheduler

Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxProfMonikaJain
 
Container Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeContainer Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeDocker, Inc.
 
Operating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - SchedulingOperating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - SchedulingPeter Tröger
 
Process scheduling
Process schedulingProcess scheduling
Process schedulingHao-Ran Liu
 
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.The Linux Foundation
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIURohit Jnagal
 
Lecture- 2_Process Management.pdf
Lecture- 2_Process Management.pdfLecture- 2_Process Management.pdf
Lecture- 2_Process Management.pdfHarika Pudugosula
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...ScyllaDB
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsScyllaDB
 
operating system (1).pdf
operating system (1).pdfoperating system (1).pdf
operating system (1).pdfAliyanAbbas1
 
LF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK17_Integrating and using DPDK with Open vSwitchLF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK17_Integrating and using DPDK with Open vSwitchLF_DPDK
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology義洋 顏
 
QCon 2015 Broken Performance Tools
QCon 2015 Broken Performance ToolsQCon 2015 Broken Performance Tools
QCon 2015 Broken Performance ToolsBrendan Gregg
 
Scheduling in Android
Scheduling in AndroidScheduling in Android
Scheduling in AndroidOpersys inc.
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayLaurent Bernaille
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Noam Elfanbaum
 

Similar a Integrating CPU idle, frequency scaling with Linux scheduler (20)

Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
 
Container Orchestration from Theory to Practice
Container Orchestration from Theory to PracticeContainer Orchestration from Theory to Practice
Container Orchestration from Theory to Practice
 
Operating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - SchedulingOperating Systems 1 (10/12) - Scheduling
Operating Systems 1 (10/12) - Scheduling
 
Process scheduling
Process schedulingProcess scheduling
Process scheduling
 
Section05 scheduling
Section05 schedulingSection05 scheduling
Section05 scheduling
 
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
XPDDS18: Real Time in XEN on ARM - Andrii Anisov, EPAM Systems Inc.
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
Task migration using CRIU
Task migration using CRIUTask migration using CRIU
Task migration using CRIU
 
Lecture- 2_Process Management.pdf
Lecture- 2_Process Management.pdfLecture- 2_Process Management.pdf
Lecture- 2_Process Management.pdf
 
Ch6 cpu scheduling
Ch6   cpu schedulingCh6   cpu scheduling
Ch6 cpu scheduling
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
 
operating system (1).pdf
operating system (1).pdfoperating system (1).pdf
operating system (1).pdf
 
LF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK17_Integrating and using DPDK with Open vSwitchLF_DPDK17_Integrating and using DPDK with Open vSwitch
LF_DPDK17_Integrating and using DPDK with Open vSwitch
 
Introduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technologyIntroduction to ARM big.LITTLE technology
Introduction to ARM big.LITTLE technology
 
QCon 2015 Broken Performance Tools
QCon 2015 Broken Performance ToolsQCon 2015 Broken Performance Tools
QCon 2015 Broken Performance Tools
 
Scheduling in Android
Scheduling in AndroidScheduling in Android
Scheduling in Android
 
Os2
Os2Os2
Os2
 
Kubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard wayKubernetes at Datadog the very hard way
Kubernetes at Datadog the very hard way
 
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
Bootstrapping a ML platform at Bluevine [Airflow Summit 2020]
 

Más de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

Más de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Integrating CPU idle, frequency scaling with Linux scheduler

  • 1. Wed 5 March, 11:15am, Daniel Lezcano, Mike Turquette LCA14-306: CPUidle & CPUfreq integration with scheduler
  • 2. Introduction ● Power aware discussion ● Patchset « Small task packing » − Some informations shared between cpuidle and the scheduler − https://lwn.net/Articles/520857/ ● « Line on the sand » by Ingo Molnar − Integrate first cpuidle and cpufreq with the scheduler − http://lwn.net/Articles/552885/
  • 3. Scheduler CPUidle Idle task Governor CPUidle backend driver cpuidle_idle_callswitch_to cpuidle_select cpuidle_enter CPUidle + scheduler : Current design
  • 4. Idle time measurement ● From the scheduler : − The duration of the idle task is running − Includes the interrupt processing time ● From CPUidle : − The duration between interrupts ● CPUIdle code happens with local interrupts disabled ● T(idle task) = Σ T(CPUidle) + Σ T(irqs)
  • 6. Idle time measurement unification ● What is the impact of returning to the scheduler each time an interrupt occurred ? − Scheduler will choose the idle task again if nothing to do − Mainloop code simplified − Idle time measured nearly the same for the scheduler and cpuidle − Probably a negative impact on performance to fix
  • 7. Load balance ● Taking the decision to balance a task when going to idle ■ Use of avg_idle ● Does not use how long the cpu will sleep ■ The idle state should be selected before ■ CPUIdle should give the state the cpu will be ● Balance a task to the idlest cpu ■ Does not use the cpu's exit latency ■ CPUidle should give back the state the cpu is
  • 8. CPUidle main function ● Reduce the distance between the scheduler and the cpuidle framework − Move the idle task to kernel/sched − Move the cpuidle_idle function in the idle task code − Integrate the idle mainloop and cpuidle_idle_call ● Allows to access the scheduler's private structure definition
  • 9. Menu governor split ● The events could be classified in three categories : 1. Predictable → timers 2. Repetitive → IOs 3. Random → key stroke, incoming packet ● Category 2 could be integrated into the scheduler
  • 10. IO latency tracking ● IO are repetitive within a reasonable interval to assume it as predictable enough
  • 11. IO latency tracking ● Measurement from the scheduler − io_schedule − io_schedule_timeout ● Count per task the io latency − Task migration moves IO history unlike current governor − Latency constraint for the task
  • 12. Combine informations ● Move predictable event framework in the scheduler ● Informations combined between the scheduler and menu governor will be more accurate − Idle balance decision based on the idle state a cpu is or about to enter − Load tracking from task for idle state exit latency − CPU computation power and topology − DVFS strategies for exit idle state boost
  • 13. Scheduler + CPUidle ● The scheduler should have all the informations to tell CPUidle : − How long it will sleep − What is the latency constraint ● The CPUidle should use the information provided by the scheduler : − Select an idle state − Use the backend driver idle callback − No more heuristics
  • 14. Status ● A lot of cleanups around the idle mainloop ● CPUidle main function inside the idle mainloop − Code distance reduced, sharing the structures scheduler/cpuidle − Communication between sub-systems made easier
  • 15. Work in progress ● First iteration of IO latency tracking implemented − Validation in progress ● Simple governor for CPUIdle − Select a state ● Idle time unification experimentation
  • 16. CPUfreq + scheduler The title is misleading … CPUfreq may completely disappear in the future.
  • 17. CPUfreq + scheduler The title is misleading … CPUfreq may completely disappear in the future. Goal is to initiate CPU dynamic voltage & frequency scaling (DVFS) from the Linux scheduler
  • 18. CPUfreq + scheduler The title is misleading … CPUfreq may completely disappear in the future. Goal is to initiate CPU dynamic voltage & frequency scaling (DVFS) from the Linux scheduler Nobody knows what this will look like, so please ask questions and raise suggestions
  • 19. • Polling workqueue • E.g. ondemand • Based on idle time / busyness • No relation to decisions taken by the scheduler • Task may be run at any time • No relation to idle task • In fact, task will not wake-up during idle CPUfreq today
  • 20. • Replace polling loop with event driven action • Scheduler already takes action which affects available compute capacity • Load balance • Migrating tasks to and from CPUs of different compute capacity • DVFS transitions are a natural fit Event driven behavior
  • 21. • Method to initiate CPU DVFS transitions from the scheduler • Identify call sites to initiate those transitions • Enqueue/dequeue task • Load balance • Idle entry/exit • Aggressively schedule deadline tasks • Maybe others • Define interface between the scheduler & the DVFS thingy • Currently a power driver in Morten’s RFC • Remove CPUfreq governor layer from the power driver completely? Lots of work ahead
  • 22. • Experiment with policy • When and where to evaluate if frequency should be changed • What metrics are important to the algorithm? • DVFS versus race-to-idle • Integrate with power model • Benchmark performance & power • Performance regressions • Does it save power? • Make it work with non-CPUfreq things like PSCI and ACPI for changing CPU P-state Lots of work ahead, part 2
  • 23. • https://lkml.org/lkml/2013/10/11/547 • Replaces polling loop in CPUfreq governor with scheduler event-driven action • CPUfreq machine drivers are re-used initially • CPUfreq governor becomes a shim layer to the power driver Morten’s power aware scheduling RFC
  • 24. • DVFS task is itself scheduled on a workqueue • Might not be run for some time after the scheduler determines that a DVFS transition should happen • Kworker threads are filtered out • Prevents infinite reentrancy into the scheduler • CPU capacity is not changed when enqueuing and dequeuing these tasks Nitty gritty details
  • 25. include/linux/sched/power.h struct power_driver { /* * Power driver calls may happen from scheduler context with irq * disabled and rq locks held. This must be taken into account in * the power driver. */ /* cpu already at max capacity? */ int (*at_max_capacity) (int cpu); /* Increase cpu capacity hint */ int (*go_faster) (int cpu, int hint); /* Decrease cpu capacity hint */ int (*go_slower) (int cpu, int hint); /* Best cpu to wake up */ int (*best_wake_cpu) (void); /* Scheduler call-back without rq lock held and with irq enabled */ void (*late_callback) (int cpu); };
  • 26. • https://github.com/mturquette/linux/commits/sched-cpufreq • Replaced workqueue method with per-CPU kthread • This allows removal of the kworker filter • Please commence bikeshedding over the name of this kthread • Use SCHED_FIFO policy for the task • Will be run before the normal work (right?) • These patches were just validated yesterday • Bugs • Holes in logic • Misunderstandings • Voided warranties Incremental changes on top
  • 27. • Gather more opinions on the power driver interface • Is go_faster/go_slower the right way? • Spoiler alert: Probably not. • When else might we want to evaluate CPU frequency? • Idle entry/exit as mentioned by Daniel • Cluster-level considerations • Sched domains • Not just per-core • Four Cortex-A9’s with single CPU clock • Coordinate with the power model work What’s next?
  • 29. More about Linaro Connect: http://connect.linaro.org More about Linaro: http://www.linaro.org/about/ More about Linaro engineering: http://www.linaro.org/engineering/ Linaro members: www.linaro.org/members