SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
BFQ, fairness and low latency in block I/O
The Two Towers
Paolo Valente
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Contents
● Why BFQ?
● What happened since the last episode?
● What is still to come?
○ The two Towers
● How is it going in general with latency and fairness?
ENGINEERS
AND DEVICES
WORKING
TOGETHER
What are we talking about?
● I/O Scheduling: deciding the order in which I/O requests
are to be served
○ Especially if several processes compete for the same storage device
● Order chosen so as to guarantee:
○ High I/O throughput
○ Low latency
○ High responsiveness
○ Fairness
● I/O scheduling is performed
by ad-hoc components in the
block layer
○ (legacy) blk and blk-mq
ENGINEERS
AND DEVICES
WORKING
TOGETHER
blk-mq and I/O schedulers
I/O scheduling
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Why BFQ
● In blk-mq, there is now a new I/O Scheduler: BFQ
● Why yet another scheduler?
● Recap with our last demo on a HiKey
○ https://youtu.be/ANfqNiJVoVE
ENGINEERS
AND DEVICES
WORKING
TOGETHER
BFQ upstreamed in April 2017
● After several years
● Merged from 4.12, end of April
● In blk-mq
● What happened since?
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Progress summary
● 34 Linux commits upstreamed (16 mine)
● 5 patches waiting for approval (4 ours, see next item)
● Task force of four students set up
● One new module and one new test upstreamed for the
Phoronix Test Suite
● First third-party evaluation of BFQ benefits finally available
○ On Phoronix
● BFQ resubmitted for android-hikey-linaro-4.9 too
● Several kernel developers, plus some enthusiastic users,
have contributed by testing, reviewing or submitting
patches
● Let’s see in detail
ENGINEERS
AND DEVICES
WORKING
TOGETHER
BFQ task force
Luca Miccio
Angelo Ruocco
Mirko Montanari
Daniele Toschi
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Upstreaming finalization
● 25 of the commits (9 mine) were devoted to finalization
and maintenance
○ Mainly bug fixing
● Plus a patch to disable writeback throttling also with BFQ
○ Made by Luca Miccio
○ First patch for mainline from a student of mine!
● One of the bugs was caused by a general issue between
blk-mq I/O schedulers and cgroups
○ My fix provided an example of how to address this issue in any
scheduler
● Only one of the bugs reported still pending
○ Yet not reported/reproduced any more
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Other contributions in a moment
● The other commits and work-in-progress patches are
related to the near future of BFQ
● We postpone their description to the part of this
presentation on the next challenges
ENGINEERS
AND DEVICES
WORKING
TOGETHER
BFQ for Android
● BFQ won’t be available in Android for a long time
○ BFQ is available only from Linux 4.12, in blk-mq
○ It will take time till Android moves to Linux 4.12
● To fill this gap, we prepared and submitted a legacy blk
version of BFQ for android-hikey-linaro-4.9
● It might be a first step to let BFQ enter Google AOSP
● But such an upstreaming is unlikely to be easy
○ Google Android crew feel that possible regressions might be more
relevant than possible benefits
● This resistance is probably related to the following
problem
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Current vision of BFQ 1/2
● Best case
○ “BFQ loads programs extremely faster” (only case backed up by
numbers)
○ “... BFQ on a multi-disk server has made it much more responsive.”
○ “BFQ has greatly helped to have a responsive system during [heavy
background I/O] and … never … any interruption of the video
streams”
○ “does seem to make using spinning rust a bit more pleasant” [Mike
Galbraith]
○ “It seems to have cured an interactivity issue … Can't really say for
sure given this is not based on measurement” [Mike Galbraith]
● Worst case
○ “Got anything besides anecdotal evidence?”
○ “BFQ like BFS seems like software based on snake oil placebo
effects instead of any real world performance gains”
○ “Do you have independent benchmarks to back this up? I have yet to
see any speedups or reduction in latency by using BFQ over CFQ”
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Current idea of BFQ 2/2
● Undecided case
○ “And thanks CK for your work.” [email to me from an enthusiastic
user ]
● Opinion most frequently expressed by kernel developers
○ “” (empty string)
● So, is BFQ an I/O scheduler with super powers, or just
snake oil?
● How are so discordant opinions possible?
ENGINEERS
AND DEVICES
WORKING
TOGETHER
BFQ folklore
● Probably for two main reasons
● People still know little about the actual benefits of BFQ
● For those who know these benefits, they know them
○ just as folklore
○ or because of their impressions (no measurements)
● Repeatable tests are available in my S suite
○ But S is very little popular, if not almost completely unknown
○ Nobody is using S to confirm or deny my public results
● In the end, the actual impact of BFQ is not well-known
ENGINEERS
AND DEVICES
WORKING
TOGETHER
From folklore to sound knowledge
● To address these issues, we tried to have BFQ
○ properly and measurably tested, in terms of its target goals
○ by a third party
○ on a very popular, authoritative benchmark suite
● So, we contributed to the Phoronix Test Suite
○ Module that helps users configure BFQ correctly for the target tests
○ New test, measuring the start-up time of popular applications in the
presence of background workload
● Using these contributions, Michael Larabel already wrote
an article (report) on Phoronix, for a very fast SSD
● A companion article for an HHD is expected soon
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Phoronix results 1/2
● The report highlights latency and throughput issues
● For one of the test cases, latency issues (bad
responsiveness) were caused by a combination of four
subtle bugs
● Bugs already found and fixed with the help of BFQ task
force
○ Four patches prepared and submitted for mainline
○ Angelo Ruocco has co-signed two of the patches
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Phoronix results 2/2
● The other issues were all due to the characteristics of the
storage device
○ BFQ is not yet effective in guaranteeing a high throughput in all
scenarios with very fast, queueing storage devices
○ BFQ is not yet so fast enough (in terms of execution time) to comply
with the request-completion times of modern, very fast devices
● Motivation for the two challenges that BFQ has to face
ENGINEERS
AND DEVICES
WORKING
TOGETHER
The Two Towers
● Improve effectiveness
○ Make BFQ effective in providing both high throughput and strong
service guarantees (latency, fairness) in any working condition
● Improve efficiency
○ Reduce BFQ execution overhead to such an extent that it complies
even with very fast flash-based devices
ENGINEERS
AND DEVICES
WORKING
TOGETHER
First contributions to effectiveness
Three commit series already upstreamed
1. Improve handling of flash-based devices with no
command queueing
○ Significant throughput boost for random I/O
○ On a HiKey board, throughput increased by up to 125%, growing,
e.g., from 6.9MB/s to 15.6MB/s with two or three random readers in
parallel
2. Improve throughput indirectly
○ Added instructions, in official documentation (in kernel tree), on how
to configure BFQ to get max performance depending on target tasks
and devices
3. Improve responsiveness with heavy sync-write workloads
in the background
○ gnome-terminal start-up time decreased from 50 to 8 seconds on an
HDD (against 5 seconds with no background workload)
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Contributions waiting for approval
● The 4-patch series containing fixes for the responsiveness
issues highlighted by one of the test cases considered in
the Phoronix article
ENGINEERS
AND DEVICES
WORKING
TOGETHER
First tower
● Going beyond these initial contributions is rather more
challenging
● In order to provide strong service guarantees, full control
on service order is needed
● Guaranteeing both full control and, at the same time, high
throughput is very hard with modern devices
● Let’s see why
ENGINEERS
AND DEVICES
WORKING
TOGETHER
The synchronous I/O problem 1/2
● Consider a process, with a very high weight, that shares
storage with other very-low-weight processes
● Virtually all the storage device bandwidth must be devoted
to the high-weight process
○ Almost every time the high-weight process has an I/O request, that
request must be served immediately
● Suppose that
○ The high-weight process does synchronous I/O: after each I/O
request, it systematically blocks, waiting for request completion, then
issues a new I/O request
○ The other process always have some pending I/O request
ENGINEERS
AND DEVICES
WORKING
TOGETHER
The synchronous I/O problem 2/2
● What to do when the high-weight process blocks waiting
for the completion of its current I/O request?
● If the I/O scheduler dispatches I/O requests from other
processes, to not leave the device idle
○ A lot of low-weight I/O may then happen to be queued in the device
while the high-weight process is blocked
○ The high-weight process will find the device busy when it finally
issues its new I/O request
○ The new high-weight I/O request may not be served immediately as it
had to be, but only after the already-queued low-weight I/O
● Control on the service order of high-weight and
low-weight I/O may be completely lost
○ Bandwidth control may be completely lost
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Device idling and epic failure
● A simple, clean is solution is just idling the device while
waiting for next I/O request for a high-weight synchronous
process
● Effective solution for devices without internal queueing
● Nefarious for devices with internal queueing
○ Idling the device means making its internal queue empty
○ Severe loss of throughput with a modern device!
● Paradoxical failure for fairness
○ The device can become so underutilized with device
idling that a process gets much less throughput if
treated fairly (with idling) than if treated unfairly
(without idling)!
ENGINEERS
AND DEVICES
WORKING
TOGETHER
So what?
● Working hard for a solution to have the best of both
worlds
○ Or at least enough good from both worlds
● We have some ideas
● Expectedly, it will be about choosing the right trade off …
● The contributions of Adam Manzanares on I/O-request
priorities may come in handy
○ Limited to devices honouring these priorities
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Second tower
● Improving efficiency is the most difficult task
● BFQ is
○ Large (~9K LOC)
○ Complex
○ Designed with efficiency in mind: unlikely to hide trivial inefficiencies
● Anyway, already implementing some
improvements
○ Reducing the average number of steps
for updating the I/O schedule
● Major improvements will involve
○ Non-trivial profiling
○ Non-trivial redesign
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Latency and fairness landscape
● Last part of this presentation
● Overview of the state of affairs with latency and fairness in
storage I/O
○ Ambiguity in the very word latency
○ Unfair fairness
○ The strange case of Android lag
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Latency ambiguity
● Ambiguity on the very word latency
● For technicians, average overall processing time of an I/O
request
○ Measured in the absence of long queueing in the OS
● For users, waiting time for tasks to be accomplished
○ E.g., start an app, get next video frame
○ Worst case much more important than average
○ Problems caused exactly by long queueing!
● A proper full name for user-perceived latency might be
worst-case I/O-task latency
○ Abbrev.: task latency?
● The term lag is already commonly used for task latency if
task is interactive
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Paradoxical failure for fairness
● Enforcing bandwidth control is needed also to guarantee a
low task latency
○ The I/O of the task that must be guaranteed a low latency must be
guaranteed a high fraction of the bandwidth
● Idling is needed to control bandwidth
○ Throughput may then be lost
● A user may be willing to sacrifice throughput when needed
for low task latency
● Things change dramatically for fairness
○ With device idling, a process may get less throughput if treated
fairly than if not!
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Fairness without idling 1/2
● Only throttling currently available
○ Simple
○ Efficient
● Yet rigid and indirect
○ To let a process get its intended share of the throughput, limit the
other processes
○ Need to guess the right limits for the other processes, as a function
of the expected throughput reached by the device
○ But overall throughput varies with the workload
○ And throttling influences workload!
● So, either be conservative to make sure the intended
share is guaranteed
○ Underutilize available bandwidth
● Or play at your own risk!
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Fairness without idling 2/2
● Yet, even in a risky configuration, throughput may
however be lost
○ If some process does I/O below its bandwidth limit,
other throttled processes cannot compensate for the unused
throughput!
● Throttling extensions now available in Linux to reduce this
bandwidth waste
● But with the following shortcomings
○ Very heuristic
○ Effectiveness largely dependent on and variable with workloads
○ Hard to configure so as to improve effectiveness
○ Unlikely to get optimal device utilization with generic, dynamic
workloads
ENGINEERS
AND DEVICES
WORKING
TOGETHER
Waiting for a complete solution ...
● Solving throughput problems of fair-queueing schedulers
like BFQ could finally provide a sound, complete solution
○ No need for complex configurations
○ Seamless redistribution of excess bandwidth
○ Throughput saturation (or quasi saturation) at all times
○ Tight task latencies guaranteed
○ And, of course, fairness
ENGINEERS
AND DEVICES
WORKING
TOGETHER
The strange case of Android lag
● All our experiments show serious lag issues related to I/O
○ E.g., during app updates
● We see the same issues on our phones in daily usage
○ Especially during app updates
○ We did not trace phones 24/7 to completely close the loop
● Users complain publicly
● It seems obvious that there are these lag issues
○ If there is bad or no control on storage access, and some app/service
are in an I/O-bound phase, then some other app/service may wait
long before having the opportunity to use storage
● Yet, Android seems to suffer from some lag in solving this
lag problem
○ Solution held up mainly by the concern that the needed kernel
changes may cause regressions or instability
Thank You
#SFO17
SFO17 keynotes and videos on: connect.linaro.org
For further information: www.linaro.org

Más contenido relacionado

Más de Linaro

Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
Linaro
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
Linaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
Linaro
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
Linaro
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
Linaro
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 Servers
Linaro
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
Linaro
 

Más de Linaro (20)

Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
 
HKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 ServersHKG18-116 - RAS Solutions for Arm64 Servers
HKG18-116 - RAS Solutions for Arm64 Servers
 
HKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with CoresightHKG18-TR14 - Postmortem Debugging with Coresight
HKG18-TR14 - Postmortem Debugging with Coresight
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

BFQ, fairness and low latency in block I/O - The Two Towers - SFO17-302

  • 1. BFQ, fairness and low latency in block I/O The Two Towers Paolo Valente
  • 2. ENGINEERS AND DEVICES WORKING TOGETHER Contents ● Why BFQ? ● What happened since the last episode? ● What is still to come? ○ The two Towers ● How is it going in general with latency and fairness?
  • 3. ENGINEERS AND DEVICES WORKING TOGETHER What are we talking about? ● I/O Scheduling: deciding the order in which I/O requests are to be served ○ Especially if several processes compete for the same storage device ● Order chosen so as to guarantee: ○ High I/O throughput ○ Low latency ○ High responsiveness ○ Fairness ● I/O scheduling is performed by ad-hoc components in the block layer ○ (legacy) blk and blk-mq
  • 5. ENGINEERS AND DEVICES WORKING TOGETHER Why BFQ ● In blk-mq, there is now a new I/O Scheduler: BFQ ● Why yet another scheduler? ● Recap with our last demo on a HiKey ○ https://youtu.be/ANfqNiJVoVE
  • 6. ENGINEERS AND DEVICES WORKING TOGETHER BFQ upstreamed in April 2017 ● After several years ● Merged from 4.12, end of April ● In blk-mq ● What happened since?
  • 7. ENGINEERS AND DEVICES WORKING TOGETHER Progress summary ● 34 Linux commits upstreamed (16 mine) ● 5 patches waiting for approval (4 ours, see next item) ● Task force of four students set up ● One new module and one new test upstreamed for the Phoronix Test Suite ● First third-party evaluation of BFQ benefits finally available ○ On Phoronix ● BFQ resubmitted for android-hikey-linaro-4.9 too ● Several kernel developers, plus some enthusiastic users, have contributed by testing, reviewing or submitting patches ● Let’s see in detail
  • 8. ENGINEERS AND DEVICES WORKING TOGETHER BFQ task force Luca Miccio Angelo Ruocco Mirko Montanari Daniele Toschi
  • 9. ENGINEERS AND DEVICES WORKING TOGETHER Upstreaming finalization ● 25 of the commits (9 mine) were devoted to finalization and maintenance ○ Mainly bug fixing ● Plus a patch to disable writeback throttling also with BFQ ○ Made by Luca Miccio ○ First patch for mainline from a student of mine! ● One of the bugs was caused by a general issue between blk-mq I/O schedulers and cgroups ○ My fix provided an example of how to address this issue in any scheduler ● Only one of the bugs reported still pending ○ Yet not reported/reproduced any more
  • 10. ENGINEERS AND DEVICES WORKING TOGETHER Other contributions in a moment ● The other commits and work-in-progress patches are related to the near future of BFQ ● We postpone their description to the part of this presentation on the next challenges
  • 11. ENGINEERS AND DEVICES WORKING TOGETHER BFQ for Android ● BFQ won’t be available in Android for a long time ○ BFQ is available only from Linux 4.12, in blk-mq ○ It will take time till Android moves to Linux 4.12 ● To fill this gap, we prepared and submitted a legacy blk version of BFQ for android-hikey-linaro-4.9 ● It might be a first step to let BFQ enter Google AOSP ● But such an upstreaming is unlikely to be easy ○ Google Android crew feel that possible regressions might be more relevant than possible benefits ● This resistance is probably related to the following problem
  • 12. ENGINEERS AND DEVICES WORKING TOGETHER Current vision of BFQ 1/2 ● Best case ○ “BFQ loads programs extremely faster” (only case backed up by numbers) ○ “... BFQ on a multi-disk server has made it much more responsive.” ○ “BFQ has greatly helped to have a responsive system during [heavy background I/O] and … never … any interruption of the video streams” ○ “does seem to make using spinning rust a bit more pleasant” [Mike Galbraith] ○ “It seems to have cured an interactivity issue … Can't really say for sure given this is not based on measurement” [Mike Galbraith] ● Worst case ○ “Got anything besides anecdotal evidence?” ○ “BFQ like BFS seems like software based on snake oil placebo effects instead of any real world performance gains” ○ “Do you have independent benchmarks to back this up? I have yet to see any speedups or reduction in latency by using BFQ over CFQ”
  • 13. ENGINEERS AND DEVICES WORKING TOGETHER Current idea of BFQ 2/2 ● Undecided case ○ “And thanks CK for your work.” [email to me from an enthusiastic user ] ● Opinion most frequently expressed by kernel developers ○ “” (empty string) ● So, is BFQ an I/O scheduler with super powers, or just snake oil? ● How are so discordant opinions possible?
  • 14. ENGINEERS AND DEVICES WORKING TOGETHER BFQ folklore ● Probably for two main reasons ● People still know little about the actual benefits of BFQ ● For those who know these benefits, they know them ○ just as folklore ○ or because of their impressions (no measurements) ● Repeatable tests are available in my S suite ○ But S is very little popular, if not almost completely unknown ○ Nobody is using S to confirm or deny my public results ● In the end, the actual impact of BFQ is not well-known
  • 15. ENGINEERS AND DEVICES WORKING TOGETHER From folklore to sound knowledge ● To address these issues, we tried to have BFQ ○ properly and measurably tested, in terms of its target goals ○ by a third party ○ on a very popular, authoritative benchmark suite ● So, we contributed to the Phoronix Test Suite ○ Module that helps users configure BFQ correctly for the target tests ○ New test, measuring the start-up time of popular applications in the presence of background workload ● Using these contributions, Michael Larabel already wrote an article (report) on Phoronix, for a very fast SSD ● A companion article for an HHD is expected soon
  • 16. ENGINEERS AND DEVICES WORKING TOGETHER Phoronix results 1/2 ● The report highlights latency and throughput issues ● For one of the test cases, latency issues (bad responsiveness) were caused by a combination of four subtle bugs ● Bugs already found and fixed with the help of BFQ task force ○ Four patches prepared and submitted for mainline ○ Angelo Ruocco has co-signed two of the patches
  • 17. ENGINEERS AND DEVICES WORKING TOGETHER Phoronix results 2/2 ● The other issues were all due to the characteristics of the storage device ○ BFQ is not yet effective in guaranteeing a high throughput in all scenarios with very fast, queueing storage devices ○ BFQ is not yet so fast enough (in terms of execution time) to comply with the request-completion times of modern, very fast devices ● Motivation for the two challenges that BFQ has to face
  • 18. ENGINEERS AND DEVICES WORKING TOGETHER The Two Towers ● Improve effectiveness ○ Make BFQ effective in providing both high throughput and strong service guarantees (latency, fairness) in any working condition ● Improve efficiency ○ Reduce BFQ execution overhead to such an extent that it complies even with very fast flash-based devices
  • 19. ENGINEERS AND DEVICES WORKING TOGETHER First contributions to effectiveness Three commit series already upstreamed 1. Improve handling of flash-based devices with no command queueing ○ Significant throughput boost for random I/O ○ On a HiKey board, throughput increased by up to 125%, growing, e.g., from 6.9MB/s to 15.6MB/s with two or three random readers in parallel 2. Improve throughput indirectly ○ Added instructions, in official documentation (in kernel tree), on how to configure BFQ to get max performance depending on target tasks and devices 3. Improve responsiveness with heavy sync-write workloads in the background ○ gnome-terminal start-up time decreased from 50 to 8 seconds on an HDD (against 5 seconds with no background workload)
  • 20. ENGINEERS AND DEVICES WORKING TOGETHER Contributions waiting for approval ● The 4-patch series containing fixes for the responsiveness issues highlighted by one of the test cases considered in the Phoronix article
  • 21. ENGINEERS AND DEVICES WORKING TOGETHER First tower ● Going beyond these initial contributions is rather more challenging ● In order to provide strong service guarantees, full control on service order is needed ● Guaranteeing both full control and, at the same time, high throughput is very hard with modern devices ● Let’s see why
  • 22. ENGINEERS AND DEVICES WORKING TOGETHER The synchronous I/O problem 1/2 ● Consider a process, with a very high weight, that shares storage with other very-low-weight processes ● Virtually all the storage device bandwidth must be devoted to the high-weight process ○ Almost every time the high-weight process has an I/O request, that request must be served immediately ● Suppose that ○ The high-weight process does synchronous I/O: after each I/O request, it systematically blocks, waiting for request completion, then issues a new I/O request ○ The other process always have some pending I/O request
  • 23. ENGINEERS AND DEVICES WORKING TOGETHER The synchronous I/O problem 2/2 ● What to do when the high-weight process blocks waiting for the completion of its current I/O request? ● If the I/O scheduler dispatches I/O requests from other processes, to not leave the device idle ○ A lot of low-weight I/O may then happen to be queued in the device while the high-weight process is blocked ○ The high-weight process will find the device busy when it finally issues its new I/O request ○ The new high-weight I/O request may not be served immediately as it had to be, but only after the already-queued low-weight I/O ● Control on the service order of high-weight and low-weight I/O may be completely lost ○ Bandwidth control may be completely lost
  • 24. ENGINEERS AND DEVICES WORKING TOGETHER Device idling and epic failure ● A simple, clean is solution is just idling the device while waiting for next I/O request for a high-weight synchronous process ● Effective solution for devices without internal queueing ● Nefarious for devices with internal queueing ○ Idling the device means making its internal queue empty ○ Severe loss of throughput with a modern device! ● Paradoxical failure for fairness ○ The device can become so underutilized with device idling that a process gets much less throughput if treated fairly (with idling) than if treated unfairly (without idling)!
  • 25. ENGINEERS AND DEVICES WORKING TOGETHER So what? ● Working hard for a solution to have the best of both worlds ○ Or at least enough good from both worlds ● We have some ideas ● Expectedly, it will be about choosing the right trade off … ● The contributions of Adam Manzanares on I/O-request priorities may come in handy ○ Limited to devices honouring these priorities
  • 26. ENGINEERS AND DEVICES WORKING TOGETHER Second tower ● Improving efficiency is the most difficult task ● BFQ is ○ Large (~9K LOC) ○ Complex ○ Designed with efficiency in mind: unlikely to hide trivial inefficiencies ● Anyway, already implementing some improvements ○ Reducing the average number of steps for updating the I/O schedule ● Major improvements will involve ○ Non-trivial profiling ○ Non-trivial redesign
  • 27. ENGINEERS AND DEVICES WORKING TOGETHER Latency and fairness landscape ● Last part of this presentation ● Overview of the state of affairs with latency and fairness in storage I/O ○ Ambiguity in the very word latency ○ Unfair fairness ○ The strange case of Android lag
  • 28. ENGINEERS AND DEVICES WORKING TOGETHER Latency ambiguity ● Ambiguity on the very word latency ● For technicians, average overall processing time of an I/O request ○ Measured in the absence of long queueing in the OS ● For users, waiting time for tasks to be accomplished ○ E.g., start an app, get next video frame ○ Worst case much more important than average ○ Problems caused exactly by long queueing! ● A proper full name for user-perceived latency might be worst-case I/O-task latency ○ Abbrev.: task latency? ● The term lag is already commonly used for task latency if task is interactive
  • 29. ENGINEERS AND DEVICES WORKING TOGETHER Paradoxical failure for fairness ● Enforcing bandwidth control is needed also to guarantee a low task latency ○ The I/O of the task that must be guaranteed a low latency must be guaranteed a high fraction of the bandwidth ● Idling is needed to control bandwidth ○ Throughput may then be lost ● A user may be willing to sacrifice throughput when needed for low task latency ● Things change dramatically for fairness ○ With device idling, a process may get less throughput if treated fairly than if not!
  • 30. ENGINEERS AND DEVICES WORKING TOGETHER Fairness without idling 1/2 ● Only throttling currently available ○ Simple ○ Efficient ● Yet rigid and indirect ○ To let a process get its intended share of the throughput, limit the other processes ○ Need to guess the right limits for the other processes, as a function of the expected throughput reached by the device ○ But overall throughput varies with the workload ○ And throttling influences workload! ● So, either be conservative to make sure the intended share is guaranteed ○ Underutilize available bandwidth ● Or play at your own risk!
  • 31. ENGINEERS AND DEVICES WORKING TOGETHER Fairness without idling 2/2 ● Yet, even in a risky configuration, throughput may however be lost ○ If some process does I/O below its bandwidth limit, other throttled processes cannot compensate for the unused throughput! ● Throttling extensions now available in Linux to reduce this bandwidth waste ● But with the following shortcomings ○ Very heuristic ○ Effectiveness largely dependent on and variable with workloads ○ Hard to configure so as to improve effectiveness ○ Unlikely to get optimal device utilization with generic, dynamic workloads
  • 32. ENGINEERS AND DEVICES WORKING TOGETHER Waiting for a complete solution ... ● Solving throughput problems of fair-queueing schedulers like BFQ could finally provide a sound, complete solution ○ No need for complex configurations ○ Seamless redistribution of excess bandwidth ○ Throughput saturation (or quasi saturation) at all times ○ Tight task latencies guaranteed ○ And, of course, fairness
  • 33. ENGINEERS AND DEVICES WORKING TOGETHER The strange case of Android lag ● All our experiments show serious lag issues related to I/O ○ E.g., during app updates ● We see the same issues on our phones in daily usage ○ Especially during app updates ○ We did not trace phones 24/7 to completely close the loop ● Users complain publicly ● It seems obvious that there are these lag issues ○ If there is bad or no control on storage access, and some app/service are in an I/O-bound phase, then some other app/service may wait long before having the opportunity to use storage ● Yet, Android seems to suffer from some lag in solving this lag problem ○ Solution held up mainly by the concern that the needed kernel changes may cause regressions or instability
  • 34. Thank You #SFO17 SFO17 keynotes and videos on: connect.linaro.org For further information: www.linaro.org