This document presents an approach called Request-centric I/O Prioritization (RCP) that holistically prioritizes I/Os along the entire I/O path to improve application performance. Existing I/O schedulers operate independently across different layers, which can cause I/O priority inversion and arbitrarily delay foreground tasks. RCP identifies critical I/Os using an enlightenment API and inherits I/O priorities through task and I/O dependencies to enforce critical I/O prioritization in caching, block allocation and queueing layers. Evaluation with PostgreSQL, MongoDB and Redis shows RCP improves throughput by 12-49% and reduces tail latencies by 5-20x compared to other schedulers.
38. Handling Transitive Dependency
• Possible states of dependent task
38
FG
inherit
BG BG
Blocked
on task
I/OFG
inherit
BG
wait wait
Blocked
on I/O
FG
inherit
BG
wait
Blocked at
admission stage
39. Handling Transitive Dependency
• Recording blocking status
39
FG
inherit
BG BG
Blocked
on task
I/OFG
inherit
BG
wait wait
Blocked
on I/O
FG
inherit
BG
wait
Blocked at
admission stage
40. Handling Transitive Dependency
• Recording blocking status
40
FG
inherit
BG BG
Task is
recorded
I/OFG
inherit
BG
wait
Blocked
on I/O
FG
inherit
BG
wait
Blocked at
admission stage
inherit
41. Handling Transitive Dependency
• Recording blocking status
41
FG
inherit
BG BG I/OFG
inherit
BG
reprio
I/O is
recorded
FG
inherit
BG
wait
Blocked at
admission stage
Task is
recorded
inherit
42. Handling Transitive Dependency
• Recording blocking status
42
FG
inherit
BG BG I/OFG
inherit
BG FG
inherit
BG
retryreprio
I/O is
recorded
Task is
recorded
inherit
43. Challenges
• How to accurately identify I/O criticality
• Enlightenment API
• I/O priority inheritance
• Recording blocking status
• How to effectively enforce I/O criticality
43
44. Criticality-Aware I/O Prioritization
• Caching layer
• Apply low dirty ratio for non-critical writes (1% by default)
• Block layer
• Isolate allocation of block queue slots
• Maintain 2 FIFO queues
• Schedule critical I/O first
• Limit # of outstanding non-critical I/Os (1 by default)
• Support queue promotion to resolve I/O dependency
44
45. Evaluation
• Implementation on Linux 3.13 w/ ext4
• Application studies
• PostgreSQL relational database
• Backend processes as foreground tasks
• I/O priority inheritance on LWLocks (semop)
• MongoDB document store
• Client threads as foreground tasks
• I/O priority inheritance on Pthread mutex and condition vars (futex)
• Redis key-value store
• Master process as foreground task
45
57. Summary of Other Results
• Performance results
• MongoDB: 12%-201% throughput, 5x-20x latency at 99.9th
• Redis: 7%-49% throughput, 2x-20x latency at 99.9th
• Analysis results
• System latency analysis using LatencyTOP
• System throughput vs. Application latency
• Need for holistic approach
57
58. Conclusions
• Key observation
• All the layers in the I/O path should be considered as a whole with I/O
priority inversion in mind for effective I/O prioritization
• Request-centric I/O prioritization
• Enlightens the I/O path solely for application performance
• Improves throughput and latency of real applications
• Ongoing work
• Practicalizing implementation
• Applying RCP to database cluster with multiple replicas
58