Performance and Availability Tradeoffs in Replicated File Systems
1. Performance and
Availability Tradeoffs in
Replicated File Systems
Peter Honeyman
Center for Information Technology Integration
University of Michigan, Ann Arbor
2. Acknowledgements
• Joint work with Dr. Jiaying Zhang
• Now at Google
• This was a chapter of her dissertation
• Partially supported by
• NSF/NMI GridNFS
• DOE/SciDAC Petascale Data Storage Institute
• NetApp
• IBM ARC
9. Parameters
• Failure free, single server run time
• Can be estimated or measured
• Our focus is on 1 to 10 days
10. Parameters
• Replication overhead
• Penalty associated with replication to
backup servers
• Proportional to RTT
• Ratio can be measured by running with a
backup server a few msec away
11. Parameters
• Recovery time
• Time to detect failure of the primary
server and switch to a backup server
• Not a sensitive parameter
13. Server failure
• Estimated by analyzing PlanetLab ping data
• 716 nodes, 349 sites, 25 countries
• All-pairs, 15 minute interval, 1/04 to 6/05
• 692 nodes were alive throughout
• We ascribe missing pings to node failure
and network partition
15. Correlated failures
failed
nodes
nodes per site 2 3 4 5
2 0.526 0.593 0.552 0.561
3 0.546 0.440 0.538
4 0.378 0.488
5 0.488
number of sites 259 65 21 11
P(n nodes down | 1 node down)
16. 0.25
Correlated failures
Average Failure Correlations
0.20
0.15
0.10
0.05
0
25 75 125 175
RTT (ms)
nodes slope y-intercept
2 -2.4 x 10-4 0.195
3 -2.3 x 10-4 0.155
4 -2.3 x 10-4 0.134
5 -2.4 x 10-4 0.119
17. Run-time model
• Discrete event simulation for expected run
time and utilization
recover
fail ok
fail
start run end
18. Simulation results
one hour no replication: utilization = .995
write intensity
0.0001
0.001
0.01
RTT 0.1
1.0 1.0
0.8 0.8
0.6 0.6
RTT RTT
One backup Four backups
19. Simulation results
one day no replication: utilization = .934
write intensity
0.0001
0.001
0.01
RTT 0.1
1.0 1.0
0.8 0.8
0.6 0.6
RTT RTT
One backup Four backups
20. Simulation results
ten days no replication: utilization = .668
RTT RTT
1.00 1.00
0.75 0.75
0.50 0.50
RTT RTT
One backup Four backups
21. Simulation discussion
• Replication improves utilization for long-
running jobs
• Multiple backup servers do not improve
utilization (due to low PlanetLab failure
rates)
22. Simulation discussion
• Distant backup servers improve utilization
for light writers
• Distant backup servers do not improve
utilization for heavy writers
• Implications for checkpoint interval …
23. Checkpoint interval
calculated on the
back of a napkin
one day, 20% checkpoint overhead
10 day, 2% checkpoint overhead 10 day, 2% checkpoint overhead
one backup server four backup servers
24. Work in progress
• Realistic failure data
• Storage and processor failure
• PDSI failure data repository
• Realistic checkpoint costs — help!
• Realistic replication overhead
• Depends on amount of computation
• Less than 10% for NAS Grid Benchmarks
25. Conclusions
• Conventional wisdom holds that
consistent mutable replication
in large-scale distributed systems
is too expensive to consider
• Our study suggests otherwise
26. Conclusions
• Consistent replication in large-scale
distributed storage systems is
feasible and practical
• Superior performance
• Rigorous adherence to conventional file
system semantics
• Improved utilization
27. Thank you for your attention!
www.citi.umich.edu
Questions?