SlideShare una empresa de Scribd logo
1 de 27
Performance and
Availability Tradeoffs in
Replicated File Systems
            Peter Honeyman
 Center for Information Technology Integration
      University of Michigan, Ann Arbor
Acknowledgements

• Joint work with Dr. Jiaying Zhang
 • Now at Google
 • This was a chapter of her dissertation
• Partially supported by
 • NSF/NMI GridNFS
 • DOE/SciDAC Petascale Data Storage Institute
 • NetApp
 • IBM ARC
Storage replication

• Advantages ☺
 • Scalability
 • Reliability
 • Read performance
Storage replication

• Disadvantages ☹
 • Complex synchronization protocols
   • Concurrency
   • Durability
 • Write performance
Durability

• If we weaken the durability guarantee, we
    may lose data ...
•   And be forced to restart the computation
•   But it might be worth it
Utilization tradeoffs

• Adding replication servers enhances durability
 • Reduces the risk that computation must be
   restarted
 • Increases utilization ☺
• Replication increases run time
 • Reduces utilization ☹
Placement tradeoffs

• Nearby replication servers reduce the
  replication penalty
 • Increases utilization ☺
• Nearby replication servers are vulnerable
  to correlated failure
 • Reduces utilization ☹
Run-time model

            recover
        fail
                ok

           fail
start        run      end
Parameters


• Failure free, single server run time
 • Can be estimated or measured
 • Our focus is on 1 to 10 days
Parameters

• Replication overhead
 • Penalty associated with replication to
   backup servers
 • Proportional to RTT
 • Ratio can be measured by running with a
    backup server a few msec away
Parameters

• Recovery time
 • Time to detect failure of the primary
   server and switch to a backup server
 • Not a sensitive parameter
Parameters


• Probability distribution functions
 • Server failure
 • Successful recovery
Server failure

• Estimated by analyzing PlanetLab ping data
 • 716 nodes, 349 sites, 25 countries
 • All-pairs, 15 minute interval, 1/04 to 6/05
 • 692 nodes were alive throughout
• We ascribe missing pings to node failure
  and network partition
PlanetLab failure




cumulative failure: log-linear scale
Correlated failures
                  failed
                  nodes
 nodes per site               2      3       4       5
                    2        0.526   0.593   0.552   0.561
                    3                0.546   0.440   0.538
                    4                        0.378   0.488
                    5                                0.488
number of sites              259     65      21      11

                           P(n nodes down | 1 node down)
0.25
                                      Correlated failures
Average Failure Correlations




                               0.20

                               0.15

                               0.10

                               0.05

                                 0
                                      25            75                     125   175
                                                             RTT (ms)

                                           nodes     slope       y-intercept
                                            2      -2.4 x 10-4     0.195
                                            3      -2.3 x 10-4     0.155
                                            4      -2.3 x 10-4     0.134
                                            5      -2.4 x 10-4     0.119
Run-time model
• Discrete event simulation for expected run
  time and utilization

                     recover
              fail          ok

                     fail
     start             run        end
Simulation results
                   one hour         no replication: utilization = .995

                                          write intensity
                                                        0.0001
                                                          0.001
                                                           0.01
          RTT                                                0.1

1.0                           1.0

0.8                           0.8

0.6                           0.6
          RTT                                    RTT


      One backup                     Four backups
Simulation results
                   one day         no replication: utilization = .934

                                         write intensity
                                                       0.0001
                                                         0.001
                                                          0.01
          RTT                                               0.1

1.0                          1.0

0.8                          0.8

0.6                          0.6
          RTT                                   RTT


      One backup                    Four backups
Simulation results
                    ten days      no replication: utilization = .668




           RTT                                 RTT

1.00                       1.00

0.75                       0.75

0.50                       0.50
           RTT                                 RTT


       One backup                 Four backups
Simulation discussion

• Replication improves utilization for long-
  running jobs
• Multiple backup servers do not improve
  utilization (due to low PlanetLab failure
  rates)
Simulation discussion

• Distant backup servers improve utilization
  for light writers
• Distant backup servers do not improve
  utilization for heavy writers
• Implications for checkpoint interval …
Checkpoint interval

                                       calculated on the
                                       back of a napkin
 one day, 20% checkpoint overhead




  10 day, 2% checkpoint overhead      10 day, 2% checkpoint overhead


one backup server                   four backup servers
Work in progress
• Realistic failure data
 • Storage and processor failure
 • PDSI failure data repository
• Realistic checkpoint costs — help!
• Realistic replication overhead
 • Depends on amount of computation
 • Less than 10% for NAS Grid Benchmarks
Conclusions

• Conventional wisdom holds that
          consistent mutable replication
        in large-scale distributed systems
           is too expensive to consider
• Our study suggests otherwise
Conclusions
• Consistent replication in large-scale
  distributed storage systems is
               feasible and practical
• Superior performance
• Rigorous adherence to conventional file
  system semantics
• Improved utilization
Thank you for your attention!
                www.citi.umich.edu




   Questions?

Más contenido relacionado

Similar a Performance and Availability Tradeoffs in Replicated File Systems

Resilience at exascale
Resilience at exascaleResilience at exascale
Resilience at exascaleMarc Snir
 
Ertss2010 multicore scheduling
Ertss2010 multicore schedulingErtss2010 multicore scheduling
Ertss2010 multicore schedulingNicolas Navet
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsThe Linux Foundation
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented DesignRodrigo Campos
 
ClusterPresentation
ClusterPresentationClusterPresentation
ClusterPresentationWill Dixon
 
ioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distributionioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distributionMasahito Zembutsu
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...DataWorks Summit
 
Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010
Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010
Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010Atlassian
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudRose Toomey
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudDatabricks
 
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & MemoryVMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & MemoryVMworld
 
JITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfJITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfRichHagarty
 
Structure for scale: Dialing in your apps for optimal performance
Structure for scale: Dialing in your apps for optimal performanceStructure for scale: Dialing in your apps for optimal performance
Structure for scale: Dialing in your apps for optimal performanceAtlassian
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudCeph Community
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 

Similar a Performance and Availability Tradeoffs in Replicated File Systems (20)

Resilience at exascale
Resilience at exascaleResilience at exascale
Resilience at exascale
 
Ertss2010 multicore scheduling
Ertss2010 multicore schedulingErtss2010 multicore scheduling
Ertss2010 multicore scheduling
 
Multicore scheduling in automotive ECUs
Multicore scheduling in automotive ECUsMulticore scheduling in automotive ECUs
Multicore scheduling in automotive ECUs
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented Design
 
ClusterPresentation
ClusterPresentationClusterPresentation
ClusterPresentation
 
No stress with state
No stress with stateNo stress with state
No stress with state
 
ioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distributionioDrive de benchmarking 2011 1209_zem_distribution
ioDrive de benchmarking 2011 1209_zem_distribution
 
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...
 
Database Health Check
Database Health CheckDatabase Health Check
Database Health Check
 
Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010
Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010
Performance Tuning: Pulling a Rabbit From a Hat - Atlassian Summit 2010
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & MemoryVMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
 
JITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfJITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdf
 
Structure for scale: Dialing in your apps for optimal performance
Structure for scale: Dialing in your apps for optimal performanceStructure for scale: Dialing in your apps for optimal performance
Structure for scale: Dialing in your apps for optimal performance
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Deep Dive on Amazon EC2
Deep Dive on Amazon EC2Deep Dive on Amazon EC2
Deep Dive on Amazon EC2
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 

Último

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Performance and Availability Tradeoffs in Replicated File Systems

  • 1. Performance and Availability Tradeoffs in Replicated File Systems Peter Honeyman Center for Information Technology Integration University of Michigan, Ann Arbor
  • 2. Acknowledgements • Joint work with Dr. Jiaying Zhang • Now at Google • This was a chapter of her dissertation • Partially supported by • NSF/NMI GridNFS • DOE/SciDAC Petascale Data Storage Institute • NetApp • IBM ARC
  • 3. Storage replication • Advantages ☺ • Scalability • Reliability • Read performance
  • 4. Storage replication • Disadvantages ☹ • Complex synchronization protocols • Concurrency • Durability • Write performance
  • 5. Durability • If we weaken the durability guarantee, we may lose data ... • And be forced to restart the computation • But it might be worth it
  • 6. Utilization tradeoffs • Adding replication servers enhances durability • Reduces the risk that computation must be restarted • Increases utilization ☺ • Replication increases run time • Reduces utilization ☹
  • 7. Placement tradeoffs • Nearby replication servers reduce the replication penalty • Increases utilization ☺ • Nearby replication servers are vulnerable to correlated failure • Reduces utilization ☹
  • 8. Run-time model recover fail ok fail start run end
  • 9. Parameters • Failure free, single server run time • Can be estimated or measured • Our focus is on 1 to 10 days
  • 10. Parameters • Replication overhead • Penalty associated with replication to backup servers • Proportional to RTT • Ratio can be measured by running with a backup server a few msec away
  • 11. Parameters • Recovery time • Time to detect failure of the primary server and switch to a backup server • Not a sensitive parameter
  • 12. Parameters • Probability distribution functions • Server failure • Successful recovery
  • 13. Server failure • Estimated by analyzing PlanetLab ping data • 716 nodes, 349 sites, 25 countries • All-pairs, 15 minute interval, 1/04 to 6/05 • 692 nodes were alive throughout • We ascribe missing pings to node failure and network partition
  • 15. Correlated failures failed nodes nodes per site 2 3 4 5 2 0.526 0.593 0.552 0.561 3 0.546 0.440 0.538 4 0.378 0.488 5 0.488 number of sites 259 65 21 11 P(n nodes down | 1 node down)
  • 16. 0.25 Correlated failures Average Failure Correlations 0.20 0.15 0.10 0.05 0 25 75 125 175 RTT (ms) nodes slope y-intercept 2 -2.4 x 10-4 0.195 3 -2.3 x 10-4 0.155 4 -2.3 x 10-4 0.134 5 -2.4 x 10-4 0.119
  • 17. Run-time model • Discrete event simulation for expected run time and utilization recover fail ok fail start run end
  • 18. Simulation results one hour no replication: utilization = .995 write intensity 0.0001 0.001 0.01 RTT 0.1 1.0 1.0 0.8 0.8 0.6 0.6 RTT RTT One backup Four backups
  • 19. Simulation results one day no replication: utilization = .934 write intensity 0.0001 0.001 0.01 RTT 0.1 1.0 1.0 0.8 0.8 0.6 0.6 RTT RTT One backup Four backups
  • 20. Simulation results ten days no replication: utilization = .668 RTT RTT 1.00 1.00 0.75 0.75 0.50 0.50 RTT RTT One backup Four backups
  • 21. Simulation discussion • Replication improves utilization for long- running jobs • Multiple backup servers do not improve utilization (due to low PlanetLab failure rates)
  • 22. Simulation discussion • Distant backup servers improve utilization for light writers • Distant backup servers do not improve utilization for heavy writers • Implications for checkpoint interval …
  • 23. Checkpoint interval calculated on the back of a napkin one day, 20% checkpoint overhead 10 day, 2% checkpoint overhead 10 day, 2% checkpoint overhead one backup server four backup servers
  • 24. Work in progress • Realistic failure data • Storage and processor failure • PDSI failure data repository • Realistic checkpoint costs — help! • Realistic replication overhead • Depends on amount of computation • Less than 10% for NAS Grid Benchmarks
  • 25. Conclusions • Conventional wisdom holds that consistent mutable replication in large-scale distributed systems is too expensive to consider • Our study suggests otherwise
  • 26. Conclusions • Consistent replication in large-scale distributed storage systems is feasible and practical • Superior performance • Rigorous adherence to conventional file system semantics • Improved utilization
  • 27. Thank you for your attention! www.citi.umich.edu Questions?