SlideShare una empresa de Scribd logo
1 de 17
FAULT TOLERANCE
By– Gaurav Singh Rawat
Electrical Department
Systems Engineering
Fault Tolerance
Fault-tolerant computing is the art and science of
building computing systems that
continue to operate satisfactorily in the presence of
faults. A fault-tolerant system may be
able to tolerate one or more fault-types including –
i) transient(cause by external disturbance),
intermittent(cause by marginal designed error) or
permanent hardware faults,
ii) software and hardware design errors,
iii) operator errors, or
iv) externally induced upsets or physical damage.
Fault tolerance concept taxonomy
Faults
Errors
Failures
Fault-
Tolerance
Threats
Attributes
Means
Availability
Perform ability
Graceful Degradation
Maintainability
Testability
Error Detection
System Recovery
Fault Masking
Reconfiguration
Redundancy
Basic Concept
Dependability includes:
 Availability
 Reliability
 Safety(security)
 Maintainability
Availability & Reliability
 Availability: A measurement of whether
a system is ready to be used immediately
◦ System is available at any given moment
 Reliability: A measurement of whether
a system can run continuously without
failure
◦ System continues to function for a long
period of time
Safety & Maintainability
 Safety: A measurement of how safe failures
are
◦ System fails, nothing serious happens
◦ For instance, high degree of safety is required for
systems controlling nuclear power plants
 Maintainability: A measurement of how
easy it is to repair a system
◦ A highly maintainable system may also show a
high degree of availability
◦ Failures can be detected and repaired
automatically. Self-healing systems.
What is Fault?
 A system fails when it cannot meet its promises
(specifications)
 An error is part of a system state that may lead to
a failure
 A fault is the cause of the error
 Fault-Tolerance: the system can provide services
even in the presence of faults
 Faults can be:
◦ Transient (appear once and disappear)
◦ Intermittent (appear-disappear-reappear behavior)
 A loose contact on a connector intermittent fault
◦ Permanent (appear and persist until repaired)
Failure Model
Type of Failure Description
Crash failure A server halts, but is working correctly until it halts
Omission failure
Receive omission
Send omission
A server fails to respond to incoming requests
A server fails to receive incoming messages
A server fails to send messages
Timing failure A server's response lies outside the specified time
interval
Response failure
Value failure
State transition
failure
The server's response is incorrect
The value of the response is wrong
The server deviates from the correct flow of control
Arbitrary failure
(Byzantine failure)
A server may produce arbitrary responses at
arbitrary times
Error Detection
 Error detection is a detection of errors
caused by noise or other impairments during
transmission from the transmitter to the
receiver.
 There are many schemes of error
detection:-
1. Repetition codes.
2. Parity bits.
3. Checksums.
4. Cyclic redundancy checks.
5. Cryptography hash functions.
System Recovery
 We have talk a lot about fault tolerance
but not talk about what happen after fault
has occurred.
 A process that exhibits a failure has to be
able to recover to a correct state
 There are two type of recovery:
1. Backward Recovery.
2. Forward Recovery.
Backward Recovery
 The goal of backward recovery is to bring
the system from an erroneous state back
to a prior correct state
 The state of the system must be recorded
- checkpointed - from time to time, and
then restored when things go wrong
 Examples
◦ Reliable communication through packet
retransmission
Forward Recovery
 The goal of forward recovery is to bring a
system from an erroneous state to a
correct new state (not a previous state)
 Examples:
◦ Reliable communication via erasure(a
correction made by erasing) correction, such
as an (n, k) block erasure code.
Fault Masking
 Fault Masking is a structural redundancy
technique that completely masks faults
within a set of redundant modules.
 Redundancy is key technique for hiding
failures.
 Redundancy, however, can have an
adverse impact on the performance of a
system. For example, it can increase the
length of transmitted data or increase
the resource consumption.
Reconfiguration
 Reconfiguration is the “process of
eliminating a faulty entity from a system
and restoring the system to some
operational condition or state”.
 When we use Reconfiguration process
designer must be concerned with fault
detection, fault location, fault containment,
and fault recovery.
Redundancy
 In engineering redundancy is the
duplication of critical components or
function of a system with the intention of
increasing reliability of the system.
 Redundancy are four types:-
1. Hardware(such as DMR & TMR)
2. Software(N-version programming)
3. Time(transient fault detection such as
Alternate logic)
4. Information(error detection or
correction)
Conclusion
Fault-tolerance is achieved by applying a set of
analysis and design techniques to create systems
with dramatically improved dependability.As new
technologies are developed and new applications
arise, new fault-tolerance approaches are also
needed. Now chips contain complex, highly-
integrated functions, and hardware and software
must be crafted to meet a variety of standards to
be economically viable.Thus a great deal of
current research focuses on implementing fault
tolerance using COTS (Commercial-Off-The-
Shelf) technology.
Fault tolerance

Más contenido relacionado

La actualidad más candente

Agreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared MemoryAgreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared MemorySHIKHA GAUTAM
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system modelHarshad Umredkar
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systemssumitjain2013
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
Synchronization in distributed computing
Synchronization in distributed computingSynchronization in distributed computing
Synchronization in distributed computingSVijaylakshmi
 
Distributed file system
Distributed file systemDistributed file system
Distributed file systemAnamika Singh
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processingPage Maker
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed SystemSunita Sahu
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance Systemprakashjjaya
 
Distributed Systems Introduction and Importance
Distributed Systems Introduction and Importance Distributed Systems Introduction and Importance
Distributed Systems Introduction and Importance SHIKHA GAUTAM
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bankpkaviya
 
Security services and mechanisms
Security services and mechanismsSecurity services and mechanisms
Security services and mechanismsRajapriya82
 
Program security
Program securityProgram security
Program securityG Prachi
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems SHATHAN
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingSayed Chhattan Shah
 
Key management and distribution
Key management and distributionKey management and distribution
Key management and distributionRiya Choudhary
 
key distribution in network security
key distribution in network securitykey distribution in network security
key distribution in network securitybabak danyal
 

La actualidad más candente (20)

Agreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared MemoryAgreement Protocols, distributed File Systems, Distributed Shared Memory
Agreement Protocols, distributed File Systems, Distributed Shared Memory
 
distributed Computing system model
distributed Computing system modeldistributed Computing system model
distributed Computing system model
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Synchronization in distributed computing
Synchronization in distributed computingSynchronization in distributed computing
Synchronization in distributed computing
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
Stream oriented communication
Stream oriented communicationStream oriented communication
Stream oriented communication
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
Distributed Systems Introduction and Importance
Distributed Systems Introduction and Importance Distributed Systems Introduction and Importance
Distributed Systems Introduction and Importance
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
 
Security services and mechanisms
Security services and mechanismsSecurity services and mechanisms
Security services and mechanisms
 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
 
Program security
Program securityProgram security
Program security
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Key management and distribution
Key management and distributionKey management and distribution
Key management and distribution
 
key distribution in network security
key distribution in network securitykey distribution in network security
key distribution in network security
 

Destacado

Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Sri Prasanna
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentationskadyan1
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computingPalani murugan
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemanujos25
 
Groupe Chèque déjeuner : « Des outils innovants pour accompagner la mise en œ...
Groupe Chèque déjeuner : « Des outils innovants pour accompagner la mise en œ...Groupe Chèque déjeuner : « Des outils innovants pour accompagner la mise en œ...
Groupe Chèque déjeuner : « Des outils innovants pour accompagner la mise en œ...idealconnaissances
 
Software reliability & quality
Software reliability & qualitySoftware reliability & quality
Software reliability & qualityNur Islam
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant systemarvinthsaran
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testingBipul Roy Bpl
 
Fault avoidance and fault tolerance
Fault avoidance and fault toleranceFault avoidance and fault tolerance
Fault avoidance and fault toleranceJabez Winston
 
Fault tolerance in wsn
Fault tolerance in wsnFault tolerance in wsn
Fault tolerance in wsnElham Hormozi
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukAndrii Vozniuk
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliabilityranapoonam1
 
Technique de Cryptographie AES, DES et RSA
Technique de Cryptographie AES, DES et RSATechnique de Cryptographie AES, DES et RSA
Technique de Cryptographie AES, DES et RSAHouda Elmoutaoukil
 
I.1 Earthquakes
I.1 EarthquakesI.1 Earthquakes
I.1 Earthquakesaldelaitre
 
DISE - Software Testing and Quality Management
DISE - Software Testing and Quality ManagementDISE - Software Testing and Quality Management
DISE - Software Testing and Quality ManagementRasan Samarasinghe
 

Destacado (20)

Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentation
 
Fault tolerance and computing
Fault tolerance  and computingFault tolerance  and computing
Fault tolerance and computing
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating system
 
Groupe Chèque déjeuner : « Des outils innovants pour accompagner la mise en œ...
Groupe Chèque déjeuner : « Des outils innovants pour accompagner la mise en œ...Groupe Chèque déjeuner : « Des outils innovants pour accompagner la mise en œ...
Groupe Chèque déjeuner : « Des outils innovants pour accompagner la mise en œ...
 
Tract Rsa Avril
Tract Rsa AvrilTract Rsa Avril
Tract Rsa Avril
 
Cours3
Cours3Cours3
Cours3
 
Software reliability & quality
Software reliability & qualitySoftware reliability & quality
Software reliability & quality
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant system
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testing
 
Fault avoidance and fault tolerance
Fault avoidance and fault toleranceFault avoidance and fault tolerance
Fault avoidance and fault tolerance
 
Fault tolerance in wsn
Fault tolerance in wsnFault tolerance in wsn
Fault tolerance in wsn
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliability
 
Technique de Cryptographie AES, DES et RSA
Technique de Cryptographie AES, DES et RSATechnique de Cryptographie AES, DES et RSA
Technique de Cryptographie AES, DES et RSA
 
I.1 Earthquakes
I.1 EarthquakesI.1 Earthquakes
I.1 Earthquakes
 
DFD level-0 to 1
DFD level-0 to 1DFD level-0 to 1
DFD level-0 to 1
 
Resilience ppt
Resilience pptResilience ppt
Resilience ppt
 
DISE - Software Testing and Quality Management
DISE - Software Testing and Quality ManagementDISE - Software Testing and Quality Management
DISE - Software Testing and Quality Management
 

Similar a Fault tolerance

Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance SystemEhsan Ilahi
 
Adaptive fault tolerance in cloud survey
Adaptive fault tolerance in cloud surveyAdaptive fault tolerance in cloud survey
Adaptive fault tolerance in cloud surveywww.pixelsolutionbd.com
 
Software archiecture lecture05
Software archiecture   lecture05Software archiecture   lecture05
Software archiecture lecture05Luktalja
 
Software Reliability_CS-3059_VISHAL_PADME.pptx
Software Reliability_CS-3059_VISHAL_PADME.pptxSoftware Reliability_CS-3059_VISHAL_PADME.pptx
Software Reliability_CS-3059_VISHAL_PADME.pptxVishalPadme2
 
Critical System Specification in Software Engineering SE17
Critical System Specification in Software Engineering SE17Critical System Specification in Software Engineering SE17
Critical System Specification in Software Engineering SE17koolkampus
 
Dependable Software Development in Software Engineering SE18
Dependable Software Development in Software Engineering SE18Dependable Software Development in Software Engineering SE18
Dependable Software Development in Software Engineering SE18koolkampus
 
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...Neelamani Samal
 
basic concepts of reliability
basic concepts of reliabilitybasic concepts of reliability
basic concepts of reliabilitydennis gookyi
 
Chapter13 -- ensuring integrity and availability
Chapter13  -- ensuring integrity and availabilityChapter13  -- ensuring integrity and availability
Chapter13 -- ensuring integrity and availabilityRaja Waseem Akhtar
 
Unit 2-software development process notes
Unit 2-software development process notes Unit 2-software development process notes
Unit 2-software development process notes arvind pandey
 
EMBEDDED SYSTEMS 1
EMBEDDED SYSTEMS 1EMBEDDED SYSTEMS 1
EMBEDDED SYSTEMS 1PRADEEP
 
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...vtunotesbysree
 
Ch11 - Reliability Engineering
Ch11 - Reliability EngineeringCh11 - Reliability Engineering
Ch11 - Reliability EngineeringHarsh Verdhan Raj
 
Ch13-Software Engineering 9
Ch13-Software Engineering 9Ch13-Software Engineering 9
Ch13-Software Engineering 9Ian Sommerville
 

Similar a Fault tolerance (20)

Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
Critical Systems
Critical SystemsCritical Systems
Critical Systems
 
Adaptive fault tolerance in cloud survey
Adaptive fault tolerance in cloud surveyAdaptive fault tolerance in cloud survey
Adaptive fault tolerance in cloud survey
 
Software archiecture lecture05
Software archiecture   lecture05Software archiecture   lecture05
Software archiecture lecture05
 
Software Reliability_CS-3059_VISHAL_PADME.pptx
Software Reliability_CS-3059_VISHAL_PADME.pptxSoftware Reliability_CS-3059_VISHAL_PADME.pptx
Software Reliability_CS-3059_VISHAL_PADME.pptx
 
Critical System Specification in Software Engineering SE17
Critical System Specification in Software Engineering SE17Critical System Specification in Software Engineering SE17
Critical System Specification in Software Engineering SE17
 
Ch20
Ch20Ch20
Ch20
 
Dependable Software Development in Software Engineering SE18
Dependable Software Development in Software Engineering SE18Dependable Software Development in Software Engineering SE18
Dependable Software Development in Software Engineering SE18
 
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
 
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
A fault tolerant tokenbased atomic broadcast algorithm relying on responsive ...
 
Sda 3
Sda   3Sda   3
Sda 3
 
basic concepts of reliability
basic concepts of reliabilitybasic concepts of reliability
basic concepts of reliability
 
Chapter13 -- ensuring integrity and availability
Chapter13  -- ensuring integrity and availabilityChapter13  -- ensuring integrity and availability
Chapter13 -- ensuring integrity and availability
 
Unit 2-software development process notes
Unit 2-software development process notes Unit 2-software development process notes
Unit 2-software development process notes
 
SEPM_MODULE 2 PPT.pptx
SEPM_MODULE 2 PPT.pptxSEPM_MODULE 2 PPT.pptx
SEPM_MODULE 2 PPT.pptx
 
EMBEDDED SYSTEMS 1
EMBEDDED SYSTEMS 1EMBEDDED SYSTEMS 1
EMBEDDED SYSTEMS 1
 
Fault tolerance techniques
Fault tolerance techniquesFault tolerance techniques
Fault tolerance techniques
 
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
VTU 5TH SEM CSE SOFTWARE ENGINEERING SOLVED PAPERS - JUN13 DEC13 JUN14 DEC14 ...
 
Ch11 - Reliability Engineering
Ch11 - Reliability EngineeringCh11 - Reliability Engineering
Ch11 - Reliability Engineering
 
Ch13-Software Engineering 9
Ch13-Software Engineering 9Ch13-Software Engineering 9
Ch13-Software Engineering 9
 

Más de Gaurav Rawat

Six sense technology
Six sense technologySix sense technology
Six sense technologyGaurav Rawat
 
CCNA Based routing protocols
CCNA Based routing protocolsCCNA Based routing protocols
CCNA Based routing protocolsGaurav Rawat
 
Distance vector routing algorithm
Distance vector routing algorithmDistance vector routing algorithm
Distance vector routing algorithmGaurav Rawat
 

Más de Gaurav Rawat (6)

Computer network
Computer networkComputer network
Computer network
 
Six sense technology
Six sense technologySix sense technology
Six sense technology
 
Ic presentation
Ic presentationIc presentation
Ic presentation
 
CCNA Based routing protocols
CCNA Based routing protocolsCCNA Based routing protocols
CCNA Based routing protocols
 
Distance vector routing algorithm
Distance vector routing algorithmDistance vector routing algorithm
Distance vector routing algorithm
 
Computer network
Computer networkComputer network
Computer network
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Fault tolerance

  • 1. FAULT TOLERANCE By– Gaurav Singh Rawat Electrical Department Systems Engineering
  • 2. Fault Tolerance Fault-tolerant computing is the art and science of building computing systems that continue to operate satisfactorily in the presence of faults. A fault-tolerant system may be able to tolerate one or more fault-types including – i) transient(cause by external disturbance), intermittent(cause by marginal designed error) or permanent hardware faults, ii) software and hardware design errors, iii) operator errors, or iv) externally induced upsets or physical damage.
  • 3. Fault tolerance concept taxonomy Faults Errors Failures Fault- Tolerance Threats Attributes Means Availability Perform ability Graceful Degradation Maintainability Testability Error Detection System Recovery Fault Masking Reconfiguration Redundancy
  • 4. Basic Concept Dependability includes:  Availability  Reliability  Safety(security)  Maintainability
  • 5. Availability & Reliability  Availability: A measurement of whether a system is ready to be used immediately ◦ System is available at any given moment  Reliability: A measurement of whether a system can run continuously without failure ◦ System continues to function for a long period of time
  • 6. Safety & Maintainability  Safety: A measurement of how safe failures are ◦ System fails, nothing serious happens ◦ For instance, high degree of safety is required for systems controlling nuclear power plants  Maintainability: A measurement of how easy it is to repair a system ◦ A highly maintainable system may also show a high degree of availability ◦ Failures can be detected and repaired automatically. Self-healing systems.
  • 7. What is Fault?  A system fails when it cannot meet its promises (specifications)  An error is part of a system state that may lead to a failure  A fault is the cause of the error  Fault-Tolerance: the system can provide services even in the presence of faults  Faults can be: ◦ Transient (appear once and disappear) ◦ Intermittent (appear-disappear-reappear behavior)  A loose contact on a connector intermittent fault ◦ Permanent (appear and persist until repaired)
  • 8. Failure Model Type of Failure Description Crash failure A server halts, but is working correctly until it halts Omission failure Receive omission Send omission A server fails to respond to incoming requests A server fails to receive incoming messages A server fails to send messages Timing failure A server's response lies outside the specified time interval Response failure Value failure State transition failure The server's response is incorrect The value of the response is wrong The server deviates from the correct flow of control Arbitrary failure (Byzantine failure) A server may produce arbitrary responses at arbitrary times
  • 9. Error Detection  Error detection is a detection of errors caused by noise or other impairments during transmission from the transmitter to the receiver.  There are many schemes of error detection:- 1. Repetition codes. 2. Parity bits. 3. Checksums. 4. Cyclic redundancy checks. 5. Cryptography hash functions.
  • 10. System Recovery  We have talk a lot about fault tolerance but not talk about what happen after fault has occurred.  A process that exhibits a failure has to be able to recover to a correct state  There are two type of recovery: 1. Backward Recovery. 2. Forward Recovery.
  • 11. Backward Recovery  The goal of backward recovery is to bring the system from an erroneous state back to a prior correct state  The state of the system must be recorded - checkpointed - from time to time, and then restored when things go wrong  Examples ◦ Reliable communication through packet retransmission
  • 12. Forward Recovery  The goal of forward recovery is to bring a system from an erroneous state to a correct new state (not a previous state)  Examples: ◦ Reliable communication via erasure(a correction made by erasing) correction, such as an (n, k) block erasure code.
  • 13. Fault Masking  Fault Masking is a structural redundancy technique that completely masks faults within a set of redundant modules.  Redundancy is key technique for hiding failures.  Redundancy, however, can have an adverse impact on the performance of a system. For example, it can increase the length of transmitted data or increase the resource consumption.
  • 14. Reconfiguration  Reconfiguration is the “process of eliminating a faulty entity from a system and restoring the system to some operational condition or state”.  When we use Reconfiguration process designer must be concerned with fault detection, fault location, fault containment, and fault recovery.
  • 15. Redundancy  In engineering redundancy is the duplication of critical components or function of a system with the intention of increasing reliability of the system.  Redundancy are four types:- 1. Hardware(such as DMR & TMR) 2. Software(N-version programming) 3. Time(transient fault detection such as Alternate logic) 4. Information(error detection or correction)
  • 16. Conclusion Fault-tolerance is achieved by applying a set of analysis and design techniques to create systems with dramatically improved dependability.As new technologies are developed and new applications arise, new fault-tolerance approaches are also needed. Now chips contain complex, highly- integrated functions, and hardware and software must be crafted to meet a variety of standards to be economically viable.Thus a great deal of current research focuses on implementing fault tolerance using COTS (Commercial-Off-The- Shelf) technology.