SlideShare una empresa de Scribd logo
1 de 15
CprE 458/558: Real-Time Systems


                              Lecture 17
                  Fault-tolerant design techniques




CprE 458/558               G. Manimaran (ISU)
Fault Tolerant Strategies
    Fault tolerance in computer system is achieved through
     redundancy in hardware, software, information, and/or
     computations. Such redundancy can be implemented in
     static, dynamic, or hybrid configurations.
    Fault tolerance can be achieved by many techniques:
        Fault masking is any process that prevents faults in a system
         from introducing errors. Example: Error correcting memories and
         majority voting.
        Reconfiguration is the process of eliminating faulty component
         from a system and restoring the system to some operational state.




CprE 458/558                 G. Manimaran (ISU)                          2
Reconfiguration Approach
    Fault detection is the process of recognizing that a fault
     has occurred. Fault detection is often required before any
     recovery procedure can be initiated.
    Fault location is the process of determining where a fault
     has occurred so that an appropriate recovery can be
     initiated.
    Fault containment is the process of isolating a fault and
     preventing the effects of that fault from propagating
     throughout the system.
    Fault recovery is the process of remaining operational or
     regaining operational status via reconfiguration even in the
     presence of faults.

CprE 458/558             G. Manimaran (ISU)                     3
The Concept of Redundancy
    Redundancy is simply the addition of information,
     resources, or time beyond what is needed for normal
     system operation.
    Hardware redundancy is the addition of extra
     hardware, usually for the purpose either detecting or
     tolerating faults.
    Software redundancy is the addition of extra software,
     beyond what is needed to perform a given function, to
     detect and possibly tolerate faults.
    Information redundancy is the addition of extra
     information beyond that required to implement a given
     function; for example, error detection codes.


CprE 458/558            G. Manimaran (ISU)                    4
The Concept of Redundancy (Cont’d)
    Time redundancy uses additional time to perform the
     functions of a system such that fault detection and often
     fault tolerance can be achieved. Transient faults are
     tolerated by this.

    The use of redundancy can provide additional capabilities
     within a system. But, redundancy can have very important
     impact on a system's performance, size, weight, power
     consumption, and reliability.




CprE 458/558             G. Manimaran (ISU)                      5
Hardware Redundancy
    Passive techniques use the concept of fault masking.
     These techniques are designed to achieve fault tolerance
     without requiring any action on the part of the system.
     Relies on voting mechanisms.
    Active techniques achieve fault tolerance by detecting
     the existence of faults and performing some action to
     remove the faulty hardware from the system. That is,
     active techniques use fault detection, fault location, and
     fault recovery in an attempt to achieve fault tolerance.




CprE 458/558             G. Manimaran (ISU)                       6
Hardware Redundancy (Cont’d)


    Hybrid techniques combine the attractive features of
     both the passive
     and active approaches.
        Fault masking is used in hybrid systems to prevent erroneous
         results from being generated.
        Fault detection, location, and recovery are also used to improve
         fault tolerance by removing faulty hardware and replacing it with
         spares.




CprE 458/558                 G. Manimaran (ISU)                              7
Hardware Redundancy - A Taxonomy

     Title:
     hard-fault.fig
     Creator:
     fig2dev Version 3.1 Patchlevel 2
     Preview:
     This EPS picture was not saved
     with a preview included in it.
     Comment:
     This EPS picture will print to a
     PostScript printer, but not to
     other types of printers.




CprE 458/558                            G. Manimaran (ISU)   8
Triple Modular Redundancy (TMR)


          Title:
          tmr1.fig
          Creator:
          fig2dev Version 3.1 Patchlevel 2
          Preview:
          This EPS picture was not saved
          with a preview included in it.
          Comment:
          This EPS picture will print to a
          PostScript printer, but not to
          other types of printers.




CprE 458/558                                 G. Manimaran (ISU)   9
Software Redundancy - to Detect
        Software Faults
    There are two popular approaches: N-Version
     Programming (NVP) and Recovery Blocks (RB).

    NVP is a forward recovery scheme - it masks faults.
    NVP: multiple versions of the same task is executed
     concurrently.
    NVP relies on voting.

    RB is a backward error recovery scheme.
    RB: the versions of a task are executed serially.
    RB relies on acceptance test.

CprE 458/558              G. Manimaran (ISU)               10
N-Version Programming (NVP)
    NVP is based on the principle of design diversity, that is coding a
     software module by different teams of programmers, to have multiple
     versions.

    Diversity can also be introduced by employing different algorithms for
     obtaining the same solution or by choosing different programming
     languages.

    NVP can tolerate both hardware and software faults.

    Correlated faults are not tolerated by the NVP.

    In NVP, deciding the number of versions required to ensure acceptable
     levels of software reliability is an important design consideration.


CprE 458/558                  G. Manimaran (ISU)                          11
N-Version Programming (Cont’d)

          Title:
          nvp.fig
          Creator:
          fig2dev Version 3.1 Patchlevel 2
          Preview:
          This EPS picture was not saved
          with a preview included in it.
          Comment:
          This EPS picture will print to a
          PostScript printer, but not to
          other types of printers.




CprE 458/558                                 G. Manimaran (ISU)   12
Recovery Blocks (RB)
    RB uses multiple alternates (backups) to perform the same
     function; one module (task) is primary and the others are
     secondary.

    The primary task executes first. When the primary task
     completes execution, its outcome is checked by an
     acceptance test.

    If the output is not acceptable, a secondary task executes
     after undoing the effects of primary (i.e., rolling back to
     the state at which primary was invoked) until either an
     acceptable output is obtained or the alternates are
     exhausted.

CprE 458/558             G. Manimaran (ISU)                    13
Recovery Blocks (Cont’d)
         Title:
         rblocks.fig
         Creator:
         fig2dev Version 3.1 Patchlevel 2
         Preview:
         This EPS picture was not saved
         with a preview included in it.
         Comment:
         This EPS picture will print to a
         PostScript printer, but not to
         other types of printers.




CprE 458/558                                G. Manimaran (ISU)   14
Recovery Blocks (Cont’d)
    The acceptance tests are usually sanity checks; these
     consist of making sure that the output is within a certain
     acceptable range or that the output does not change at
     more than the allowed maximum rate.

    Selecting the range for acceptance test is crucial. If the
     allowed ranges are too small, the acceptance tests may
     label correct outputs as bad. If they are too large, the
     probability that incorrect outputs will be accepted is more.

    RB can tolerate software faults because the alternates are
     usually implemented with different approaches; RB is also
     known as Primary-Backup approach.

CprE 458/558              G. Manimaran (ISU)                      15

Más contenido relacionado

La actualidad más candente

Foult Tolerence In Distributed System
Foult Tolerence In Distributed SystemFoult Tolerence In Distributed System
Foult Tolerence In Distributed SystemRajan Kumar
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performanceSyed Zaid Irshad
 
Software Engineering Process Models
Software Engineering Process Models Software Engineering Process Models
Software Engineering Process Models Satya P. Joshi
 
Off the-shelf components (cots)
Off the-shelf components (cots)Off the-shelf components (cots)
Off the-shelf components (cots)Himanshu
 
formal verification
formal verificationformal verification
formal verificationToseef Aslam
 
Optimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed SystemsOptimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed Systemsmridul mishra
 
Unified process model
Unified process modelUnified process model
Unified process modelRyndaMaala
 
Processor allocation in Distributed Systems
Processor allocation in Distributed SystemsProcessor allocation in Distributed Systems
Processor allocation in Distributed SystemsRitu Ranjan Shrivastwa
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating systemudaya khanal
 
clock synchronization in Distributed System
clock synchronization in Distributed System clock synchronization in Distributed System
clock synchronization in Distributed System Harshita Ved
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems SHATHAN
 
Operating Systems: Device Management
Operating Systems: Device ManagementOperating Systems: Device Management
Operating Systems: Device ManagementDamian T. Gordon
 
Design Concept software engineering
Design Concept software engineeringDesign Concept software engineering
Design Concept software engineeringDarshit Metaliya
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented CommunicationDilum Bandara
 
Software Testing Strategies ,Validation Testing and System Testing.
Software Testing Strategies ,Validation Testing and System Testing.Software Testing Strategies ,Validation Testing and System Testing.
Software Testing Strategies ,Validation Testing and System Testing.Tanzeem Aslam
 

La actualidad más candente (20)

Foult Tolerence In Distributed System
Foult Tolerence In Distributed SystemFoult Tolerence In Distributed System
Foult Tolerence In Distributed System
 
Fault tolerance
Fault toleranceFault tolerance
Fault tolerance
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performance
 
Software Engineering Process Models
Software Engineering Process Models Software Engineering Process Models
Software Engineering Process Models
 
Off the-shelf components (cots)
Off the-shelf components (cots)Off the-shelf components (cots)
Off the-shelf components (cots)
 
formal verification
formal verificationformal verification
formal verification
 
Optimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed SystemsOptimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed Systems
 
Unified process model
Unified process modelUnified process model
Unified process model
 
Processor allocation in Distributed Systems
Processor allocation in Distributed SystemsProcessor allocation in Distributed Systems
Processor allocation in Distributed Systems
 
Distributed operating system
Distributed operating systemDistributed operating system
Distributed operating system
 
Walkthroughs
WalkthroughsWalkthroughs
Walkthroughs
 
clock synchronization in Distributed System
clock synchronization in Distributed System clock synchronization in Distributed System
clock synchronization in Distributed System
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
 
Distributed System ppt
Distributed System pptDistributed System ppt
Distributed System ppt
 
Operating Systems: Device Management
Operating Systems: Device ManagementOperating Systems: Device Management
Operating Systems: Device Management
 
Design Concept software engineering
Design Concept software engineeringDesign Concept software engineering
Design Concept software engineering
 
Message and Stream Oriented Communication
Message and Stream Oriented CommunicationMessage and Stream Oriented Communication
Message and Stream Oriented Communication
 
Software Testing Strategies ,Validation Testing and System Testing.
Software Testing Strategies ,Validation Testing and System Testing.Software Testing Strategies ,Validation Testing and System Testing.
Software Testing Strategies ,Validation Testing and System Testing.
 
3. challenges
3. challenges3. challenges
3. challenges
 
Distributed deadlock
Distributed deadlockDistributed deadlock
Distributed deadlock
 

Destacado

N-version programming
N-version programmingN-version programming
N-version programmingshabnam0102
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemanujos25
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentationskadyan1
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault ToleranceAnkit Singh
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant systemarvinthsaran
 
Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data BaseSiva Rushi
 
Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Sri Prasanna
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testingBipul Roy Bpl
 
Real time database (MDARTS)
Real time database (MDARTS)Real time database (MDARTS)
Real time database (MDARTS)Pradeep Kumar TS
 
Fault management presentation
Fault management presentationFault management presentation
Fault management presentationardhita banu adji
 
Fault Management System (OSS)
Fault Management System (OSS)Fault Management System (OSS)
Fault Management System (OSS)Riswan
 
Be information technology2008course
Be information technology2008courseBe information technology2008course
Be information technology2008courseAnuj Sharma
 
Chapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsChapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsWayne Jones Jnr
 
Introduction to Real-Time Operating Systems
Introduction to Real-Time Operating SystemsIntroduction to Real-Time Operating Systems
Introduction to Real-Time Operating Systemscoolmirza143
 
Real Time Systems & RTOS
Real Time Systems & RTOSReal Time Systems & RTOS
Real Time Systems & RTOSVishwa Mohan
 
DISE - Software Testing and Quality Management
DISE - Software Testing and Quality ManagementDISE - Software Testing and Quality Management
DISE - Software Testing and Quality ManagementRasan Samarasinghe
 

Destacado (20)

N-version programming
N-version programmingN-version programming
N-version programming
 
Fault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating systemFault tolerance techniques for real time operating system
Fault tolerance techniques for real time operating system
 
Fault tolerant presentation
Fault tolerant presentationFault tolerant presentation
Fault tolerant presentation
 
Software Fault Tolerance
Software Fault ToleranceSoftware Fault Tolerance
Software Fault Tolerance
 
Fault tolearant system
Fault tolearant systemFault tolearant system
Fault tolearant system
 
Main MeMory Data Base
Main MeMory Data BaseMain MeMory Data Base
Main MeMory Data Base
 
Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)Fault Tolerance (Distributed computing)
Fault Tolerance (Distributed computing)
 
Software engineering quality assurance and testing
Software engineering quality assurance and testingSoftware engineering quality assurance and testing
Software engineering quality assurance and testing
 
Vxworks
VxworksVxworks
Vxworks
 
Real time database (MDARTS)
Real time database (MDARTS)Real time database (MDARTS)
Real time database (MDARTS)
 
Fault management presentation
Fault management presentationFault management presentation
Fault management presentation
 
Fault Management System (OSS)
Fault Management System (OSS)Fault Management System (OSS)
Fault Management System (OSS)
 
Be information technology2008course
Be information technology2008courseBe information technology2008course
Be information technology2008course
 
Chapter 19 - Real Time Systems
Chapter 19 - Real Time SystemsChapter 19 - Real Time Systems
Chapter 19 - Real Time Systems
 
Ch21 real time software engineering
Ch21 real time software engineeringCh21 real time software engineering
Ch21 real time software engineering
 
DFD level-0 to 1
DFD level-0 to 1DFD level-0 to 1
DFD level-0 to 1
 
Introduction to Real-Time Operating Systems
Introduction to Real-Time Operating SystemsIntroduction to Real-Time Operating Systems
Introduction to Real-Time Operating Systems
 
Real Time Systems & RTOS
Real Time Systems & RTOSReal Time Systems & RTOS
Real Time Systems & RTOS
 
In-memory Databases
In-memory DatabasesIn-memory Databases
In-memory Databases
 
DISE - Software Testing and Quality Management
DISE - Software Testing and Quality ManagementDISE - Software Testing and Quality Management
DISE - Software Testing and Quality Management
 

Similar a Fault tolerance

1 introduction
1 introduction1 introduction
1 introductionhanmya
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management Argyle Executive Forum
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenationRVCE
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenationRVCE2
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenationRVCE
 
Review Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultReview Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultAM Publications
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...vtunotesbysree
 
Systematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage AnalysisSystematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage AnalysisIDES Editor
 
Presentation
PresentationPresentation
Presentations1150056
 
Presentation
PresentationPresentation
Presentations1150056
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSFelipe Prado
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixZongYing Lyu
 
02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindset02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindseteva
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OSvampugani
 
Business Continuity Knowledge Share
Business Continuity Knowledge ShareBusiness Continuity Knowledge Share
Business Continuity Knowledge Share.Gastón. .Bx.
 
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed SystemsA Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed SystemsIRJET Journal
 

Similar a Fault tolerance (20)

1 introduction
1 introduction1 introduction
1 introduction
 
Implementing Vulnerability Management
Implementing Vulnerability Management Implementing Vulnerability Management
Implementing Vulnerability Management
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenation
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenation
 
Software rejuvenation
Software rejuvenationSoftware rejuvenation
Software rejuvenation
 
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
A Practical Fault Tolerance Approach in Cloud Computing Using Support Vector ...
 
Review Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software FaultReview Paper on Recovery of Data during Software Fault
Review Paper on Recovery of Data during Software Fault
 
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
SOLUTION MANUAL OF OPERATING SYSTEM CONCEPTS BY ABRAHAM SILBERSCHATZ, PETER B...
 
Systematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage AnalysisSystematic Model based Testing with Coverage Analysis
Systematic Model based Testing with Coverage Analysis
 
580 584
580 584580 584
580 584
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORSDEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
DEF CON 27 - ALI ISLAM and DAN REGALADO WEAPONIZING HYPERVISORS
 
Techno-Fest-15nov16
Techno-Fest-15nov16Techno-Fest-15nov16
Techno-Fest-15nov16
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unix
 
02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindset02. Fault Tolerance Pattern 위한 mindset
02. Fault Tolerance Pattern 위한 mindset
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
Ch20
Ch20Ch20
Ch20
 
Business Continuity Knowledge Share
Business Continuity Knowledge ShareBusiness Continuity Knowledge Share
Business Continuity Knowledge Share
 
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed SystemsA Brief Review Of Approaches For Fault Tolerance In Distributed Systems
A Brief Review Of Approaches For Fault Tolerance In Distributed Systems
 

Último

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Último (20)

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

Fault tolerance

  • 1. CprE 458/558: Real-Time Systems Lecture 17 Fault-tolerant design techniques CprE 458/558 G. Manimaran (ISU)
  • 2. Fault Tolerant Strategies  Fault tolerance in computer system is achieved through redundancy in hardware, software, information, and/or computations. Such redundancy can be implemented in static, dynamic, or hybrid configurations.  Fault tolerance can be achieved by many techniques:  Fault masking is any process that prevents faults in a system from introducing errors. Example: Error correcting memories and majority voting.  Reconfiguration is the process of eliminating faulty component from a system and restoring the system to some operational state. CprE 458/558 G. Manimaran (ISU) 2
  • 3. Reconfiguration Approach  Fault detection is the process of recognizing that a fault has occurred. Fault detection is often required before any recovery procedure can be initiated.  Fault location is the process of determining where a fault has occurred so that an appropriate recovery can be initiated.  Fault containment is the process of isolating a fault and preventing the effects of that fault from propagating throughout the system.  Fault recovery is the process of remaining operational or regaining operational status via reconfiguration even in the presence of faults. CprE 458/558 G. Manimaran (ISU) 3
  • 4. The Concept of Redundancy  Redundancy is simply the addition of information, resources, or time beyond what is needed for normal system operation.  Hardware redundancy is the addition of extra hardware, usually for the purpose either detecting or tolerating faults.  Software redundancy is the addition of extra software, beyond what is needed to perform a given function, to detect and possibly tolerate faults.  Information redundancy is the addition of extra information beyond that required to implement a given function; for example, error detection codes. CprE 458/558 G. Manimaran (ISU) 4
  • 5. The Concept of Redundancy (Cont’d)  Time redundancy uses additional time to perform the functions of a system such that fault detection and often fault tolerance can be achieved. Transient faults are tolerated by this.  The use of redundancy can provide additional capabilities within a system. But, redundancy can have very important impact on a system's performance, size, weight, power consumption, and reliability. CprE 458/558 G. Manimaran (ISU) 5
  • 6. Hardware Redundancy  Passive techniques use the concept of fault masking. These techniques are designed to achieve fault tolerance without requiring any action on the part of the system. Relies on voting mechanisms.  Active techniques achieve fault tolerance by detecting the existence of faults and performing some action to remove the faulty hardware from the system. That is, active techniques use fault detection, fault location, and fault recovery in an attempt to achieve fault tolerance. CprE 458/558 G. Manimaran (ISU) 6
  • 7. Hardware Redundancy (Cont’d)  Hybrid techniques combine the attractive features of both the passive and active approaches.  Fault masking is used in hybrid systems to prevent erroneous results from being generated.  Fault detection, location, and recovery are also used to improve fault tolerance by removing faulty hardware and replacing it with spares. CprE 458/558 G. Manimaran (ISU) 7
  • 8. Hardware Redundancy - A Taxonomy Title: hard-fault.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 8
  • 9. Triple Modular Redundancy (TMR) Title: tmr1.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 9
  • 10. Software Redundancy - to Detect Software Faults  There are two popular approaches: N-Version Programming (NVP) and Recovery Blocks (RB).  NVP is a forward recovery scheme - it masks faults.  NVP: multiple versions of the same task is executed concurrently.  NVP relies on voting.  RB is a backward error recovery scheme.  RB: the versions of a task are executed serially.  RB relies on acceptance test. CprE 458/558 G. Manimaran (ISU) 10
  • 11. N-Version Programming (NVP)  NVP is based on the principle of design diversity, that is coding a software module by different teams of programmers, to have multiple versions.  Diversity can also be introduced by employing different algorithms for obtaining the same solution or by choosing different programming languages.  NVP can tolerate both hardware and software faults.  Correlated faults are not tolerated by the NVP.  In NVP, deciding the number of versions required to ensure acceptable levels of software reliability is an important design consideration. CprE 458/558 G. Manimaran (ISU) 11
  • 12. N-Version Programming (Cont’d) Title: nvp.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 12
  • 13. Recovery Blocks (RB)  RB uses multiple alternates (backups) to perform the same function; one module (task) is primary and the others are secondary.  The primary task executes first. When the primary task completes execution, its outcome is checked by an acceptance test.  If the output is not acceptable, a secondary task executes after undoing the effects of primary (i.e., rolling back to the state at which primary was invoked) until either an acceptable output is obtained or the alternates are exhausted. CprE 458/558 G. Manimaran (ISU) 13
  • 14. Recovery Blocks (Cont’d) Title: rblocks.fig Creator: fig2dev Version 3.1 Patchlevel 2 Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers. CprE 458/558 G. Manimaran (ISU) 14
  • 15. Recovery Blocks (Cont’d)  The acceptance tests are usually sanity checks; these consist of making sure that the output is within a certain acceptable range or that the output does not change at more than the allowed maximum rate.  Selecting the range for acceptance test is crucial. If the allowed ranges are too small, the acceptance tests may label correct outputs as bad. If they are too large, the probability that incorrect outputs will be accepted is more.  RB can tolerate software faults because the alternates are usually implemented with different approaches; RB is also known as Primary-Backup approach. CprE 458/558 G. Manimaran (ISU) 15