Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Fault Tolerant and Distributed System
1. Paradigms in Fault Tolerant Check
pointing Protocols in Distributed
Mobile
Systems
2. Abstract
• Distributed mobile systems are ubiquitous now-a days.
• Distributed mobile systems are not fault tolerant. They
introduce new challenges in the area of fault tolerant
computing.
• Mobile computing having many issues, such as lower
throughput and latency, low bandwidth of wireless channels,
lack of stable storage on mobile hosts, connection
breakdowns and inadequate battery life.
• This paper surveys the algorithms which will restore the
system back to a consistent state after a failure.
3. • Various techniques and algorithms have been devised and
developed in this regard. One commonly applied solution to
these failures is the use of Checkpoint/Restart scheme.
• But the problem with this technique is that it rollbacks all the
processors to an earlier stage, even if single processor
crashes.
• The idea behind most of the fault tolerance protocols is to
roll-back only the crashed processor instead of rolling-back all
the processors.
• In such cases, if some processors are not dependent upon the
results of the crashed processors, they can continue to
perform their task without further waiting
5. • “distributed transaction” is a group of several sub-transactions,
each running and updating data on different computer
systems.
• local “transaction manager” whose purpose is to enlist,
prepare, commit, and abort the calls made by the distributed
transactions.
• Before the occurrence of any distributed transaction, each
participating transaction manager must agree to commit an
action; like, updating.
6. Failure Models in Mobile Distributed Systems
1) Timing faults – occurs when a module does not complete its
services in time;
2) Omission faults - occurs when a module completely fails to
accomplish its services;
3) Crash faults - occurs when a module either stops operating
completely or never yields to an effective state;
4) Byzantine faults - these are the faults that are random in
nature.
7. FAULT TOLERANCE PROTOCOLS
The Two-phase commit (2PC) protocol:
The two-phase commit (2PC) protocol is a distributed
algorithm that assures the reliable termination of a
transaction in a distributed environment.
8. Phase-I Protocol for the coordinator:
Start
i) Send transaction to the participating nodes.
ii) Wait for signal (YES/NO) from all participating nodes.
Stop
Phase-I Protocol for the participating nodes:
Start
i) Receive transaction from the coordinator.
ii) Do local processing.
iii) Send signal (YES/NO) to the coordinator node.
Stop
9. Decision making phase(YES)
Phase-II Agreement Protocol for the coordinator:
Start
i) Send commit signal to the participating nodes.
ii) Receive acknowledgment from all participating nodes.
iii) Commit or complete the transaction.
Stop
Phase-II Agreement Protocol for the participating nodes:
Start
i) Receive commit signal from the coordinator.
ii) Commit the transaction.
iii) Release the resources.
iv) Send acknowledgement to the coordinator node.
Stop
10. In case of (NO)
Phase-II Failure Protocol for the coordinator:
Start
i) Send switchback signal to the participating nodes.
ii) Receive acknowledgment from all participating nodes.
iii) Undo transaction.
Stop
Phase-II Failure Protocol for the participating nodes:
Start
i) Receive switchback signal from the coordinator.
ii) Undo transaction.
iii) Release the resources.
iv) Send acknowledgement to thecoordinator node.
Stop
11.
12. conclusion
• Reliability can be restored using the above mentioned
techniques of mobile distributed systems
• Although there will be new challenges and thus making such
protocols is still unsuitable.
• Further protocols can be developed to add reliability to such
systems.
• This recent paper provides a further step to restore the
system back to a consistent state even during the presence of
a failure.