Fault Tolerant and Distributed System

Paradigms in Fault Tolerant Check
pointing Protocols in Distributed
Mobile
Systems

Abstract
• Distributed mobile systems are ubiquitous now-a days.
• Distributed mobile systems are not fault tolerant. They
introduce new challenges in the area of fault tolerant
computing.
• Mobile computing having many issues, such as lower
throughput and latency, low bandwidth of wireless channels,
lack of stable storage on mobile hosts, connection
breakdowns and inadequate battery life.
• This paper surveys the algorithms which will restore the
system back to a consistent state after a failure.

• Various techniques and algorithms have been devised and
developed in this regard. One commonly applied solution to
these failures is the use of Checkpoint/Restart scheme.
• But the problem with this technique is that it rollbacks all the
processors to an earlier stage, even if single processor
crashes.
• The idea behind most of the fault tolerance protocols is to
roll-back only the crashed processor instead of rolling-back all
the processors.
• In such cases, if some processors are not dependent upon the
results of the crashed processors, they can continue to
perform their task without further waiting

• “distributed transaction” is a group of several sub-transactions,
each running and updating data on different computer
systems.
• local “transaction manager” whose purpose is to enlist,
prepare, commit, and abort the calls made by the distributed
transactions.
• Before the occurrence of any distributed transaction, each
participating transaction manager must agree to commit an
action; like, updating.

Failure Models in Mobile Distributed Systems

1) Timing faults – occurs when a module does not complete its
services in time;
2) Omission faults - occurs when a module completely fails to
accomplish its services;
3) Crash faults - occurs when a module either stops operating
completely or never yields to an effective state;
4) Byzantine faults - these are the faults that are random in
nature.

FAULT TOLERANCE PROTOCOLS

The Two-phase commit (2PC) protocol:

 The two-phase commit (2PC) protocol is a distributed
algorithm that assures the reliable termination of a
transaction in a distributed environment.

Phase-I Protocol for the coordinator:

Start
i) Send transaction to the participating nodes.
ii) Wait for signal (YES/NO) from all participating nodes.

Stop

Phase-I Protocol for the participating nodes:

Start
i) Receive transaction from the coordinator.
ii) Do local processing.
iii) Send signal (YES/NO) to the coordinator node.

Stop

Decision making phase(YES)
Phase-II Agreement Protocol for the coordinator:
Start
i) Send commit signal to the participating nodes.
ii) Receive acknowledgment from all participating nodes.
iii) Commit or complete the transaction.
Stop

Phase-II Agreement Protocol for the participating nodes:
Start
i) Receive commit signal from the coordinator.
ii) Commit the transaction.
iii) Release the resources.
iv) Send acknowledgement to the coordinator node.
Stop

In case of (NO)

Phase-II Failure Protocol for the coordinator:

Start
i) Send switchback signal to the participating nodes.
ii) Receive acknowledgment from all participating nodes.
iii) Undo transaction.
Stop
Phase-II Failure Protocol for the participating nodes:

Start
i) Receive switchback signal from the coordinator.
ii) Undo transaction.
iii) Release the resources.
iv) Send acknowledgement to thecoordinator node.
Stop

conclusion
• Reliability can be restored using the above mentioned
techniques of mobile distributed systems
• Although there will be new challenges and thus making such
protocols is still unsuitable.
• Further protocols can be developed to add reliability to such
systems.
• This recent paper provides a further step to restore the
system back to a consistent state even during the presence of
a failure.

Fault Tolerant and Distributed System

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Fault Tolerant and Distributed System

Similar to Fault Tolerant and Distributed System (20)

Recently uploaded

Recently uploaded (20)

Fault Tolerant and Distributed System