2024: Domino Containers - The Next Step. News from the Domino Container commu...
Hadoop fault tolerance
1. A survey on Hadoop: Fault Tolerance
and Optimization of Fault Tolerance
Model
Group-11
Project Guide : Mr. R Patgiri
Members:
Pallav(10-1-5-023)
Prabhakar Barua(10-1-5-017)
Prabodh Hend(10-1-5-053)
Prem Chandra(09-1-5-062)
Jugal Assudani(10-1-5-068)
2. What is Apache Hadoop?
• Large scale, open source software framework
▫ Yahoo! has been the largest contributor to date
• Dedicated to scalable, distributed, data-intensive
computing
• Handles thousands of nodes and petabytes of data
• Supports applications under a free license
• 2 Hadoop subprojects:
▫ HDFS: Hadoop Distributed File System with high
throughput access to application data
▫ MapReduce: A software framework for distributed
processing of large data sets on computer clusters
3. Hadoop MapReduce
• MapReduce is a programming model and software
framework first developed by Google (Google’s
MapReduce paper submitted in 2004)
• Intended to facilitate and simplify the processing of
vast amounts of data in parallel on large clusters of
commodity hardware in a reliable, fault-tolerant
manner
▫ Petabytes of data
▫ Thousands of nodes
• Computational processing occurs on both:
▫ Unstructured data : filesystem
▫ Structured data : database
4. Hadoop Distributed File System (HDFS)
• Inspired by Google File System
• Scalable, distributed, portable filesystem written in Java for
Hadoop framework
▫ Primary distributed storage used by Hadoop applications
• HFDS can be part of a Hadoop cluster or can be a stand-alone
general purpose distributed file system
• An HFDS cluster primarily consists of
▫ NameNode that manages file system metadata
▫ DataNode that stores actual data
• Stores very large files in blocks across machines in a large
cluster
▫ Reliability and fault tolerance ensured by replicating data across
multiple hosts
• Has data awareness between nodes
• Designed to be deployed on low-cost hardware
5. Typical Hadoop cluster integrates
MapReduce and HFDS
• Master/slave architecture
• Master node contains
▫ Job tracker node (MapReduce layer)
▫ Task tracker node (MapReduce layer)
▫ Name node (HFDS layer)
▫ Data node (HFDS layer)
• Multiple slave nodes contain
▫ Task tracker node (MapReduce layer)
▫ Data node (HFDS layer)
• MapReduce layer has job and task tracker nodes
• HFDS layer has name and data nodes
7. MapReduce framework
• Per cluster node:
▫ Single JobTracker per master
1. Responsible for scheduling the jobs
component tasks on the slaves
2. Monitors slave progress
3. Re-executing failed tasks
▫ Single TaskTracker per slave
1.
Execute the tasks as directed by the master
8. MapReduce core functionality
• Code usually written in Java- though it can be written in
other languages with the Hadoop Streaming API
• Two fundamental pieces:
▫ Map step
1. Master node takes large problem input and slices it into
smaller sub problems; distributes these to worker nodes.
2.Worker node may do this again; leads to a multi-level tree
structure
3.Worker processes smaller problem and hands back to master
▫ Reduce step
1. Master node takes the answers to the sub problems and
combines them in a predefined way to get the output/answer
to original problem
9. MapReduce core functionality (II)
• Data flow beyond the two key pieces (map and reduce):
▫ Input reader – divides input into appropriate size splits
which get assigned to a Map function
▫ Map function – maps file data to smaller, intermediate
<key, value> pairs
▫ Compare function – input for Reduce is pulled from the
Map intermediate output and sorted according to ths
compare function
▫ Reduce function – takes intermediate values and reduces to
a smaller solution handed back to the framework
▫ Output writer – writes file output
10. MapReduce core functionality (III)
• A MapReduce Job controls the execution
▫ Splits the input dataset into independent chunks
▫ Processed by the map tasks in parallel
• The framework sorts the outputs of the maps
• A MapReduce Task is sent the output of the
framework to reduce and combine
• Both the input and output of the job are stored
in a filesystem
• Framework handles scheduling
▫ Monitors and re-executes failed tasks
16. Map class (II)
Two maps are
generated (1 per
file)
Second map
emits:
First map emits:
< hello, 1 >
< world, 1 >
< hello, 1 >
< moon, 1 >
< goodbye, 1 >
< world, 1 >
< goodnight, 1 >
< moon, 1 >
17. Hadoop MapReduce Word Count Source
public static class MapClass extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word, one);
}
}
}
18. /**
* A reducer class that just emits the sum of the input values.
*/
public static class Reduce extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
19. Hadoop MapReduce Word Count Driver
public void run(String inputPath, String outputPath) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount"); // the keys are words (strings)
conf.setOutputKeyClass(Text.class); // the values are counts (ints)
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(MapClass.class); conf.setReducerClass(Reduce.class);
FileInputFormat.addInputPath(conf, new Path(inputPath));
FileOutputFormat.setOutputPath(conf, new Path(outputPath));
JobClient.runJob(conf); }
21. 1.Addition of one more “Backup” state
in Hadoop Pipeline
• Motive
In a Hadoop pipeline when system failure occurs
then then the whole process of MapReduce is
computed again, although it is a straight forward and
the most plausible remedy, However when the
system turns faulty after the mapping process the
whole process of MapReduce is done again. If the
Intermediate key value pairs can be backedup then
even after the faults in the system the intermediate
key value pair data can be fed to another cluster of
reducers.
24. 1.Addition of one more “Backup” state
in Hadoop Pipeline
• Once this backup system is installed in the pipeline
then the adaptivity of hadoop in terms of fault
tolerance will increase. As the intermediate data is
preserved for the reducer, even if the earlier cluster is
non functional, the preserved data could be fed into a
new reducer. Although the scheduling decisions to be
taken by the master remains unchanged i.e. O(M+R),
where M and R are the mappers and Reducers, but
time required to allocate a new cluster for computing
MapReduce again will be saved. Once the Reducer
completes the process the indermediate data is
removed from the backup device.
25. 1.Addition of one more “Backup” state
in Hadoop Pipeline
Advantages & Disadvantages
Advantages
Unnecessary computation of Mapping is subsided.
BackUp is created as a new state in the pipeline.
Disadvantage
Cost overhead
There will be some delay in the reducer function as backup also will be accessing the
data.
26. 2. Protocol for Supernode
• Motive
Hadoop usually does not have any communication between
its slave nodes in which control signals regarding the status is
shared. This proposal tries to estalblish a differnt type of
communication between the slave nodes so that when any
node turns faulty, the neighbouring nodes try to do its job. In
this new infrastructure configuration, neighboring nodes will
behave as a supernode, and each node will know the other
nodes in the supernode and the tasks they have assigned.
Thus, in case one of the nodes fails, another node in the
supernode can take the role of the failed one, without
needing the JobTracker to know about it or take any action.
31. 2. Protocol for Supernode
• Detection of Fault
Every neighbouring nodes (Node 1, Node 2, Node 3, Node 4, Node 5 ) will
ping Node 1 periodically after time T(T<heartbeat) and keep track of it. If
there is no response from Node 1 for a certain time interval it will be
assumed as a non functional node.
• Info Node 1 sends to its neighbour during normal execution
1.
2.
3.
Task Information
Location of Data in Shared Memory
Progress of Task(Checkpoint)
• Failure of Node
Nodes 1,2,3,4,5 ping each other periodically, These nodes should get a
response from other nodes, however if there comes no response from
other nodes for a period of three handshake time, it will be considered as
a faulty node.
32. 2. Protocol for Supernode
• Re assingment of tasks to Node 2
Any of the neighbour 2,3,4,5 which has completed its
job or it is about to complete i.e. it has free resources,
will assign itself this task and its task tracker will
schedule it.
• Revival of Node 1
If Node 1 starts working again and tries to gain access to
the shared memory where Node 2 is already performing
the task allocated to Node 1,the Control Unit of the
shared memory will prevent the Node 1 from accessing
the shared memory location.
33. 2. Protocol for Supernode
• Control Unit
Control unit is present in the shared memory. Its job is to
handle all theshared memory in the cluster. It also prevents
simultaneous access of more than 1 node to a particular
memory segment.
• Client App request
Before this structure was theorised, ie. the current system in
use, the client app requests the main node for the address of
the reducers where it may find the required answers, now
will request the CU of the shared memory associated with the
task related to the client app via name node, the address of
the shared memory which will be kept track of by the main
node.
34. 2. Protocol for Supernode
• Advantages
1. More fault tolerant
When any node becomes non-functional, then the node present nearby
(ie. Supernode), which is near completion or has already completed its
task reassigns itself to the task of that faulty node, The description of
which is present in the shared memory. Therefore a faulty node does not
have to wait for the Master node to notice about its non-functionality and
hence reduce execution time in case any of the node gets faulty.
2.
Shared Memory
By use of shared memory, the memory of a particular node does not get
blocked in case it gets faulty, it is available to other nodes, with a
controlled access to it.
3.
Control Unit
Control Unit prevents simultaneous access of the same memory block,
thereby providing data integrity. It also provides the client application
about the data it requires.
35. 2. Protocol for Supernode
• Disadvantages
• Cost overhead
Extra hardware is needed to implement this
structure.
• Bandwidth consumption
There will be some bandwidth consumption during
the interaction between the nodes.
36. System Of Slaves
Motive
In Hadoop every node is a single commodity
hardware. However a vision of system of nodes has
never been implemented. If a node turns non
functional, its task is reassigned by the master to
another node. To decrease the time in assigning this
task and all the overhead caused during this
process is reduced by making a system of multiple
nodes as a unit.
38. System Of Slaves
Theorization
Unlike conventional Hadoop Mapreduce each slave is now
considered as a system of slaves which contain multiple
nodes. Athough similar to the process of task assignment in
Hadoop here also master distributes the task among several
system of slaves.Name node informs each system about the
location of data in the shared memory.System seeks for the
data and after it has got the access permission N1 and N2
acquires the data and they start computation. If N1 gets nonfunctional N2 resumes N1’s functioning by completing its own
task and after moving forwad to N1’s task. N2 gets the
information of N1 just by going through the checkpoints and
output data. If the whole system fails supernode algorithm
will come into play.
40. System Of Slaves
Division of data:If chunk size is 64 MB, N1 and N2 each will get 32MB of
data. Although N1 and N2 cannot access the data
simultaneously.
Firstly when the map-reduce job will be assigned to the
system, N1 will acquire the data. Here N1 is given higher
priority. After N1 takes its data, N2 will proceed.
However if N1 turns faulty, after some time, N2 will start
processing its data, and while submitting the checkpoint
it will notice the inavailablity of N1.