6. MapReduce
• Google
• map reduce
>>> map(lambda x: x ** 2, range(1, 6)) map
[1, 4, 9, 16, 25]
>>> reduce(lambda a, b: a + b, range(1, 6)) reduce
15
• OK
•
KVS
MapReduce Big Table
Google File System
7. Hadoop
• Google
• MapReduce Hadoop MapReduce
Google Hadoop
KVS KVS
Hadoop
MapReduce Big Table HBase
MapReduce
Hadoop Distributed File System
Google File System
(HDFS)
Google Hadoop
10. WordCount
JobTracker
JobClient
HDFS
the end of money is
the end of love
11. WordCount
JobTracker
JobClient
assign map task assign reduce task
HDFS the end of love
mapper
the end of money is
reducer
the end of love the end of money is
mapper
12. WordCount
JobTracker
JobClient
the
1
end
1
of
1
money
1
HDFS is
1
mapper
the end of money is the
1
end
1 reducer
the end of love of
1
love
1
mapper
Map
phase
13. WordCount
JobTracker
JobClient
the
1
end
1 end
1
of
1 end
1
money
1 is
1
is
1 love
1
HDFS money
1
of
1
mapper copy & sort of
1
the
1
the
1
the end of money is the
1
end
1 reducer
the end of love of
1
love
1
mapper
Shuffle
phase
14. WordCount
JobTracker
JobClient
end
<1, 1>
HDFS is
<1>
love
<1>
mapper money
<1>
of
<1, 1>
the
<1, 1>
the end of money is
reducer
the end of love
mapper
15. WordCount
JobTracker
JobClient
end
<1, 1>
HDFS is
<1>
HDFS
love
<1>
mapper money
<1>
of
<1, 1>
the
<1, 1> end
2
is
1
the end of money is love
1
reducer money
1
the end of love of
2
the
2
mapper
Reduce
phase
17. MapReduce
$ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce
text.txt
the end of money is
mapper reducer
the end of love
18. MapReduce
$ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce
6 sub map { the
1
7 my $text = shift; end
1
8 my @words = split /s/, $text; of
1
9 foreach my $word (@words) { money
1
10 print $word, "t", 1, "n"; is
1
11 } the
1
12 } end
1
of
1
love
1
the end of money is
mapper reducer
the end of love
map
Map
phase
19. MapReduce
$ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce
the
1 end
1
end
1 end
1
of
1 is
1
money
1 love
1
is
1 money
1
the
1 of
1
end
1 of
1
of
1 the
1
love
1 the
1
the end of money is copy & sort
mapper reducer
the end of love
Shuffle
phase
20. MapReduce
$ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce
14 sub reduce {
15 my ($key, @values) = @_;
end
<1, 1>
16 my $cnt = 0;
is
<1> 17 foreach my $value (@values) {
love
<1> 18 $cnt += $value;
money
<1> 19 }
of
<1, 1> 20 print $key, "t", $cnt, "n";
the
<1, 1> 21 }
end
2
the end of money is is
1
mapper reducer love
1
the end of love money
1
of
2
the
2
reduce
Reduce
phase
26. Shuffle
• Map Combine reducer
reducer
shuffle sort
mapper hash(the) % 2 = 0 reducer
hash(end) % 2 = 0
Map hash(is) % 2 = 0
the
1 end
1 end
1
the
1 sort end
1 copy end
1
end
1 end
1
end
1 partition is
1 is
1 is
1
end
1
of
1 the
1 the
1 the
1
fuga
1
money
1 end
1 the
1 the
1
hoge
1
is
1 sort & merge
hash(key) % 2 is
1
the
1
the
1
end
1 copy the
1
of
1 hoge
1
love
1 of
1 love
1 fuga
1
sort money
1
partition money
1
of
1 of
1
love
1 of
1
hash(of) % 2 = 1
hash(money) % 2 = 1
hash(love) % 2 = 1 reducer