5. Map/Reduce
• Very scalable algorithm
• Inspirered by map and reduce from
functional programming.
• Everything is based on key/value
tirsdag 14. september 2010
9. functional map
List(“hello”,“dude”).map{x=>x.substring(0,1)}
tirsdag 14. september 2010
10. Map/Reduce map
• Input is key/value
• Output is key/value
tirsdag 14. september 2010
11. Simple Example, Map
• Count occurences of words in a document
• Input is: <linenumber>, <content of line>
• For each word on the line, the output is
<word>, <count>
tirsdag 14. september 2010
14. functional reduce
val sum=List(32,40,23).reduceLeft{_+_}
tirsdag 14. september 2010
15. Map/Reduce reduce
• Input is key/list of values
• Output is key/value
tirsdag 14. september 2010
16. Simple Example, Reduce
• Reduce input is <word, counts>
• For each value we increase the count
• Output is <word>, <sum of counts>
tirsdag 14. september 2010
32. Several iterations
Iteration 1
Iteration 2
Iteration 3
tirsdag 14. september 2010
33. Several iterations
Iteration 1 Iteration 2
Iteration 3
tirsdag 14. september 2010
34. Partitioning
Paul Mary Kate Lea Jeff Ali
Ali Jeff
Lea Kate
Paul Mary
Reducer Reducer
tirsdag 14. september 2010
35. Comparison
Pres 1 Pres 2
Paul Lea Ali Jeff Mary Kate
Paul Kate
Pres 1 Ali Pres 2 Jeff
Lea Mary
Reducer Reducer
tirsdag 14. september 2010
36. Guidelines
• Never access external sources during
computation.
• Your functions should be small and fast
• You might not have all the data available
tirsdag 14. september 2010
37. Hadoop
• Hadoop is reusing objects, so remember to
clone if you plan to keep them.
• You can read and write all objects
implementing hadoop.WritableComparable
• write(DataOutput)
• readFields(DataInput)
• compareTo(Object)
tirsdag 14. september 2010
54. Thank you
Ole-Martin Mørk
olemartin@gmail.com
twitter.com/olemartin
del.icio.us/olemartin/jz10
All images are licensed with Creative Commons.
See http://bit.ly/mr-photos for details,
tirsdag 14. september 2010