3. Background
● Transformation operations are conceptually straightforward
○ Until data is large and the computation must be
distributed over hundred or thousands of machines
● So, Google created MapReduce
● MapReduce is a programming abstraction
○ Expresses simple computations
○ Hides complexity details
4. Model
● Utilizes higher-order shaping functions Map and Reduce to
take a set of input key/value pairs and produce a set of
output key/value pairs
● Map
○ Takes an input key/value pair and produces a set of
intermediate key/value pairs
● Reduce
○ Accepts an intermediate key I and a set of values for
that key, and merges those values to form possibly
smaller sets of values
5. Examples
● Distributed Grep
● Count of URL Access Frequency
● Reverse Web-Link Graph
● Term-Vector per Host
● Inverted Index
● Distributed Sort
7. Conclusions
● The MapReduce programming model proved to be a useful
abstraction for many different purposes
○ Easy to use
■ even for programmers without experience with
parallel and distributed systems
○ A large variety of problems are easily expressible as
MapReduce computations
○ The implementation scales to large clusters of machines
● Greatly simplifies large-scale computations at Google