SlideShare una empresa de Scribd logo
1 de 31
Distributed Computing Seminar Lecture 2: MapReduce Theory and Implementation Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007 Except as otherwise noted, the contents of this presentation are © Copyright 2007 University of Washington and licensed under the Creative Commons Attribution 2.5 License.
Outline ,[object Object],[object Object],[object Object],[object Object]
Functional Programming Review ,[object Object],[object Object],[object Object],[object Object]
Functional Programming Review ,[object Object],[object Object],[object Object]
Functional Updates Do Not Modify Structures ,[object Object],[object Object],[object Object],The append() function above reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item.  But it  never modifies lst !
Functions Can Be Used As Arguments ,[object Object],It does not matter what f does to its argument; DoDouble() will do it twice. What is the type of this function?
Map ,[object Object],[object Object]
Fold ,[object Object],[object Object]
fold left vs. fold right ,[object Object],[object Object],[object Object],SML Implementation: fun foldl f a []  = a | foldl f a (x::xs) = foldl f (f(x, a)) xs fun foldr f a []  = a | foldr f a (x::xs) = f(x, (foldr f a xs))
Example ,[object Object],[object Object],[object Object]
Example (Solved) ,[object Object],[object Object],[object Object],[object Object],[object Object]
A More Complicated Fold Problem ,[object Object],[object Object],[object Object]
A More Complicated Map Problem ,[object Object],[object Object]
map Implementation ,[object Object],[object Object],fun map f []  = [] | map f (x::xs) = (f x) :: (map f xs)
Implicit Parallelism In map ,[object Object],[object Object],[object Object]
MapReduce
Motivation: Large Scale Data Processing ,[object Object],[object Object],[object Object]
MapReduce ,[object Object],[object Object],[object Object],[object Object]
Programming Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
map ,[object Object],[object Object]
reduce ,[object Object],[object Object],[object Object]
 
Parallelism ,[object Object],[object Object],[object Object],[object Object]
Example: Count word occurrences map(String input_key, String input_value): // input_key: document name  // input_value: document contents  for each  word w  in  input_value:  EmitIntermediate (w, "1");  reduce(String output_key, Iterator intermediate_values):  // output_key: a word  // output_values: a list of counts  int  result = 0;  for each  v  in  intermediate_values:  result += ParseInt(v); Emit (AsString(result));
Example vs. Actual Source Code ,[object Object],[object Object],[object Object],[object Object]
Locality ,[object Object],[object Object]
Fault Tolerance ,[object Object],[object Object],[object Object],[object Object],[object Object]
Optimizations ,[object Object],[object Object],[object Object],Why is it safe to redundantly execute map tasks? Wouldn’t this mess up the total computation?
Optimizations ,[object Object],[object Object],Under what conditions is it sound to use a combiner?
MapReduce Conclusions ,[object Object],[object Object],[object Object],[object Object]
Next Time... ,[object Object]

Más contenido relacionado

La actualidad más candente

Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFishAnushree Prasanna Kumar
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersAbolfazl Asudeh
 
Data Visualization With R: Introduction
Data Visualization With R: IntroductionData Visualization With R: Introduction
Data Visualization With R: IntroductionRsquared Academy
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
Optimization for iterative queries on Mapreduce
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreducemakoto onizuka
 
Real Time Framework by Tonny
Real Time Framework by TonnyReal Time Framework by Tonny
Real Time Framework by TonnyAgate Studio
 
Data Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple GraphsData Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple GraphsRsquared Academy
 
Mi primer map reduce
Mi primer map reduceMi primer map reduce
Mi primer map reducebetabeers
 
Mi primer map reduce
Mi primer map reduceMi primer map reduce
Mi primer map reduceRuben Orta
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsYu Liu
 
Parallel & Distributed Computing
Parallel & Distributed ComputingParallel & Distributed Computing
Parallel & Distributed Computingrohit_ainapure
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce scriptHaripritha
 
Data Visualization With R: Learn To Modify Font Of Graphical Parameters
Data Visualization With R: Learn To Modify Font Of Graphical ParametersData Visualization With R: Learn To Modify Font Of Graphical Parameters
Data Visualization With R: Learn To Modify Font Of Graphical ParametersRsquared Academy
 

La actualidad más candente (20)

Optimization of graph storage using GoFFish
Optimization of graph storage using GoFFishOptimization of graph storage using GoFFish
Optimization of graph storage using GoFFish
 
Map algebra
Map algebraMap algebra
Map algebra
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large Clusters
 
Data Visualization With R: Introduction
Data Visualization With R: IntroductionData Visualization With R: Introduction
Data Visualization With R: Introduction
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
Optimization for iterative queries on Mapreduce
Optimization for iterative queries on MapreduceOptimization for iterative queries on Mapreduce
Optimization for iterative queries on Mapreduce
 
Real Time Framework by Tonny
Real Time Framework by TonnyReal Time Framework by Tonny
Real Time Framework by Tonny
 
Queues
QueuesQueues
Queues
 
Data Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple GraphsData Visualization With R: Learn To Combine Multiple Graphs
Data Visualization With R: Learn To Combine Multiple Graphs
 
Mi primer map reduce
Mi primer map reduceMi primer map reduce
Mi primer map reduce
 
Mi primer map reduce
Mi primer map reduceMi primer map reduce
Mi primer map reduce
 
Main map reduce
Main map reduceMain map reduce
Main map reduce
 
Stack & heap
Stack & heap Stack & heap
Stack & heap
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
 
Parallel & Distributed Computing
Parallel & Distributed ComputingParallel & Distributed Computing
Parallel & Distributed Computing
 
Graph Plots in Matlab
Graph Plots in MatlabGraph Plots in Matlab
Graph Plots in Matlab
 
Stacks
StacksStacks
Stacks
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
 
Data Visualization With R: Learn To Modify Font Of Graphical Parameters
Data Visualization With R: Learn To Modify Font Of Graphical ParametersData Visualization With R: Learn To Modify Font Of Graphical Parameters
Data Visualization With R: Learn To Modify Font Of Graphical Parameters
 
Heap Management
Heap ManagementHeap Management
Heap Management
 

Destacado

Parallel Programming Primer 1
Parallel Programming Primer 1Parallel Programming Primer 1
Parallel Programming Primer 1mobius.cn
 
Industrial management
Industrial management Industrial management
Industrial management Anshu Singh
 
Industrial management
Industrial managementIndustrial management
Industrial managementAkshay Yawale
 
Compiler Design
Compiler DesignCompiler Design
Compiler DesignMir Majid
 
Introduction to computer network
Introduction to computer networkIntroduction to computer network
Introduction to computer networkAshita Agrawal
 
BASIC CONCEPTS OF COMPUTER NETWORKS
BASIC CONCEPTS OF COMPUTER NETWORKS BASIC CONCEPTS OF COMPUTER NETWORKS
BASIC CONCEPTS OF COMPUTER NETWORKS Kak Yong
 

Destacado (9)

Parallel Programming Primer 1
Parallel Programming Primer 1Parallel Programming Primer 1
Parallel Programming Primer 1
 
Industrial management
Industrial management Industrial management
Industrial management
 
Industrial management
Industrial managementIndustrial management
Industrial management
 
Lexical analyzer
Lexical analyzerLexical analyzer
Lexical analyzer
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Compiler Chapter 1
Compiler Chapter 1Compiler Chapter 1
Compiler Chapter 1
 
Networking
NetworkingNetworking
Networking
 
Introduction to computer network
Introduction to computer networkIntroduction to computer network
Introduction to computer network
 
BASIC CONCEPTS OF COMPUTER NETWORKS
BASIC CONCEPTS OF COMPUTER NETWORKS BASIC CONCEPTS OF COMPUTER NETWORKS
BASIC CONCEPTS OF COMPUTER NETWORKS
 

Similar a Lec2 Mapred

Big data shim
Big data shimBig data shim
Big data shimtistrue
 
Functional Programming in F#
Functional Programming in F#Functional Programming in F#
Functional Programming in F#Dmitri Nesteruk
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Stacks,queues,linked-list
Stacks,queues,linked-listStacks,queues,linked-list
Stacks,queues,linked-listpinakspatel
 
Real World Haskell: Lecture 6
Real World Haskell: Lecture 6Real World Haskell: Lecture 6
Real World Haskell: Lecture 6Bryan O'Sullivan
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduceHassan A-j
 
Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3
Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3
Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3Philip Schwarz
 
Please I want a detailed complete answers for each part.Then make.pdf
Please I want a detailed complete answers for each part.Then make.pdfPlease I want a detailed complete answers for each part.Then make.pdf
Please I want a detailed complete answers for each part.Then make.pdfsiennatimbok52331
 
On fuctional programming, high order functions, ML
On fuctional programming, high order functions, MLOn fuctional programming, high order functions, ML
On fuctional programming, high order functions, MLSimone Di Maulo
 
Fusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsFusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsPhilip Schwarz
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkDB Tsai
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technicalalpinedatalabs
 

Similar a Lec2 Mapred (20)

Big data shim
Big data shimBig data shim
Big data shim
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
 
Functional Programming in F#
Functional Programming in F#Functional Programming in F#
Functional Programming in F#
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Matlab1
Matlab1Matlab1
Matlab1
 
Stacks,queues,linked-list
Stacks,queues,linked-listStacks,queues,linked-list
Stacks,queues,linked-list
 
Real World Haskell: Lecture 6
Real World Haskell: Lecture 6Real World Haskell: Lecture 6
Real World Haskell: Lecture 6
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3
Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3
Game of Life - Polyglot FP - Haskell - Scala - Unison - Part 3
 
Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
Please I want a detailed complete answers for each part.Then make.pdf
Please I want a detailed complete answers for each part.Then make.pdfPlease I want a detailed complete answers for each part.Then make.pdf
Please I want a detailed complete answers for each part.Then make.pdf
 
MapReduce
MapReduceMapReduce
MapReduce
 
On fuctional programming, high order functions, ML
On fuctional programming, high order functions, MLOn fuctional programming, high order functions, ML
On fuctional programming, high order functions, ML
 
Fusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with ViewsFusing Transformations of Strict Scala Collections with Views
Fusing Transformations of Strict Scala Collections with Views
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 

Más de mobius.cn

Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clusteringmobius.cn
 
Lec5 Pagerank
Lec5 PagerankLec5 Pagerank
Lec5 Pagerankmobius.cn
 
Advanced Lighting Techniques Dan Baker (Meltdown 2005)
Advanced Lighting Techniques   Dan Baker (Meltdown 2005)Advanced Lighting Techniques   Dan Baker (Meltdown 2005)
Advanced Lighting Techniques Dan Baker (Meltdown 2005)mobius.cn
 
Influence map
Influence mapInfluence map
Influence mapmobius.cn
 

Más de mobius.cn (6)

Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
 
Lec3 Dfs
Lec3 DfsLec3 Dfs
Lec3 Dfs
 
Lec5 Pagerank
Lec5 PagerankLec5 Pagerank
Lec5 Pagerank
 
Lec1 Intro
Lec1 IntroLec1 Intro
Lec1 Intro
 
Advanced Lighting Techniques Dan Baker (Meltdown 2005)
Advanced Lighting Techniques   Dan Baker (Meltdown 2005)Advanced Lighting Techniques   Dan Baker (Meltdown 2005)
Advanced Lighting Techniques Dan Baker (Meltdown 2005)
 
Influence map
Influence mapInfluence map
Influence map
 

Lec2 Mapred

  • 1. Distributed Computing Seminar Lecture 2: MapReduce Theory and Implementation Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007 Except as otherwise noted, the contents of this presentation are © Copyright 2007 University of Washington and licensed under the Creative Commons Attribution 2.5 License.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.  
  • 23.
  • 24. Example: Count word occurrences map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate (w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit (AsString(result));
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.