SlideShare una empresa de Scribd logo
1 de 19
MapReduce : Simplified Data
Processing on Large Cluster
Dae Ho Kim, Dept of Computer Science, Sangmyung Univ.
Introduction
The Age of Big Data
Introduction
The Age of Big Data
SNS
IoT
Smart
Phone
Introduction
The Age of Big Data
SNS
IoT
Smart
Phone Large-scale Computation
Introduction
The Age of Big Data
Automatic
Powerful
Simple
Introduction
The Age of Big Data
Automatic
Powerful
Simple
MapReduce
Concept Description
MapReduce
Concept Description
MapReduce
MapReduce
Map Reduce
Input Data -> key / value Merge Values
Concept Description
MapReduce
Implementation
Overview, Fault Tolerance, …
Implementation
Execution Overview
In-progress completeidle
Map Reduce
MasterWorker
Implementation
Execution Overview
Implementation
Fault Tolerance
1. Worker Failure
• If no response is received from a worker in a certain amount of time, the master marks the
worker as failed.
• Any map task or reduce task in progress on a failed worker is reset to idle and becomes
eligible for rescheduling.
• Completed map task are re-executed on an failure because their output is stored on the
local disk(s) of the failed machine and is therefore inaccessible. Completed reduce tasks do
not need to be re-executed since their output is stored in a global file system.
• When a map task is executed first by worker A and then later executed by worker B
(because A failed), all workers executing reduce tasks are notified of the re-execution. Any
reduce task that has not already read the data from worker A will read the data from
worker B.
Implementation
Fault Tolerance
2. Master Failure
• Our current implementation aborts the MapReduce computation if the master fails.
Clients can check for this condition and retry the MapReduce operation if they desire.
3. Semantics in the Presence of Failures
• When the user-supplied map and reduce operators are deterministic functions of their
input values, our distributed implementation produces the same output as would have
been produced by a non-faulting sequential execution of the entire program.
• If the master receives a completion message for an already completed map task, it
ignores the message.
• If the same reduce task is executed on multiple machines, multiple rename calls will be
executed for the same final output file. We rely on the atomic rename operation provided
by the underlying file system to guarantee that the final file system state contains just the
data produced by one execution of the reduce task.
• The vast majority of our map and reduce operators are deterministic, and the fact that our
semantics are equivalent to a sequential execution in this case makes it very easy for
programmers to reason about their program’s behavior.
Implementation
Backup Tasks
 Backup Tasks
• One of the common causes that lengthens the total
time taken for a MapReduce operation is a “straggler”:
a machine that takes an unusually long time to
complete one of the last few map or reduce tasks in
the computation.
• When a MapReduce operation is close to completion,
the master schedules backup executions of the
remaining in-progress tasks. The task is marked as
completed whenever either the primary or the backup
execution completes.
• The sort program described in Section 5.3 takes 44%
longer to complete when the backup task mechanism is
disabled.
Refinements
Partitioning Function, Combiner Function, …
Refinements
Partitioning Function, Combiner Function
 Partitioning Function
• Data gets partitioned across these tasks using a partitioning function on the intermediate key.
• A default partitioning function is provided that uses hashing (e.g. “hash(key) mod R”).
• For example, using “hash(Hostname(urlkey)) mod R” as the partitioning function causes all
URLs from the same host to end up in the same output file.
 Combiner Function
• In some cases, there is significant repetition in the intermediate keys produced by each map
task, and the user specified Reduce function is commutative and associative.
• We allow the user to specify an optional Combiner function that does partial merging of this
data before it is sent over the network.
• The Combiner function is executed on each machine that performs a map task.
Refinements
Skipping Bad Records, Counters
 Skipping Bad Records
• Sometimes it is acceptable to ignore a few records, for example when doing statistical
analysis on a large data set.
• We provide an optional mode of execution where the MapReduce library detects which
records cause deterministic crashes and skips these records in order to make forward
progress.
• When the master has seen more than one failure on a particular record, the signal handler
indicates that the record should be skipped when it issues the next re-execution of the
corresponding Map or Reduce task.
 Counters
• To use this facility, user code creates a named counter object and then increments the
counter appropriately in the Map and/or Reduce function.
• The current counter values are also displayed on the master status page so that a human can
watch the progress of the live computation.
Map reduce

Más contenido relacionado

La actualidad más candente

Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
Subhas Kumar Ghosh
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable Computation
Qian Lin
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
Subhas Kumar Ghosh
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
Uday Vakalapudi
 
Wei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingWei's notes on MapReduce Scheduling
Wei's notes on MapReduce Scheduling
Lu Wei
 

La actualidad más candente (19)

Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable Computation
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Load balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed systemLoad balancing In cloud - In a semi distributed system
Load balancing In cloud - In a semi distributed system
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Map reduce
Map reduceMap reduce
Map reduce
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
load balancing in public cloud
load balancing in public cloudload balancing in public cloud
load balancing in public cloud
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
Mapreduce total order sorting technique
Mapreduce total order sorting techniqueMapreduce total order sorting technique
Mapreduce total order sorting technique
 
An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)An Enhanced MapReduce Model (on BSP)
An Enhanced MapReduce Model (on BSP)
 
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
 
Processing Large Datasets for the National Broadband Map with FME
Processing Large Datasets for the National Broadband Map with FMEProcessing Large Datasets for the National Broadband Map with FME
Processing Large Datasets for the National Broadband Map with FME
 
Processing Large Datasets for the National Broadband Map with FME
Processing Large Datasets for the National Broadband Map with FMEProcessing Large Datasets for the National Broadband Map with FME
Processing Large Datasets for the National Broadband Map with FME
 
Wei's notes on MapReduce Scheduling
Wei's notes on MapReduce SchedulingWei's notes on MapReduce Scheduling
Wei's notes on MapReduce Scheduling
 
A load balancing model based on cloud partitioning for the public cloud. ppt
A  load balancing model based on cloud partitioning for the public cloud. ppt A  load balancing model based on cloud partitioning for the public cloud. ppt
A load balancing model based on cloud partitioning for the public cloud. ppt
 

Similar a Map reduce

module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
TSANKARARAO
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Pallav Jha
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
Indhujeni
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation Contest
AMIT BORUDE
 

Similar a Map reduce (20)

MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
E031201032036
E031201032036E031201032036
E031201032036
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
Introduction of MapReduce
Introduction of MapReduceIntroduction of MapReduce
Introduction of MapReduce
 
MapReduce
MapReduceMapReduce
MapReduce
 
Hadoop
HadoopHadoop
Hadoop
 
Architecting for the cloud map reduce creating
Architecting for the cloud   map reduce creatingArchitecting for the cloud   map reduce creating
Architecting for the cloud map reduce creating
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Juniper Innovation Contest
Juniper Innovation ContestJuniper Innovation Contest
Juniper Innovation Contest
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
MapReduce
MapReduceMapReduce
MapReduce
 

Último

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Último (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 

Map reduce

  • 1. MapReduce : Simplified Data Processing on Large Cluster Dae Ho Kim, Dept of Computer Science, Sangmyung Univ.
  • 3. Introduction The Age of Big Data SNS IoT Smart Phone
  • 4. Introduction The Age of Big Data SNS IoT Smart Phone Large-scale Computation
  • 5. Introduction The Age of Big Data Automatic Powerful Simple
  • 6. Introduction The Age of Big Data Automatic Powerful Simple MapReduce
  • 13. Implementation Fault Tolerance 1. Worker Failure • If no response is received from a worker in a certain amount of time, the master marks the worker as failed. • Any map task or reduce task in progress on a failed worker is reset to idle and becomes eligible for rescheduling. • Completed map task are re-executed on an failure because their output is stored on the local disk(s) of the failed machine and is therefore inaccessible. Completed reduce tasks do not need to be re-executed since their output is stored in a global file system. • When a map task is executed first by worker A and then later executed by worker B (because A failed), all workers executing reduce tasks are notified of the re-execution. Any reduce task that has not already read the data from worker A will read the data from worker B.
  • 14. Implementation Fault Tolerance 2. Master Failure • Our current implementation aborts the MapReduce computation if the master fails. Clients can check for this condition and retry the MapReduce operation if they desire. 3. Semantics in the Presence of Failures • When the user-supplied map and reduce operators are deterministic functions of their input values, our distributed implementation produces the same output as would have been produced by a non-faulting sequential execution of the entire program. • If the master receives a completion message for an already completed map task, it ignores the message. • If the same reduce task is executed on multiple machines, multiple rename calls will be executed for the same final output file. We rely on the atomic rename operation provided by the underlying file system to guarantee that the final file system state contains just the data produced by one execution of the reduce task. • The vast majority of our map and reduce operators are deterministic, and the fact that our semantics are equivalent to a sequential execution in this case makes it very easy for programmers to reason about their program’s behavior.
  • 15. Implementation Backup Tasks  Backup Tasks • One of the common causes that lengthens the total time taken for a MapReduce operation is a “straggler”: a machine that takes an unusually long time to complete one of the last few map or reduce tasks in the computation. • When a MapReduce operation is close to completion, the master schedules backup executions of the remaining in-progress tasks. The task is marked as completed whenever either the primary or the backup execution completes. • The sort program described in Section 5.3 takes 44% longer to complete when the backup task mechanism is disabled.
  • 17. Refinements Partitioning Function, Combiner Function  Partitioning Function • Data gets partitioned across these tasks using a partitioning function on the intermediate key. • A default partitioning function is provided that uses hashing (e.g. “hash(key) mod R”). • For example, using “hash(Hostname(urlkey)) mod R” as the partitioning function causes all URLs from the same host to end up in the same output file.  Combiner Function • In some cases, there is significant repetition in the intermediate keys produced by each map task, and the user specified Reduce function is commutative and associative. • We allow the user to specify an optional Combiner function that does partial merging of this data before it is sent over the network. • The Combiner function is executed on each machine that performs a map task.
  • 18. Refinements Skipping Bad Records, Counters  Skipping Bad Records • Sometimes it is acceptable to ignore a few records, for example when doing statistical analysis on a large data set. • We provide an optional mode of execution where the MapReduce library detects which records cause deterministic crashes and skips these records in order to make forward progress. • When the master has seen more than one failure on a particular record, the signal handler indicates that the record should be skipped when it issues the next re-execution of the corresponding Map or Reduce task.  Counters • To use this facility, user code creates a named counter object and then increments the counter appropriately in the Map and/or Reduce function. • The current counter values are also displayed on the master status page so that a human can watch the progress of the live computation.

Notas del editor

  1. 그래서 나온 것이 분산컴퓨팅인데 분산컴퓨팅이란 여러 대의 컴퓨터가 하나의 작업을 나누어 처리하는 방식이다. 그리고 이 분산 컴퓨팅을 보다 쉽고 간편하게 하기 위해 만든 것이 맵 리듀스 .
  2. 그래서 나온 것이 분산컴퓨팅인데 분산컴퓨팅이란 여러 대의 컴퓨터가 하나의 작업을 나누어 처리하는 방식이다. 그리고 이 분산 컴퓨팅을 보다 쉽고 간편하게 하기 위해 만든 것이 맵 리듀스 .
  3. Straggler : CPU, Memory, Local disk, Network bandwidth etc..