SlideShare una empresa de Scribd logo
1 de 15
Wei’s Notes on Map-Reduce Job Scheduling Feb 2011
[Map-Reduce] Workflow Master splits a job into small chunks (symd model) Assign to slaves with available mapper slots (taking into account of data locality) Mapper collects required data, puts through user defined mapper function Mapper writes intermediate results to local disk, report to Master with location of the results Master record status, pick slaves with available reducer and push over location info for reduce phase (*locality? Yes!) Reducer copies data from mapper via RPC, waits for all mappers to finish, then sorts by intermediate keys, eventually puts through user defined reducer function Reducer writes final output to DFS, report to Master
[Map-Reduce] Data flow Raw Map(k1, v1) -> list(k2, v2) Reduce(k2, list(v2)) -> list(v2) *why not v3?
[Map-Reduce] Fault Tolerance Upon machine failure:
[Map-Reduce] To-Dos Splitting:  When: upon arrival or upon head-of-queue  how is size M determined? (based on chunk size) “can be processed in parallel by different machines” Cost of re-execution Map & reduce
[Fair Scheduler] 3-phase allocation Satisfy the pool whose min share >= demand Allocate resources to the other pools up to its min share Residual given to the unfilled, starting with the least fulfilled Notes Resource allocation is pool based instead of job based Pool: min share is user specified
[Fair Scheduler] reschedule Policy: wait & kill Algorithm: Wait Tmin. If min share not achieved, kill others Wait Tfair. If fare share not achieved, kill more.
[Fair Scheduler] Issues & Solutions Data Locality Delay scheduling: address sticky slots issue IO-rate biasing: address hotspot node  Map/Reduce interdependency Copy-Compute Splitting: overlapping IO intensive copy and CPU intensive reducing
[Fair Scheduler] Tradeoffs Batch response time: fairness vs. utilization tradeoff (throughput)  Average Response Time Space Usage with Intermediate Data User Isolation: “ability to provide worst-case performance comparable to owning a small private cluster regardless of user workload”
[Fair Scheduler] To-Dos<done> Reschedule/Reassignment FairScheduler keeps UPDATE_INTERVAL, check all pools for tasks to preempt and set status of those tasks, and place in action queue.  Next heartbeat will pick up the changes in task status and carry out the kills. Relationship between batch response time and throughput: measure the same thing.  Relationship between average response time and user isolation: could be correlated, but not all the time. ART is not a quantitative measurement of user isolation
[Quincy] Model the problem as a flow network Flow network: a directed graph each of whose  Edges e is annotated with a non-negative integer capacity and a cost, and whose Nodes v is annotated with an integer “supply” where total supply of the graph equals to zero To construct simplest graph with only hard constraint being no starvation
Quincy vs. Fair Scheduler
Readings MapReduce. Jeffery Dean* Google: Cluster Computing and MR Job Scheduling for Multi-User. Matei Zaharia* Max-min fairness. Wikipedia + algo* Quincy. Michael Isard* An update on Google’s infrastructure
Topic Before: Existing systems predetermined and fixed allocation of resources/slots to queries/tasks. Intuitively, if resources can be dynamically allocated to tasks, the resources can be better utilized. After: Enable scheduler to make resource aware decisions. (IO, CPU, memory) + bring fair scheduler from pool level to job level.
Tips from Prof Tan Keep references of all the literature reviews done and note where it is published

Más contenido relacionado

La actualidad más candente

Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersCleverence Kombe
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepSubhas Kumar Ghosh
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSArchana Gopinath
 
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...jencyjayastina
 
A load balancing model based on cloud partitioning
A load balancing model based on cloud partitioningA load balancing model based on cloud partitioning
A load balancing model based on cloud partitioningLavanya Vigrahala
 
load balancing in public cloud
load balancing in public cloudload balancing in public cloud
load balancing in public cloudSudhagarp Cse
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationQian Lin
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationAhmad El Tawil
 
3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTMHartanto Sanjaya
 
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
Base paper ppt-. A  load balancing model based on cloud partitioning for the ...Base paper ppt-. A  load balancing model based on cloud partitioning for the ...
Base paper ppt-. A load balancing model based on cloud partitioning for the ...Lavanya Vigrahala
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reducePranshu Pathak
 
Parallel Algorithm Models
Parallel Algorithm ModelsParallel Algorithm Models
Parallel Algorithm ModelsMartin Coronel
 
Multi-level Elasticity Control of Cloud Services -- ICSOC 2013
Multi-level Elasticity Control of Cloud Services -- ICSOC 2013Multi-level Elasticity Control of Cloud Services -- ICSOC 2013
Multi-level Elasticity Control of Cloud Services -- ICSOC 2013Georgiana Copil
 
An Efficient Decentralized Load Balancing Algorithm in Cloud Computing
An Efficient Decentralized Load Balancing Algorithm in Cloud ComputingAn Efficient Decentralized Load Balancing Algorithm in Cloud Computing
An Efficient Decentralized Load Balancing Algorithm in Cloud ComputingAisha Kalsoom
 

La actualidad más candente (20)

Map reduce
Map reduceMap reduce
Map reduce
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
 
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
Introduction to map reduce s. jency jayastina II MSC COMPUTER SCIENCE BON SEC...
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
A load balancing model based on cloud partitioning
A load balancing model based on cloud partitioningA load balancing model based on cloud partitioning
A load balancing model based on cloud partitioning
 
load balancing in public cloud
load balancing in public cloudload balancing in public cloud
load balancing in public cloud
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable Computation
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM3D Analyst - Watershed from SRTM
3D Analyst - Watershed from SRTM
 
Graph chi
Graph chiGraph chi
Graph chi
 
Communication
CommunicationCommunication
Communication
 
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
Base paper ppt-. A  load balancing model based on cloud partitioning for the ...Base paper ppt-. A  load balancing model based on cloud partitioning for the ...
Base paper ppt-. A load balancing model based on cloud partitioning for the ...
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
02 Map Reduce
02 Map Reduce02 Map Reduce
02 Map Reduce
 
Parallel Algorithm Models
Parallel Algorithm ModelsParallel Algorithm Models
Parallel Algorithm Models
 
Multi-level Elasticity Control of Cloud Services -- ICSOC 2013
Multi-level Elasticity Control of Cloud Services -- ICSOC 2013Multi-level Elasticity Control of Cloud Services -- ICSOC 2013
Multi-level Elasticity Control of Cloud Services -- ICSOC 2013
 
An Efficient Decentralized Load Balancing Algorithm in Cloud Computing
An Efficient Decentralized Load Balancing Algorithm in Cloud ComputingAn Efficient Decentralized Load Balancing Algorithm in Cloud Computing
An Efficient Decentralized Load Balancing Algorithm in Cloud Computing
 

Similar a Wei's notes on MapReduce Scheduling

Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptxShimoFcis
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
MapReduce: Ordering and  Large-Scale Indexing on Large ClustersMapReduce: Ordering and  Large-Scale Indexing on Large Clusters
MapReduce: Ordering and Large-Scale Indexing on Large ClustersIRJET Journal
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model examIndhujeni
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Yahoo Developer Network
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingcoolmirza143
 
Map reduce
Map reduceMap reduce
Map reducexydii
 

Similar a Wei's notes on MapReduce Scheduling (20)

Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
MapReduce
MapReduceMapReduce
MapReduce
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
E031201032036
E031201032036E031201032036
E031201032036
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
MapReduce
MapReduceMapReduce
MapReduce
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
MapReduce: Ordering and  Large-Scale Indexing on Large ClustersMapReduce: Ordering and  Large-Scale Indexing on Large Clusters
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Map reduce
Map reduceMap reduce
Map reduce
 

Último

Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsApsara Of India
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in managementchhavia330
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 

Último (20)

Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in management
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdf
 
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
Nepali Escort Girl Kakori \ 9548273370 Indian Call Girls Service Lucknow ₹,9517
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 

Wei's notes on MapReduce Scheduling

  • 1. Wei’s Notes on Map-Reduce Job Scheduling Feb 2011
  • 2. [Map-Reduce] Workflow Master splits a job into small chunks (symd model) Assign to slaves with available mapper slots (taking into account of data locality) Mapper collects required data, puts through user defined mapper function Mapper writes intermediate results to local disk, report to Master with location of the results Master record status, pick slaves with available reducer and push over location info for reduce phase (*locality? Yes!) Reducer copies data from mapper via RPC, waits for all mappers to finish, then sorts by intermediate keys, eventually puts through user defined reducer function Reducer writes final output to DFS, report to Master
  • 3. [Map-Reduce] Data flow Raw Map(k1, v1) -> list(k2, v2) Reduce(k2, list(v2)) -> list(v2) *why not v3?
  • 4. [Map-Reduce] Fault Tolerance Upon machine failure:
  • 5. [Map-Reduce] To-Dos Splitting: When: upon arrival or upon head-of-queue how is size M determined? (based on chunk size) “can be processed in parallel by different machines” Cost of re-execution Map & reduce
  • 6. [Fair Scheduler] 3-phase allocation Satisfy the pool whose min share >= demand Allocate resources to the other pools up to its min share Residual given to the unfilled, starting with the least fulfilled Notes Resource allocation is pool based instead of job based Pool: min share is user specified
  • 7. [Fair Scheduler] reschedule Policy: wait & kill Algorithm: Wait Tmin. If min share not achieved, kill others Wait Tfair. If fare share not achieved, kill more.
  • 8. [Fair Scheduler] Issues & Solutions Data Locality Delay scheduling: address sticky slots issue IO-rate biasing: address hotspot node Map/Reduce interdependency Copy-Compute Splitting: overlapping IO intensive copy and CPU intensive reducing
  • 9. [Fair Scheduler] Tradeoffs Batch response time: fairness vs. utilization tradeoff (throughput) Average Response Time Space Usage with Intermediate Data User Isolation: “ability to provide worst-case performance comparable to owning a small private cluster regardless of user workload”
  • 10. [Fair Scheduler] To-Dos<done> Reschedule/Reassignment FairScheduler keeps UPDATE_INTERVAL, check all pools for tasks to preempt and set status of those tasks, and place in action queue. Next heartbeat will pick up the changes in task status and carry out the kills. Relationship between batch response time and throughput: measure the same thing. Relationship between average response time and user isolation: could be correlated, but not all the time. ART is not a quantitative measurement of user isolation
  • 11. [Quincy] Model the problem as a flow network Flow network: a directed graph each of whose Edges e is annotated with a non-negative integer capacity and a cost, and whose Nodes v is annotated with an integer “supply” where total supply of the graph equals to zero To construct simplest graph with only hard constraint being no starvation
  • 12. Quincy vs. Fair Scheduler
  • 13. Readings MapReduce. Jeffery Dean* Google: Cluster Computing and MR Job Scheduling for Multi-User. Matei Zaharia* Max-min fairness. Wikipedia + algo* Quincy. Michael Isard* An update on Google’s infrastructure
  • 14. Topic Before: Existing systems predetermined and fixed allocation of resources/slots to queries/tasks. Intuitively, if resources can be dynamically allocated to tasks, the resources can be better utilized. After: Enable scheduler to make resource aware decisions. (IO, CPU, memory) + bring fair scheduler from pool level to job level.
  • 15. Tips from Prof Tan Keep references of all the literature reviews done and note where it is published