SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Scheduling
Hadoopas batch processing system 
•Hadoopwas designed mainly for running large batch jobs such as web indexing and log mining. 
•Users submitted jobs to a queue, and the cluster ran them in order. 
•Soon, another use case became attractive: 
–Sharing a MapReducecluster between multiple users.
The benefits of sharing 
•With all the data in one place, users can run queries that they may never have been able to execute otherwise, and 
•Costs go down because system utilization is higher than building a separate Hadoopcluster for each group. 
•However, sharing requires support from the Hadoopjob scheduler to 
–provide guaranteed capacity to production jobs and 
–good response time to interactive jobs while allocating resources fairly between users.
Approaches to Sharing 
•FIFO : In FIFO scheduling, a JobTrackerpulls jobs from a work queue, oldest job first. This schedule had no concept of the priority or size of the job 
•Fair : Assign resources to jobs such that on average over time, each job gets an equal share of the available resources. The result is that jobs that require less time are able to access the CPU and finish intermixed with the execution of jobs that require more time to execute. This behavior allows for some interactivity among Hadoopjobs and permits greater responsiveness of the Hadoopcluster to the variety of job types submitted. 
•Capacity : In capacity scheduling, instead of pools, several queues are created, each with a configurable number of map and reduce slots. Each queue is also assigned a guaranteed capacity (where the overall capacity of the cluster is the sum of each queue's capacity). Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues.
FIFO Scheduling 
Job Queue
FIFO Scheduling 
Job Queue
FIFO Scheduling 
Job Queue
Hadoop default scheduler (FIFO) 
–Problem:short jobs get stuck behind long ones 
•Separate clusters 
–Problem 1:poor utilization 
–Problem 2:costly data replication 
•Full replication across clusters nearly infeasible at Facebook/Yahoo! scale 
•Partial replication prevents cross-dataset queries
Fair Scheduling 
Job Queue
Fair Scheduling 
Job Queue
Fair Scheduler Basics 
•Group jobs into “pools” 
•Assign each pool a guaranteed minimum share 
•Divide excess capacity evenly between pools
Pools 
•Determined from a configurable job property 
–Default in 0.20: user.name (one pool per user) 
•Pools have properties: 
–Minimum map slots 
–Minimum reduce slots 
–Limit on # of running jobs
Example Pool Allocations 
entire cluster100 slots 
emp1 
emp2 
finance 
min share = 40 
emp6 
min share = 30 
job 2 
15 slots 
job 3 
15 slots 
job 1 
30 slots 
job 4 
40 slots
Scheduling Algorithm 
•Split each pool’s min share among its jobs 
•Split each pool’s total share among its jobs 
•When a slot needs to be assigned: 
–If there is any job below its min share, schedule it 
–Else schedule the job that we’ve been most unfair to (based on “deficit”)
Scheduler Dashboard
Scheduler Dashboard 
Change priority 
Change pool 
FIFO mode (for testing)
Additional Features 
•Weights for unequal sharing: 
–Job weights based on priority (each level = 2x) 
–Job weights based on size 
–Pool weights 
•Limits for # of running jobs: 
–Per user 
–Per pool
Installing the Fair Scheduler 
•Build it: 
–ant package 
•Place it on the classpath: 
–cp build/contrib/fairscheduler/*.jar lib
Configuration Files 
•Hadoop config(conf/mapred-site.xml) 
–Contains scheduler options, pointer to pools file 
•Pools file (pools.xml) 
–Contains min share allocations and limits on pools 
–Reloaded every 15 seconds at runtime
Minimal hadoop-site.xml 
<property> 
<name>mapred.jobtracker.taskScheduler</name> 
<value>org.apache.hadoop.mapred.FairScheduler</ value> 
</property> 
<property> 
<name>mapred.fairscheduler.allocation.file</name> 
<value>/path/to/pools.xml</value> 
</property>
Minimal pools.xml 
<?xml version="1.0"?> 
<allocations> 
</allocations>
Configuring a Pool 
<?xml version="1.0"?> 
<allocations> 
<pool name=“emp4"> 
<minMaps>10</minMaps> 
<minReduces>5</minReduces> 
</pool> 
</allocations>
Setting Running Job Limits 
<?xml version="1.0"?> 
<allocations> 
<pool name=“emp4"> 
<minMaps>10</minMaps> 
<minReduces>5</minReduces> 
<maxRunningJobs>3</maxRunningJobs> 
</pool> 
<user name=“emp1"> 
<maxRunningJobs>1</maxRunningJobs> 
</user> 
</allocations>
Default Per-User Running Job Limit 
<?xml version="1.0"?> 
<allocations> 
<pool name=“emp4"> 
<minMaps>10</minMaps> 
<minReduces>5</minReduces> 
<maxRunningJobs>3</maxRunningJobs> 
</pool> 
<user name=“emp1"> 
<maxRunningJobs>1</maxRunningJobs> 
</user> 
<userMaxJobsDefault>10</userMaxJobsDefault> 
</allocations>
Other Parameters 
mapred.fairscheduler.assignmultiple: 
•Assign a map and a reduce on each heartbeat; improves ramp-up speed and throughput; recommendation: set to true
Other Parameters 
mapred.fairscheduler.poolnameproperty: 
•Which JobConfproperty sets what pool a job is in 
-Default: user.name (one pool per user) 
-Can make up your own, e.g. “pool.name”, and pass in JobConfwith conf.set(“pool.name”, “mypool”)
Useful Setting 
<property> 
<name>mapred.fairscheduler.poolnameproperty</name> 
<value>pool.name</value> 
</property> 
<property> 
<name>pool.name</name> 
<value>${user.name}</value> 
</property> 
Make pool.name default to user.name
Issues with Fair Scheduler 
–Fine-grained sharing at level of map & reduce tasks 
–Predictable response times and user isolation 
•Problem:data locality 
–For efficiency, must run tasks near their input data 
–Strictly following any job queuing policy hurts locality: job picked by policy may not have data on free nodes 
•Solution:delay scheduling 
–Relax queuing policy for limited time to achieve locality
The Problem 
Job 2 
Master 
Job 1 
Scheduling order 
Slave 
Slave 
Slave 
Slave 
Slave 
Slave 
4 
2 
1 
1 
2 
2 
3 
3 
9 
5 
3 
3 
6 
7 
5 
6 
9 
4 
8 
7 
8 
2 
1 
1 
Task 2 
Task 5 
Task 3 
Task 1 
Task 7 
Task 4 
File 1: 
File 2:
The Problem 
Job 1 
Master 
Job 2 
Scheduling order 
Slave 
Slave 
Slave 
Slave 
Slave 
Slave 
4 
2 
1 
2 
2 
3 
9 
5 
3 
3 
6 
7 
5 
6 
9 
4 
8 
7 
8 
2 
1 
1 
Task 2 
Task 5 
3 
Task 1 
File 1: 
File 2: 
Task 1 
7 
Task 2 
Task 4 
3 
Problem: Fair decision hurts locality 
Especially bad for jobs with small input files 
1 
3
Solution: Delay Scheduling 
•Relax queuing policy to make jobs wait for a limited time if they cannot launch local tasks 
•Result: Very short wait time (1-5s) is enough to get nearly 100% locality
Delay Scheduling Example 
Job 1 
Master 
Job 2 
Scheduling order 
Slave 
Slave 
Slave 
Slave 
Slave 
Slave 
4 
2 
1 
1 
2 
2 
3 
3 
9 
5 
3 
3 
6 
7 
5 
6 
9 
4 
8 
7 
8 
2 
1 
1 
Task 2 
3 
File 1: 
File 2: 
Task 8 
7 
Task 2 
Task 4 
6 
Idea: Wait a short time to get data-local scheduling opportunities 
5 
Task 1 
1 
Task 3 
Wait!
Delay Scheduling Details 
•Scan jobs in order given by queuing policy, picking first that is permitted to launch a task 
•Jobs must wait before being permitted to launch non-local tasks 
–If wait < T1, only allow node-local tasks 
–If T1< wait < T2, also allow rack-local 
–If wait > T2, also allow off-rack 
•Increase a job’s time waited when it is skipped
Capacity Scheduler 
•Organizes jobs into queues 
•Queue shares as %’s of cluster 
•FIFO scheduling within each queue 
•Supports preemption 
•Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues.
End of session 
Day –1: Scheduling

Más contenido relacionado

La actualidad más candente

Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoopdarugar
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitEric Wendelin
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Лекция 12. Spark
Лекция 12. SparkЛекция 12. Spark
Лекция 12. SparkTechnopark
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Cloudera, Inc.
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Rajesh Ananda Kumar
 

La actualidad más candente (20)

Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Testing Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnitTesting Hadoop jobs with MRUnit
Testing Hadoop jobs with MRUnit
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Лекция 12. Spark
Лекция 12. SparkЛекция 12. Spark
Лекция 12. Spark
 
Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
Spark
SparkSpark
Spark
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 

Similar a Scheduling Hadoop for Batch and Interactive Workloads

Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systemsknowdiff
 
Hadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep InsightHadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep InsightHanborq Inc.
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of Viewaragozin
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTAmmarHassan80
 
Survey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop ClusterSurvey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop ClusterIOSR Journals
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingTeddy Choi
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCOlga Lavrentieva
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 

Similar a Scheduling Hadoop for Batch and Interactive Workloads (20)

Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
Hadoop
HadoopHadoop
Hadoop
 
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time SystemsSara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
Sara Afshar: Scheduling and Resource Sharing in Multiprocessor Real-Time Systems
 
BIG DATA Session 7 8
BIG DATA Session 7 8BIG DATA Session 7 8
BIG DATA Session 7 8
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Hadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep InsightHadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep Insight
 
tdtechtalk20160330johan
tdtechtalk20160330johantdtechtalk20160330johan
tdtechtalk20160330johan
 
Scalable Hadoop in the cloud
Scalable Hadoop in the cloudScalable Hadoop in the cloud
Scalable Hadoop in the cloud
 
Hadoop availability
Hadoop availabilityHadoop availability
Hadoop availability
 
High Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of ViewHigh Performance Computing - Cloud Point of View
High Performance Computing - Cloud Point of View
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Data Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoTData Analytics and IoT, how to analyze data from IoT
Data Analytics and IoT, how to analyze data from IoT
 
Survey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop ClusterSurvey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop Cluster
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Взгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPCВзгляд на облака с точки зрения HPC
Взгляд на облака с точки зрения HPC
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 

Más de Subhas Kumar Ghosh

07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descentSubhas Kumar Ghosh
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hiveSubhas Kumar Ghosh
 
05 pig user defined functions (udfs)
05 pig user defined functions (udfs)05 pig user defined functions (udfs)
05 pig user defined functions (udfs)Subhas Kumar Ghosh
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysisSubhas Kumar Ghosh
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tipsSubhas Kumar Ghosh
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorSubhas Kumar Ghosh
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepSubhas Kumar Ghosh
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operationSubhas Kumar Ghosh
 

Más de Subhas Kumar Ghosh (20)

07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
 
01 hbase
01 hbase01 hbase
01 hbase
 
06 pig etl features
06 pig etl features06 pig etl features
06 pig etl features
 
05 pig user defined functions (udfs)
05 pig user defined functions (udfs)05 pig user defined functions (udfs)
05 pig user defined functions (udfs)
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Hadoop Day 3
Hadoop Day 3Hadoop Day 3
Hadoop Day 3
 
Hadoop exercise
Hadoop exerciseHadoop exercise
Hadoop exercise
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 

Último

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

Scheduling Hadoop for Batch and Interactive Workloads

  • 2. Hadoopas batch processing system •Hadoopwas designed mainly for running large batch jobs such as web indexing and log mining. •Users submitted jobs to a queue, and the cluster ran them in order. •Soon, another use case became attractive: –Sharing a MapReducecluster between multiple users.
  • 3. The benefits of sharing •With all the data in one place, users can run queries that they may never have been able to execute otherwise, and •Costs go down because system utilization is higher than building a separate Hadoopcluster for each group. •However, sharing requires support from the Hadoopjob scheduler to –provide guaranteed capacity to production jobs and –good response time to interactive jobs while allocating resources fairly between users.
  • 4. Approaches to Sharing •FIFO : In FIFO scheduling, a JobTrackerpulls jobs from a work queue, oldest job first. This schedule had no concept of the priority or size of the job •Fair : Assign resources to jobs such that on average over time, each job gets an equal share of the available resources. The result is that jobs that require less time are able to access the CPU and finish intermixed with the execution of jobs that require more time to execute. This behavior allows for some interactivity among Hadoopjobs and permits greater responsiveness of the Hadoopcluster to the variety of job types submitted. •Capacity : In capacity scheduling, instead of pools, several queues are created, each with a configurable number of map and reduce slots. Each queue is also assigned a guaranteed capacity (where the overall capacity of the cluster is the sum of each queue's capacity). Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues.
  • 8. Hadoop default scheduler (FIFO) –Problem:short jobs get stuck behind long ones •Separate clusters –Problem 1:poor utilization –Problem 2:costly data replication •Full replication across clusters nearly infeasible at Facebook/Yahoo! scale •Partial replication prevents cross-dataset queries
  • 11. Fair Scheduler Basics •Group jobs into “pools” •Assign each pool a guaranteed minimum share •Divide excess capacity evenly between pools
  • 12. Pools •Determined from a configurable job property –Default in 0.20: user.name (one pool per user) •Pools have properties: –Minimum map slots –Minimum reduce slots –Limit on # of running jobs
  • 13. Example Pool Allocations entire cluster100 slots emp1 emp2 finance min share = 40 emp6 min share = 30 job 2 15 slots job 3 15 slots job 1 30 slots job 4 40 slots
  • 14. Scheduling Algorithm •Split each pool’s min share among its jobs •Split each pool’s total share among its jobs •When a slot needs to be assigned: –If there is any job below its min share, schedule it –Else schedule the job that we’ve been most unfair to (based on “deficit”)
  • 16. Scheduler Dashboard Change priority Change pool FIFO mode (for testing)
  • 17. Additional Features •Weights for unequal sharing: –Job weights based on priority (each level = 2x) –Job weights based on size –Pool weights •Limits for # of running jobs: –Per user –Per pool
  • 18. Installing the Fair Scheduler •Build it: –ant package •Place it on the classpath: –cp build/contrib/fairscheduler/*.jar lib
  • 19. Configuration Files •Hadoop config(conf/mapred-site.xml) –Contains scheduler options, pointer to pools file •Pools file (pools.xml) –Contains min share allocations and limits on pools –Reloaded every 15 seconds at runtime
  • 20. Minimal hadoop-site.xml <property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.FairScheduler</ value> </property> <property> <name>mapred.fairscheduler.allocation.file</name> <value>/path/to/pools.xml</value> </property>
  • 21. Minimal pools.xml <?xml version="1.0"?> <allocations> </allocations>
  • 22. Configuring a Pool <?xml version="1.0"?> <allocations> <pool name=“emp4"> <minMaps>10</minMaps> <minReduces>5</minReduces> </pool> </allocations>
  • 23. Setting Running Job Limits <?xml version="1.0"?> <allocations> <pool name=“emp4"> <minMaps>10</minMaps> <minReduces>5</minReduces> <maxRunningJobs>3</maxRunningJobs> </pool> <user name=“emp1"> <maxRunningJobs>1</maxRunningJobs> </user> </allocations>
  • 24. Default Per-User Running Job Limit <?xml version="1.0"?> <allocations> <pool name=“emp4"> <minMaps>10</minMaps> <minReduces>5</minReduces> <maxRunningJobs>3</maxRunningJobs> </pool> <user name=“emp1"> <maxRunningJobs>1</maxRunningJobs> </user> <userMaxJobsDefault>10</userMaxJobsDefault> </allocations>
  • 25. Other Parameters mapred.fairscheduler.assignmultiple: •Assign a map and a reduce on each heartbeat; improves ramp-up speed and throughput; recommendation: set to true
  • 26. Other Parameters mapred.fairscheduler.poolnameproperty: •Which JobConfproperty sets what pool a job is in -Default: user.name (one pool per user) -Can make up your own, e.g. “pool.name”, and pass in JobConfwith conf.set(“pool.name”, “mypool”)
  • 27. Useful Setting <property> <name>mapred.fairscheduler.poolnameproperty</name> <value>pool.name</value> </property> <property> <name>pool.name</name> <value>${user.name}</value> </property> Make pool.name default to user.name
  • 28. Issues with Fair Scheduler –Fine-grained sharing at level of map & reduce tasks –Predictable response times and user isolation •Problem:data locality –For efficiency, must run tasks near their input data –Strictly following any job queuing policy hurts locality: job picked by policy may not have data on free nodes •Solution:delay scheduling –Relax queuing policy for limited time to achieve locality
  • 29. The Problem Job 2 Master Job 1 Scheduling order Slave Slave Slave Slave Slave Slave 4 2 1 1 2 2 3 3 9 5 3 3 6 7 5 6 9 4 8 7 8 2 1 1 Task 2 Task 5 Task 3 Task 1 Task 7 Task 4 File 1: File 2:
  • 30. The Problem Job 1 Master Job 2 Scheduling order Slave Slave Slave Slave Slave Slave 4 2 1 2 2 3 9 5 3 3 6 7 5 6 9 4 8 7 8 2 1 1 Task 2 Task 5 3 Task 1 File 1: File 2: Task 1 7 Task 2 Task 4 3 Problem: Fair decision hurts locality Especially bad for jobs with small input files 1 3
  • 31. Solution: Delay Scheduling •Relax queuing policy to make jobs wait for a limited time if they cannot launch local tasks •Result: Very short wait time (1-5s) is enough to get nearly 100% locality
  • 32. Delay Scheduling Example Job 1 Master Job 2 Scheduling order Slave Slave Slave Slave Slave Slave 4 2 1 1 2 2 3 3 9 5 3 3 6 7 5 6 9 4 8 7 8 2 1 1 Task 2 3 File 1: File 2: Task 8 7 Task 2 Task 4 6 Idea: Wait a short time to get data-local scheduling opportunities 5 Task 1 1 Task 3 Wait!
  • 33. Delay Scheduling Details •Scan jobs in order given by queuing policy, picking first that is permitted to launch a task •Jobs must wait before being permitted to launch non-local tasks –If wait < T1, only allow node-local tasks –If T1< wait < T2, also allow rack-local –If wait > T2, also allow off-rack •Increase a job’s time waited when it is skipped
  • 34. Capacity Scheduler •Organizes jobs into queues •Queue shares as %’s of cluster •FIFO scheduling within each queue •Supports preemption •Queues are monitored; if a queue is not consuming its allocated capacity, this excess capacity can be temporarily allocated to other queues.
  • 35. End of session Day –1: Scheduling