Hadoop Scheduling - a 7 year perspective

•Download as PPTX, PDF•

8 likes•3,563 views

Joydeep Sen Sarma

Talk at Flipkart's SlashN conference 2014. Perspectives on Hadoop

Technology

Job Scheduling in Hadoop
an exposé

Joydeep Sen Sarma

About Me
c 2007

Facebook: Ran/Managed Hadoop ~ 3 years
Wrote Hive
Mentor/PM Hadoop Fair-Scheduler
Used Hadoop/Hive (as Warehouse/ETL Dev)
Re-wrote significant chunks of Hadoop
Job Scheduling (incl. Corona)

Qubole: Running World’s largest Hadoop
clusters on AWS
c 2014

The Crime
Shared Hadoop Clusters

Statistical Multiplexing
Largest jobs only fit on pooled hardware
Data Locality
Easier to manage

… and the Punishment
• “Have you no Hadoop Etiquettes?” (c 2007)
(reducer count capped in response)

• User takes down entire Cluster (OOM) (c 2007-09)

• Bad Job slows down entire Cluster (c 2009)
• Steady State Latencies get intolerable (c 2010-)
• ”How do I know I am getting my fair share?” (c 2011)
• “Too few reducer slots, cluster idle” (c 2013)

The Perfect Weapon
Scheduler

• Efficient
• Scalable

• Strong Isolation
• Fair
• Fault Tolerant
• Low Latency

Quick Review
• Fair Scheduler (Fairness/Isolation)
• Speculation (Fault Tolerance/Latency)
• Preemption (Fairness)
• Usage Monitoring/Limits (Isolation)

And then there’s Hadoop (1.x) …
• Single JobTracker for all Jobs
– Does not scale, SPOF

• Pull Based Architecture
– Scalability and Low Latency at permanent War
– Inefficient – leaves idle time

• Slot Based Scheduling
– Inefficient

• Pessimistic Locking in Tracker
– Scalability Bottleneck

• Long Running Tasks
– Fairness and Efficiency at permanent War

Poll Driven Scheduling
insert overwrite table dest
select … from ads join
campaigns on …group by …;

Map Tasks

Job Tracker

Master

ReduceTasks

Heartbeat

MapTask

TaskTracker

Slave

Child
8

Pessmistic Locking
getBestTask():
for pool: sortedPools
for job: pool.sortedJobs()
for task: job.tasks()
if betterMatch(task) …

processHeartbeat():
synchronized(world):
return getBestTask()

Slot Based Scheduling
• N cpus, M map slots, R reduce slots
– Memory cannot be oversubscribed!

• How to divide?
– M < N  not enough mappers at times
– R < N  not enough reducers at times
– N=M=R  enough memory to run 2N tasks ?

• Reduce Tasks Problematic
– Network Intensive to start, CPU wasted
– Memory Intensive later

Long Running Reducers
• Online Scheduling
– No advance information of future workload

• Greedy + Fair Scheduling
– Schedule ASAP
– Preempt if future workload disagrees

• Long Running Reducers
– Preemption causes restart and wasted work
– No effective way to use short bursts of idle cpu

Optimistic Locking
Task[] getBestTaskCandidates():
for pool: sortedPools
for job: pool.sortedJobs.clone()
for task: job.tasks.clone()
synchronized(task):
…
processHeartbeat():
tasks = getBestTaskCandidates()
synchronized(world):
return acquireTasks(tasks)

Corona: Push Scheduling
1. JT subscribes for M maps and R reduces
–

Receives availability from Cluster Manager (CM)

2. CM publishes availability ASAP
–

Pushes events to JT

3. JT pushes tasks to available TT
– In parallel

Corona/YARN: Scalability
1. JobTracker for each Job now Independent
–

More Fault Tolerant and Isolated as well

2. Centralized Cluster/Resource Manager
–

Must be super-efficient!

3. Fundamental Differences
–
–

Corona ~ Latency
YARN ~ Heterogenous workloads

Pesky Reducers
• Hadoop 2 removes distinction between M and
R slots
• Not Enough
– Reduce Tasks don’t use much CPU in shuffle
– Still long running and bad to preempt
 Re-architect to run millions of small Reducers

The Future is Cloudy
• Data Center Assumption:
– Cluster characteristics known
– Job spec fits to cluster

• In Cloud:
– Cluster can grow/shrink, change node-type
– Job Spec must be dynamic
– Uniform task configuration untenable

Questions?

joydeep@qubole.com
http://www.linkedin.com/in/joydeeps

What's hot

Hadoop introduction 2Tianwei Liu

Hadoop & MapReduceNewvewm

MapReduce ParadigmDilip Reddy

Hadoop fault-toleranceRavindra Bandara

GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production ScaleSpark Summit

Introduction To Map Reducerantav

Hadoop for Scientific Workloads__HadoopSummit2010Yahoo Developer Network

Spark Summit EU talk by Josef HabdankSpark Summit

project--2 nd review_2Aswini Ashu

Hadoop ArchitectureDr. C.V. Suresh Babu

Prediction as a service with ensemble model in SparkML and Python ScikitLearnJosef A. Habdank

Map ReduceVigen Sahakyan

Hadoop YARNVigen Sahakyan

CUDA performance study on Hadoop MapReduce Clusterairbots

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju

Distributed Processing FrameworksAntonios Katsarakis

Map reduce paradigm explainedDmytro Sandu

February 2014 HUG : Hive On TezYahoo Developer Network

HadoopRamakrishna Reddy Bijjam

What's hot (19)

Hadoop introduction 2

Hadoop & MapReduce

MapReduce Paradigm

Hadoop fault-tolerance

GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale

Introduction To Map Reduce

Hadoop for Scientific Workloads__HadoopSummit2010

Spark Summit EU talk by Josef Habdank

project--2 nd review_2

Hadoop Architecture

Prediction as a service with ensemble model in SparkML and Python ScikitLearn

Map Reduce

Hadoop YARN

CUDA performance study on Hadoop MapReduce Cluster

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015

Distributed Processing Frameworks

Map reduce paradigm explained

February 2014 HUG : Hive On Tez

Hadoop

Similar to Hadoop Scheduling - a 7 year perspective

MapReduce.pptxAtulYadav218546

(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin

Hadoop schedulerSubhas Kumar Ghosh

Hadoop on-mesosHenry Cai 蔡明航

Hanborq optimizations on hadoop map reduce 20120221aSchubert Zhang

Hanborq Optimizations on Hadoop MapReduceHanborq Inc.

2010 06-07-sto-2010-intelligent-resource-scheduling-for-reduced-turnaround-du...Robert Richards, Ph.D.

Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)Spark Summit

HadoopThe Hadoop Java Software FrameworkThoughtWorks

High Performance Computing - Cloud Point of Viewaragozin

What is Distributed Computing, Why we use Apache SparkAndy Petrella

Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk

Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Spark Summit

ENAR short courseDeepak Agarwal

BIG DATA Session 7 8Infinity Tech Solutions

Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...Codemotion

Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li

Distributed Data processing in a Cloudelliando dias

mapreduce-advanced.pptxShimoFcis

Similar to Hadoop Scheduling - a 7 year perspective (20)

MapReduce.pptx

(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...

Hadoop scheduler

Hadoop on-mesos

Hanborq optimizations on hadoop map reduce 20120221a

Hanborq Optimizations on Hadoop MapReduce

2010 06-07-sto-2010-intelligent-resource-scheduling-for-reduced-turnaround-du...

Making Sense of Spark Performance-(Kay Ousterhout, UC Berkeley)

HadoopThe Hadoop Java Software Framework

High Performance Computing - Cloud Point of View

What is Distributed Computing, Why we use Apache Spark

Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk

Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)

ENAR short course

BIG DATA Session 7 8

Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...

Powering Interactive Data Analysis at Pinterest by Amazon Redshift

Distributed Data processing in a Cloud

mapreduce-advanced.pptx

Recently uploaded

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

From Family Reminiscence to Scholarly Archive .Alan Dix

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Take control of your SAP testing with UiPath Test SuiteDianaGray10

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

How to write a Business Continuity PlanDatabarracks

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Commit 2024 - Secret Management made easyAlfredo García Lavilla

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Recently uploaded (20)

Generative AI for Technical Writer or Information Developers

Developer Data Modeling Mistakes: From Postgres to NoSQL

SAP Build Work Zone - Overview L2-L3.pptx

DevoxxFR 2024 Reproducible Builds with Apache Maven

Unleash Your Potential - Namagunga Girls Coding Club

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

From Family Reminiscence to Scholarly Archive .

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

SIP trunking in Janus @ Kamailio World 2024

Take control of your SAP testing with UiPath Test Suite

The State of Passkeys with FIDO Alliance.pptx

The Ultimate Guide to Choosing WordPress Pros and Cons

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

How to write a Business Continuity Plan

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Commit 2024 - Secret Management made easy

DMCC Future of Trade Web3 - Special Edition

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Unraveling Multimodality with Large Language Models.pdf

TeamStation AI System Report LATAM IT Salaries 2024

Hadoop Scheduling - a 7 year perspective

1. Job Scheduling in Hadoop an exposé Joydeep Sen Sarma

2. About Me c 2007 Facebook: Ran/Managed Hadoop ~ 3 years Wrote Hive Mentor/PM Hadoop Fair-Scheduler Used Hadoop/Hive (as Warehouse/ETL Dev) Re-wrote significant chunks of Hadoop Job Scheduling (incl. Corona) Qubole: Running World’s largest Hadoop clusters on AWS c 2014

3. The Crime Shared Hadoop Clusters Statistical Multiplexing Largest jobs only fit on pooled hardware Data Locality Easier to manage

4. … and the Punishment • “Have you no Hadoop Etiquettes?” (c 2007) (reducer count capped in response) • User takes down entire Cluster (OOM) (c 2007-09) • Bad Job slows down entire Cluster (c 2009) • Steady State Latencies get intolerable (c 2010-) • ”How do I know I am getting my fair share?” (c 2011) • “Too few reducer slots, cluster idle” (c 2013)

5. The Perfect Weapon Scheduler • Efficient • Scalable • Strong Isolation • Fair • Fault Tolerant • Low Latency

6. Quick Review • Fair Scheduler (Fairness/Isolation) • Speculation (Fault Tolerance/Latency) • Preemption (Fairness) • Usage Monitoring/Limits (Isolation)

7. And then there’s Hadoop (1.x) … • Single JobTracker for all Jobs – Does not scale, SPOF • Pull Based Architecture – Scalability and Low Latency at permanent War – Inefficient – leaves idle time • Slot Based Scheduling – Inefficient • Pessimistic Locking in Tracker – Scalability Bottleneck • Long Running Tasks – Fairness and Efficiency at permanent War

8. Poll Driven Scheduling insert overwrite table dest select … from ads join campaigns on …group by …; Map Tasks Job Tracker Master ReduceTasks Heartbeat MapTask TaskTracker Slave Child 8

9. Pessmistic Locking getBestTask(): for pool: sortedPools for job: pool.sortedJobs() for task: job.tasks() if betterMatch(task) … processHeartbeat(): synchronized(world): return getBestTask()

10. Slot Based Scheduling • N cpus, M map slots, R reduce slots – Memory cannot be oversubscribed! • How to divide? – M < N  not enough mappers at times – R < N  not enough reducers at times – N=M=R  enough memory to run 2N tasks ? • Reduce Tasks Problematic – Network Intensive to start, CPU wasted – Memory Intensive later

11. Long Running Reducers • Online Scheduling – No advance information of future workload • Greedy + Fair Scheduling – Schedule ASAP – Preempt if future workload disagrees • Long Running Reducers – Preemption causes restart and wasted work – No effective way to use short bursts of idle cpu

12. Optimistic Locking Task[] getBestTaskCandidates(): for pool: sortedPools for job: pool.sortedJobs.clone() for task: job.tasks.clone() synchronized(task): … processHeartbeat(): tasks = getBestTaskCandidates() synchronized(world): return acquireTasks(tasks)

13. Corona: Push Scheduling 1. JT subscribes for M maps and R reduces – Receives availability from Cluster Manager (CM) 2. CM publishes availability ASAP – Pushes events to JT 3. JT pushes tasks to available TT – In parallel

14. Corona/YARN: Scalability 1. JobTracker for each Job now Independent – More Fault Tolerant and Isolated as well 2. Centralized Cluster/Resource Manager – Must be super-efficient! 3. Fundamental Differences – – Corona ~ Latency YARN ~ Heterogenous workloads

15. Pesky Reducers • Hadoop 2 removes distinction between M and R slots • Not Enough – Reduce Tasks don’t use much CPU in shuffle – Still long running and bad to preempt  Re-architect to run millions of small Reducers

16. The Future is Cloudy • Data Center Assumption: – Cluster characteristics known – Job spec fits to cluster • In Cloud: – Cluster can grow/shrink, change node-type – Job Spec must be dynamic – Uniform task configuration untenable

17. Questions? joydeep@qubole.com http://www.linkedin.com/in/joydeeps

Hadoop Scheduling - a 7 year perspective

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Hadoop Scheduling - a 7 year perspective

Similar to Hadoop Scheduling - a 7 year perspective (20)

More from Joydeep Sen Sarma

More from Joydeep Sen Sarma (9)

Recently uploaded

Recently uploaded (20)

Hadoop Scheduling - a 7 year perspective