SlideShare una empresa de Scribd logo
1 de 40
Scale-Out Resource Management at
Microsoft Using Apache YARN
Raghu Ramakrishnan
CTO for Data
Microsoft
Technical Fellow
Head, Big Data Engineering
Store any data
relations
Do any analysis
SQL queries
Hive,
At any speed
Batch
Hive
At any scale … elastic!
Anywhere
Data to
Intelligent
Action
Windows
SMSG
Live
Ads
CRM/Dynamics
Windows Phone
Xbox Live
Office365
STB Malware Protection
Microsoft Stores
STBCommerceRisk
Messenger
LCA
Exchange
Yammer
Skype
Bing
data managed: EBs
cluster sizes: 10s of Ks
# machines: 100s of Ks
daily I/O: >100 PBs
# internal developers: 1000s
# daily jobs: 100s of Ks
• Interactive and Real-Time Analytics requires i
• Massive data volumes require scale-out stores
using commodity servers, even archival storage
Tiered Storage
Seamlessly move data across tiers, mirroring life-cycle and usage patterns
Schedule compute near low-latency copies of data
How can we manage this trade-off without moving data across
different storage systems (and governance boundaries)?
• Many different analytic engines (OSS and
vendors; SQL, ML; batch, interactive, streaming)
• Many users’ jobs (across these job types) run
on the same machines (where the data lives)
Resource Management with Multitenancy and SLAs
Policy-driven management of vast compute pools co-located with data
Schedule computation “near” data
How can we manage this multi-tenanted heterogeneous job mix
across tens of thousands of machines?
Shared Data and Compute
Tiered Storage
Relational
Query Engine
Machine
Learning
Compute Fabric (Resource Management)
Multiple analytic
engines sharing same
resource pool
Compute and
store/cache on
same machines
What’s Behind a U-SQL Query
. . .
. . .
…
…
…
Resource Managers for Big Data
Allocate compute containers to competing jobs
Multiple job engines shared pool
Containers
YARN: Resource manager for Hadoop2.x
Corona, Mesos, Omega
YARN Gaps
resource allocation SLOs
scalability limitations
• High allocation latency
• Support for specialized execution frameworks
• Interactive environments, long-running services
• Amoeba Rayon
• Status: shipping in Apache Hadoop 2.6
• Mercury and Yaq
• Status: prototypes, JIRAs and papers
• Federation
• Status: prototype and JIRA
• Framework-level Pooling
• Enable frameworks that want to take over resource allocation to support millisecond-
level response and adaptation times
• Status: spec
Microsoft Contributions to OSS Apache YARN
15
Killing Tasks vs. Preemption
0
10
20
30
40
50
60
70
80
90
100
0
220
420
620
820
1020
1180
1380
1580
1780
1980
2180
2380
2580
2780
2980
3180
3380
3580
3780
3980
4140
4350
4550
4750
4950
5150
5350
5550
5750
5950
6150
6350
6550
6750
6950
7150
7350
7550
7750
7950
8150
8350
8550
%Complete
Time (s)
Kill Preempt
33% Improvement
Client
Job1
RM
Scheduler
NodeManager NodeManager NodeManager
App
Master Task
Task
Task
Task
Task
Task
Task
MR-5192
MR-5194
MR-5197
MR-5189
MR-5189
MR-5176
YARN-569
MR-5196
Contributing to Apache
Engaging with OSS
talk with active developers
show early/partial work
small patches
ok to leave things unfinished
Sharing a Cluster Between Production & Best-effort Jobs
Production Jobs (P)
Money-making, large (recurrent) jobs with SLAs
e.g., Shows up at 3pm, deadline at 6pm, ~ 90 min runtime
Best-Effort Jobs
Interactive exploratory jobs submitted by data scientists w/o SLAs
However, latency is still important (user waiting)
19
New idea:
Support SLOs for production jobs by using
- Job-provided resource requirements in RDL
- System-enforced admission control
Reservation-Based Scheduling in Hadoop
(Curino, Krishnan, Difallah, Douglas, Ramakrishnan, Rao; Rayon paper, SOCC 2014)
20
Resource Definition Language (RDL)
e.g., atom (<2GB,1core>, 1, 10,
1min, 10bundle/min)
(simplified for OSS release)
Steps:
1. App formulates reservation request in RDL
2. Request is “placed” in the Plan
3. Allocation is validated against sharing policy
4. System commits to deliver resources on time
5. Plan is dynamically enacted
6. Jobs get (reserved) resources
7. System adapts to live conditions
Planning
Plan
Follower
Plan
j
… j
5
sharing
policy
adapter
3
7
Reservation
Adaptive Scheduling
Scheduler
Node
Manager
...
Node
Manager
Node
Manager
Planning
Agent 2
RDL
1
Node
Manager
ResourceManager
Production Job
Best-effort Job
6
4
Reservation-based Scheduling
Architecture: Teach the RM About Time
Results
• Meets all production job SLAs
• Lowers best-effort jobs latency
• Increased cluster utilization and throughput
Committed to Hadoop trunk and 2.6 release
Now part of Cloudera CDH and Hortonworks HDP
Comparing Rayon With CapacityScheduler
Initial Umbrella JIRA: YARN-1051 (14 sub-tasks)
Rayon OSS
Rayon V2 Umbrella JIRA: YARN-2572 (25 sub-tasks)
(tooling, REST APIs, UI, Documentation, perf-improvements)
High-Availability Umbrella JIRA: YARN-2573 (7 sub-tasks)
Heterogeneity/node-labels Umbrella JIRA: YARN-4193
(8 sub-tasks)
Algo enhancements: YARN-3656 (1 sub-task)
Folks Involved:
Carlo Curino, Subru Krishnan, Ishai Menache, Sean Po, Jonathan Yaniv, Arun Suresh, Abhunav Dhoot, Alexey Tumanov
Included in Apache Hadoop 2.6
Various enhancements in upcoming Apache Hadoop 2.8
Why Federation?
Problem:
• YARN scalability is bounded by the centralized ResourceManager
• Proportional to #nodes, #apps, #containers, heart-beat#frequency
• Maintenance and Operations on single massive cluster are painful
Solution:
• Scale by federating multiple YARN clusters
• Appears as a single massive cluster to an app
• Node Manager(s) heart-beat to one RM only
• Most apps talk with one RM only; a few apps might span sub-clusters
(achieved by transparently proxying AM-RM communication)
• If single app exceed sub-cluster size, or for load
• Easier provisioning / maintenance
• Leverage cross-company stabilization effort of smaller YARN clusters
• Use ~6k YARN clusters as-is as building blocks
Federation ArchitectureClient
HDFS
Read Placement hints
(optional)
Global Policy
Generator
. .
.
Yarn Sub-Cluster 1
NM-001
NM-001
NM-001
…
NM-001
NM-001
NM-001
…
YARNRM-001YARNRM-001
Yarn Sub-Cluster 2
NM-002
NM-002
NM-002
…
NM-002
NM-002
NM-002
…
YARNRM-002YARNRM-002
Yarn Sub-Cluster N
NM-
NNN
NM-
NNN
NM-
NNN
…
NM-
NNN
NM-
NNN
NM-
NNN
…
YARNRM-NNN
YARNRM-NNN
Router
9. Submit
Job
7. Read Policies (capacity allocation and job routing)
4. Read membership and load
State
Store
8. Write App -> sub-cluster mapping
3. Write (initial) capacity allocation
Policy
Store
5. Write capacity allocation (updates) & policies
2. Request capacity allocation
6. Submit Job
1. Heartbeat
(membership)
Federation JIRAs
YARN-2915
Work Item Associated
JIRA
Author
Federation StateStore APIs YARN-3662 Subru Krishnan
Federation PolicyStore APIs YARN-3664 Subru Krishnan
Federation "Capacity Allocation" across sub-cluster YARN-3658 Carlo Curino
Federation Router YARN-3658 Giovanni Fumarola
Federation Intercepting and propagating AM-RM
communications
YARN-3666 Kishore Chaliparambil
Federation maintenance mechanisms (command propagation) YARN-3657 Carlo Curino
Federation subcluster membership mechanisms YARN-3665 Subru Krishnan
Federation State and Policy Store (SQL implementation) YARN-3663 Giovanni Fumarola
Federation Global Policy Generator (load balancing) YARN-3660 Subru Krishnan
• Mercury
• Yaq
Cluster Utilization in YARN
5 sec 10 sec 50 sec Mixed-5-50 Cosmos-gm
60.59% 78.35% 92.38% 78.54% 83.38%
Two types of schedulers:
• Central scheduler (YARN)
• Distributed schedulers (new)
Mercury: Distributed Scheduling in YARN
Mercury
Runtime
Mercury
Runtime
Mercury
Runtime
Mercury Resource Management Framework
Request with
resource type
guaranteed
opportunistic
Mercury: Task Throughput As Task Duration Increases
Results:
“only-G” is stock YARN, “only-Q” is Mercury
• Introduce queuing of tasks at NMs
• Explore queue management techniques
• Techniques applied to:
Yaq: Efficient Management of NM Queues
Evaluating Yaq on a Production Workload
• Umbrella JIRA for Mercury: YARN-2877
– RESOLVED
– RESOLVED
– RESOLVED
– PATCH AVAILABLE
– PATCH AVAILABLE
– RESOLVED
– RESOLVED
Mercury and Yaq OSS
• Amoeba Rayon
• Status: shipping in Apache Hadoop 2.6
• Mercury and Yaq
• Status: prototypes, JIRAs and papers
• Federation
• Status: prototype and JIRA
• Framework-level Pooling
• Enable frameworks that want to take over resource allocation to support millisecond-
level response and adaptation times
• Status: spec
Microsoft Contributions to OSS Apache YARN
Papers
(Won Best Paper at SoCC’13)
• Reservations and planning
• Queue management techniques
• Scalability
Comparison with Mesos/Omega/Borg
REEF
http://ww.reef-project.org http://reef.incubator.apache.org
http://aka.ms/adltechblog/
http://ww.reef-project.org and
http://reef.incubator.apache.org

Más contenido relacionado

La actualidad más candente

Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?
DataWorks Summit
 
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow StoryHandling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow Story
DataWorks Summit
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 

La actualidad más candente (20)

Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow StoryHandling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow Story
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
Query Engines for Hive: MR, Spark, Tez with LLAP – Considerations!
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 

Destacado

Are you paying attention
Are you paying attentionAre you paying attention
Are you paying attention
Hiba Hamdan
 

Destacado (20)

Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
 
Are you paying attention
Are you paying attentionAre you paying attention
Are you paying attention
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The EDW Ecosystem
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
 
Sql Stream Intro
Sql Stream IntroSql Stream Intro
Sql Stream Intro
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
 
Enterprise Data Classification and Provenance
Enterprise Data Classification and ProvenanceEnterprise Data Classification and Provenance
Enterprise Data Classification and Provenance
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
Empower Data-Driven Organizations
Empower Data-Driven OrganizationsEmpower Data-Driven Organizations
Empower Data-Driven Organizations
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 

Similar a Scale-Out Resource Management at Microsoft using Apache YARN

Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Final
jucaab
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
 

Similar a Scale-Out Resource Management at Microsoft using Apache YARN (20)

Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on HadoopFeb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
Feb 2013 HUG: A Visual Workbench for Big Data Analytics on Hadoop
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
 
OOW09 Ebs Tuning Final
OOW09 Ebs Tuning FinalOOW09 Ebs Tuning Final
OOW09 Ebs Tuning Final
 
Hadoop Platform at Yahoo
Hadoop Platform at YahooHadoop Platform at Yahoo
Hadoop Platform at Yahoo
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
 
Scaling paypal workloads with oracle rac ss
Scaling paypal workloads with oracle rac ssScaling paypal workloads with oracle rac ss
Scaling paypal workloads with oracle rac ss
 
Scale Your Load Balancer from 0 to 1 million TPS on Azure
Scale Your Load Balancer from 0 to 1 million TPS on AzureScale Your Load Balancer from 0 to 1 million TPS on Azure
Scale Your Load Balancer from 0 to 1 million TPS on Azure
 
堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
 
Robin Cloud Platform for Big Data and Databases
Robin Cloud Platform for Big Data and DatabasesRobin Cloud Platform for Big Data and Databases
Robin Cloud Platform for Big Data and Databases
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)
 
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
ECS19 - Patrick Curran, Eric Shupps - SHAREPOINT 24X7X365: ARCHITECTING FOR H...
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 

Más de DataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 

Más de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Scale-Out Resource Management at Microsoft using Apache YARN

  • 1. Scale-Out Resource Management at Microsoft Using Apache YARN Raghu Ramakrishnan CTO for Data Microsoft Technical Fellow Head, Big Data Engineering
  • 2. Store any data relations Do any analysis SQL queries Hive, At any speed Batch Hive At any scale … elastic! Anywhere Data to Intelligent Action
  • 3.
  • 4.
  • 5.
  • 6. Windows SMSG Live Ads CRM/Dynamics Windows Phone Xbox Live Office365 STB Malware Protection Microsoft Stores STBCommerceRisk Messenger LCA Exchange Yammer Skype Bing data managed: EBs cluster sizes: 10s of Ks # machines: 100s of Ks daily I/O: >100 PBs # internal developers: 1000s # daily jobs: 100s of Ks
  • 7. • Interactive and Real-Time Analytics requires i • Massive data volumes require scale-out stores using commodity servers, even archival storage Tiered Storage Seamlessly move data across tiers, mirroring life-cycle and usage patterns Schedule compute near low-latency copies of data How can we manage this trade-off without moving data across different storage systems (and governance boundaries)?
  • 8. • Many different analytic engines (OSS and vendors; SQL, ML; batch, interactive, streaming) • Many users’ jobs (across these job types) run on the same machines (where the data lives) Resource Management with Multitenancy and SLAs Policy-driven management of vast compute pools co-located with data Schedule computation “near” data How can we manage this multi-tenanted heterogeneous job mix across tens of thousands of machines?
  • 9. Shared Data and Compute Tiered Storage Relational Query Engine Machine Learning Compute Fabric (Resource Management) Multiple analytic engines sharing same resource pool Compute and store/cache on same machines
  • 10. What’s Behind a U-SQL Query . . . . . . … … …
  • 11. Resource Managers for Big Data Allocate compute containers to competing jobs Multiple job engines shared pool Containers YARN: Resource manager for Hadoop2.x Corona, Mesos, Omega
  • 12. YARN Gaps resource allocation SLOs scalability limitations • High allocation latency • Support for specialized execution frameworks • Interactive environments, long-running services
  • 13. • Amoeba Rayon • Status: shipping in Apache Hadoop 2.6 • Mercury and Yaq • Status: prototypes, JIRAs and papers • Federation • Status: prototype and JIRA • Framework-level Pooling • Enable frameworks that want to take over resource allocation to support millisecond- level response and adaptation times • Status: spec Microsoft Contributions to OSS Apache YARN
  • 14.
  • 15. 15
  • 16. Killing Tasks vs. Preemption 0 10 20 30 40 50 60 70 80 90 100 0 220 420 620 820 1020 1180 1380 1580 1780 1980 2180 2380 2580 2780 2980 3180 3380 3580 3780 3980 4140 4350 4550 4750 4950 5150 5350 5550 5750 5950 6150 6350 6550 6750 6950 7150 7350 7550 7750 7950 8150 8350 8550 %Complete Time (s) Kill Preempt 33% Improvement
  • 17. Client Job1 RM Scheduler NodeManager NodeManager NodeManager App Master Task Task Task Task Task Task Task MR-5192 MR-5194 MR-5197 MR-5189 MR-5189 MR-5176 YARN-569 MR-5196 Contributing to Apache Engaging with OSS talk with active developers show early/partial work small patches ok to leave things unfinished
  • 18.
  • 19. Sharing a Cluster Between Production & Best-effort Jobs Production Jobs (P) Money-making, large (recurrent) jobs with SLAs e.g., Shows up at 3pm, deadline at 6pm, ~ 90 min runtime Best-Effort Jobs Interactive exploratory jobs submitted by data scientists w/o SLAs However, latency is still important (user waiting) 19
  • 20. New idea: Support SLOs for production jobs by using - Job-provided resource requirements in RDL - System-enforced admission control Reservation-Based Scheduling in Hadoop (Curino, Krishnan, Difallah, Douglas, Ramakrishnan, Rao; Rayon paper, SOCC 2014) 20
  • 21. Resource Definition Language (RDL) e.g., atom (<2GB,1core>, 1, 10, 1min, 10bundle/min) (simplified for OSS release)
  • 22. Steps: 1. App formulates reservation request in RDL 2. Request is “placed” in the Plan 3. Allocation is validated against sharing policy 4. System commits to deliver resources on time 5. Plan is dynamically enacted 6. Jobs get (reserved) resources 7. System adapts to live conditions Planning Plan Follower Plan j … j 5 sharing policy adapter 3 7 Reservation Adaptive Scheduling Scheduler Node Manager ... Node Manager Node Manager Planning Agent 2 RDL 1 Node Manager ResourceManager Production Job Best-effort Job 6 4 Reservation-based Scheduling Architecture: Teach the RM About Time
  • 23. Results • Meets all production job SLAs • Lowers best-effort jobs latency • Increased cluster utilization and throughput Committed to Hadoop trunk and 2.6 release Now part of Cloudera CDH and Hortonworks HDP Comparing Rayon With CapacityScheduler
  • 24. Initial Umbrella JIRA: YARN-1051 (14 sub-tasks) Rayon OSS Rayon V2 Umbrella JIRA: YARN-2572 (25 sub-tasks) (tooling, REST APIs, UI, Documentation, perf-improvements) High-Availability Umbrella JIRA: YARN-2573 (7 sub-tasks) Heterogeneity/node-labels Umbrella JIRA: YARN-4193 (8 sub-tasks) Algo enhancements: YARN-3656 (1 sub-task) Folks Involved: Carlo Curino, Subru Krishnan, Ishai Menache, Sean Po, Jonathan Yaniv, Arun Suresh, Abhunav Dhoot, Alexey Tumanov Included in Apache Hadoop 2.6 Various enhancements in upcoming Apache Hadoop 2.8
  • 25.
  • 26. Why Federation? Problem: • YARN scalability is bounded by the centralized ResourceManager • Proportional to #nodes, #apps, #containers, heart-beat#frequency • Maintenance and Operations on single massive cluster are painful Solution: • Scale by federating multiple YARN clusters • Appears as a single massive cluster to an app • Node Manager(s) heart-beat to one RM only • Most apps talk with one RM only; a few apps might span sub-clusters (achieved by transparently proxying AM-RM communication) • If single app exceed sub-cluster size, or for load • Easier provisioning / maintenance • Leverage cross-company stabilization effort of smaller YARN clusters • Use ~6k YARN clusters as-is as building blocks
  • 27. Federation ArchitectureClient HDFS Read Placement hints (optional) Global Policy Generator . . . Yarn Sub-Cluster 1 NM-001 NM-001 NM-001 … NM-001 NM-001 NM-001 … YARNRM-001YARNRM-001 Yarn Sub-Cluster 2 NM-002 NM-002 NM-002 … NM-002 NM-002 NM-002 … YARNRM-002YARNRM-002 Yarn Sub-Cluster N NM- NNN NM- NNN NM- NNN … NM- NNN NM- NNN NM- NNN … YARNRM-NNN YARNRM-NNN Router 9. Submit Job 7. Read Policies (capacity allocation and job routing) 4. Read membership and load State Store 8. Write App -> sub-cluster mapping 3. Write (initial) capacity allocation Policy Store 5. Write capacity allocation (updates) & policies 2. Request capacity allocation 6. Submit Job 1. Heartbeat (membership)
  • 28. Federation JIRAs YARN-2915 Work Item Associated JIRA Author Federation StateStore APIs YARN-3662 Subru Krishnan Federation PolicyStore APIs YARN-3664 Subru Krishnan Federation "Capacity Allocation" across sub-cluster YARN-3658 Carlo Curino Federation Router YARN-3658 Giovanni Fumarola Federation Intercepting and propagating AM-RM communications YARN-3666 Kishore Chaliparambil Federation maintenance mechanisms (command propagation) YARN-3657 Carlo Curino Federation subcluster membership mechanisms YARN-3665 Subru Krishnan Federation State and Policy Store (SQL implementation) YARN-3663 Giovanni Fumarola Federation Global Policy Generator (load balancing) YARN-3660 Subru Krishnan
  • 29.
  • 30. • Mercury • Yaq Cluster Utilization in YARN 5 sec 10 sec 50 sec Mixed-5-50 Cosmos-gm 60.59% 78.35% 92.38% 78.54% 83.38%
  • 31. Two types of schedulers: • Central scheduler (YARN) • Distributed schedulers (new) Mercury: Distributed Scheduling in YARN Mercury Runtime Mercury Runtime Mercury Runtime Mercury Resource Management Framework Request with resource type guaranteed opportunistic
  • 32. Mercury: Task Throughput As Task Duration Increases Results: “only-G” is stock YARN, “only-Q” is Mercury
  • 33. • Introduce queuing of tasks at NMs • Explore queue management techniques • Techniques applied to: Yaq: Efficient Management of NM Queues
  • 34. Evaluating Yaq on a Production Workload
  • 35. • Umbrella JIRA for Mercury: YARN-2877 – RESOLVED – RESOLVED – RESOLVED – PATCH AVAILABLE – PATCH AVAILABLE – RESOLVED – RESOLVED Mercury and Yaq OSS
  • 36. • Amoeba Rayon • Status: shipping in Apache Hadoop 2.6 • Mercury and Yaq • Status: prototypes, JIRAs and papers • Federation • Status: prototype and JIRA • Framework-level Pooling • Enable frameworks that want to take over resource allocation to support millisecond- level response and adaptation times • Status: spec Microsoft Contributions to OSS Apache YARN
  • 37. Papers (Won Best Paper at SoCC’13)
  • 38. • Reservations and planning • Queue management techniques • Scalability Comparison with Mesos/Omega/Borg

Notas del editor

  1. Youre familiar with SQL Server, and many of you know Hadoop and Azure HDInsight This is a little bigger.
  2. Analytic storage for the cloud Users want to think about the content of their data and what it can tell them about their business, and control who can access it They don’t want to think about remote vs local storage RAM vs flash security
  3. Analytic storage for the cloud Users want to think about the content of their data and what it can tell them about their business, and control who can access it They don’t want to think about remote vs local storage RAM vs flash security
  4. Example?