SlideShare una empresa de Scribd logo
1 de 35
© Cloudera, Inc. All rights reserved.
NEXT GENERATION SCHEDULING for YARN & K8s:
For Hybrid Cloud/On-prem Environment to run Mixed Workloads
Sunil Govindan & Weiwei Yang
© Cloudera, Inc. All rights reserved. 2
SPEAKER
SUNIL GOVINDAN
Engineering Manager @Cloudera
@sunilgovind
Apache Hadoop PMC & Committer
WEIWEI YANG
Staff Software Engineer @Cloudera
@abvclouds
Apache Hadoop PMC & Committer
© Cloudera, Inc. All rights reserved. 3
AGENDA
Journey through Big Data Experience
Where are we today on scheduling experience ?
Motivation to improve scheduling
Introduction to YuniKorn
Demo
Future & Open Source Story
© Cloudera, Inc. All rights reserved. 4
BATCH
WORKLOADS
DEEP LEARNING APPS
CUSTOMER JOURNEY - BIG DATA ECOSYSTEM
BIG DATA ECOSYSTEM - TODAY
PUBLIC CLOUD
STORAGE
100100100
101001000
010010101
STORAGECOMPUTE (on-prem/on-cloud)
HIVE on LLAP
SERVICES
© Cloudera, Inc. All rights reserved. 5
BIG DATA ECOSYSTEM
NEW Trends
Big Data &
Containerization
• Portability for apps
• Resource Isolation
• Dependency
Management
Moving to Cloud for
better elasticity
• Budget Control
• Infinite Resources
• Cloud Burst
Decoupled Compute &
Storage
• Low Cost
• Hybrid in nature
• Agility
• Mixed Workloads
© Cloudera, Inc. All rights reserved. 6
AGENDA
Journey through Big Data Experience
Where are we today on scheduling experience ?
Motivation to improve scheduling
Introduction to YuniKorn
Demo
Future & Open Source Story
© Cloudera, Inc. All rights reserved. 7
WHERE ARE WE TODAY ?
RESOURCE ORCHESTRATOR PERSPECTIVE - STRENGTHS
APACHE YARN
Big Data Ecosystem
KUBERNETES
Cloud Ecosystem
CLOUD NATIVE
Public Cloud
Big Data—Optimized to run Big
Data workloads.
Batch Workloads—High Throughput
scheduling for batch workloads.
SLA—Better SLA for Big Data
workloads.
Multi tenant—Quota Management.
Services—Optimized for
containerized microservices.
Networking—Strong Network
management support.
Cloud Aware—Better tuned for
cloud use cases.
Storage—Persistent volumes
Cost—Budget Centric
∞—Infinite resource for infinite $$.
© Cloudera, Inc. All rights reserved. 8
WHERE ARE WE TODAY ?
QUICK RECAP TO THE NEW TRENDS
Moving to Cloud
Big Data &
Containerization
Mixed Workloads
© Cloudera, Inc. All rights reserved. 9
WHERE ARE WE TODAY ?
CHALLENGES - APACHE YARN
APACHE YARN WORLD
• Challenges in managing THREE YARN schedulers for different use cases
• Should Deploy & Manage containerized microservices in the same Hadoop Cluster
• Need strong networking and persistent volume support
• Need more powerful Auto scaling and budget control
© Cloudera, Inc. All rights reserved. 10
WHERE ARE WE TODAY ?
CHALLENGES - KUBERNETES
KUBERNETES WORLD
• Challenges in running Big Data workloads along with microservices together
• Need much better quota management and better SLA
• No first class Application management concept for Big Data workloads
© Cloudera, Inc. All rights reserved. 11
“WHERE IS MY SILVER BULLET
TO SOLVE ALL THESE
CHALLENGES? ”
© Cloudera, Inc. All rights reserved. 12
“SORRY, THERE ISN’T ANY”
© Cloudera, Inc. All rights reserved. 13
SO HOW TO SOLVE THIS ?
Assessment
Native Big Data Apps — Moving
batch workloads (MR, TEZ .. ) from
YARN to Kubernetes looking into
High throughput, Low Latency and
With notion of job etc is costly.
Adaptability — Optimized for
services and running batch
workloads exposes hard-to-bridge
gaps such as run few workloads
with or without docker
Services on YARN — Services and
web farms can run on YARN
however it is not as feature-rich as
Kubernetes.
Multiple Schedulers (YARN) —
Different schedulers are focussed
on specific use cases and not very
easy to drive continuous feature
enhancements .
One cannot replace another —
Neither Kubernetes can replace
YARN or vice versa in the near
future considering some of the
fundamental architecture
differences.
EXPENSIVE
Higher cost to achieve the goal
© Cloudera, Inc. All rights reserved. 14
AGENDA
Journey through Big Data Experience
Where are we today on scheduling experience ?
Motivation to improve scheduling
Introduction to YuniKorn
Demo
Future & Open Source Story
© Cloudera, Inc. All rights reserved. 15
“WE NEED TO IMPROVE ON
WHAT THEY ARE NOT GOOD AT
TODAY ”
© Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved.
APACHE YARN WORLD
Strengthening Dynamic
Environments
• Improve YARN to work
well for cloud (public &
private)
• Focus on Autoscaling,
Smarter Scheduling etc.
Refer: YARN-9548
Improving capabilities for
persistent volumes
• Added CSI (Container
Storage Interface) support
• Enhancing CSI
implementation to expand
and support storages such
S3, Ozone etc as mounted
volume to YARN
containers
Native Service
enhancements
• Improving native
services support in
YARN
• Micro Service upgrades
What’s happening now in YARN today ?
© Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved.
KUBERNETES WORLD
Demand for Better support of
batch workloads
• Early efforts to support batch
scheduling is in progress by
K8s community.
Efforts on running Spark
on K8s
• Spark and Kubernetes
community is working
towards Spark on K8s
deployments.
• Gaps in running Spark
such dynamic resource
allocation, security etc is
still open.
CDP and Kubernetes
Cloudera
Few CDP microservices will
be running on Kubernetes
What’s happening now in KUBERNETES today ?
© Cloudera, Inc. All rights reserved. 18
MOTIVATION & EFFORTS TO IMPROVE
Not Optimized — to balance use-
cases like batch workloads to the
needs like running web farms or
services with respect to scheduling
challenges.
Poor Resource Utilization — Not able
to effectively utilize complete resources
in cluster for services and workloads.
FRAGMENTED
Multiple YARN & Kubernetes schedulers are
UNIFIED RESOURCE SCHEDULER AND
APPLICATION MANAGEMENT
What we need is an effort to improve both YARN and
Kubernetes scheduling worlds.
Multiple schedulers power YARN & Kubernetes for different use cases
© Cloudera, Inc. All rights reserved. 19
AGENDA
Journey through Big Data Experience
Where are we today on scheduling experience ?
Motivation to improve scheduling
Introduction to YuniKorn
Demo
Future & Open Source Story
© Cloudera, Inc. All rights reserved. 20
PROPOSAL
YuniKorn (/ˈyo͞ onəˌkôrn/, Y for YARN, K for K8s, uni- for
Unified)
• A common resource scheduler
• Platform independent
• Enhanced scheduling capabilities
© Cloudera, Inc. All rights reserved. 21
WHAT YuniKorn IS (IS NOT) ?
YuniKorn is
• A better scheduler for the K8s world, for services and batch workloads
• A unified scheduler for the YARN world (FiFo, Fair and Capacity Scheduler)
• Providing unified resource scheduling experience across the YARN and K8s (and beyond)
• Suitable for both finite resource (datacenter) and infinite resources/dollars (cloud) worlds
Is NOT
• A system to port YARN applications to run on K8s w/o modification, or vice versa
© Cloudera, Inc. All rights reserved. 22© Cloudera, Inc. All rights reserved.
YuniKorn - A UNIFIED RESOURCE SCHEDULER
Capacity Planning
Capacity Planning
Divide cluster resources into resource pools
(queues), define capacity range based on needs.
Enforce resource quotas and limits.
Resource scheduling
Resource fairness, preemption, high-throughput,
multi-tenant, placement, etc.
Application Management
A central place to monitor application states
Resource Monitoring
A unified view of cluster resources, a dashboard to
easy track resource usage by queue, user or
organization.
Resource
Scheduling
Application
Management
Resource
Monitoring
...
Explore the feature set
© Cloudera, Inc. All rights reserved. 23© Cloudera, Inc. All rights reserved.
CAPACITY PLANNING
Hierarchy of queues
Queues can be organized in honor of user groups or
organizations, with multiple levels.
Elastic Capacity
Each queue has its min-max capacity, usage is elastic
within this range for multiple users.
Resource Quotas
Resource cap for queues or users. Limited amount of
resources, number of applications etc.
Partition
A set of instances (nodes) that are physically isolated
© Cloudera, Inc. All rights reserved. 24© Cloudera, Inc. All rights reserved.
RESOURCE SCHEDULING
Resource Fairness
Queue/User/App level fairness
ensures each entity gets its own fair
share of resources.
Priorities
Queue priority + App priority
Preemption
Queue demands for more resources
have the chance to preempt resources
from other queues for high priority
apps.
Placement Constraints
Affinity/anti-affinity, node constraints
etc
Services
Low Latency
Long Running
Batch
High-throughput
Short-lived
YuniKorn
© Cloudera, Inc. All rights reserved. 25© Cloudera, Inc. All rights reserved.
RESOURCE MONITORING
Common dashboard to monitor resources
Hierarchy of
queues
Cluster
resources are
divided into
hierarchy of
queues, all
queue state is
visible
Common View
A common view
of resources,
cross platform.
Resource Centric
Focus on resources,
total/available/used,
and all
Resource
Dashboar
d
© Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved.
APPLICATION MANAGEMENT
Track applications in a consistent fashion
Application originated
GUI to manage
workloads (instead of
individual pods in
K8s).
Entire application
lifecycle is visible and
trackable.
© Cloudera, Inc. All rights reserved. 27© Cloudera, Inc. All rights reserved.
ARCHITECTURE
YuniKorn Core
Scheduler
Shim
Master
Scheduler Interface
Api-server etcd
kubelets
Resource
Manager
Node Managers
GPRC/API
Resource requests, new application,
node updates
GPRC/API
Container allocation, preemption
Master Node
Slave nodes
MR
Spark
Flink
Tez
MySQL
Spark
Web
Server
Kafka
Client API
Allocate, release container
© Cloudera, Inc. All rights reserved. 28
YuniKorn SCHEDULER vs. OTHERS
Disclosure: this table is summarized based on speakers’ analysis
Scheduler
Capabilities
Resource Sharing
Resource
Fairness Preemption Throughput Gang Scheduling
Hierarchy
queues
Queue
prioritoy
Queue elastic
capacity
Cross queue
fairness
User level
fairness
App level
fairness
Basic
preemption With fairness With priority
Kube-default x x x x x x v x v 100+ allocs/s x
Kube-batch x x x x x v v x v ? v
YARN CS/FS v v v v v v v v v 4k+ allocs/s x
YuniKorn v x (wip) v v v v v v x (wip) ? x (wip)
Key capabilities of a resource scheduler from our perspective
© Cloudera, Inc. All rights reserved. 29
AGENDA
Journey through Big Data Experience
Where are we today on scheduling experience ?
Motivation to improve scheduling
Introduction to YuniKorn
Demo
Future & Open Source Story
© Cloudera, Inc. All rights reserved. 30© Cloudera, Inc. All rights reserved.
DEMO
K8shim API Server
YuniKorn Core
etcd
yunikorn-pod
Master
Kubelet
● YuniKorn is deployed in ONE pod
● YuniKorn core only talks to K8shim via scheduler-interface, K8shim talks to api-server
Let’s explore what YuniKorn can do for KUBERNETES !!!
HERE WE GO...
YuniKorn UI
© Cloudera, Inc. All rights reserved. 31
AGENDA
Journey through Big Data Experience
Where are we today on scheduling experience ?
Motivation to improve scheduling
Introduction to YuniKorn
Demo
Future & Open Source Story
© Cloudera, Inc. All rights reserved. 32© Cloudera, Inc. All rights reserved.
YARN cluster
TAKE AWAY
BEFORE AFTER
Scenario 1
Scenario 2
Scenario 3
K8s cluster on cloud/prem
Existing K8s
schedulers
K8s cluster on cloud/prem
YuniKorn
scheduler
K8s cluster
Existing K8s
schedulers
YARN cluster
Capacity
Scheduler
Fair
Scheduler
K8s cluster
YuniKorn Scheduler
YARN cluster
Capacity
Scheduler
Fair
Scheduler
YARN cluster
YuniKorn
scheduler
© Cloudera, Inc. All rights reserved. 33© Cloudera, Inc. All rights reserved.
OPEN SOURCE
Yes, We are going to Open Source soon.
STAY TUNED...
© Cloudera, Inc. All rights reserved. 34© Cloudera, Inc. All rights reserved.
ACKNOWLEDGMENTS
A big shout out to the folks who helped to design, develop and make this project
possible.
❏ Vinod Kumar Vavilapalli
❏ Wangda Tan
❏ Wilfred Spiegelenburg
❏ Akhil PB
❏ Suma Shivaprasad
❏ and many others...
© Cloudera, Inc. All rights reserved.
THANK YOU

Más contenido relacionado

La actualidad más candente

Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 

La actualidad más candente (20)

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
 
Snowflake free trial_lab_guide
Snowflake free trial_lab_guideSnowflake free trial_lab_guide
Snowflake free trial_lab_guide
 
Oracle Cloud Reference Architecture
Oracle Cloud Reference ArchitectureOracle Cloud Reference Architecture
Oracle Cloud Reference Architecture
 
Oracle Cloud Infrastructure Overview Deck.pptx
Oracle Cloud Infrastructure Overview Deck.pptxOracle Cloud Infrastructure Overview Deck.pptx
Oracle Cloud Infrastructure Overview Deck.pptx
 
Oracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud InfrastructureOracle Database Migration to Oracle Cloud Infrastructure
Oracle Database Migration to Oracle Cloud Infrastructure
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
 
Hive tuning
Hive tuningHive tuning
Hive tuning
 
OCI Overview
OCI OverviewOCI Overview
OCI Overview
 
Hive Does ACID
Hive Does ACIDHive Does ACID
Hive Does ACID
 
OpenShift 4 installation
OpenShift 4 installationOpenShift 4 installation
OpenShift 4 installation
 
AWS Cloud Assessment
AWS Cloud AssessmentAWS Cloud Assessment
AWS Cloud Assessment
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Getting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on KubernetesGetting Started with Apache Spark on Kubernetes
Getting Started with Apache Spark on Kubernetes
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 

Similar a Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environment to run Mixed Workloads

C5 accelerating your journey to self-service it
C5   accelerating your journey to self-service itC5   accelerating your journey to self-service it
C5 accelerating your journey to self-service it
Dr. Wilfred Lin (Ph.D.)
 
Hybrid Cloud Keynote
Hybrid Cloud Keynote Hybrid Cloud Keynote
Hybrid Cloud Keynote
gcamarda
 

Similar a Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environment to run Mixed Workloads (20)

Cloudera DataTalks 2019 Bangalore - YuniKorn A next generation scheduler for ...
Cloudera DataTalks 2019 Bangalore - YuniKorn A next generation scheduler for ...Cloudera DataTalks 2019 Bangalore - YuniKorn A next generation scheduler for ...
Cloudera DataTalks 2019 Bangalore - YuniKorn A next generation scheduler for ...
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
 
Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2Yarns about YARN: Migrating to MapReduce v2
Yarns about YARN: Migrating to MapReduce v2
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Five Journeys to (your) Cloud Infrastructure
Five Journeys to (your) Cloud InfrastructureFive Journeys to (your) Cloud Infrastructure
Five Journeys to (your) Cloud Infrastructure
 
C5 accelerating your journey to self-service it
C5   accelerating your journey to self-service itC5   accelerating your journey to self-service it
C5 accelerating your journey to self-service it
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computing
 
Move your oracle apps to oci
Move your oracle apps to ociMove your oracle apps to oci
Move your oracle apps to oci
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
How to Guarantee High Performance for Application Data in the Cloud
How to Guarantee High Performance for Application Data in the CloudHow to Guarantee High Performance for Application Data in the Cloud
How to Guarantee High Performance for Application Data in the Cloud
 
Hybrid Cloud Keynote
Hybrid Cloud Keynote Hybrid Cloud Keynote
Hybrid Cloud Keynote
 
Migration, Protection, and Availability with AWS
Migration, Protection, and Availability with AWSMigration, Protection, and Availability with AWS
Migration, Protection, and Availability with AWS
 
Many Clouds, Many Choices (Oracle)
Many Clouds, Many Choices (Oracle) Many Clouds, Many Choices (Oracle)
Many Clouds, Many Choices (Oracle)
 
How to move to the cloud
How to move to the cloudHow to move to the cloud
How to move to the cloud
 

Más de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Next Generation Scheduling for YARN and K8s: For Hybrid Cloud/On-prem Environment to run Mixed Workloads

  • 1. © Cloudera, Inc. All rights reserved. NEXT GENERATION SCHEDULING for YARN & K8s: For Hybrid Cloud/On-prem Environment to run Mixed Workloads Sunil Govindan & Weiwei Yang
  • 2. © Cloudera, Inc. All rights reserved. 2 SPEAKER SUNIL GOVINDAN Engineering Manager @Cloudera @sunilgovind Apache Hadoop PMC & Committer WEIWEI YANG Staff Software Engineer @Cloudera @abvclouds Apache Hadoop PMC & Committer
  • 3. © Cloudera, Inc. All rights reserved. 3 AGENDA Journey through Big Data Experience Where are we today on scheduling experience ? Motivation to improve scheduling Introduction to YuniKorn Demo Future & Open Source Story
  • 4. © Cloudera, Inc. All rights reserved. 4 BATCH WORKLOADS DEEP LEARNING APPS CUSTOMER JOURNEY - BIG DATA ECOSYSTEM BIG DATA ECOSYSTEM - TODAY PUBLIC CLOUD STORAGE 100100100 101001000 010010101 STORAGECOMPUTE (on-prem/on-cloud) HIVE on LLAP SERVICES
  • 5. © Cloudera, Inc. All rights reserved. 5 BIG DATA ECOSYSTEM NEW Trends Big Data & Containerization • Portability for apps • Resource Isolation • Dependency Management Moving to Cloud for better elasticity • Budget Control • Infinite Resources • Cloud Burst Decoupled Compute & Storage • Low Cost • Hybrid in nature • Agility • Mixed Workloads
  • 6. © Cloudera, Inc. All rights reserved. 6 AGENDA Journey through Big Data Experience Where are we today on scheduling experience ? Motivation to improve scheduling Introduction to YuniKorn Demo Future & Open Source Story
  • 7. © Cloudera, Inc. All rights reserved. 7 WHERE ARE WE TODAY ? RESOURCE ORCHESTRATOR PERSPECTIVE - STRENGTHS APACHE YARN Big Data Ecosystem KUBERNETES Cloud Ecosystem CLOUD NATIVE Public Cloud Big Data—Optimized to run Big Data workloads. Batch Workloads—High Throughput scheduling for batch workloads. SLA—Better SLA for Big Data workloads. Multi tenant—Quota Management. Services—Optimized for containerized microservices. Networking—Strong Network management support. Cloud Aware—Better tuned for cloud use cases. Storage—Persistent volumes Cost—Budget Centric ∞—Infinite resource for infinite $$.
  • 8. © Cloudera, Inc. All rights reserved. 8 WHERE ARE WE TODAY ? QUICK RECAP TO THE NEW TRENDS Moving to Cloud Big Data & Containerization Mixed Workloads
  • 9. © Cloudera, Inc. All rights reserved. 9 WHERE ARE WE TODAY ? CHALLENGES - APACHE YARN APACHE YARN WORLD • Challenges in managing THREE YARN schedulers for different use cases • Should Deploy & Manage containerized microservices in the same Hadoop Cluster • Need strong networking and persistent volume support • Need more powerful Auto scaling and budget control
  • 10. © Cloudera, Inc. All rights reserved. 10 WHERE ARE WE TODAY ? CHALLENGES - KUBERNETES KUBERNETES WORLD • Challenges in running Big Data workloads along with microservices together • Need much better quota management and better SLA • No first class Application management concept for Big Data workloads
  • 11. © Cloudera, Inc. All rights reserved. 11 “WHERE IS MY SILVER BULLET TO SOLVE ALL THESE CHALLENGES? ”
  • 12. © Cloudera, Inc. All rights reserved. 12 “SORRY, THERE ISN’T ANY”
  • 13. © Cloudera, Inc. All rights reserved. 13 SO HOW TO SOLVE THIS ? Assessment Native Big Data Apps — Moving batch workloads (MR, TEZ .. ) from YARN to Kubernetes looking into High throughput, Low Latency and With notion of job etc is costly. Adaptability — Optimized for services and running batch workloads exposes hard-to-bridge gaps such as run few workloads with or without docker Services on YARN — Services and web farms can run on YARN however it is not as feature-rich as Kubernetes. Multiple Schedulers (YARN) — Different schedulers are focussed on specific use cases and not very easy to drive continuous feature enhancements . One cannot replace another — Neither Kubernetes can replace YARN or vice versa in the near future considering some of the fundamental architecture differences. EXPENSIVE Higher cost to achieve the goal
  • 14. © Cloudera, Inc. All rights reserved. 14 AGENDA Journey through Big Data Experience Where are we today on scheduling experience ? Motivation to improve scheduling Introduction to YuniKorn Demo Future & Open Source Story
  • 15. © Cloudera, Inc. All rights reserved. 15 “WE NEED TO IMPROVE ON WHAT THEY ARE NOT GOOD AT TODAY ”
  • 16. © Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved. APACHE YARN WORLD Strengthening Dynamic Environments • Improve YARN to work well for cloud (public & private) • Focus on Autoscaling, Smarter Scheduling etc. Refer: YARN-9548 Improving capabilities for persistent volumes • Added CSI (Container Storage Interface) support • Enhancing CSI implementation to expand and support storages such S3, Ozone etc as mounted volume to YARN containers Native Service enhancements • Improving native services support in YARN • Micro Service upgrades What’s happening now in YARN today ?
  • 17. © Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved. KUBERNETES WORLD Demand for Better support of batch workloads • Early efforts to support batch scheduling is in progress by K8s community. Efforts on running Spark on K8s • Spark and Kubernetes community is working towards Spark on K8s deployments. • Gaps in running Spark such dynamic resource allocation, security etc is still open. CDP and Kubernetes Cloudera Few CDP microservices will be running on Kubernetes What’s happening now in KUBERNETES today ?
  • 18. © Cloudera, Inc. All rights reserved. 18 MOTIVATION & EFFORTS TO IMPROVE Not Optimized — to balance use- cases like batch workloads to the needs like running web farms or services with respect to scheduling challenges. Poor Resource Utilization — Not able to effectively utilize complete resources in cluster for services and workloads. FRAGMENTED Multiple YARN & Kubernetes schedulers are UNIFIED RESOURCE SCHEDULER AND APPLICATION MANAGEMENT What we need is an effort to improve both YARN and Kubernetes scheduling worlds. Multiple schedulers power YARN & Kubernetes for different use cases
  • 19. © Cloudera, Inc. All rights reserved. 19 AGENDA Journey through Big Data Experience Where are we today on scheduling experience ? Motivation to improve scheduling Introduction to YuniKorn Demo Future & Open Source Story
  • 20. © Cloudera, Inc. All rights reserved. 20 PROPOSAL YuniKorn (/ˈyo͞ onəˌkôrn/, Y for YARN, K for K8s, uni- for Unified) • A common resource scheduler • Platform independent • Enhanced scheduling capabilities
  • 21. © Cloudera, Inc. All rights reserved. 21 WHAT YuniKorn IS (IS NOT) ? YuniKorn is • A better scheduler for the K8s world, for services and batch workloads • A unified scheduler for the YARN world (FiFo, Fair and Capacity Scheduler) • Providing unified resource scheduling experience across the YARN and K8s (and beyond) • Suitable for both finite resource (datacenter) and infinite resources/dollars (cloud) worlds Is NOT • A system to port YARN applications to run on K8s w/o modification, or vice versa
  • 22. © Cloudera, Inc. All rights reserved. 22© Cloudera, Inc. All rights reserved. YuniKorn - A UNIFIED RESOURCE SCHEDULER Capacity Planning Capacity Planning Divide cluster resources into resource pools (queues), define capacity range based on needs. Enforce resource quotas and limits. Resource scheduling Resource fairness, preemption, high-throughput, multi-tenant, placement, etc. Application Management A central place to monitor application states Resource Monitoring A unified view of cluster resources, a dashboard to easy track resource usage by queue, user or organization. Resource Scheduling Application Management Resource Monitoring ... Explore the feature set
  • 23. © Cloudera, Inc. All rights reserved. 23© Cloudera, Inc. All rights reserved. CAPACITY PLANNING Hierarchy of queues Queues can be organized in honor of user groups or organizations, with multiple levels. Elastic Capacity Each queue has its min-max capacity, usage is elastic within this range for multiple users. Resource Quotas Resource cap for queues or users. Limited amount of resources, number of applications etc. Partition A set of instances (nodes) that are physically isolated
  • 24. © Cloudera, Inc. All rights reserved. 24© Cloudera, Inc. All rights reserved. RESOURCE SCHEDULING Resource Fairness Queue/User/App level fairness ensures each entity gets its own fair share of resources. Priorities Queue priority + App priority Preemption Queue demands for more resources have the chance to preempt resources from other queues for high priority apps. Placement Constraints Affinity/anti-affinity, node constraints etc Services Low Latency Long Running Batch High-throughput Short-lived YuniKorn
  • 25. © Cloudera, Inc. All rights reserved. 25© Cloudera, Inc. All rights reserved. RESOURCE MONITORING Common dashboard to monitor resources Hierarchy of queues Cluster resources are divided into hierarchy of queues, all queue state is visible Common View A common view of resources, cross platform. Resource Centric Focus on resources, total/available/used, and all Resource Dashboar d
  • 26. © Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved. APPLICATION MANAGEMENT Track applications in a consistent fashion Application originated GUI to manage workloads (instead of individual pods in K8s). Entire application lifecycle is visible and trackable.
  • 27. © Cloudera, Inc. All rights reserved. 27© Cloudera, Inc. All rights reserved. ARCHITECTURE YuniKorn Core Scheduler Shim Master Scheduler Interface Api-server etcd kubelets Resource Manager Node Managers GPRC/API Resource requests, new application, node updates GPRC/API Container allocation, preemption Master Node Slave nodes MR Spark Flink Tez MySQL Spark Web Server Kafka Client API Allocate, release container
  • 28. © Cloudera, Inc. All rights reserved. 28 YuniKorn SCHEDULER vs. OTHERS Disclosure: this table is summarized based on speakers’ analysis Scheduler Capabilities Resource Sharing Resource Fairness Preemption Throughput Gang Scheduling Hierarchy queues Queue prioritoy Queue elastic capacity Cross queue fairness User level fairness App level fairness Basic preemption With fairness With priority Kube-default x x x x x x v x v 100+ allocs/s x Kube-batch x x x x x v v x v ? v YARN CS/FS v v v v v v v v v 4k+ allocs/s x YuniKorn v x (wip) v v v v v v x (wip) ? x (wip) Key capabilities of a resource scheduler from our perspective
  • 29. © Cloudera, Inc. All rights reserved. 29 AGENDA Journey through Big Data Experience Where are we today on scheduling experience ? Motivation to improve scheduling Introduction to YuniKorn Demo Future & Open Source Story
  • 30. © Cloudera, Inc. All rights reserved. 30© Cloudera, Inc. All rights reserved. DEMO K8shim API Server YuniKorn Core etcd yunikorn-pod Master Kubelet ● YuniKorn is deployed in ONE pod ● YuniKorn core only talks to K8shim via scheduler-interface, K8shim talks to api-server Let’s explore what YuniKorn can do for KUBERNETES !!! HERE WE GO... YuniKorn UI
  • 31. © Cloudera, Inc. All rights reserved. 31 AGENDA Journey through Big Data Experience Where are we today on scheduling experience ? Motivation to improve scheduling Introduction to YuniKorn Demo Future & Open Source Story
  • 32. © Cloudera, Inc. All rights reserved. 32© Cloudera, Inc. All rights reserved. YARN cluster TAKE AWAY BEFORE AFTER Scenario 1 Scenario 2 Scenario 3 K8s cluster on cloud/prem Existing K8s schedulers K8s cluster on cloud/prem YuniKorn scheduler K8s cluster Existing K8s schedulers YARN cluster Capacity Scheduler Fair Scheduler K8s cluster YuniKorn Scheduler YARN cluster Capacity Scheduler Fair Scheduler YARN cluster YuniKorn scheduler
  • 33. © Cloudera, Inc. All rights reserved. 33© Cloudera, Inc. All rights reserved. OPEN SOURCE Yes, We are going to Open Source soon. STAY TUNED...
  • 34. © Cloudera, Inc. All rights reserved. 34© Cloudera, Inc. All rights reserved. ACKNOWLEDGMENTS A big shout out to the folks who helped to design, develop and make this project possible. ❏ Vinod Kumar Vavilapalli ❏ Wangda Tan ❏ Wilfred Spiegelenburg ❏ Akhil PB ❏ Suma Shivaprasad ❏ and many others...
  • 35. © Cloudera, Inc. All rights reserved. THANK YOU