SlideShare a Scribd company logo
1 of 37
Download to read offline
Towards the True Elasticity
of Spark
Michael Le and Min Li
IBM,	
  T.J.	
  Watson	
  Research	
  Center	
  
Auto-Scaling
•  Detects changes of resource usage in current workloads →
Dynamically allocate/de-allocate resources
•  Meets SLA requirements at reduced cost
•  Existing auto-scaling approaches react slowly and often miss
optimization opportunities
•  YARN and Mesos have initial auto-scaling support, yet how
workloads can benefit from the capability?
Infrastructure as a Service"
Resource Manager (YARN/Mesos)"
Main Focus
•  Analyze how scaling affects Spark workloads
–  Is simply adding new resources sufficient for performance
improvement?
•  Analyze pros/cons of current Spark auto-scaling mechanism
–  Are there rooms for performance improvement?
Agenda
•  Introduction
•  Evaluation setup
•  Impact of scaling
•  Auto-scaling in Spark
•  Lessons and future work
Experimental Setup
•  Baseline setup (6 nodes):
•  Node: 4 CPUs, 8GB, 100 GB HDDs
•  YARN executor: 3GB, 1CPU
•  Mesos executor: 6GB and 4CPUs
•  4 benchmarks (SparkBench*):
–  Kmeans (input 37GB)
–  Page Rank (input 1.1GB)
–  Spark SQL: SQL queries (input 39GB)
–  Logistic Regression (input 47GB)
•  Scaling up: add 4 new nodes “instantaneously”
•  Wait ~45 seconds after benchmark run
•  Scale down – wait 3min after benchmark run
*	
  h6ps://bitbucket.org/lm0926/sparkbench	
  
Mechanisms for Scaling
•  Scale up – add new node “instantaneously” (VM already
provisioned)
–  Run YARN node manager or Mesos slave daemon
–  New nodes are Task nodes (no HDFS component)
–  Scale down – kill resource manager processes
•  Spark on YARN – set total executors for app higher than
available in cluster to ensure executors get launched
•  Spark on Mesos – make use of new resource offers from
Mesos
Agenda
•  Introduction
•  Evaluation setup
•  Impact of scaling
•  Auto-scaling in Spark
•  Lessons and future work
Runtime – Spark on YARN
Benchmark! Runtime
(baseline)!
Runtime !
(scale out)!
Runtime
Reduction!
Kmeans" 54min 44sec" 29min 26sec" 46%"
Page Rank" 9min 28sec" 8min 18sec" 12.3%"
Spark SQL queries" 15min 45sec" 13min 11sec" 16.3%"
LogisticRegression" 13min 10sec" 12min 55sec" 2%"
Similar behavior seen on Mesos
Why the variation?
•  Delay scheduling preventing tasks to be scheduled on
new node
Benchmark! Total
tasks!
% tasks locality
preference
RACK or ANY!
% tasks wait >3s
to be scheduled!
% tasks on new
nodes!
KMeans" 7800" 0.4%" 3.5%" 24.5%"
Page Rank" 21600" 0%" 0.01%" 13.5%"
Spark SQL" 9805" 3%" 0.2%" 11.8%"
LogisticRegression" 6637" 0%" 0.1%" 1.5%"
Tuning Spark for Scaling
•  locality wait time
–  How soon to change locality preference of tasks in
stage
•  resource revive interval
–  How soon to inform scheduler a resource that has not
been used is still available
Runtime – Tuning Spark
Benchmark! Runtime
(baseline)!
Runtime
(scale out)!
Runtime (scale
out– revive
interval 100ms)!
Runtime (scale
out– locality wait
time 100ms)!
Runtime (scale
out–locality wait
time 0ms)!
KMeans" 54min 44sec" 29min 26sec" 30min 39sec" 11min 32sec" 14min 1sec"
Page Rank" 9min 28sec" 8min 18sec" 8min 35sec" 7min 35sec" 5min 57sec"
Spark SQL queries" 15min 45sec" 13min 11sec" 12min 55sec" 11min 40sec" 19min 57sec"
LogisticRegression" 13min 10sec" 12min 55sec" 12min 43sec" 6min 54sec" 9min 12sec"
Benchmark! % tasks on new
node (locality wait
time 3s)!
% tasks on new node
(locality wait time
100ms)!
% tasks on new node
(locality wait time
0ms)!
KMeans" 24.5%" 39.1%" 38%"
Page Rank" 13.5%" 16.1%" 39%"
Spark SQL queries" 11.8%" 22.6%" 39%"
LogisticRegression" 1.5%" 38%" 39%"
KMeans - CPU Utilization Per Node
Scale out (locality wait - 100ms)
1 of the 4 new nodes
Base line (1 of the 6 base nodes) Scale out (locality wait - 100ms)
1 of the 6 base nodes
Scale out (locality wait - 0ms)
1 of the 6 base nodes
Scale out (locality wait - 0ms)
1 of the 4 new nodes
KMeans - Network Utilization
10 MB shuffle read
12 MB shuffle write
Baseline
10 MB shuffle read
11 MB shuffle write
Scale out (locality wait - 100ms)
Greater bandwidth consumption due to transferring of RDD partitions
Scale out (locality wait - 0ms)
10 MB shuffle read
12 MB shuffle write
KMeans - Memory Utilization
Baseline KMeans – scale out (locality wait - 100ms)
KMeans – scale out (locality wait - 0ms)
PageRank - CPU Utilization
Scale out (locality wait - 100ms)
1 of the 4 new nodes
Base line (1 of the 6 nodes) Scale out (locality wait - 100ms)
1 of the 6 nodes
Scale out (locality wait - 0ms)
1 of the 4 new nodes
Scale out (locality wait - 0ms)
1 of the 6 nodes
PageRank - Network Utilization
15.50 GB shuffle read
16.71 GB shuffle write
Baseline
16.07 GB shuffle read
16.71 GB shuffle write
Scale out (locality wait - 100ms)
Greater bandwidth consumption due to transferring of RDD partitions
16.10 GB shuffle read
16.73 GB shuffle write
Scale out (locality wait - 0ms)
PageRank - Memory Utilization
Baseline Scale out (locality wait - 100ms)
Scale out (locality wait - 0ms)
SQL – Network Utilization
Scale out (locality wait - 0ms)Scale out (locality wait - 100ms)
LogisticRegression – Network Utilization
Scale out (locality wait - 0ms)Scale out (locality wait - 100ms)
Take-aways
•  Locality wait time is key to improvement
– Tune during runtime?
– Adjust during scaling to force use of new
nodes
•  Need to consider gains from running task
on new nodes vs. network bandwidth used
Scaling Down
Benchmark" Runtime – Baseline
(10 nodes)"
Runtime – "
Scale in"
KMeans" 11min 54sec" 49min 13sec"
PageRank" 11min 41sec" 13min 12sec"
Spark SQL" 12min 50sec" 15min 52sec"
LogisticRegression" 10min 20sec" 14min 7sec"
Re-execution overhead worst for some workloads
Scaling Down with Mesos/YARN
•  Prevent Spark from scheduling more tasks on
nodes that are selected to removed
–  Mesos fine-grained, simply not offer resource
from node selected for removal
–  Mesos coarse-grained and YARN, requires
cooperation from Spark to not schedule new
tasks on nodes selected for removal
•  Shutdown node once tasks are drained
–  What to do about stored shuffle data?
Agenda
•  Introduction
•  Background
•  Evaluation setup
•  Impact of scaling
•  Auto-scaling in Spark
•  Lessons and future work
Existing Auto-scaling Mechanism
•  Dynamic Executor Allocation (DEA)
–  Works only with YARN
–  Spark request new executors after a fixed time interval
when there are still pending tasks
–  Number of requested executors grow exponentially
•  Works but does have some potential inefficiency
Improving Auto-scaling
•  Main reason tasks are not scheduled on new node
is data locality preferences
–  Delay scheduling preventing tasks from being
run on new node
•  Approach:
–  Change locality wait time dynamically during
application runtime
–  Ideally, reduce locality wait time at point of scale
out, then after stabilizes, revert to previous
locality wait time value
Improving Auto-scaling Details
•  Ideally: tasks spread evenly among all executors
–  average # of tasks per executor:
•  T = Total tasks / # executors
•  ti = # of tasks per executori
•  If ti < alpha*T, for s seconds, then change locality
wait time
•  If ti is still below threshold after r seconds from first
change of locality time, then remove executor
•  If no executors below threshold, reset locality wait
time to initial value
Dynamically Adjusting Locality Wait Time at
Runtime with Dynamic Allocator Execution
Benchmark! Runtime 

(scale out w/
DEA)!
!
% tasks on new
node (locality
wait time 3s)!
Runtime!
(scale out w/
DEA and runtime
adjustment of
locality wait time)!
% tasks on new
node (dynamic
locality wait time)!
KMeans" 32min 15sec" 28.6%" 20min 35sec" 35%"
PageRank" 12min 2sec" 12.9%" 11min 23sec" 14%"
Spark SQL" 12min 48sec" 11.5%" 12min 26sec" 16%"
Logistic

Regression"
11min 58sec" 0.7%" 12min 16sec" 3.3%"
Parameters: alpha = 30%, s = 5sec, new locality wait time = 100ms
Mechanism helps at increasing % tasks on new nodes
Simple Auto-scaling Improvements
Existing
Mechanism
Description Drawbacks Proposal
When to scale? Backlogged tasks
exist for n secs
Request CPU
resources when
Tasks might be I/O
bound
CPU/memory of more than 50%
nodes are greater than a threshold t
(e.g., 98%) over n seconds
How many more
VMs?
Increase
exponentially number
of executors until
configured max or #
of pending tasks
Initially under request n% of the task queue length
Whether to scale
(still beneficial)?
Always scale if above
condition met
Scaling unnecessary
if near end or short
runtime
Option1: According to the ratio of
unprocessed data: e.g., < 80%.
Rough estimation of job execution
time: proportional to the data
process rate
Option2: Model driven – predict
runtime based on previous runs
Agenda
•  Introduction
•  Evaluation setup
•  Impact of scaling
•  Auto-scaling in Spark
•  Lessons and future work
Lessons
•  Naïve scaling can help but effectiveness varies greatly
across different workloads
•  Why some workloads do not respond well to scaling?
–  Delay scheduling preventing new nodes to be utilized
•  Dynamic executor allocation works but can be improved
–  Dynamically changing locality wait time can be
effective
•  Overhead of transferring RDD partitions can reduce
benefit of scaling
Future Work
•  Study scaling effects given multiple
simultaneous workloads
•  Implement better support for scaling down
•  Enhance DEA to make use of resource
monitors and job runtime prediction
Backup
Delay Scheduling
•  Intended for fixed-size clusters running
multiple workloads with short tasks
•  Emphasis on scheduling task on nodes
containing data
– Wait short time for resources to free up
on nodes containing data rather than run
task on node available now but further
away from data
Resource Managers + Auto-scaling
•  Resource management
→ different frameworks can coexist
→ high resource utilization
•  Facilitates on-demand resource allocation – support
elastic services
Infrastructure as a Service"
Resource Manager (YARN/
Mesos)"
Brief Intro: YARN
•  Resource Manager (RM) controls resource allocation
•  Application Master (AM) negotiates with RM for
resources and launch executors to run jobs
Brief Intro: Mesos
•  Framework schedulers accept or
reject offered resource
•  Resource preferences are
communicated to Mesos thru common
APIs
•  Coarse-grained mode
–  Mesos launches one long-running Spark
executor on each node to execute all Spark
tasks
•  Fine-grained mode
–  Mesos launches executor for each Spark
task
Runtime – Spark on Mesos
Benchmark! Runtime (baseline)! Runtime (scale out)!
Fine-grained" KMeans" 92min 54sec" 35min 37sec"
Page Rank" 19min 29sec" 16min 55 sec"
Spark SQL queries" 24min 45sec" 19min 7sec"
LogisticRegression" 14min 43sec" 14min 39sec"
Coarse-grained" KMeans" 89min 24sec" 11min 49sec"
Page Rank" 8min 29sec" 7min 41sec"
Spark SQL queries" 10min 32sec" 8min 30sec"
LogisticRegression" 10min 57sec" 7min 56sec"

More Related Content

What's hot

What's hot (20)

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
 
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
 
Processing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And ToilProcessing 70Tb Of Genomics Data With ADAM And Toil
Processing 70Tb Of Genomics Data With ADAM And Toil
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
SSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine LearningSSR: Structured Streaming for R and Machine Learning
SSR: Structured Streaming for R and Machine Learning
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
 
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
 
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service700 Updatable Queries Per Second: Spark as a Real-Time Web Service
700 Updatable Queries Per Second: Spark as a Real-Time Web Service
 
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
 
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
Extending the R API for Spark with sparklyr and Microsoft R Server with Ali Z...
 
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
Time Series Analytics with Spark: Spark Summit East talk by Simon Ouellette
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
 
Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on Databricks
 

Viewers also liked

Цветочные легенды
Цветочные легендыЦветочные легенды
Цветочные легенды
Ninel Kek
 
Римский корсаков снегурочка
Римский корсаков снегурочкаРимский корсаков снегурочка
Римский корсаков снегурочка
Ninel Kek
 
High Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSHigh Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRS
Jonathan Oliver
 
правописание приставок урок№4
правописание приставок урок№4правописание приставок урок№4
правописание приставок урок№4
HomichAlla
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
Apache Apex
 

Viewers also liked (20)

Цветочные легенды
Цветочные легендыЦветочные легенды
Цветочные легенды
 
Римский корсаков снегурочка
Римский корсаков снегурочкаРимский корсаков снегурочка
Римский корсаков снегурочка
 
High Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRSHigh Performance Distributed Systems with CQRS
High Performance Distributed Systems with CQRS
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
правописание приставок урок№4
правописание приставок урок№4правописание приставок урок№4
правописание приставок урок№4
 
бсп (обоб. урок)
бсп (обоб. урок)бсп (обоб. урок)
бсп (обоб. урок)
 
Troubleshooting mysql-tutorial
Troubleshooting mysql-tutorialTroubleshooting mysql-tutorial
Troubleshooting mysql-tutorial
 
Windowing in Apache Apex
Windowing in Apache ApexWindowing in Apache Apex
Windowing in Apache Apex
 
The 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy CodeThe 5 People in your Organization that grow Legacy Code
The 5 People in your Organization that grow Legacy Code
 
Hadoop File System Shell Commands,
Hadoop File System Shell Commands,Hadoop File System Shell Commands,
Hadoop File System Shell Commands,
 
Hadoop basic commands
Hadoop basic commandsHadoop basic commands
Hadoop basic commands
 
Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application  Introduction to Apache Apex and writing a big data streaming application
Introduction to Apache Apex and writing a big data streaming application
 
Build your shiny new pc, with Pangoly
Build your shiny new pc, with PangolyBuild your shiny new pc, with Pangoly
Build your shiny new pc, with Pangoly
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Introduction to UNIX Command-Lines with examples
Introduction to UNIX Command-Lines with examplesIntroduction to UNIX Command-Lines with examples
Introduction to UNIX Command-Lines with examples
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 

Similar to Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)

Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
Oh Chan Kwon
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
Databricks
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Databricks
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDownscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Databricks
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Databricks
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
DataWorks Summit
 

Similar to Towards True Elasticity of Spark-(Michael Le and Min Li, IBM) (20)

Spark cep
Spark cepSpark cep
Spark cep
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 
Tackling Scaling Challenges of Apache Spark at LinkedIn
Tackling Scaling Challenges of Apache Spark at LinkedInTackling Scaling Challenges of Apache Spark at LinkedIn
Tackling Scaling Challenges of Apache Spark at LinkedIn
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
 
Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017 Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017
 
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark ClustersDownscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
 
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedInMagnet Shuffle Service: Push-based Shuffle at LinkedIn
Magnet Shuffle Service: Push-based Shuffle at LinkedIn
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 

More from Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Recently uploaded (20)

Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 

Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)

  • 1. Towards the True Elasticity of Spark Michael Le and Min Li IBM,  T.J.  Watson  Research  Center  
  • 2. Auto-Scaling •  Detects changes of resource usage in current workloads → Dynamically allocate/de-allocate resources •  Meets SLA requirements at reduced cost •  Existing auto-scaling approaches react slowly and often miss optimization opportunities •  YARN and Mesos have initial auto-scaling support, yet how workloads can benefit from the capability? Infrastructure as a Service" Resource Manager (YARN/Mesos)"
  • 3. Main Focus •  Analyze how scaling affects Spark workloads –  Is simply adding new resources sufficient for performance improvement? •  Analyze pros/cons of current Spark auto-scaling mechanism –  Are there rooms for performance improvement?
  • 4. Agenda •  Introduction •  Evaluation setup •  Impact of scaling •  Auto-scaling in Spark •  Lessons and future work
  • 5. Experimental Setup •  Baseline setup (6 nodes): •  Node: 4 CPUs, 8GB, 100 GB HDDs •  YARN executor: 3GB, 1CPU •  Mesos executor: 6GB and 4CPUs •  4 benchmarks (SparkBench*): –  Kmeans (input 37GB) –  Page Rank (input 1.1GB) –  Spark SQL: SQL queries (input 39GB) –  Logistic Regression (input 47GB) •  Scaling up: add 4 new nodes “instantaneously” •  Wait ~45 seconds after benchmark run •  Scale down – wait 3min after benchmark run *  h6ps://bitbucket.org/lm0926/sparkbench  
  • 6. Mechanisms for Scaling •  Scale up – add new node “instantaneously” (VM already provisioned) –  Run YARN node manager or Mesos slave daemon –  New nodes are Task nodes (no HDFS component) –  Scale down – kill resource manager processes •  Spark on YARN – set total executors for app higher than available in cluster to ensure executors get launched •  Spark on Mesos – make use of new resource offers from Mesos
  • 7. Agenda •  Introduction •  Evaluation setup •  Impact of scaling •  Auto-scaling in Spark •  Lessons and future work
  • 8. Runtime – Spark on YARN Benchmark! Runtime (baseline)! Runtime ! (scale out)! Runtime Reduction! Kmeans" 54min 44sec" 29min 26sec" 46%" Page Rank" 9min 28sec" 8min 18sec" 12.3%" Spark SQL queries" 15min 45sec" 13min 11sec" 16.3%" LogisticRegression" 13min 10sec" 12min 55sec" 2%" Similar behavior seen on Mesos
  • 9. Why the variation? •  Delay scheduling preventing tasks to be scheduled on new node Benchmark! Total tasks! % tasks locality preference RACK or ANY! % tasks wait >3s to be scheduled! % tasks on new nodes! KMeans" 7800" 0.4%" 3.5%" 24.5%" Page Rank" 21600" 0%" 0.01%" 13.5%" Spark SQL" 9805" 3%" 0.2%" 11.8%" LogisticRegression" 6637" 0%" 0.1%" 1.5%"
  • 10. Tuning Spark for Scaling •  locality wait time –  How soon to change locality preference of tasks in stage •  resource revive interval –  How soon to inform scheduler a resource that has not been used is still available
  • 11. Runtime – Tuning Spark Benchmark! Runtime (baseline)! Runtime (scale out)! Runtime (scale out– revive interval 100ms)! Runtime (scale out– locality wait time 100ms)! Runtime (scale out–locality wait time 0ms)! KMeans" 54min 44sec" 29min 26sec" 30min 39sec" 11min 32sec" 14min 1sec" Page Rank" 9min 28sec" 8min 18sec" 8min 35sec" 7min 35sec" 5min 57sec" Spark SQL queries" 15min 45sec" 13min 11sec" 12min 55sec" 11min 40sec" 19min 57sec" LogisticRegression" 13min 10sec" 12min 55sec" 12min 43sec" 6min 54sec" 9min 12sec" Benchmark! % tasks on new node (locality wait time 3s)! % tasks on new node (locality wait time 100ms)! % tasks on new node (locality wait time 0ms)! KMeans" 24.5%" 39.1%" 38%" Page Rank" 13.5%" 16.1%" 39%" Spark SQL queries" 11.8%" 22.6%" 39%" LogisticRegression" 1.5%" 38%" 39%"
  • 12. KMeans - CPU Utilization Per Node Scale out (locality wait - 100ms) 1 of the 4 new nodes Base line (1 of the 6 base nodes) Scale out (locality wait - 100ms) 1 of the 6 base nodes Scale out (locality wait - 0ms) 1 of the 6 base nodes Scale out (locality wait - 0ms) 1 of the 4 new nodes
  • 13. KMeans - Network Utilization 10 MB shuffle read 12 MB shuffle write Baseline 10 MB shuffle read 11 MB shuffle write Scale out (locality wait - 100ms) Greater bandwidth consumption due to transferring of RDD partitions Scale out (locality wait - 0ms) 10 MB shuffle read 12 MB shuffle write
  • 14. KMeans - Memory Utilization Baseline KMeans – scale out (locality wait - 100ms) KMeans – scale out (locality wait - 0ms)
  • 15. PageRank - CPU Utilization Scale out (locality wait - 100ms) 1 of the 4 new nodes Base line (1 of the 6 nodes) Scale out (locality wait - 100ms) 1 of the 6 nodes Scale out (locality wait - 0ms) 1 of the 4 new nodes Scale out (locality wait - 0ms) 1 of the 6 nodes
  • 16. PageRank - Network Utilization 15.50 GB shuffle read 16.71 GB shuffle write Baseline 16.07 GB shuffle read 16.71 GB shuffle write Scale out (locality wait - 100ms) Greater bandwidth consumption due to transferring of RDD partitions 16.10 GB shuffle read 16.73 GB shuffle write Scale out (locality wait - 0ms)
  • 17. PageRank - Memory Utilization Baseline Scale out (locality wait - 100ms) Scale out (locality wait - 0ms)
  • 18. SQL – Network Utilization Scale out (locality wait - 0ms)Scale out (locality wait - 100ms)
  • 19. LogisticRegression – Network Utilization Scale out (locality wait - 0ms)Scale out (locality wait - 100ms)
  • 20. Take-aways •  Locality wait time is key to improvement – Tune during runtime? – Adjust during scaling to force use of new nodes •  Need to consider gains from running task on new nodes vs. network bandwidth used
  • 21. Scaling Down Benchmark" Runtime – Baseline (10 nodes)" Runtime – " Scale in" KMeans" 11min 54sec" 49min 13sec" PageRank" 11min 41sec" 13min 12sec" Spark SQL" 12min 50sec" 15min 52sec" LogisticRegression" 10min 20sec" 14min 7sec" Re-execution overhead worst for some workloads
  • 22. Scaling Down with Mesos/YARN •  Prevent Spark from scheduling more tasks on nodes that are selected to removed –  Mesos fine-grained, simply not offer resource from node selected for removal –  Mesos coarse-grained and YARN, requires cooperation from Spark to not schedule new tasks on nodes selected for removal •  Shutdown node once tasks are drained –  What to do about stored shuffle data?
  • 23. Agenda •  Introduction •  Background •  Evaluation setup •  Impact of scaling •  Auto-scaling in Spark •  Lessons and future work
  • 24. Existing Auto-scaling Mechanism •  Dynamic Executor Allocation (DEA) –  Works only with YARN –  Spark request new executors after a fixed time interval when there are still pending tasks –  Number of requested executors grow exponentially •  Works but does have some potential inefficiency
  • 25. Improving Auto-scaling •  Main reason tasks are not scheduled on new node is data locality preferences –  Delay scheduling preventing tasks from being run on new node •  Approach: –  Change locality wait time dynamically during application runtime –  Ideally, reduce locality wait time at point of scale out, then after stabilizes, revert to previous locality wait time value
  • 26. Improving Auto-scaling Details •  Ideally: tasks spread evenly among all executors –  average # of tasks per executor: •  T = Total tasks / # executors •  ti = # of tasks per executori •  If ti < alpha*T, for s seconds, then change locality wait time •  If ti is still below threshold after r seconds from first change of locality time, then remove executor •  If no executors below threshold, reset locality wait time to initial value
  • 27. Dynamically Adjusting Locality Wait Time at Runtime with Dynamic Allocator Execution Benchmark! Runtime 
 (scale out w/ DEA)! ! % tasks on new node (locality wait time 3s)! Runtime! (scale out w/ DEA and runtime adjustment of locality wait time)! % tasks on new node (dynamic locality wait time)! KMeans" 32min 15sec" 28.6%" 20min 35sec" 35%" PageRank" 12min 2sec" 12.9%" 11min 23sec" 14%" Spark SQL" 12min 48sec" 11.5%" 12min 26sec" 16%" Logistic
 Regression" 11min 58sec" 0.7%" 12min 16sec" 3.3%" Parameters: alpha = 30%, s = 5sec, new locality wait time = 100ms Mechanism helps at increasing % tasks on new nodes
  • 28. Simple Auto-scaling Improvements Existing Mechanism Description Drawbacks Proposal When to scale? Backlogged tasks exist for n secs Request CPU resources when Tasks might be I/O bound CPU/memory of more than 50% nodes are greater than a threshold t (e.g., 98%) over n seconds How many more VMs? Increase exponentially number of executors until configured max or # of pending tasks Initially under request n% of the task queue length Whether to scale (still beneficial)? Always scale if above condition met Scaling unnecessary if near end or short runtime Option1: According to the ratio of unprocessed data: e.g., < 80%. Rough estimation of job execution time: proportional to the data process rate Option2: Model driven – predict runtime based on previous runs
  • 29. Agenda •  Introduction •  Evaluation setup •  Impact of scaling •  Auto-scaling in Spark •  Lessons and future work
  • 30. Lessons •  Naïve scaling can help but effectiveness varies greatly across different workloads •  Why some workloads do not respond well to scaling? –  Delay scheduling preventing new nodes to be utilized •  Dynamic executor allocation works but can be improved –  Dynamically changing locality wait time can be effective •  Overhead of transferring RDD partitions can reduce benefit of scaling
  • 31. Future Work •  Study scaling effects given multiple simultaneous workloads •  Implement better support for scaling down •  Enhance DEA to make use of resource monitors and job runtime prediction
  • 33. Delay Scheduling •  Intended for fixed-size clusters running multiple workloads with short tasks •  Emphasis on scheduling task on nodes containing data – Wait short time for resources to free up on nodes containing data rather than run task on node available now but further away from data
  • 34. Resource Managers + Auto-scaling •  Resource management → different frameworks can coexist → high resource utilization •  Facilitates on-demand resource allocation – support elastic services Infrastructure as a Service" Resource Manager (YARN/ Mesos)"
  • 35. Brief Intro: YARN •  Resource Manager (RM) controls resource allocation •  Application Master (AM) negotiates with RM for resources and launch executors to run jobs
  • 36. Brief Intro: Mesos •  Framework schedulers accept or reject offered resource •  Resource preferences are communicated to Mesos thru common APIs •  Coarse-grained mode –  Mesos launches one long-running Spark executor on each node to execute all Spark tasks •  Fine-grained mode –  Mesos launches executor for each Spark task
  • 37. Runtime – Spark on Mesos Benchmark! Runtime (baseline)! Runtime (scale out)! Fine-grained" KMeans" 92min 54sec" 35min 37sec" Page Rank" 19min 29sec" 16min 55 sec" Spark SQL queries" 24min 45sec" 19min 7sec" LogisticRegression" 14min 43sec" 14min 39sec" Coarse-grained" KMeans" 89min 24sec" 11min 49sec" Page Rank" 8min 29sec" 7min 41sec" Spark SQL queries" 10min 32sec" 8min 30sec" LogisticRegression" 10min 57sec" 7min 56sec"