SlideShare una empresa de Scribd logo
1 de 22
SCALE| SIMPLIFY| OPTIMIZE | EVOLVE
4/15/2016 TidalScale Proprietary & Confidential 1
Comparing a Virtual Supercomputer
with a Cluster for Spark in-memory
Computations
Ike Nassi
Ike.nassi@tidalscale.com
Why Run Spark?
Spark originated as in-memory alternative to Hadoop
Run huge analytics on clusters of commodity servers
Enjoy the hardware economy of “scale-out”
Apply a rich set of transformations and actions
Operate out of memory as much as possible
4/15/2016 TidalScale Proprietary & Confidential 2
Today’s Conundrum:
Scale Up vs. Scale Out?
4/15/2016 TidalScale Proprietary & Confidential 3
Scale Up Scale Out
Software Simplicity
HW Cost
?
✔ ✗
✗ ✔
TidalScale – The Best of Both
4/15/2016 TidalScale Proprietary & Confidential 4
Software Simplicity HW Cost
✔ ✔
Easy to say, but this is a ridiculously difficult problem!
Key takeaways
Simplicity of Scale up:
• We allow the simplicity of scale-up – you can run multi-
terabyte analytics on a single Spark node.
Scale out “under the hood”
• We offer a new class of virtual supercomputers to host
Spark – we hide the complexity of scale-out “under the
hood”.
4/15/2016 TidalScale Proprietary & Confidential 5
Traditional Spark in two layers
4/15/2016 TidalScale Proprietary & Confidential 6
Programming Paradigm
RDD – Resilient Distributed Dataset / DataFrame
Parallel in-memory execution
Lazy, repeatable evaluation thanks to ”wide dependencies”
Rich set of operators beyond just Map-Reduce
Implementation Plumbing
Clusters – standalone, Mesos, Yarn
Data – HDFS, Dataframes
Memory management
Alternative Spark in two layers
4/15/2016 TidalScale Proprietary & Confidential 7
Programming Paradigm
RDD – Resilient Distributed Dataset / DataFrame
Parallel in-memory execution
Lazy, repeatable evaluation thanks to ”wide dependencies”
Rich set of operators beyond just Map-Reduce
TidalScale as alternate plumbing!
Today’s Spark cluster with multiple nodes
4/15/2016 TidalScale Proprietary & Confidential 8
Hardware
Spark Application
Cluster Manager
Operating System
OS
HW
OS
HW
OS
HW
Executor Executor Executor
Manager
Workers
Virtual Supercomputer running Spark
4/15/2016 TidalScale Proprietary & Confidential 9
Spark Application
HW HW HW…
HyperKernel HyperKernel HyperKernel
Cluster Manager
Operating System
Draws from a pool of
processors and JVMs in a single
coherent memory space.
Standard Linux,
FreeBSD, Windows
The OS sees a collection of
cores, disks, and networks in a
huge address space
A tale of two approaches
4/15/2016 TidalScale Proprietary & Confidential 10
Feature Scale out under the hood Scale out with worker nodes
Organization One super-node Cluster of worker nodes
Cross-connect 10Gb Ethernet TCP/IP
Shared variables and shuffle Across JVMs in one address space Across distinct nodes
RDD partitioning See shuffle See shuffle
Scale out Add servers “under the hood” Add servers to the cluster
Scale up Scale-out creates bigger a computer None
Reuse Run any application Other cluster techs like Hadoop
Experiment Setup
SynthBenchmark benchmark from Apache.org
• git://git.apache.org/spark.git (spark-1.6.1-bin-hadoop2.6.tgz)
• Applies the PageRank algorithm to a generated graph
• Benchmark scaled from 15GB to 150GB by number of vertices
Scale Out Spark Configuration on EC2:
• 1 Master: ec2 r3.2xlarge (8 cpus, 61G)
• 5 Workers: r3.xlarge (4 cpus, 28.5G)
• 4 Intel E5 2670 CPUs x 5 servers = 20 CPUs total allocated to Spark
Scale Up Spark Configuration on TidalScale:
• TidalScale TidalPod with 5 nodes
• 20 Intel E5 2643 v3 CPUs allocated to Spark
15-Apr-16 TidalScale Proprietary & Confidential 11
Experiment Setup
15-Apr-16 TidalScale Proprietary & Confidential 12
28.5G
Worker
28G
Driver
8 CPUs
61G
EC2 Setup: Spark Cluster with 1 Driver server & 5 Worker servers (20 worker CPUs)
4 CPUs
30.5G
28.5G
Worker
28.5G
Worker
28.5G
Worker
28.5G
Worker
4 CPUs
30.5G
4 CPUs
30.5G
4 CPUs
30.5G
4 CPUs
30.5G
Single 140G Worker
(20 CPUs)
40G
Driver
Hyper
Kernel
Hyper
Kernel
Hyper
Kernel
Hyper
Kernel
Hyper
Kernel
TidalScale Setup:
Spark Standalone Mode
- 20 worker CPUs
(Guest: 28cpu 225GB)
Experiment Setup
15-Apr-16 TidalScale Proprietary & Confidential 13
* Note: The number of edge partitions in this example have been set to a fixed constant over all size
workloads for illustrative purposes. Normal practice is to vary # of edge partitions based on workload size.
A:
EC2
Small
B:
EC2
Big
C:
TidalScale
Big
D:
TidalScale
Bigger
Cluster Configuration 10 nodes 5 nodes 1 Tidalpod 1 Tidalpod
Edge partitions * 10 10 10 20
Spark.worker.instances 10 5 1 1
Spark.worker.cores 20 20 20 20
Spark Memory per Node 10G 28G 140GB 300GB
Total Spark Memory 100GB 140GB 140GB 300GB
Experiment Results
15-Apr-16 TidalScale Proprietary & Confidential 14
A B
Experiment Results
15-Apr-16 TidalScale Proprietary & Confidential 15
A B
C (standalone mode)
Experiment Results
15-Apr-16 TidalScale Proprietary & Confidential 16
A B
C
D
Experiment Observations
Tuning Spark is complex
• We spent most of our time tuning Spark parameters
• We are not sure we’ve tuned optimally for either the ec2 spark distributed
cluster or the TS spark standalone case, but parameters were the same
in both
Choice of the number of data partitions really matters
• A suboptimal choice can have 2-3x performance impact
• We used 10 edge partitions for both ec2 and TidalScale configurations
15-Apr-16 TidalScale Proprietary & Confidential 17
Possible mixed model with multi-terabyte manager
4/15/2016 TidalScale Proprietary & Confidential 18
OS
HW
OS
HW
OS
HW
Executor Executor Executor
Super
Manager
Workers
Spark Application
HW HW HW…
HyperKerne
l
HyperKernel
HyperKerne
l
Cluster Manager
Operating System
Conclusions & Recommendations
Spark standalone on TidalScale performs similarly to a
cluster
Without TidalScale, larger workloads can run out of
memory without careful Spark tuning
We recommend using both scale up and scale out
15-Apr-16 TidalScale Proprietary & Confidential 19
Key messages – more obvious now?
A new class of virtual supercomputers to host Spark
Run multi-terabyte analytics on a single Spark node
4/15/2016 TidalScale Proprietary & Confidential 20
Value Proposition
Scale:
• Aggregates compute resources for large scale in-memory analysis and decision support
• Scales like a cluster using commodity hardware at linear cost
• Allows customers to grow gradually as their needs develop
Simplify:
• Dramatically simplifies application development
• No need to distribute work across servers
• Existing applications run as a single instance, without modification, as if on a highly flexible
mainframe
Optimize:
• Automatic dynamic hierarchical resource optimization
Evolve:
• Applicable to modern and emerging microprocessors, memories, interconnects, persistent storage
& networks
4/15/2016 TidalScale Proprietary & Confidential 21
SCALE | SIMPLIFY | OPTIMIZE | EVOLVE
4/15/2016 TidalScale Proprietary & Confidential 22
Contact: Ike Nassi
ike.nassi@tidalscale.com

Más contenido relacionado

La actualidad más candente

Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Databricks
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDistributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Databricks
 

La actualidad más candente (20)

Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
 
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
Next Generation Workshop Car Diagnostics at BMW Powered by Apache Spark with ...
 
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learning
 
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
 
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
Deep Learning Pipelines for High Energy Physics using Apache Spark with Distr...
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim DowlingDistributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
 
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
High Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloudHigh Performance Computing (HPC) in cloud
High Performance Computing (HPC) in cloud
 
Spark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark EcosystemSpark Summit 2016: Connecting Python to the Spark Ecosystem
Spark Summit 2016: Connecting Python to the Spark Ecosystem
 
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon InnovationPedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon Innovation
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and Delta
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
Deep Learning to Production with MLflow & RedisAI
Deep Learning to Production with MLflow & RedisAIDeep Learning to Production with MLflow & RedisAI
Deep Learning to Production with MLflow & RedisAI
 

Similar a Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16

Data proliferation and machine learning: The case for upgrading your servers ...
Data proliferation and machine learning: The case for upgrading your servers ...Data proliferation and machine learning: The case for upgrading your servers ...
Data proliferation and machine learning: The case for upgrading your servers ...
Principled Technologies
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld
 

Similar a Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16 (20)

Spark
SparkSpark
Spark
 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkThe Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Představení produktové řady Oracle SPARC S7
Představení produktové řady Oracle SPARC S7Představení produktové řady Oracle SPARC S7
Představení produktové řady Oracle SPARC S7
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
C5 journey to_the_cloud_with_oracle_sparc
C5 journey to_the_cloud_with_oracle_sparcC5 journey to_the_cloud_with_oracle_sparc
C5 journey to_the_cloud_with_oracle_sparc
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
 
Accelerating apache spark with rdma
Accelerating apache spark with rdmaAccelerating apache spark with rdma
Accelerating apache spark with rdma
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
Data proliferation and machine learning: The case for upgrading your servers ...
Data proliferation and machine learning: The case for upgrading your servers ...Data proliferation and machine learning: The case for upgrading your servers ...
Data proliferation and machine learning: The case for upgrading your servers ...
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Mastering OpenStack - Episode 11 - Scaling Out
Mastering OpenStack - Episode 11 - Scaling OutMastering OpenStack - Episode 11 - Scaling Out
Mastering OpenStack - Episode 11 - Scaling Out
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers
 
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
Accelerating Spark Genome Sequencing in Cloud—A Data Driven Approach, Case St...
 
Sparc SuperCluster
Sparc SuperClusterSparc SuperCluster
Sparc SuperCluster
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
 
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
VMworld 2013: Strategic Reasons for Classifying Workloads for Tier 1 Virtuali...
 

Más de MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

Más de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 

Dr. Ike Nassi, Founder, TidalScale at MLconf NYC - 4/15/16

  • 1. SCALE| SIMPLIFY| OPTIMIZE | EVOLVE 4/15/2016 TidalScale Proprietary & Confidential 1 Comparing a Virtual Supercomputer with a Cluster for Spark in-memory Computations Ike Nassi Ike.nassi@tidalscale.com
  • 2. Why Run Spark? Spark originated as in-memory alternative to Hadoop Run huge analytics on clusters of commodity servers Enjoy the hardware economy of “scale-out” Apply a rich set of transformations and actions Operate out of memory as much as possible 4/15/2016 TidalScale Proprietary & Confidential 2
  • 3. Today’s Conundrum: Scale Up vs. Scale Out? 4/15/2016 TidalScale Proprietary & Confidential 3 Scale Up Scale Out Software Simplicity HW Cost ? ✔ ✗ ✗ ✔
  • 4. TidalScale – The Best of Both 4/15/2016 TidalScale Proprietary & Confidential 4 Software Simplicity HW Cost ✔ ✔ Easy to say, but this is a ridiculously difficult problem!
  • 5. Key takeaways Simplicity of Scale up: • We allow the simplicity of scale-up – you can run multi- terabyte analytics on a single Spark node. Scale out “under the hood” • We offer a new class of virtual supercomputers to host Spark – we hide the complexity of scale-out “under the hood”. 4/15/2016 TidalScale Proprietary & Confidential 5
  • 6. Traditional Spark in two layers 4/15/2016 TidalScale Proprietary & Confidential 6 Programming Paradigm RDD – Resilient Distributed Dataset / DataFrame Parallel in-memory execution Lazy, repeatable evaluation thanks to ”wide dependencies” Rich set of operators beyond just Map-Reduce Implementation Plumbing Clusters – standalone, Mesos, Yarn Data – HDFS, Dataframes Memory management
  • 7. Alternative Spark in two layers 4/15/2016 TidalScale Proprietary & Confidential 7 Programming Paradigm RDD – Resilient Distributed Dataset / DataFrame Parallel in-memory execution Lazy, repeatable evaluation thanks to ”wide dependencies” Rich set of operators beyond just Map-Reduce TidalScale as alternate plumbing!
  • 8. Today’s Spark cluster with multiple nodes 4/15/2016 TidalScale Proprietary & Confidential 8 Hardware Spark Application Cluster Manager Operating System OS HW OS HW OS HW Executor Executor Executor Manager Workers
  • 9. Virtual Supercomputer running Spark 4/15/2016 TidalScale Proprietary & Confidential 9 Spark Application HW HW HW… HyperKernel HyperKernel HyperKernel Cluster Manager Operating System Draws from a pool of processors and JVMs in a single coherent memory space. Standard Linux, FreeBSD, Windows The OS sees a collection of cores, disks, and networks in a huge address space
  • 10. A tale of two approaches 4/15/2016 TidalScale Proprietary & Confidential 10 Feature Scale out under the hood Scale out with worker nodes Organization One super-node Cluster of worker nodes Cross-connect 10Gb Ethernet TCP/IP Shared variables and shuffle Across JVMs in one address space Across distinct nodes RDD partitioning See shuffle See shuffle Scale out Add servers “under the hood” Add servers to the cluster Scale up Scale-out creates bigger a computer None Reuse Run any application Other cluster techs like Hadoop
  • 11. Experiment Setup SynthBenchmark benchmark from Apache.org • git://git.apache.org/spark.git (spark-1.6.1-bin-hadoop2.6.tgz) • Applies the PageRank algorithm to a generated graph • Benchmark scaled from 15GB to 150GB by number of vertices Scale Out Spark Configuration on EC2: • 1 Master: ec2 r3.2xlarge (8 cpus, 61G) • 5 Workers: r3.xlarge (4 cpus, 28.5G) • 4 Intel E5 2670 CPUs x 5 servers = 20 CPUs total allocated to Spark Scale Up Spark Configuration on TidalScale: • TidalScale TidalPod with 5 nodes • 20 Intel E5 2643 v3 CPUs allocated to Spark 15-Apr-16 TidalScale Proprietary & Confidential 11
  • 12. Experiment Setup 15-Apr-16 TidalScale Proprietary & Confidential 12 28.5G Worker 28G Driver 8 CPUs 61G EC2 Setup: Spark Cluster with 1 Driver server & 5 Worker servers (20 worker CPUs) 4 CPUs 30.5G 28.5G Worker 28.5G Worker 28.5G Worker 28.5G Worker 4 CPUs 30.5G 4 CPUs 30.5G 4 CPUs 30.5G 4 CPUs 30.5G Single 140G Worker (20 CPUs) 40G Driver Hyper Kernel Hyper Kernel Hyper Kernel Hyper Kernel Hyper Kernel TidalScale Setup: Spark Standalone Mode - 20 worker CPUs (Guest: 28cpu 225GB)
  • 13. Experiment Setup 15-Apr-16 TidalScale Proprietary & Confidential 13 * Note: The number of edge partitions in this example have been set to a fixed constant over all size workloads for illustrative purposes. Normal practice is to vary # of edge partitions based on workload size. A: EC2 Small B: EC2 Big C: TidalScale Big D: TidalScale Bigger Cluster Configuration 10 nodes 5 nodes 1 Tidalpod 1 Tidalpod Edge partitions * 10 10 10 20 Spark.worker.instances 10 5 1 1 Spark.worker.cores 20 20 20 20 Spark Memory per Node 10G 28G 140GB 300GB Total Spark Memory 100GB 140GB 140GB 300GB
  • 14. Experiment Results 15-Apr-16 TidalScale Proprietary & Confidential 14 A B
  • 15. Experiment Results 15-Apr-16 TidalScale Proprietary & Confidential 15 A B C (standalone mode)
  • 16. Experiment Results 15-Apr-16 TidalScale Proprietary & Confidential 16 A B C D
  • 17. Experiment Observations Tuning Spark is complex • We spent most of our time tuning Spark parameters • We are not sure we’ve tuned optimally for either the ec2 spark distributed cluster or the TS spark standalone case, but parameters were the same in both Choice of the number of data partitions really matters • A suboptimal choice can have 2-3x performance impact • We used 10 edge partitions for both ec2 and TidalScale configurations 15-Apr-16 TidalScale Proprietary & Confidential 17
  • 18. Possible mixed model with multi-terabyte manager 4/15/2016 TidalScale Proprietary & Confidential 18 OS HW OS HW OS HW Executor Executor Executor Super Manager Workers Spark Application HW HW HW… HyperKerne l HyperKernel HyperKerne l Cluster Manager Operating System
  • 19. Conclusions & Recommendations Spark standalone on TidalScale performs similarly to a cluster Without TidalScale, larger workloads can run out of memory without careful Spark tuning We recommend using both scale up and scale out 15-Apr-16 TidalScale Proprietary & Confidential 19
  • 20. Key messages – more obvious now? A new class of virtual supercomputers to host Spark Run multi-terabyte analytics on a single Spark node 4/15/2016 TidalScale Proprietary & Confidential 20
  • 21. Value Proposition Scale: • Aggregates compute resources for large scale in-memory analysis and decision support • Scales like a cluster using commodity hardware at linear cost • Allows customers to grow gradually as their needs develop Simplify: • Dramatically simplifies application development • No need to distribute work across servers • Existing applications run as a single instance, without modification, as if on a highly flexible mainframe Optimize: • Automatic dynamic hierarchical resource optimization Evolve: • Applicable to modern and emerging microprocessors, memories, interconnects, persistent storage & networks 4/15/2016 TidalScale Proprietary & Confidential 21
  • 22. SCALE | SIMPLIFY | OPTIMIZE | EVOLVE 4/15/2016 TidalScale Proprietary & Confidential 22 Contact: Ike Nassi ike.nassi@tidalscale.com

Notas del editor

  1. I’m here to present a different approach to large-scale, in-memory computations. Ike’s contact is on the last slide
  2. With a little more time (25 mins.) we can set the context and thereby frame our context. Scale-out – not much for me to add to Ike… Spark has some 80 transformations (within a partition) and actions (often across partitions) that greatly enhance the original MapReduce of tools like Hadoop
  3. It may seem sacrilegious (you’ll find your word) to address a group of Spark enthusiasts on the theme of a single huge node, but it’s a different way of thinking about the method The remainder of this talk is about this different approach …which we claim is completely in line with Spark’s direction
  4. We like Spark and actively support the technology. We think it’s useful to distinguish the powerful programming paradigm from the underlying implementation. Our message is that there are different ways to achieve the same end. [It may be a red herring here, but the power of the “80 operators” applied to RDDs is what makes Spark cool. This talk may not want to explore that.]
  5. We like Spark and actively support the technology. We think it’s useful to distinguish the powerful programming paradigm from the underlying implementation. Our message is that there are different ways to achieve the same end. [It may be a red herring here, but the power of the “80 operators” applied to RDDs is what makes Spark cool. This talk may not want to explore that.]
  6. Here’s a stock Spark diagram… The “driver program” runs on the manager, dispatching tasks to the executors.
  7. [We try to give the generic message without being to Sales-y, YET.]
  8. The moral of this table is that you can have your cake (parallelization with in-memory processing) and eat it (solve non-Spark problems), too. The issue is where to scale out -- under the hood, invisibly to the operating system; or at the server level, over a network. Spark shares variables and shuffles data across partitions – a key performance issue The punchline is that when you scale up you get the benefit of reuse, the opportunity to run any demanding application, which is beneficial for experimentation.
  9. You can show this slide or just talk through these points while showing the next slide: SynthBenchmark benchmark from Apache.org git://git.apache.org/spark.git (spark-1.6.1-bin-hadoop2.6.tgz) Applies the PageRank algorithm to a generated graph Benchmark scaled from 15GB to 150GB by number of vertices Scale Out Spark Configuration on EC2: 1 Master: ec2 r3.2xlarge (8 cpus, 61G) 5 Workers: r3.xlarge (4 cpus, 28.5G) 4 Intel E5 2370 CPUs x 5 servers = 20 CPUs total allocated to Spark Scale Up Spark Configuration on TidalScale: TidalScale TidalPod with 5 nodes 20 Intel E5 2343 CPUs allocated to Spark
  10. Note: TidalPod is booted with 224GB total – 200GB for spark and 24 for the OS. This means each physical node is hosting 45GB of the guest OS.
  11. We ran the PageRank workloads in four tests: A “EC2 Small” - 10 node EC2 using 15GB servers (total spark memory = 100GB) B “EC2 Big” - 5 node EC2 using 31GB servers (total spark memory = 140GB) C “TidalScale Big” - 5 node TidalPod with hardware equivalent to B D “TidalScale Bigger” – 5 node TidalPod booted at 2.5TB B “EC2 Big” and C “TidalScale Big” are the two to compare directly.
  12. Time in seconds on the Y axis, size of workload in memory on the Y axis. This is log-log on both axis These two lines show the effect of more sharding – the 10 node EC2 config is slower than the 5 node EC2 config At the larger sizes the jobs fail with Out of Memory errors on the worker nodes (denoted by the red box on each line).
  13. The TidalPod “Big” result (“Big” meaning case C, configured with equivalent HW to the EC2 5 nodes config). This shows similar performance between the two 5 node EC2 and TidalScale configurations (“B” – “EC2 Big” versus “C” – “TidalScale Big”).
  14. For fun we tested a larger TidalScale single spark instance to see if we could get further up the workload size – the config we show here is a 400G spark worker on very large tidalpod with 20 edge partitions instead of 10. The shape of the performance result line is different because of the effect of the greater number of edge partitions. The job does NOT fail because of Out of Memory but because of another spark standalone mode issue (according to one forum some bug).
  15. Tuning is time consuming TidalScale can help you address Out of Memory challenges!
  16. We’re committed to big data analytics as carried out in a variety of environments More memory can expedite the assimilation of data from the workers A more extreme example has virtual supercomputers for worker nodes
  17. We give you flexibility in how you deploy node size in your spark applications
  18. [Repeat the mantra in slightly different form to reinforce the message.]
  19. Given the Spark context, here are some ground rules. We see huge opportunity in the 80% solution up to 15TB. We’ll talk at the end about the realm of hundreds of terabytes and challenge problems. One the rules is to maintain the economy of scale-out. A multi-million dollar HPC-class machine is another conversation. Goals that we’ve added to the discussion are simplicity of deployment and use, especially for one-off experiments, but also the versatility to support different problems that arise.
  20. [I removed many words for readability. Still too many but the point isn’t to read every one.] Our work here is the outcome of years of development
  21. This is the punchline of my talk TidalScale technology is where scale-up meets scale-out Spark provides an excellent, if at first surprising, context for this conversation Spark is migrating from its original model of multiple JVMs on distributed machines …to a more bare-metal approach of JIT compiled code operating on memory allocated C-style