SlideShare una empresa de Scribd logo
1 de 35
1
Performance Characterization of In-Memory
Data Analytics on a Modern Cloud Server
Ahsan Javed Awan
EMJD-DC (KTH-UPC)
(https://www.kth.se/profile/ajawan/)
Mats Brorsson(KTH), Eduard Ayguade(UPC and BSC),
Vladimir Vlassov(KTH)
2
Motivation
Why should we care?
*Source: SGI
3
Motivation
Cont...
Our FocusOur Focus
Improve the single node performance
in scale-out configuration
*Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/
Phoenix ++,
Metis, Ostrich,
etc..
Hadoop, Spark,
Flink, etc..
4
Progress Meeting 12-12-14
Which Scale-out Framework ?
[Picture Courtesy: Amir H. Payberah]
5
Our Approach
● A three fold analysis method at Application, Thread and Micro-
architectural level
– Tuning of Spark internal Parameters
– Tuning of JVM Parameters (Heap size etc..)
– Thread Level Analysis using Intel Vtune
– Micro-architecture Level Analysis using Hardware
Performance Counters
Methodology
6
Our Approach
Benchmarks
3GB of Wikipedia raw datasets, Amazon Movies Reviews and
numerical records have been used
7
Our Hardware Configuration
Machine Details
Hyper Threading and Turbo Boost is disabled
Hyper Threading and Turbo-boost are disabled
8
Our Approach
System Configuration
9
Multicore Scalability of Spark
Application Level Performance
Spark scales poorly in Scale-up configuration
10
Multicore Scalability of Spark
Stage Level Performance
● Shuffle Map Stages don't scale beyond 12 threads
across different workloads
● No of concurrent files open in Map-side shuffling is
C*R where C is no of threads in executor pool and R is
no of reduce tasks
11
Multicore Scalability of Spark
Task Level Performance
Percentage increase in Area Under the Curve compared to 1-thread
12
Is there thread level load imbalance ??
13
CPU Utilization is not
scaling with performance
14
Is there any Work Time
Inflation ??
15
How does Micro-architecture contribute to
Work time inflation ??
16
Cont...
17
Cont...
18
Is Memory Bandwidth a bottleneck??
19
Key Findings
● More than 12 threads in an executor pool does not yield significant
performance
● Spark runtime system need to be improved to provide better load
balancing and avoid work-time inflation.
● Work time inflation and load imbalance on the threads are the
scalability bottlenecks.
● Removing the bottlenecks in the front-end of the processor would
not remove more than 20% of stalls.
● Effort should be focused on removing the memory bound stalls
since they account for up to 72% of stalls in the pipeline slots.
● Memory bandwidth of current processors is sufficient for in-
memory data analytics
20
Motivation
Future Directions
NUMA Aware Task Scheduling
Cache Aware Transformations
Exploiting Processing In Memory Architectures
HW/SW Data Prefetching
Rethinking Memory Architectures
21
Our Approach
● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade,
“Performance charaterization of in-memory data analytics on a
mordern cloud server,” in 5th
International IEEE Conference on
Big Data and Cloud Computing, 2015.
● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “How
Data Volume Affects Spark Based Data Analytics on a Scale-up
Server”, in 6th
International Workshop on Big Data
Benchmarking, Performance Optimization and Emerging
Hardware (BpoE) held in conjunction with VLDB, 2015.
What are the major bottlenecks??
22
Motivation
How Data Volume Affects Spark Based Data Analytics on a
Scale-up Server
23
Motivation
Do Spark based data analytics benefit from using larger
scale-up servers
24
Motivation
Is GC detrimental to scalability of Spark applications?
25
Motivation
How does performance scale with data volume ?
26
Motivation
Does GC time scale linearly with Data Volume ??
27
Motivation
How does CPU utilization scale with data volume ?
28
Motivation
Is File I/O detrimental to performance ?
29
Motivation
How does data size affects micro-architectural
performance ?
30
Motivation
How Data Volume Affects Spark Based Data Analytics on a
Scale-up Server
31
Motivation
Cont..
32
Motivation
Cont..
33
Key Findings
● Spark workloads do not benefit significantly from executors with
more than 12 cores.
● The performance of Spark workloads degrades with large volumes
of data due to substantial increase in garbage collection and file
I/O time.
● With out any tuning, Parallel Scavenge garbage collection scheme
outperforms Concurrent Mark Sweep and G1 garbage collectors
for Spark workloads.
● Spark workloads exhibit improved instruction retirement due to
lower L1 cache misses and better utilization of functional units
inside cores at large volumes of data.
● Memory bandwidth utilization of Spark benchmarks decreases
with large volumes of data and is 3x lower than the available off-
chip bandwidth on our test machine
34
Our Approach
● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade,
“Performance charaterization of in-memory data analytics on a
mordern cloud server,” in 5th
International IEEE Conference on
Big Data and Cloud Computing, 2015.
● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “How
Data Volume Affects Spark Based Data Analytics on a Scale-up
Server”, in 6th
International Workshop on Big Data
Benchmarking, Performance Optimization and Emerging
Hardware (BpoE) held in conjunction with VLDB, 2015.
What are the major bottlenecks??
35
Motivation
Future Directions
NUMA Aware Task Scheduling
Cache Aware Transformations
Exploiting Processing In Memory Architectures
HW/SW Data Prefectching
Rethinking Memory Architectures

Más contenido relacionado

La actualidad más candente

An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEWShiyong Lu
 
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...Institute of Information Systems (HES-SO)
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitGanesan Narayanasamy
 
Big Data and its emergence
Big Data and its emergenceBig Data and its emergence
Big Data and its emergencekoolkalpz
 
Hadoop and Netezza - Co-existence or Competition?
Hadoop and Netezza - Co-existence or Competition?Hadoop and Netezza - Co-existence or Competition?
Hadoop and Netezza - Co-existence or Competition?Krishnan Parasuraman
 
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...Facultad de Informática UCM
 
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...CitiusTech
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesMapR Technologies
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudIJAAS Team
 
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...Cloudera, Inc.
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluationGIORGOS STAMELOS
 
Real time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationReal time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationLeMeniz Infotech
 
IBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deploymentsIBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deploymentsthinkASG
 
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...ijcsit
 
Zaitsev Schooner Webinar 6 25 V1
Zaitsev Schooner Webinar 6 25 V1Zaitsev Schooner Webinar 6 25 V1
Zaitsev Schooner Webinar 6 25 V1guest94b9c8
 

La actualidad más candente (20)

An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
MUSYOP: Towards a Query Optimization for Heterogeneous Distributed Database S...
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on Summit
 
Big Data and its emergence
Big Data and its emergenceBig Data and its emergence
Big Data and its emergence
 
Hadoop and Netezza - Co-existence or Competition?
Hadoop and Netezza - Co-existence or Competition?Hadoop and Netezza - Co-existence or Competition?
Hadoop and Netezza - Co-existence or Competition?
 
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
 
Big Data Benchmarking
Big Data BenchmarkingBig Data Benchmarking
Big Data Benchmarking
 
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
De-duplicated Refined Zone in Healthcare Data Lake Using Big Data Processing ...
 
CDSS
CDSSCDSS
CDSS
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
Data Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with CloudData Partitioning in Mongo DB with Cloud
Data Partitioning in Mongo DB with Cloud
 
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
Hadoop World 2011: Hadoop and Netezza Deployment Models and Case Study - Kris...
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluation
 
Real time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationReal time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing application
 
IBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deploymentsIBM POWER - An ideal platform for scale-out deployments
IBM POWER - An ideal platform for scale-out deployments
 
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN S...
 
Zaitsev Schooner Webinar 6 25 V1
Zaitsev Schooner Webinar 6 25 V1Zaitsev Schooner Webinar 6 25 V1
Zaitsev Schooner Webinar 6 25 V1
 

Destacado

Giveyourmessgae1125
Giveyourmessgae1125Giveyourmessgae1125
Giveyourmessgae1125예슬 이
 
Frivilligarbete, Humak
Frivilligarbete, HumakFrivilligarbete, Humak
Frivilligarbete, Humakhumak_tki
 
Medborgarorganisationer
MedborgarorganisationerMedborgarorganisationer
Medborgarorganisationerhumak_tki
 
China construction coating industry market demand forecast and investment str...
China construction coating industry market demand forecast and investment str...China construction coating industry market demand forecast and investment str...
China construction coating industry market demand forecast and investment str...Qianzhan Intelligence
 
China quality inspection industry development prospects and investment foreca...
China quality inspection industry development prospects and investment foreca...China quality inspection industry development prospects and investment foreca...
China quality inspection industry development prospects and investment foreca...Qianzhan Intelligence
 
Evaluation: UNDP Armenia's Approach to Innovation
Evaluation: UNDP Armenia's Approach to InnovationEvaluation: UNDP Armenia's Approach to Innovation
Evaluation: UNDP Armenia's Approach to InnovationGeorge Hodge
 
China jewelry industry consumption demand and market competition and investme...
China jewelry industry consumption demand and market competition and investme...China jewelry industry consumption demand and market competition and investme...
China jewelry industry consumption demand and market competition and investme...Qianzhan Intelligence
 
Portafolio de matematicas
Portafolio de matematicasPortafolio de matematicas
Portafolio de matematicasgingerestefania
 
Ronaldo’s mediterrean trip
Ronaldo’s mediterrean tripRonaldo’s mediterrean trip
Ronaldo’s mediterrean tripperaltamultiage
 
China construction quality testing industry market forecast and competition s...
China construction quality testing industry market forecast and competition s...China construction quality testing industry market forecast and competition s...
China construction quality testing industry market forecast and competition s...Qianzhan Intelligence
 
Postman pat textual analysis
Postman pat textual analysisPostman pat textual analysis
Postman pat textual analysischloeharrisoon
 
Pemahaman atas nama ar rahman
Pemahaman atas nama ar rahmanPemahaman atas nama ar rahman
Pemahaman atas nama ar rahmanLuthfi Dhani
 
Q5: How did you attract/address your audience?
Q5: How did you attract/address your audience?Q5: How did you attract/address your audience?
Q5: How did you attract/address your audience?chloeharrisoon
 
China high end equipment manufacturing park development pattern and investmen...
China high end equipment manufacturing park development pattern and investmen...China high end equipment manufacturing park development pattern and investmen...
China high end equipment manufacturing park development pattern and investmen...Qianzhan Intelligence
 
China coal industry development trend and investment strategic decision repor...
China coal industry development trend and investment strategic decision repor...China coal industry development trend and investment strategic decision repor...
China coal industry development trend and investment strategic decision repor...Qianzhan Intelligence
 
I Guanciali: la scelta
I Guanciali: la sceltaI Guanciali: la scelta
I Guanciali: la sceltaDorelan
 
China engineering consultation industry development prospects and investment ...
China engineering consultation industry development prospects and investment ...China engineering consultation industry development prospects and investment ...
China engineering consultation industry development prospects and investment ...Qianzhan Intelligence
 

Destacado (20)

Giveyourmessgae1125
Giveyourmessgae1125Giveyourmessgae1125
Giveyourmessgae1125
 
Frivilligarbete, Humak
Frivilligarbete, HumakFrivilligarbete, Humak
Frivilligarbete, Humak
 
Medborgarorganisationer
MedborgarorganisationerMedborgarorganisationer
Medborgarorganisationer
 
Duo pres
Duo presDuo pres
Duo pres
 
China construction coating industry market demand forecast and investment str...
China construction coating industry market demand forecast and investment str...China construction coating industry market demand forecast and investment str...
China construction coating industry market demand forecast and investment str...
 
China quality inspection industry development prospects and investment foreca...
China quality inspection industry development prospects and investment foreca...China quality inspection industry development prospects and investment foreca...
China quality inspection industry development prospects and investment foreca...
 
Evaluation: UNDP Armenia's Approach to Innovation
Evaluation: UNDP Armenia's Approach to InnovationEvaluation: UNDP Armenia's Approach to Innovation
Evaluation: UNDP Armenia's Approach to Innovation
 
China jewelry industry consumption demand and market competition and investme...
China jewelry industry consumption demand and market competition and investme...China jewelry industry consumption demand and market competition and investme...
China jewelry industry consumption demand and market competition and investme...
 
Portafolio de matematicas
Portafolio de matematicasPortafolio de matematicas
Portafolio de matematicas
 
Ronaldo’s mediterrean trip
Ronaldo’s mediterrean tripRonaldo’s mediterrean trip
Ronaldo’s mediterrean trip
 
Problem 3
Problem 3Problem 3
Problem 3
 
China construction quality testing industry market forecast and competition s...
China construction quality testing industry market forecast and competition s...China construction quality testing industry market forecast and competition s...
China construction quality testing industry market forecast and competition s...
 
Postman pat textual analysis
Postman pat textual analysisPostman pat textual analysis
Postman pat textual analysis
 
Pemahaman atas nama ar rahman
Pemahaman atas nama ar rahmanPemahaman atas nama ar rahman
Pemahaman atas nama ar rahman
 
Q5: How did you attract/address your audience?
Q5: How did you attract/address your audience?Q5: How did you attract/address your audience?
Q5: How did you attract/address your audience?
 
China high end equipment manufacturing park development pattern and investmen...
China high end equipment manufacturing park development pattern and investmen...China high end equipment manufacturing park development pattern and investmen...
China high end equipment manufacturing park development pattern and investmen...
 
China coal industry development trend and investment strategic decision repor...
China coal industry development trend and investment strategic decision repor...China coal industry development trend and investment strategic decision repor...
China coal industry development trend and investment strategic decision repor...
 
I Guanciali: la scelta
I Guanciali: la sceltaI Guanciali: la scelta
I Guanciali: la scelta
 
China engineering consultation industry development prospects and investment ...
China engineering consultation industry development prospects and investment ...China engineering consultation industry development prospects and investment ...
China engineering consultation industry development prospects and investment ...
 
me
meme
me
 

Similar a Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server

Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...Ahsan Javed Awan
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Ahsan Javed Awan
 
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesAhsan Javed Awan
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkAhsan Javed Awan
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Spark Summit
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in ClustersNode Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in ClustersAhsan Javed Awan
 
Altitude SF 2017: Granular, Precached, & Under Budget
Altitude SF 2017: Granular, Precached, & Under BudgetAltitude SF 2017: Granular, Precached, & Under Budget
Altitude SF 2017: Granular, Precached, & Under BudgetFastly
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...In-Memory Computing Summit
 
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_ComputingIntroduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_ComputingYanpingWang
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLDESMOND YUEN
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingeSAT Journals
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RIRJET Journal
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 

Similar a Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server (20)

Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
Micro-architectural Characterization of Apache Spark on Batch and Stream Proc...
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed Awan
 
Boosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of TechniquesBoosting spark performance: An Overview of Techniques
Boosting spark performance: An Overview of Techniques
 
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache SparkNear Data Computing Architectures: Opportunities and Challenges for Apache Spark
Near Data Computing Architectures: Opportunities and Challenges for Apache Spark
 
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
Near Data Computing Architectures for Apache Spark: Challenges and Opportunit...
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in ClustersNode Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
 
Altitude SF 2017: Granular, Precached, & Under Budget
Altitude SF 2017: Granular, Precached, & Under BudgetAltitude SF 2017: Granular, Precached, & Under Budget
Altitude SF 2017: Granular, Precached, & Under Budget
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
IMC Summit 2016 Breakout - Yanping Wang - Non-volatile Generic Object Program...
 
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_ComputingIntroduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
Introduce_non-volatile_generic_object_programming_model_for_In-Memory_Computing
 
FULLTEXT02
FULLTEXT02FULLTEXT02
FULLTEXT02
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
An octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passingAn octa core processor with shared memory and message-passing
An octa core processor with shared memory and message-passing
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 

Último

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 

Último (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server

  • 1. 1 Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server Ahsan Javed Awan EMJD-DC (KTH-UPC) (https://www.kth.se/profile/ajawan/) Mats Brorsson(KTH), Eduard Ayguade(UPC and BSC), Vladimir Vlassov(KTH)
  • 2. 2 Motivation Why should we care? *Source: SGI
  • 3. 3 Motivation Cont... Our FocusOur Focus Improve the single node performance in scale-out configuration *Source: http://navcode.info/2012/12/24/cloud-scaling-schemes/ Phoenix ++, Metis, Ostrich, etc.. Hadoop, Spark, Flink, etc..
  • 4. 4 Progress Meeting 12-12-14 Which Scale-out Framework ? [Picture Courtesy: Amir H. Payberah]
  • 5. 5 Our Approach ● A three fold analysis method at Application, Thread and Micro- architectural level – Tuning of Spark internal Parameters – Tuning of JVM Parameters (Heap size etc..) – Thread Level Analysis using Intel Vtune – Micro-architecture Level Analysis using Hardware Performance Counters Methodology
  • 6. 6 Our Approach Benchmarks 3GB of Wikipedia raw datasets, Amazon Movies Reviews and numerical records have been used
  • 7. 7 Our Hardware Configuration Machine Details Hyper Threading and Turbo Boost is disabled Hyper Threading and Turbo-boost are disabled
  • 9. 9 Multicore Scalability of Spark Application Level Performance Spark scales poorly in Scale-up configuration
  • 10. 10 Multicore Scalability of Spark Stage Level Performance ● Shuffle Map Stages don't scale beyond 12 threads across different workloads ● No of concurrent files open in Map-side shuffling is C*R where C is no of threads in executor pool and R is no of reduce tasks
  • 11. 11 Multicore Scalability of Spark Task Level Performance Percentage increase in Area Under the Curve compared to 1-thread
  • 12. 12 Is there thread level load imbalance ??
  • 13. 13 CPU Utilization is not scaling with performance
  • 14. 14 Is there any Work Time Inflation ??
  • 15. 15 How does Micro-architecture contribute to Work time inflation ??
  • 18. 18 Is Memory Bandwidth a bottleneck??
  • 19. 19 Key Findings ● More than 12 threads in an executor pool does not yield significant performance ● Spark runtime system need to be improved to provide better load balancing and avoid work-time inflation. ● Work time inflation and load imbalance on the threads are the scalability bottlenecks. ● Removing the bottlenecks in the front-end of the processor would not remove more than 20% of stalls. ● Effort should be focused on removing the memory bound stalls since they account for up to 72% of stalls in the pipeline slots. ● Memory bandwidth of current processors is sufficient for in- memory data analytics
  • 20. 20 Motivation Future Directions NUMA Aware Task Scheduling Cache Aware Transformations Exploiting Processing In Memory Architectures HW/SW Data Prefetching Rethinking Memory Architectures
  • 21. 21 Our Approach ● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “Performance charaterization of in-memory data analytics on a mordern cloud server,” in 5th International IEEE Conference on Big Data and Cloud Computing, 2015. ● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “How Data Volume Affects Spark Based Data Analytics on a Scale-up Server”, in 6th International Workshop on Big Data Benchmarking, Performance Optimization and Emerging Hardware (BpoE) held in conjunction with VLDB, 2015. What are the major bottlenecks??
  • 22. 22 Motivation How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
  • 23. 23 Motivation Do Spark based data analytics benefit from using larger scale-up servers
  • 24. 24 Motivation Is GC detrimental to scalability of Spark applications?
  • 25. 25 Motivation How does performance scale with data volume ?
  • 26. 26 Motivation Does GC time scale linearly with Data Volume ??
  • 27. 27 Motivation How does CPU utilization scale with data volume ?
  • 28. 28 Motivation Is File I/O detrimental to performance ?
  • 29. 29 Motivation How does data size affects micro-architectural performance ?
  • 30. 30 Motivation How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
  • 33. 33 Key Findings ● Spark workloads do not benefit significantly from executors with more than 12 cores. ● The performance of Spark workloads degrades with large volumes of data due to substantial increase in garbage collection and file I/O time. ● With out any tuning, Parallel Scavenge garbage collection scheme outperforms Concurrent Mark Sweep and G1 garbage collectors for Spark workloads. ● Spark workloads exhibit improved instruction retirement due to lower L1 cache misses and better utilization of functional units inside cores at large volumes of data. ● Memory bandwidth utilization of Spark benchmarks decreases with large volumes of data and is 3x lower than the available off- chip bandwidth on our test machine
  • 34. 34 Our Approach ● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “Performance charaterization of in-memory data analytics on a mordern cloud server,” in 5th International IEEE Conference on Big Data and Cloud Computing, 2015. ● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “How Data Volume Affects Spark Based Data Analytics on a Scale-up Server”, in 6th International Workshop on Big Data Benchmarking, Performance Optimization and Emerging Hardware (BpoE) held in conjunction with VLDB, 2015. What are the major bottlenecks??
  • 35. 35 Motivation Future Directions NUMA Aware Task Scheduling Cache Aware Transformations Exploiting Processing In Memory Architectures HW/SW Data Prefectching Rethinking Memory Architectures