In last decade, data analytics have rapidly progressed from traditional disk-based processing to modern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of work-loads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of worktime inflation.
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server
1. 1
Performance Characterization of In-Memory
Data Analytics on a Modern Cloud Server
Ahsan Javed Awan
EMJD-DC (KTH-UPC)
(https://www.kth.se/profile/ajawan/)
Mats Brorsson(KTH), Eduard Ayguade(UPC and BSC),
Vladimir Vlassov(KTH)
9. 9
Multicore Scalability of Spark
Application Level Performance
Spark scales poorly in Scale-up configuration
10. 10
Multicore Scalability of Spark
Stage Level Performance
● Shuffle Map Stages don't scale beyond 12 threads
across different workloads
● No of concurrent files open in Map-side shuffling is
C*R where C is no of threads in executor pool and R is
no of reduce tasks
11. 11
Multicore Scalability of Spark
Task Level Performance
Percentage increase in Area Under the Curve compared to 1-thread
19. 19
Key Findings
● More than 12 threads in an executor pool does not yield significant
performance
● Spark runtime system need to be improved to provide better load
balancing and avoid work-time inflation.
● Work time inflation and load imbalance on the threads are the
scalability bottlenecks.
● Removing the bottlenecks in the front-end of the processor would
not remove more than 20% of stalls.
● Effort should be focused on removing the memory bound stalls
since they account for up to 72% of stalls in the pipeline slots.
● Memory bandwidth of current processors is sufficient for in-
memory data analytics
20. 20
Motivation
Future Directions
NUMA Aware Task Scheduling
Cache Aware Transformations
Exploiting Processing In Memory Architectures
HW/SW Data Prefetching
Rethinking Memory Architectures
21. 21
Our Approach
● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade,
“Performance charaterization of in-memory data analytics on a
mordern cloud server,” in 5th
International IEEE Conference on
Big Data and Cloud Computing, 2015.
● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “How
Data Volume Affects Spark Based Data Analytics on a Scale-up
Server”, in 6th
International Workshop on Big Data
Benchmarking, Performance Optimization and Emerging
Hardware (BpoE) held in conjunction with VLDB, 2015.
What are the major bottlenecks??
33. 33
Key Findings
● Spark workloads do not benefit significantly from executors with
more than 12 cores.
● The performance of Spark workloads degrades with large volumes
of data due to substantial increase in garbage collection and file
I/O time.
● With out any tuning, Parallel Scavenge garbage collection scheme
outperforms Concurrent Mark Sweep and G1 garbage collectors
for Spark workloads.
● Spark workloads exhibit improved instruction retirement due to
lower L1 cache misses and better utilization of functional units
inside cores at large volumes of data.
● Memory bandwidth utilization of Spark benchmarks decreases
with large volumes of data and is 3x lower than the available off-
chip bandwidth on our test machine
34. 34
Our Approach
● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade,
“Performance charaterization of in-memory data analytics on a
mordern cloud server,” in 5th
International IEEE Conference on
Big Data and Cloud Computing, 2015.
● A. J. Awan, M. Brorsson, V. Vlassov, and E. Ayguade, “How
Data Volume Affects Spark Based Data Analytics on a Scale-up
Server”, in 6th
International Workshop on Big Data
Benchmarking, Performance Optimization and Emerging
Hardware (BpoE) held in conjunction with VLDB, 2015.
What are the major bottlenecks??
35. 35
Motivation
Future Directions
NUMA Aware Task Scheduling
Cache Aware Transformations
Exploiting Processing In Memory Architectures
HW/SW Data Prefectching
Rethinking Memory Architectures