SlideShare a Scribd company logo
1 of 22
Download to read offline
Benchmarking Virtualized
Hadoop Clusters
Todor Ivanov, Roberto V. Zicari
Big Data Lab, Goethe University Frankfurt
Alejandro Buchmann
Database and Distributed Systems, TU Darmstadt
15th Workshop on Big Data Benchmarking 2014
Outline
• Virtualizing Hadoop
• Measuring Performance
– Iterative Experimental Approach
– Platform Setup
– Experiments
– Summary of Results
• Lessons Learned
• Next Steps
5th Workshop on Big Data Benchmarking 2014 2
Virtualizing Hadoop
• Motivation
– Hadoop-as-a-service (e.g. Amazon Elastic Map Reduce)
– Automated deployment and cost-effective management
– Dynamically scalable cluster size (e.g. # of nodes, resource allocation)
• Challenges
– I/O overhead
– Network overhead (message communication and data transfer)
• Related Work: virtualized vs. physical Hadoop
 Virtualized Hadoop has an estimated overhead ranging between 2-10%
(reported in [1], [2], [3])
5th Workshop on Big Data Benchmarking 2014 3
[1] Buell, J.: A Benchmarking Case Study of Virtualized Hadoop Performance on VMware vSphere 5.
Tech. White Pap. VMware Inc. (2011).
[2] Buell, J.: Virtualized Hadoop Performance with VMware vSphere ®5.1. Tech. White Pap. VMware Inc. (2013).
[3] Microsoft: Performance of Hadoop on Windows in Hyper-V Environments. Tech. White Pap. Microsoft. (2013).
Objectives of Our Research
Investigate and compare the performance between
standard and separated data-compute cluster configurations.
• How does the application performance change on a data-compute
cluster?
• What type of applications are more suitable for data-compute clusters?
5th Workshop on Big Data Benchmarking 2014 4
Standard
Cluster Data-Compute
Cluster
Methodology:
Iterative Experimental Approach
I. Choose a Big Data
Benchmark
II. Configure
Hadoop Cluster
III. Perform
Experiments
IV. Evaluate
Results
5th Workshop on Big Data Benchmarking 2014 5
Step I: Intel HiBench
• Benchmark suite for Hadoop (developed by Intel in 2010) (Huang et al. [4])
• 4 categories, 10 workloads & 3 types
• Metrics: Time (Sec) & Throughput (Bytes/Sec)
Category No Workload Tools Type
Micro Benchmarks
1 Sort MapReduce IO Bound
2 WordCount MapReduce CPU Bound
3 TeraSort MapReduce Mixed
4 TestDFSIOEnhanced MapReduce IO Bound
Web Search
5 Nutch Indexing Nutch, Lucene Mixed
6 Page Rank Pegasus Mixed
Machine Learning
7 Bayesian Classification Mahout Mixed
8 K-means Clustering Mahout Mixed
Analytical Query
9 Join Hive Mixed
10 Aggregation Hive Mixed
5th Workshop on Big Data Benchmarking 2014 6
[4] Huang, S. et al.: The HiBench benchmark suite: Characterization of the MapReduce-based data analysis.
Data Engineering Workshops (ICDEW), 2010
Step II: Platform Setup
• Platform layer (Hadoop Cluster)
– vSphere Big Data Extension integrating Serengeti Server (version 1.0)
– VM template hosting CentOS
– Apache Hadoop (version 1.2.1) with default parameters:
• 200MB Java Heap size
• 64MB block size
• 3 replication factor
• Management layer (Virtualization)
– VMWare vSphere 5.1
– ESXi and vCenter Servers
• Hardware layer - Dell PowerEdge T420 server
– 2 x Intel Xeon E5-2420 (1.9 GHz), 6 core CPUs
– 32GB RAM
– 4 x 1 TB, WD SATA disks
Hardware
Management (Virtualization)
Application (HiBench Benchmark)
Platform (Hadoop Cluster)
CPUs Memory Storage
5th Workshop on Big Data Benchmarking 2014 7
(Known) Limitations
• Single physical server (no physical network)
• VMWare ESXi server hypervisor
• Testing with default configurations (Serengeti & Hadoop)
• Time constraints:
– Input data sizes: 10/20/50GB
– 3 test repetitions
5th Workshop on Big Data Benchmarking 2014 8
Step II: Comparison Factors
The number of utilized VMs in the compared clusters should
be equal.
• Each additional VM increases the hypervisor overhead
(reported in [2], [5], [6])
• Utilizing more VMs may improve the overall system
performance [2]
The utilized hardware resources in a cluster should be equal.
5th Workshop on Big Data Benchmarking 2014 9
[2] Buell, J.: Virtualized Hadoop Performance with VMware vSphere ®5.1. Tech. White Pap. VMware Inc. (2013).
[5] Li, J. et al.: Performance Overhead Among Three Hypervisors: An Experimental Study using Hadoop Benchmarks.
Big Data (BigData Congress), 2013
[6] Ye, K. et al.: vHadoop: A Scalable Hadoop Virtual Cluster Platform for MapReduce-Based Parallel Machine Learning with
Performance Consideration. Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012
Step II: Comparison Standard1/Data-
Compute1
Standard
Cluster Data-Compute
Cluster
1) of the utilized hardware resources
2) of the utilized VMs
∆ – difference in performance
5th Workshop on Big Data Benchmarking 2014 10
Step II: Comparison Standard2/Data-
Compute3
Standard
Cluster Data-Compute
Cluster
1) of the utilized hardware resources
2) of the utilized VMs
∆ – difference in performance
5th Workshop on Big Data Benchmarking 2014 11
Step II: Comparison Data-
Compute1/2/3
Data-Compute
Cluster Data-Compute
Cluster
1) of the utilized hardware resources
∆ – difference in performance
5th Workshop on Big Data Benchmarking 2014 12
Step II: All Cluster Configurations
5th Workshop on Big Data Benchmarking 2014 13
Step III & IV: CPU Bound - WordCount
• Configuration: 4 map/1 reduce tasks, 10/20/50 GB input data sizes
• Times normalized with respect to baseline Standard1
• 38-47% better performance for Data-Compute cluster
• Data-Compute1 (2CW & 1DW) ≈ Data-Compute2 (2CW & 2DW)
Equal
Number
of VMs
3 VMs 6 VMs
DataSize
(GB)
Diff. (%)
Standard1/
Data-Comp1
Diff. (%)
Standard2/
Data-Comp3
10 -40 -38
20 -41 -42
50 -43 -47
5th Workshop on Big Data Benchmarking 2014 14
1.00 1.00 1.00
1.75 1.74 1.74
0.71 0.71 0.700.71 0.71 0.70
1.26 1.22 1.19
0
0.5
1
1.5
2
10 20 50Data Size (GB)
Standard1 Standard2 Data-Comp1 Data-Comp2 Data-Comp3
RatiotoStandard1
Step III & IV: Read I/O Bound –
TestDFSIOEnh (1)
• Configuration: 100MB file size, 10/20/50 GB input data sizes
• Read times normalized with respect to baseline Standard1
• Standard1 (Standard Cluster) performs best
Equal
Number
of VMs
3 VMs 6 VMs
Data Size
(GB)
Diff. (%)
Standard1/
Data-Comp1
Diff. (%)
Standard2/
Data-Comp3
10 68 -18
20 71 -30
50 73 -46
RatiotoStandard1
5th Workshop on Big Data Benchmarking 2014 15
1.00 1.00 1.00
1.83 1.93 1.87
3.08
3.39
3.66
1.51
1.71 1.78
1.55 1.48
1.28
0.0
1.0
2.0
3.0
4.0
10 20 50Data Size (GB)
Standard1 Standard2 Data-Comp1 Data-Comp2 Data-Comp3
Step III & IV: Read I/O Bound –
TestDFSIOEnh (2)
• Configuration: 100MB file size, 10/20/50 GB input data sizes
• Read times normalized with respect to baseline Standard1
• Data-Comp1 (2CW & 1DW) > DC2 (2CW & 2DW) > DC3 (3CW & 3DW)
 More data nodes improve read performance in a Data-Compute cluster.
Different
Number
of VMs
3 VMs
4 VMs
4 VMs
6 VMs
Data Size
(GB)
Diff. (%)
Data-
Comp1/2
Diff. (%)
Data-
Comp2/3
10 -104 3
20 -99 -15
50 -106 -39
5th Workshop on Big Data Benchmarking 2014 16
1.00 1.00 1.00
1.83 1.93 1.87
3.08
3.39
3.66
1.51
1.71 1.78
1.55 1.48
1.28
0.0
1.0
2.0
3.0
4.0
10 20 50Data Size (GB)
Standard1 Standard2 Data-Comp1 Data-Comp2 Data-Comp3
RatiotoStandard1
Step III & IV: Write I/O Bound –
TestDFSIOEnh (1)
• Configuration: 100MB file size, 10/20/50 GB input data sizes
• Write times normalized with respect to baseline Standard1
• Data-Compute cluster (Data-Comp1, Data-Comp3) performs better
Equal
Number
of VMs
3 VMs 6 VMs
Data Size
(GB)
Diff. (%)
Standard1/
Data-Comp1
Diff. (%)
Standard2/
Data-Comp3
10 -10 4
20 -21 -14
50 -24 -1
5th Workshop on Big Data Benchmarking 2014 17
1.00 1.00 1.00
0.84
1.08
1.00
0.91
0.83 0.81
0.73
0.86
0.95
0.87
0.95 0.99
0.0
0.5
1.0
1.5
10 20 50
Data Size (GB)
Standard1 Standard2 Data-Comp1 Data-Comp2 Data-Comp3
RatiotoStandard1
Step III & IV: Write I/O Bound –
TestDFSIOEnh (2)
• Configuration: 100MB file size, 10/20/50 GB input data sizes
• Write times normalized with respect to baseline Standard1
• Data-Comp1 (2CW & 1DW) < Data-Comp3(3CW & 3DW)
 Having 2 extra Data Worker nodes increases the write overhead up to
19% in a Data-Compute cluster.
• Data-Comp3 (6VMs) outperforms Standard1 (3VMs)
Different
Number
of VMs
3 VMs
6 VMs
3 VMs
6 VMs
Data Size
(GB)
Diff. (%)
Data-
Comp1/3
Diff. (%)
Standard1/
Data-Comp3
10 -4 -15
20 13 -6
50 19 -1
5th Workshop on Big Data Benchmarking 2014 18
1.00 1.00 1.00
0.84
1.08
1.00
0.91
0.83 0.81
0.73
0.86
0.95
0.87
0.95 0.99
0.0
0.5
1.0
1.5
10 20 50
Data Size (GB)
Standard1 Standard2 Data-Comp1 Data-Comp2 Data-Comp3
RatiotoStandard1
Summary of Results
• Compute-intensive (i.e. CPU bound) workloads are suitable for Data-
Compute clusters. (up to 47% faster)
• Read-intensive (i.e. read I/O bound) workloads are suitable for Standard
clusters.
– For Data-Compute clusters adding more data nodes improves the read
performance. (up to 39% better e.g. Data-Compute2/Data-Compute3)
• Write-intensive (i.e. write I/O bound) workloads are suitable for Data-
Compute clusters. (up to 15% faster e.g. Standard1/Data-Compute3 )
– Lower number of data nodes result in better write performance.
5th Workshop on Big Data Benchmarking 2014 19
Lessons Learned
• Factors influencing cluster performance*:
– Overall number of virtual nodes (VMs) in a cluster
– Choosing cluster type (Standard or Data-Compute Hadoop cluster)
– Number of nodes for each type (compute and data nodes) in a Data-
Compute cluster
* note: Limitations known! (slide 9)
5th Workshop on Big Data Benchmarking 2014 20
Next Steps
• Repeat the experiments on virtualized multi-node cluster
• Evaluate virtualized performance with other workloads
• Experiments with larger data sets
• Repeat the experiments using other hypervisors (e.g.
OpenStack)
5th Workshop on Big Data Benchmarking 2014 21
Thank you! 
Questions & Feedback
are very welcome!
Contact info:
Todor Ivanov
todor@dbis.cs.uni-frankfurt.de
http://www.bigdata.uni-frankfurt.de/
5th Workshop on Big Data Benchmarking 2014 22

More Related Content

What's hot

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureKyong-Ha Lee
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
 
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...Kyong-Ha Lee
 
Getting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Getting the most out of multi-GPU on Inference stage using Hadoop-spark clusterGetting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Getting the most out of multi-GPU on Inference stage using Hadoop-spark clusterDaesu Chung
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...Rafael Ferreira da Silva
 
Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets robertlz
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalabilityWANdisco Plc
 
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...AM Publications
 
NameNode and DataNode Coupling for a Power-proportional Hadoop Distributed F...
NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed F...NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed F...
NameNode and DataNode Coupling for a Power-proportional Hadoop Distributed F...Hanh Le Hieu
 
An experimental evaluation of performance
An experimental evaluation of performanceAn experimental evaluation of performance
An experimental evaluation of performanceijcsa
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 

What's hot (20)

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Database Research on Modern Computing Architecture
Database Research on Modern Computing ArchitectureDatabase Research on Modern Computing Architecture
Database Research on Modern Computing Architecture
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Unit 1
Unit 1Unit 1
Unit 1
 
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
 
Hadoop
HadoopHadoop
Hadoop
 
Getting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Getting the most out of multi-GPU on Inference stage using Hadoop-spark clusterGetting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
Getting the most out of multi-GPU on Inference stage using Hadoop-spark cluster
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
 
Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
 
NameNode and DataNode Coupling for a Power-proportional Hadoop Distributed F...
NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed F...NameNode and DataNode Couplingfor a Power-proportional Hadoop Distributed F...
NameNode and DataNode Coupling for a Power-proportional Hadoop Distributed F...
 
Hadoop
HadoopHadoop
Hadoop
 
Google's Dremel
Google's DremelGoogle's Dremel
Google's Dremel
 
An experimental evaluation of performance
An experimental evaluation of performanceAn experimental evaluation of performance
An experimental evaluation of performance
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 

Viewers also liked

Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld
 
Soyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonSoyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonRSD
 
7. emc isilon hdfs enterprise storage for hadoop
7. emc isilon hdfs   enterprise storage for hadoop7. emc isilon hdfs   enterprise storage for hadoop
7. emc isilon hdfs enterprise storage for hadoopTaldor Group
 
EMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC
 
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop InnoTech
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastuctureDataWorks Summit
 
Gartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud ServicesGartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud ServicesPhilip Say
 
VMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
VMworld - vSphere Distributed Switch 6.0 Technical Deep DiveVMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
VMworld - vSphere Distributed Switch 6.0 Technical Deep DiveChris Wahl
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Nati Shalom
 
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...EMC
 
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...EMC
 

Viewers also liked (15)

Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
 
Soyez Big Data ready avec Isilon
Soyez Big Data ready avec IsilonSoyez Big Data ready avec Isilon
Soyez Big Data ready avec Isilon
 
7. emc isilon hdfs enterprise storage for hadoop
7. emc isilon hdfs   enterprise storage for hadoop7. emc isilon hdfs   enterprise storage for hadoop
7. emc isilon hdfs enterprise storage for hadoop
 
EMC Hadoop Starter Kit
EMC Hadoop Starter KitEMC Hadoop Starter Kit
EMC Hadoop Starter Kit
 
Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop Emerging Big Data & Analytics Trends with Hadoop
Emerging Big Data & Analytics Trends with Hadoop
 
EMC config Hadoop
EMC config HadoopEMC config Hadoop
EMC config Hadoop
 
Big data on virtualized infrastucture
Big data on virtualized infrastuctureBig data on virtualized infrastucture
Big data on virtualized infrastucture
 
Gartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud ServicesGartner IT Symposium 2014 - VMware Cloud Services
Gartner IT Symposium 2014 - VMware Cloud Services
 
VMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
VMworld - vSphere Distributed Switch 6.0 Technical Deep DiveVMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
VMworld - vSphere Distributed Switch 6.0 Technical Deep Dive
 
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...
 
Cloud Management with vRealize Operations
Cloud Management with vRealize OperationsCloud Management with vRealize Operations
Cloud Management with vRealize Operations
 
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
Building Hadoop-as-a-Service with Pivotal Hadoop Distribution, Serengeti, & I...
 
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
Hadoop Analytics + Enterprise Class Storage: One-Stop Solution From EMC for H...
 

Similar to WBDB 2014 Benchmarking Virtualized Hadoop Clusters

BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBencht_ivanov
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...t_ivanov
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systemshdhappy001
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systemshdhappy001
 
Comparison of In-memory Data Platforms
Comparison of In-memory Data PlatformsComparison of In-memory Data Platforms
Comparison of In-memory Data PlatformsAmir Mahdi Akbari
 
Big Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeBig Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeROHIT KHARABE
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopHadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopYahoo Developer Network
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on HadoopDataWorks Summit
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)Robert Grossman
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsGábor Szárnyas
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
An introduction to Workload Modelling for Cloud Applications
An introduction to Workload Modelling for Cloud ApplicationsAn introduction to Workload Modelling for Cloud Applications
An introduction to Workload Modelling for Cloud ApplicationsRavi Yogesh
 

Similar to WBDB 2014 Benchmarking Virtualized Hadoop Clusters (20)

BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems
 
Comparison of In-memory Data Platforms
Comparison of In-memory Data PlatformsComparison of In-memory Data Platforms
Comparison of In-memory Data Platforms
 
Big Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit KharabeBig Data Testing Approach - Rohit Kharabe
Big Data Testing Approach - Rohit Kharabe
 
F1803013034
F1803013034F1803013034
F1803013034
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopHadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
An introduction to Workload Modelling for Cloud Applications
An introduction to Workload Modelling for Cloud ApplicationsAn introduction to Workload Modelling for Cloud Applications
An introduction to Workload Modelling for Cloud Applications
 

More from t_ivanov

CoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsCoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core Operationst_ivanov
 
Building the DataBench Workflow and Architecture
Building the DataBench Workflow and ArchitectureBuilding the DataBench Workflow and Architecture
Building the DataBench Workflow and Architecturet_ivanov
 
Adding Velocity to BigBench
Adding Velocity to BigBenchAdding Velocity to BigBench
Adding Velocity to BigBencht_ivanov
 
Exploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured StreamingExploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured Streamingt_ivanov
 
ABench: Big Data Architecture Stack Benchmark
ABench: Big Data Architecture Stack BenchmarkABench: Big Data Architecture Stack Benchmark
ABench: Big Data Architecture Stack Benchmarkt_ivanov
 
Lessons Learned on Benchmarking Big Data Platforms
Lessons Learned on Benchmarking  Big Data PlatformsLessons Learned on Benchmarking  Big Data Platforms
Lessons Learned on Benchmarking Big Data Platformst_ivanov
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBencht_ivanov
 

More from t_ivanov (7)

CoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsCoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core Operations
 
Building the DataBench Workflow and Architecture
Building the DataBench Workflow and ArchitectureBuilding the DataBench Workflow and Architecture
Building the DataBench Workflow and Architecture
 
Adding Velocity to BigBench
Adding Velocity to BigBenchAdding Velocity to BigBench
Adding Velocity to BigBench
 
Exploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured StreamingExploratory Analysis of Spark Structured Streaming
Exploratory Analysis of Spark Structured Streaming
 
ABench: Big Data Architecture Stack Benchmark
ABench: Big Data Architecture Stack BenchmarkABench: Big Data Architecture Stack Benchmark
ABench: Big Data Architecture Stack Benchmark
 
Lessons Learned on Benchmarking Big Data Platforms
Lessons Learned on Benchmarking  Big Data PlatformsLessons Learned on Benchmarking  Big Data Platforms
Lessons Learned on Benchmarking Big Data Platforms
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
 

Recently uploaded

What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 

Recently uploaded (20)

What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 

WBDB 2014 Benchmarking Virtualized Hadoop Clusters

  • 1. Benchmarking Virtualized Hadoop Clusters Todor Ivanov, Roberto V. Zicari Big Data Lab, Goethe University Frankfurt Alejandro Buchmann Database and Distributed Systems, TU Darmstadt 15th Workshop on Big Data Benchmarking 2014
  • 2. Outline • Virtualizing Hadoop • Measuring Performance – Iterative Experimental Approach – Platform Setup – Experiments – Summary of Results • Lessons Learned • Next Steps 5th Workshop on Big Data Benchmarking 2014 2
  • 3. Virtualizing Hadoop • Motivation – Hadoop-as-a-service (e.g. Amazon Elastic Map Reduce) – Automated deployment and cost-effective management – Dynamically scalable cluster size (e.g. # of nodes, resource allocation) • Challenges – I/O overhead – Network overhead (message communication and data transfer) • Related Work: virtualized vs. physical Hadoop  Virtualized Hadoop has an estimated overhead ranging between 2-10% (reported in [1], [2], [3]) 5th Workshop on Big Data Benchmarking 2014 3 [1] Buell, J.: A Benchmarking Case Study of Virtualized Hadoop Performance on VMware vSphere 5. Tech. White Pap. VMware Inc. (2011). [2] Buell, J.: Virtualized Hadoop Performance with VMware vSphere ®5.1. Tech. White Pap. VMware Inc. (2013). [3] Microsoft: Performance of Hadoop on Windows in Hyper-V Environments. Tech. White Pap. Microsoft. (2013).
  • 4. Objectives of Our Research Investigate and compare the performance between standard and separated data-compute cluster configurations. • How does the application performance change on a data-compute cluster? • What type of applications are more suitable for data-compute clusters? 5th Workshop on Big Data Benchmarking 2014 4 Standard Cluster Data-Compute Cluster
  • 5. Methodology: Iterative Experimental Approach I. Choose a Big Data Benchmark II. Configure Hadoop Cluster III. Perform Experiments IV. Evaluate Results 5th Workshop on Big Data Benchmarking 2014 5
  • 6. Step I: Intel HiBench • Benchmark suite for Hadoop (developed by Intel in 2010) (Huang et al. [4]) • 4 categories, 10 workloads & 3 types • Metrics: Time (Sec) & Throughput (Bytes/Sec) Category No Workload Tools Type Micro Benchmarks 1 Sort MapReduce IO Bound 2 WordCount MapReduce CPU Bound 3 TeraSort MapReduce Mixed 4 TestDFSIOEnhanced MapReduce IO Bound Web Search 5 Nutch Indexing Nutch, Lucene Mixed 6 Page Rank Pegasus Mixed Machine Learning 7 Bayesian Classification Mahout Mixed 8 K-means Clustering Mahout Mixed Analytical Query 9 Join Hive Mixed 10 Aggregation Hive Mixed 5th Workshop on Big Data Benchmarking 2014 6 [4] Huang, S. et al.: The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. Data Engineering Workshops (ICDEW), 2010
  • 7. Step II: Platform Setup • Platform layer (Hadoop Cluster) – vSphere Big Data Extension integrating Serengeti Server (version 1.0) – VM template hosting CentOS – Apache Hadoop (version 1.2.1) with default parameters: • 200MB Java Heap size • 64MB block size • 3 replication factor • Management layer (Virtualization) – VMWare vSphere 5.1 – ESXi and vCenter Servers • Hardware layer - Dell PowerEdge T420 server – 2 x Intel Xeon E5-2420 (1.9 GHz), 6 core CPUs – 32GB RAM – 4 x 1 TB, WD SATA disks Hardware Management (Virtualization) Application (HiBench Benchmark) Platform (Hadoop Cluster) CPUs Memory Storage 5th Workshop on Big Data Benchmarking 2014 7
  • 8. (Known) Limitations • Single physical server (no physical network) • VMWare ESXi server hypervisor • Testing with default configurations (Serengeti & Hadoop) • Time constraints: – Input data sizes: 10/20/50GB – 3 test repetitions 5th Workshop on Big Data Benchmarking 2014 8
  • 9. Step II: Comparison Factors The number of utilized VMs in the compared clusters should be equal. • Each additional VM increases the hypervisor overhead (reported in [2], [5], [6]) • Utilizing more VMs may improve the overall system performance [2] The utilized hardware resources in a cluster should be equal. 5th Workshop on Big Data Benchmarking 2014 9 [2] Buell, J.: Virtualized Hadoop Performance with VMware vSphere ®5.1. Tech. White Pap. VMware Inc. (2013). [5] Li, J. et al.: Performance Overhead Among Three Hypervisors: An Experimental Study using Hadoop Benchmarks. Big Data (BigData Congress), 2013 [6] Ye, K. et al.: vHadoop: A Scalable Hadoop Virtual Cluster Platform for MapReduce-Based Parallel Machine Learning with Performance Consideration. Cluster Computing Workshops (CLUSTER WORKSHOPS), 2012
  • 10. Step II: Comparison Standard1/Data- Compute1 Standard Cluster Data-Compute Cluster 1) of the utilized hardware resources 2) of the utilized VMs ∆ – difference in performance 5th Workshop on Big Data Benchmarking 2014 10
  • 11. Step II: Comparison Standard2/Data- Compute3 Standard Cluster Data-Compute Cluster 1) of the utilized hardware resources 2) of the utilized VMs ∆ – difference in performance 5th Workshop on Big Data Benchmarking 2014 11
  • 12. Step II: Comparison Data- Compute1/2/3 Data-Compute Cluster Data-Compute Cluster 1) of the utilized hardware resources ∆ – difference in performance 5th Workshop on Big Data Benchmarking 2014 12
  • 13. Step II: All Cluster Configurations 5th Workshop on Big Data Benchmarking 2014 13
  • 14. Step III & IV: CPU Bound - WordCount • Configuration: 4 map/1 reduce tasks, 10/20/50 GB input data sizes • Times normalized with respect to baseline Standard1 • 38-47% better performance for Data-Compute cluster • Data-Compute1 (2CW & 1DW) ≈ Data-Compute2 (2CW & 2DW) Equal Number of VMs 3 VMs 6 VMs DataSize (GB) Diff. (%) Standard1/ Data-Comp1 Diff. (%) Standard2/ Data-Comp3 10 -40 -38 20 -41 -42 50 -43 -47 5th Workshop on Big Data Benchmarking 2014 14 1.00 1.00 1.00 1.75 1.74 1.74 0.71 0.71 0.700.71 0.71 0.70 1.26 1.22 1.19 0 0.5 1 1.5 2 10 20 50Data Size (GB) Standard1 Standard2 Data-Comp1 Data-Comp2 Data-Comp3 RatiotoStandard1
  • 15. Step III & IV: Read I/O Bound – TestDFSIOEnh (1) • Configuration: 100MB file size, 10/20/50 GB input data sizes • Read times normalized with respect to baseline Standard1 • Standard1 (Standard Cluster) performs best Equal Number of VMs 3 VMs 6 VMs Data Size (GB) Diff. (%) Standard1/ Data-Comp1 Diff. (%) Standard2/ Data-Comp3 10 68 -18 20 71 -30 50 73 -46 RatiotoStandard1 5th Workshop on Big Data Benchmarking 2014 15 1.00 1.00 1.00 1.83 1.93 1.87 3.08 3.39 3.66 1.51 1.71 1.78 1.55 1.48 1.28 0.0 1.0 2.0 3.0 4.0 10 20 50Data Size (GB) Standard1 Standard2 Data-Comp1 Data-Comp2 Data-Comp3
  • 16. Step III & IV: Read I/O Bound – TestDFSIOEnh (2) • Configuration: 100MB file size, 10/20/50 GB input data sizes • Read times normalized with respect to baseline Standard1 • Data-Comp1 (2CW & 1DW) > DC2 (2CW & 2DW) > DC3 (3CW & 3DW)  More data nodes improve read performance in a Data-Compute cluster. Different Number of VMs 3 VMs 4 VMs 4 VMs 6 VMs Data Size (GB) Diff. (%) Data- Comp1/2 Diff. (%) Data- Comp2/3 10 -104 3 20 -99 -15 50 -106 -39 5th Workshop on Big Data Benchmarking 2014 16 1.00 1.00 1.00 1.83 1.93 1.87 3.08 3.39 3.66 1.51 1.71 1.78 1.55 1.48 1.28 0.0 1.0 2.0 3.0 4.0 10 20 50Data Size (GB) Standard1 Standard2 Data-Comp1 Data-Comp2 Data-Comp3 RatiotoStandard1
  • 17. Step III & IV: Write I/O Bound – TestDFSIOEnh (1) • Configuration: 100MB file size, 10/20/50 GB input data sizes • Write times normalized with respect to baseline Standard1 • Data-Compute cluster (Data-Comp1, Data-Comp3) performs better Equal Number of VMs 3 VMs 6 VMs Data Size (GB) Diff. (%) Standard1/ Data-Comp1 Diff. (%) Standard2/ Data-Comp3 10 -10 4 20 -21 -14 50 -24 -1 5th Workshop on Big Data Benchmarking 2014 17 1.00 1.00 1.00 0.84 1.08 1.00 0.91 0.83 0.81 0.73 0.86 0.95 0.87 0.95 0.99 0.0 0.5 1.0 1.5 10 20 50 Data Size (GB) Standard1 Standard2 Data-Comp1 Data-Comp2 Data-Comp3 RatiotoStandard1
  • 18. Step III & IV: Write I/O Bound – TestDFSIOEnh (2) • Configuration: 100MB file size, 10/20/50 GB input data sizes • Write times normalized with respect to baseline Standard1 • Data-Comp1 (2CW & 1DW) < Data-Comp3(3CW & 3DW)  Having 2 extra Data Worker nodes increases the write overhead up to 19% in a Data-Compute cluster. • Data-Comp3 (6VMs) outperforms Standard1 (3VMs) Different Number of VMs 3 VMs 6 VMs 3 VMs 6 VMs Data Size (GB) Diff. (%) Data- Comp1/3 Diff. (%) Standard1/ Data-Comp3 10 -4 -15 20 13 -6 50 19 -1 5th Workshop on Big Data Benchmarking 2014 18 1.00 1.00 1.00 0.84 1.08 1.00 0.91 0.83 0.81 0.73 0.86 0.95 0.87 0.95 0.99 0.0 0.5 1.0 1.5 10 20 50 Data Size (GB) Standard1 Standard2 Data-Comp1 Data-Comp2 Data-Comp3 RatiotoStandard1
  • 19. Summary of Results • Compute-intensive (i.e. CPU bound) workloads are suitable for Data- Compute clusters. (up to 47% faster) • Read-intensive (i.e. read I/O bound) workloads are suitable for Standard clusters. – For Data-Compute clusters adding more data nodes improves the read performance. (up to 39% better e.g. Data-Compute2/Data-Compute3) • Write-intensive (i.e. write I/O bound) workloads are suitable for Data- Compute clusters. (up to 15% faster e.g. Standard1/Data-Compute3 ) – Lower number of data nodes result in better write performance. 5th Workshop on Big Data Benchmarking 2014 19
  • 20. Lessons Learned • Factors influencing cluster performance*: – Overall number of virtual nodes (VMs) in a cluster – Choosing cluster type (Standard or Data-Compute Hadoop cluster) – Number of nodes for each type (compute and data nodes) in a Data- Compute cluster * note: Limitations known! (slide 9) 5th Workshop on Big Data Benchmarking 2014 20
  • 21. Next Steps • Repeat the experiments on virtualized multi-node cluster • Evaluate virtualized performance with other workloads • Experiments with larger data sets • Repeat the experiments using other hypervisors (e.g. OpenStack) 5th Workshop on Big Data Benchmarking 2014 21
  • 22. Thank you!  Questions & Feedback are very welcome! Contact info: Todor Ivanov todor@dbis.cs.uni-frankfurt.de http://www.bigdata.uni-frankfurt.de/ 5th Workshop on Big Data Benchmarking 2014 22