SlideShare a Scribd company logo
1 of 45
Download to read offline
BigDataBench: Benchmarking
Big Data Systems

http://prof.ict.ac.cn/jfzhan

INSTITUTE OF COMPUTING TECHNOLOGY

1

Jianfeng Zhan
Computer Systems Research Center, ICT, CAS
CCF Big Data Technology Conference
2013-12-06
Why Big Data Benchmarking?

2

Measuring big data architecture and
systems quantitatively
2/
What is BigDataBench?


An open source project on big data
benchmarking:
•

3/

http://prof.ict.ac.cn/BigDataBench/

•

6 real-world data sets and 19 workloads
–

•

4V characteristics
–

3/

Extended in near future

Volume, Variety, Velocity, and Veracity
4/

Comparison of Big Data Benchmarking Efforts

4/
5/

Possible Users
Systems
OS for big data
File systems for big data
…………………………..

Architecture

Data
management

Processor
Memory
Networks

…………..

BigDataBench
Performance
optimization
Co-design

5/

…….....

Distributed systems
Scheduling
Programming systems
Research Publications


Characterizing data analysis workloads in data
centers. Zhen Jia, Lei Wang, Jianfeng Zhan,
Lixing Zhang, and Chunjie Luo. IISWC 2013
Best paper award

6/



6/

BigDataBench: a Big Data Benchmark Suite
from Internet Services. Lei Wang, Jianfeng
Zhan, et al. HPCA 2014, Industry Session.
Outline

7/

1

2
3

Benchmarking Methodology and Decision

Case Study

3

How to Use

5
4

Future Work
8/

BigDataBench Methodology

4V of Big Data

8/

BigDataBench
Methodology (Cont’)

9/

Represent
ative Data
Sets
Investigate
Typical
Application
Domains

Data Types
Structured
Semi-structured
Unstructured

Data
Sources
Text data
Graph data
Table data
Extended …

Big Data
Sets
Preserving
4V

data generation tool
preserving data
characteristics

Diverse
Worklo
ads

Application
Types

Basic & Important
Operations and
Algorithms
Extended…

Offline analytics
Realtime analytics
Online services

Represent
Software Stack
Extended…

BigDataBench

Big Data
Workloads
10/

Methodology (Cont’)
4V of Big Data

System and architecture
characteristics

10/

BigDataBench

Similarity
analysis
Top Sites on the Web

More details in http://www.alexa.com/topsites/global;0

Search Engine, Social Network and Electronic
Commerce hold 80% page views of all the
11/
Internet service.
12/

12/

and
atte
rep
nti
res
ons
ent
to
ativ
diff
• Inc
e
ere
lud
app
nt
e
lica
app
diff
tio
lica
ere
n
tio
nt
sce
n
dat
nar
typ
a
ios
es:
• •sou
Co
Se
onl
arc
ver
ine
rce
ser
rep
sh
En
vic
•res
Te
gin
e,
xt
ent
e,
rea
dat
Elativ
a,
tim
eco
Gr
m
e
ap
soft
me
ana
h
rce,
lyti
war
dat
cs,
eSo
a,
cia
off
Ta
sta
l
lin
ble
Ne
e
cks
dat

Workloads Chosen

tw
ana
a
ork
lyti
cs
13/

19 Chosen Workloads
Micro Benchmarks
Basic Datastore
Operations
Relational Queries
Application
Scenarios
Search engines

Social networks

E-commerce system

13/
Data Generation Tools


Data Sources


Text, Graph and Table
• Six real raw data


14/

Synthetics Data


Scale
• From GB to PB



Features
• Preserve characteristics of real-world data

14/
15/

Naïve Text generator
machine
evaluate
big
system
data
mining
architecture

select word randomly

CPU

cpu

memory
benchmarking
learning

words

documents

following multinomial distribution

Only modeling on the word level;

15/
Improved Text generator

16/

topic2

topic1

select topic randomly

machine
evaluate
big
CPU
data
mining
architecture

CPU

select word randomly

benchmarking

topic3

memory system
learning

topics
following multinomial distribution

words
following multinomial distribution under topic2

Modeling on the both topic and word
level
16/

document
Outline

17/

1

2
3

Benchmarking Methodology and Decision

Case Study

3
5
4
17/

How to Use

Future Work
BigDataBench Case Study

18/

Performance evaluation and Diagnosis
SJTU, and XJTU

Workload
Characterization

Evaluating Big
Data Hardware
Systems

ICT, CAS
SIAT, CAS

USTC, and Florida
International
University

BigDataBench

Networks for
big data
OSU

Energy Efficiency of
Big Data Systems
CNCERT

http://prof.ict.ac.cn/BigDataBench/#users
18/
19/

19/

Testbed
20/

Workloads Analyzed

http://prof.ict.ac.cn/BigDataBench
Floating point operation intensity
Data Analytics

Services

21

The total number of (floating point or integer) instructions divided by the
total number of memory access bytes in a run of workload.
Very low floating point operation intensities ( 0.009), two orders of
magnitude lower than the theory number of state-of-practice CPU (1.8)
21/
Instruction Breakdown
Data Analytics

Services

 Less floating point operations
22/


More Integer operations
23/

Ratio of Integer to Floating Point
Operations
Data Analytics




Services

The average of big data workloads is 100
Parsec, HPCC and SPECFP (1.4, 1.0, 0.67)
Integer operation intensity
Data Analytics

Services

The average integer operation intensity of big data
24/ workloads is 0.49
 That of PARSEC, HPCC, SPECFP is 1.5, 0.38, 0.23

Cache Behaviors
Data Analytics

Services

Big data workloads have high L1I misses than HPC workloads
 Data analysis workloads have better L2 cache behaviors than service workloads
25/
except BFS




Big data workloads have good L3 behaviors
TLB Behaviors
14
data analysis

5
service

ITLB misses of big data workloads are higher than HPC workloads.
 DTLB misses of big data workloads are higher than HPC workloads.
26/


26/
BigDataBench Case Study

27/

Performance evaluation and Diagnosis
SJTU, and XJTU

Big Data workload
Characterization

Evaluating Big
Data Hardware
Systems

ICT, CAS
SIAT, CAS

USTC, and Florida
International
University

BigDataBench

Networks for
big data
OSU

Energy Efficiency of
Big Data Systems
CNCERT

http://prof.ict.ac.cn/BigDataBench/#users
Evaluating Big Data Hardware Systems

28/
Experimental Platforms
Xeon (Common processor)
Atom ( Low power processor)
Tilera (Many

Brief Comparison
Basic Information
core processor)

CPU Type

Intel Atom D510

Tilera TilePro36

CPU Core

4 cores @
1.6GHz

2 cores @
1.66GHz

36 cores @
500MHz

L1 I/D
Cache

32KB

24KB

16KB/8KB

L2 Cache
29/

Intel Xeon
E5310

4096KB

512KB

64KB
Experimental Platforms
Hadoop Cluster
Information

Xeon VS Atom

Xeon VS Tilera

[ 1 Xeon master+7
Comprison
[1 Xeon master+7 Xeon
Xeon slaves ] VS [ 1
(the same logical
slaves] VS [ 1 Xeon
Atom master +7 Atom
core number)
master +1 Tilera slave]
slaves]
Hadoop setting

30/

Following the guidance on Hadoop official
website
Benchmark Selection
BigDataBench 1.0
Application

Characteristics

Sort

O(n*log2n)

Integer comparison

WordCount

O(n)

Integer comparison and
calculation

Grep

O(n)

String comparison

Naïve Bayes

O(m*n)

Floating-point computation

SVM

31/

Time
Complexity

O(n3)

Floating-point computation
Metrics
Performance: Data processed per second
(DPS)
Energy Efficiency: Application Performance
Power Usage Effectiveness(DPJ)

32/
Xeon VS Atom – DPJ

33/
Xeon VS Tilera – DPJ

34/
Reference
Jing Quan, University of Science and Technology of China, Yingjie
Shi, Chinese Academy of Sciences, Ming Zhao, Florida
International University, Wei Yang, University of Science and
Technology of China.
”The Implications from Benchmarking Three Different Data
Center Platforms”
The First Workshop on Big Data Benchmarks, Performance
Optimization, and Emerging hardware (BPOE 2013) in
conjunction with 2013 IEEE International Conference on Big
Data (IEEE Big Data 2013)

35/
Outline

36/

1

2
3

Benchmarking Methodology and Decision

Case Study

3

How to Use

5
4

Future Work
BigDataBench Class


For Architecture




For OS



37/

19 among 19
19 among 19

For Runtime environment (Hadoop)


9 of 19 workloads
•Sort, Grep, WordCount, PageRank, Index, Kmeans, Connected Components,
Collaborative Filtering and Naive Bayes.



For Data management


6 of 19 workloads
•Read, Write, Scan, Select Query, Aggregate Query, Join Query

37/
BigDataBench Class: data sources


Text related


6 of 19 workloads
•Sort, Grep, WordCount, Index, Collaborative Filtering and Naive Bayes



Graph related


•BFS, PageRank, Kmeans, and Connected Components

38/



4 of 19 workloads

Table related


9 of 19 workloads
•Read, Write, Scan, Select Query, Aggregate Query, Join Query, Nutch Server, Olio
Server and Rubis Server
BigDataBench Class: Application Types


Online Services


6 of 19 workloads
• Read, Write, Scan, Nutch server, Olio Server and Rubis server



Offline Analytics

39/ 

10 of 19 workloads
• Sort, Grep, WordCount, BFS, PageRank, Index, Kmeans, Connected Components,
Collaborative Filtering and Naive Bayes.



Realtime Analytics


3 of 19 workloads
• Select Query, Aggregate Query and Join Query
BigDataBench Class: Application Domains


Search engine related:


Basic Operations + Search Engine

7 of 19 workloads
•Sort, Grep, WordCount, BFS, PageRank, Index and Nutch Server



Social network related:

Basic Cloud OLTP+ Basic Relational Query+ Social

Network
40/





9 of 19 workloads
•Read, Write, Scan, Select Query, Aggregate Query, Join Query, Olio Server, Kmeans and
Connected Components

E-commerce related:

Basic Cloud OLTP+ Basic Relational Query+ Social

Network


9 of 19 workloads
• Read, Write, Scan, Select Query, Aggregate Query, Join Query, Rubis server, Collaborative
Filtering and Naive Bayes
Outline

41/

1

2
3

Benchmarking Methodology and Decision

Case Study

3

How to Use

5
4

Future Work
Near Future Work


Multi-media data



Deep learning workloads

42/




42/

HPC
Refine BigDataBench
Related Resources


BigDataBench project




BPOE workshop


43/

http://prof.ict.ac.cn/BigDataBench





http://prof.ict.ac.cn/bpoe
A series of workshops on Big Data Benchmarks,
Performance Optimization, and Emerging Hardware
BPOE-4: interaction among OS, architecture, and data
management
• Co-located with ASPLOS 2014
BPOE-4 SC
Christos Kozyrakis, Stanford
 Xiaofang Zhou, University of Queensland
 Dhabaleswar K Panda, Ohio State University
 Raghunath Nambiar, Cisco
 Lizy K John, University of Texas at Austin
 Xiaoyong Du, Renmin University of China
44/
 H. Peter Hofstee, IBM Austin Research Laboratory
 Ippokratis Pandis, IBM Almaden Research Center
 Alexandros Labrinidis, University of Pittsburgh
 Bill Jia, Facebook
 Jianfeng Zhan, ICT, Chinese Academy of Sciences

45/

THANKS

More Related Content

What's hot

Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Denodo
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionSteve Loughran
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Mitul Tiwari
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata Mk Kim
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationCesare Cugnasco
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsAli Hodroj
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at TwitchImply
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsStavros Kontopoulos
 
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...DataWorks Summit
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopCloudera, Inc.
 
Unattended Apache BigTop installer CD using preseed
Unattended Apache BigTop installer CD using preseedUnattended Apache BigTop installer CD using preseed
Unattended Apache BigTop installer CD using preseedJazz Yao-Tsung Wang
 
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jScalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jNeo4j
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big AnalyticsAjay Ohri
 

What's hot (20)

Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
Enabling Fast Data Strategy: What’s new in Denodo Platform 6.0
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integrationIndexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
 
Movie data analysis
Movie data analysisMovie data analysis
Movie data analysis
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at Twitch
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
Promote the Good of the People of the United Kingdom by Maintaining Monetary ...
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
 
Unattended Apache BigTop installer CD using preseed
Unattended Apache BigTop installer CD using preseedUnattended Apache BigTop installer CD using preseed
Unattended Apache BigTop installer CD using preseed
 
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4jScalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Big data Big Analytics
Big data Big AnalyticsBig data Big Analytics
Big data Big Analytics
 

Viewers also liked

Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2hdhappy001
 
Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍
Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍
Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍Sijia Lyu
 
徐萌:中国移动大数据应用实践
徐萌:中国移动大数据应用实践徐萌:中国移动大数据应用实践
徐萌:中国移动大数据应用实践hdhappy001
 
袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战hdhappy001
 
薛伟:腾讯广点通——大数据之上的实时精准推荐
薛伟:腾讯广点通——大数据之上的实时精准推荐薛伟:腾讯广点通——大数据之上的实时精准推荐
薛伟:腾讯广点通——大数据之上的实时精准推荐hdhappy001
 
刘书良:基于大数据公共云平台的Dsp技术
刘书良:基于大数据公共云平台的Dsp技术刘书良:基于大数据公共云平台的Dsp技术
刘书良:基于大数据公共云平台的Dsp技术hdhappy001
 
翟艳堂:腾讯大规模Hadoop集群实践
翟艳堂:腾讯大规模Hadoop集群实践翟艳堂:腾讯大规模Hadoop集群实践
翟艳堂:腾讯大规模Hadoop集群实践hdhappy001
 
Capital onehadoopintro
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintroDoug Chang
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsTrendProgContest13
 

Viewers also liked (11)

Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
 
Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍
Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍
Ad network、ad exchange、dsp、ssp、rtb_和dmp介绍
 
徐萌:中国移动大数据应用实践
徐萌:中国移动大数据应用实践徐萌:中国移动大数据应用实践
徐萌:中国移动大数据应用实践
 
袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战袁晓如:大数据时代可视化和可视分析的机遇与挑战
袁晓如:大数据时代可视化和可视分析的机遇与挑战
 
薛伟:腾讯广点通——大数据之上的实时精准推荐
薛伟:腾讯广点通——大数据之上的实时精准推荐薛伟:腾讯广点通——大数据之上的实时精准推荐
薛伟:腾讯广点通——大数据之上的实时精准推荐
 
刘书良:基于大数据公共云平台的Dsp技术
刘书良:基于大数据公共云平台的Dsp技术刘书良:基于大数据公共云平台的Dsp技术
刘书良:基于大数据公共云平台的Dsp技术
 
翟艳堂:腾讯大规模Hadoop集群实践
翟艳堂:腾讯大规模Hadoop集群实践翟艳堂:腾讯大规模Hadoop集群实践
翟艳堂:腾讯大规模Hadoop集群实践
 
Zh tw cloud computing era
Zh tw cloud computing eraZh tw cloud computing era
Zh tw cloud computing era
 
Capital onehadoopintro
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintro
 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 

Similar to BigDataBench Benchmarking Big Data Systems

BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBencht_ivanov
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom IndustryCloudera, Inc.
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up ServerHow Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up ServerAhsan Javed Awan
 
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop ClustersWBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop Clusterst_ivanov
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Rusif Eyvazli
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200xIBM Sverige
 
Big Data Analytics: Architectural Perspective
Big Data Analytics: Architectural PerspectiveBig Data Analytics: Architectural Perspective
Big Data Analytics: Architectural PerspectiveSumit Kalra
 
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
EUGM 2014 -  Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...EUGM 2014 -  Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...ChemAxon
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006raj_vij
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
Comparison of In-memory Data Platforms
Comparison of In-memory Data PlatformsComparison of In-memory Data Platforms
Comparison of In-memory Data PlatformsAmir Mahdi Akbari
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 

Similar to BigDataBench Benchmarking Big Data Systems (20)

BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
 
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
Hw09   Hadoop Based Data Mining Platform For The Telecom IndustryHw09   Hadoop Based Data Mining Platform For The Telecom Industry
Hw09 Hadoop Based Data Mining Platform For The Telecom Industry
 
disertation
disertationdisertation
disertation
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up ServerHow Data Volume Affects Spark Based Data Analytics on a Scale-up Server
How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop ClustersWBDB 2014 Benchmarking Virtualized Hadoop Clusters
WBDB 2014 Benchmarking Virtualized Hadoop Clusters
 
Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200x
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Big Data Analytics: Architectural Perspective
Big Data Analytics: Architectural PerspectiveBig Data Analytics: Architectural Perspective
Big Data Analytics: Architectural Perspective
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
EUGM 2014 -  Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...EUGM 2014 -  Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
EUGM 2014 - Brock Luty (Dart Neuroscience): A ChemAxon/KNIME based tool for ...
 
Internet data mining 2006
Internet data mining   2006Internet data mining   2006
Internet data mining 2006
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Comparison of In-memory Data Platforms
Comparison of In-memory Data PlatformsComparison of In-memory Data Platforms
Comparison of In-memory Data Platforms
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 

More from hdhappy001

俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkabanhdhappy001
 
杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务hdhappy001
 
肖永红:科研数据应用和共享方面的实践
肖永红:科研数据应用和共享方面的实践肖永红:科研数据应用和共享方面的实践
肖永红:科研数据应用和共享方面的实践hdhappy001
 
肖康:Storm在实时网络攻击检测和分析的应用与改进
肖康:Storm在实时网络攻击检测和分析的应用与改进肖康:Storm在实时网络攻击检测和分析的应用与改进
肖康:Storm在实时网络攻击检测和分析的应用与改进hdhappy001
 
夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架hdhappy001
 
魏凯:大数据商业利用的政策管制问题
魏凯:大数据商业利用的政策管制问题魏凯:大数据商业利用的政策管制问题
魏凯:大数据商业利用的政策管制问题hdhappy001
 
王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎hdhappy001
 
王峰:阿里搜索实时流计算技术
王峰:阿里搜索实时流计算技术王峰:阿里搜索实时流计算技术
王峰:阿里搜索实时流计算技术hdhappy001
 
钱卫宁:在线社交媒体分析型查询基准评测初探
钱卫宁:在线社交媒体分析型查询基准评测初探钱卫宁:在线社交媒体分析型查询基准评测初探
钱卫宁:在线社交媒体分析型查询基准评测初探hdhappy001
 
穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scalehdhappy001
 
罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群hdhappy001
 
刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sql刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sqlhdhappy001
 
刘昌钰:阿里大数据应用平台
刘昌钰:阿里大数据应用平台刘昌钰:阿里大数据应用平台
刘昌钰:阿里大数据应用平台hdhappy001
 
李战怀:大数据背景下分布式系统的数据一致性策略
李战怀:大数据背景下分布式系统的数据一致性策略李战怀:大数据背景下分布式系统的数据一致性策略
李战怀:大数据背景下分布式系统的数据一致性策略hdhappy001
 
冯宏华:H base在小米的应用与扩展
冯宏华:H base在小米的应用与扩展冯宏华:H base在小米的应用与扩展
冯宏华:H base在小米的应用与扩展hdhappy001
 
堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensionshdhappy001
 
陈跃国:Sql on-hadoop结构化大数据分析系统性能评测
陈跃国:Sql on-hadoop结构化大数据分析系统性能评测陈跃国:Sql on-hadoop结构化大数据分析系统性能评测
陈跃国:Sql on-hadoop结构化大数据分析系统性能评测hdhappy001
 
查礼 -大数据技术如何用于传统信息系统
查礼 -大数据技术如何用于传统信息系统查礼 -大数据技术如何用于传统信息系统
查礼 -大数据技术如何用于传统信息系统hdhappy001
 
Ted yu:h base and hoya
Ted yu:h base and hoyaTed yu:h base and hoya
Ted yu:h base and hoyahdhappy001
 
Raghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarksRaghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarkshdhappy001
 

More from hdhappy001 (20)

俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban俞晨杰:Linked in大数据应用和azkaban
俞晨杰:Linked in大数据应用和azkaban
 
杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务杨少华:阿里开放数据处理服务
杨少华:阿里开放数据处理服务
 
肖永红:科研数据应用和共享方面的实践
肖永红:科研数据应用和共享方面的实践肖永红:科研数据应用和共享方面的实践
肖永红:科研数据应用和共享方面的实践
 
肖康:Storm在实时网络攻击检测和分析的应用与改进
肖康:Storm在实时网络攻击检测和分析的应用与改进肖康:Storm在实时网络攻击检测和分析的应用与改进
肖康:Storm在实时网络攻击检测和分析的应用与改进
 
夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架夏俊鸾:Spark——基于内存的下一代大数据分析框架
夏俊鸾:Spark——基于内存的下一代大数据分析框架
 
魏凯:大数据商业利用的政策管制问题
魏凯:大数据商业利用的政策管制问题魏凯:大数据商业利用的政策管制问题
魏凯:大数据商业利用的政策管制问题
 
王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎王涛:基于Cloudera impala的非关系型数据库sql执行引擎
王涛:基于Cloudera impala的非关系型数据库sql执行引擎
 
王峰:阿里搜索实时流计算技术
王峰:阿里搜索实时流计算技术王峰:阿里搜索实时流计算技术
王峰:阿里搜索实时流计算技术
 
钱卫宁:在线社交媒体分析型查询基准评测初探
钱卫宁:在线社交媒体分析型查询基准评测初探钱卫宁:在线社交媒体分析型查询基准评测初探
钱卫宁:在线社交媒体分析型查询基准评测初探
 
穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale
 
罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群罗李:构建一个跨机房的Hadoop集群
罗李:构建一个跨机房的Hadoop集群
 
刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sql刘诚忠:Running cloudera impala on postgre sql
刘诚忠:Running cloudera impala on postgre sql
 
刘昌钰:阿里大数据应用平台
刘昌钰:阿里大数据应用平台刘昌钰:阿里大数据应用平台
刘昌钰:阿里大数据应用平台
 
李战怀:大数据背景下分布式系统的数据一致性策略
李战怀:大数据背景下分布式系统的数据一致性策略李战怀:大数据背景下分布式系统的数据一致性策略
李战怀:大数据背景下分布式系统的数据一致性策略
 
冯宏华:H base在小米的应用与扩展
冯宏华:H base在小米的应用与扩展冯宏华:H base在小米的应用与扩展
冯宏华:H base在小米的应用与扩展
 
堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions堵俊平:Hadoop virtualization extensions
堵俊平:Hadoop virtualization extensions
 
陈跃国:Sql on-hadoop结构化大数据分析系统性能评测
陈跃国:Sql on-hadoop结构化大数据分析系统性能评测陈跃国:Sql on-hadoop结构化大数据分析系统性能评测
陈跃国:Sql on-hadoop结构化大数据分析系统性能评测
 
查礼 -大数据技术如何用于传统信息系统
查礼 -大数据技术如何用于传统信息系统查礼 -大数据技术如何用于传统信息系统
查礼 -大数据技术如何用于传统信息系统
 
Ted yu:h base and hoya
Ted yu:h base and hoyaTed yu:h base and hoya
Ted yu:h base and hoya
 
Raghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarksRaghu nambiar:industry standard benchmarks
Raghu nambiar:industry standard benchmarks
 

Recently uploaded

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

BigDataBench Benchmarking Big Data Systems

  • 1. BigDataBench: Benchmarking Big Data Systems http://prof.ict.ac.cn/jfzhan INSTITUTE OF COMPUTING TECHNOLOGY 1 Jianfeng Zhan Computer Systems Research Center, ICT, CAS CCF Big Data Technology Conference 2013-12-06
  • 2. Why Big Data Benchmarking? 2 Measuring big data architecture and systems quantitatively 2/
  • 3. What is BigDataBench?  An open source project on big data benchmarking: • 3/ http://prof.ict.ac.cn/BigDataBench/ • 6 real-world data sets and 19 workloads – • 4V characteristics – 3/ Extended in near future Volume, Variety, Velocity, and Veracity
  • 4. 4/ Comparison of Big Data Benchmarking Efforts 4/
  • 5. 5/ Possible Users Systems OS for big data File systems for big data ………………………….. Architecture Data management Processor Memory Networks ………….. BigDataBench Performance optimization Co-design 5/ ……..... Distributed systems Scheduling Programming systems
  • 6. Research Publications  Characterizing data analysis workloads in data centers. Zhen Jia, Lei Wang, Jianfeng Zhan, Lixing Zhang, and Chunjie Luo. IISWC 2013 Best paper award 6/  6/ BigDataBench: a Big Data Benchmark Suite from Internet Services. Lei Wang, Jianfeng Zhan, et al. HPCA 2014, Industry Session.
  • 7. Outline 7/ 1 2 3 Benchmarking Methodology and Decision Case Study 3 How to Use 5 4 Future Work
  • 8. 8/ BigDataBench Methodology 4V of Big Data 8/ BigDataBench
  • 9. Methodology (Cont’) 9/ Represent ative Data Sets Investigate Typical Application Domains Data Types Structured Semi-structured Unstructured Data Sources Text data Graph data Table data Extended … Big Data Sets Preserving 4V data generation tool preserving data characteristics Diverse Worklo ads Application Types Basic & Important Operations and Algorithms Extended… Offline analytics Realtime analytics Online services Represent Software Stack Extended… BigDataBench Big Data Workloads
  • 10. 10/ Methodology (Cont’) 4V of Big Data System and architecture characteristics 10/ BigDataBench Similarity analysis
  • 11. Top Sites on the Web More details in http://www.alexa.com/topsites/global;0 Search Engine, Social Network and Electronic Commerce hold 80% page views of all the 11/ Internet service.
  • 13. 13/ 19 Chosen Workloads Micro Benchmarks Basic Datastore Operations Relational Queries Application Scenarios Search engines Social networks E-commerce system 13/
  • 14. Data Generation Tools  Data Sources  Text, Graph and Table • Six real raw data  14/ Synthetics Data  Scale • From GB to PB  Features • Preserve characteristics of real-world data 14/
  • 15. 15/ Naïve Text generator machine evaluate big system data mining architecture select word randomly CPU cpu memory benchmarking learning words documents following multinomial distribution Only modeling on the word level; 15/
  • 16. Improved Text generator 16/ topic2 topic1 select topic randomly machine evaluate big CPU data mining architecture CPU select word randomly benchmarking topic3 memory system learning topics following multinomial distribution words following multinomial distribution under topic2 Modeling on the both topic and word level 16/ document
  • 17. Outline 17/ 1 2 3 Benchmarking Methodology and Decision Case Study 3 5 4 17/ How to Use Future Work
  • 18. BigDataBench Case Study 18/ Performance evaluation and Diagnosis SJTU, and XJTU Workload Characterization Evaluating Big Data Hardware Systems ICT, CAS SIAT, CAS USTC, and Florida International University BigDataBench Networks for big data OSU Energy Efficiency of Big Data Systems CNCERT http://prof.ict.ac.cn/BigDataBench/#users 18/
  • 21. Floating point operation intensity Data Analytics Services 21 The total number of (floating point or integer) instructions divided by the total number of memory access bytes in a run of workload. Very low floating point operation intensities ( 0.009), two orders of magnitude lower than the theory number of state-of-practice CPU (1.8) 21/
  • 22. Instruction Breakdown Data Analytics Services  Less floating point operations 22/  More Integer operations
  • 23. 23/ Ratio of Integer to Floating Point Operations Data Analytics   Services The average of big data workloads is 100 Parsec, HPCC and SPECFP (1.4, 1.0, 0.67)
  • 24. Integer operation intensity Data Analytics Services The average integer operation intensity of big data 24/ workloads is 0.49  That of PARSEC, HPCC, SPECFP is 1.5, 0.38, 0.23 
  • 25. Cache Behaviors Data Analytics Services Big data workloads have high L1I misses than HPC workloads  Data analysis workloads have better L2 cache behaviors than service workloads 25/ except BFS   Big data workloads have good L3 behaviors
  • 26. TLB Behaviors 14 data analysis 5 service ITLB misses of big data workloads are higher than HPC workloads.  DTLB misses of big data workloads are higher than HPC workloads. 26/  26/
  • 27. BigDataBench Case Study 27/ Performance evaluation and Diagnosis SJTU, and XJTU Big Data workload Characterization Evaluating Big Data Hardware Systems ICT, CAS SIAT, CAS USTC, and Florida International University BigDataBench Networks for big data OSU Energy Efficiency of Big Data Systems CNCERT http://prof.ict.ac.cn/BigDataBench/#users
  • 28. Evaluating Big Data Hardware Systems 28/
  • 29. Experimental Platforms Xeon (Common processor) Atom ( Low power processor) Tilera (Many Brief Comparison Basic Information core processor) CPU Type Intel Atom D510 Tilera TilePro36 CPU Core 4 cores @ 1.6GHz 2 cores @ 1.66GHz 36 cores @ 500MHz L1 I/D Cache 32KB 24KB 16KB/8KB L2 Cache 29/ Intel Xeon E5310 4096KB 512KB 64KB
  • 30. Experimental Platforms Hadoop Cluster Information Xeon VS Atom Xeon VS Tilera [ 1 Xeon master+7 Comprison [1 Xeon master+7 Xeon Xeon slaves ] VS [ 1 (the same logical slaves] VS [ 1 Xeon Atom master +7 Atom core number) master +1 Tilera slave] slaves] Hadoop setting 30/ Following the guidance on Hadoop official website
  • 31. Benchmark Selection BigDataBench 1.0 Application Characteristics Sort O(n*log2n) Integer comparison WordCount O(n) Integer comparison and calculation Grep O(n) String comparison Naïve Bayes O(m*n) Floating-point computation SVM 31/ Time Complexity O(n3) Floating-point computation
  • 32. Metrics Performance: Data processed per second (DPS) Energy Efficiency: Application Performance Power Usage Effectiveness(DPJ) 32/
  • 33. Xeon VS Atom – DPJ 33/
  • 34. Xeon VS Tilera – DPJ 34/
  • 35. Reference Jing Quan, University of Science and Technology of China, Yingjie Shi, Chinese Academy of Sciences, Ming Zhao, Florida International University, Wei Yang, University of Science and Technology of China. ”The Implications from Benchmarking Three Different Data Center Platforms” The First Workshop on Big Data Benchmarks, Performance Optimization, and Emerging hardware (BPOE 2013) in conjunction with 2013 IEEE International Conference on Big Data (IEEE Big Data 2013) 35/
  • 36. Outline 36/ 1 2 3 Benchmarking Methodology and Decision Case Study 3 How to Use 5 4 Future Work
  • 37. BigDataBench Class  For Architecture   For OS   37/ 19 among 19 19 among 19 For Runtime environment (Hadoop)  9 of 19 workloads •Sort, Grep, WordCount, PageRank, Index, Kmeans, Connected Components, Collaborative Filtering and Naive Bayes.  For Data management  6 of 19 workloads •Read, Write, Scan, Select Query, Aggregate Query, Join Query 37/
  • 38. BigDataBench Class: data sources  Text related  6 of 19 workloads •Sort, Grep, WordCount, Index, Collaborative Filtering and Naive Bayes  Graph related  •BFS, PageRank, Kmeans, and Connected Components 38/  4 of 19 workloads Table related  9 of 19 workloads •Read, Write, Scan, Select Query, Aggregate Query, Join Query, Nutch Server, Olio Server and Rubis Server
  • 39. BigDataBench Class: Application Types  Online Services  6 of 19 workloads • Read, Write, Scan, Nutch server, Olio Server and Rubis server  Offline Analytics 39/  10 of 19 workloads • Sort, Grep, WordCount, BFS, PageRank, Index, Kmeans, Connected Components, Collaborative Filtering and Naive Bayes.  Realtime Analytics  3 of 19 workloads • Select Query, Aggregate Query and Join Query
  • 40. BigDataBench Class: Application Domains  Search engine related:  Basic Operations + Search Engine 7 of 19 workloads •Sort, Grep, WordCount, BFS, PageRank, Index and Nutch Server  Social network related: Basic Cloud OLTP+ Basic Relational Query+ Social Network 40/   9 of 19 workloads •Read, Write, Scan, Select Query, Aggregate Query, Join Query, Olio Server, Kmeans and Connected Components E-commerce related: Basic Cloud OLTP+ Basic Relational Query+ Social Network  9 of 19 workloads • Read, Write, Scan, Select Query, Aggregate Query, Join Query, Rubis server, Collaborative Filtering and Naive Bayes
  • 41. Outline 41/ 1 2 3 Benchmarking Methodology and Decision Case Study 3 How to Use 5 4 Future Work
  • 42. Near Future Work  Multi-media data  Deep learning workloads 42/   42/ HPC Refine BigDataBench
  • 43. Related Resources  BigDataBench project   BPOE workshop  43/ http://prof.ict.ac.cn/BigDataBench   http://prof.ict.ac.cn/bpoe A series of workshops on Big Data Benchmarks, Performance Optimization, and Emerging Hardware BPOE-4: interaction among OS, architecture, and data management • Co-located with ASPLOS 2014
  • 44. BPOE-4 SC Christos Kozyrakis, Stanford  Xiaofang Zhou, University of Queensland  Dhabaleswar K Panda, Ohio State University  Raghunath Nambiar, Cisco  Lizy K John, University of Texas at Austin  Xiaoyong Du, Renmin University of China 44/  H. Peter Hofstee, IBM Austin Research Laboratory  Ippokratis Pandis, IBM Almaden Research Center  Alexandros Labrinidis, University of Pittsburgh  Bill Jia, Facebook  Jianfeng Zhan, ICT, Chinese Academy of Sciences 