A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

A
Performance
Comparison
of
Container-‐based

Virtualiza8on
Systems
for
MapReduce
Clusters

Miguel
G.
Xavier,
Marcelo
V.
Neves,
Cesar
A.
F.
De
Rose

miguel.xavier@acad.pucrs.br

Faculty
of
Informa8cs,
PUCRS

Porto
Alegre,
Brazil

February
13,
2014

Outline

•  Introduc8on

•  Container-‐based
Virtualiza8on

•  MapReduce

•  Evalua8on

•  Conclusion

Introduc8on

•  Virtualiza8on

•  Allows
resources
to
be
shared

•  Hardware
independence,
availability,
isola8on
and
security

•  BeUer
manageability

•  Widely
used
in
datacenters/cloud
compu8ng

•  MapReduce
Cluster
and
Virtualiza8on

•  Usage
scenarios

•  BeUer
resource
sharing

•  Cloud
Compu8ng

•  However,
hypervisor-‐based
technologies
in
MapReduce
environments
has

tradi8onally
been
avoided

Container-‐based
Virtualiza8on

•  A
group
o
processes
on
a
Linux
box,
put
together
in
a

isolated
environment

•  A
lightweight
virtualiza8on
layer

•  Non
virtualized
drivers

•  Shared
opera8ng
system

Hardware
Host OS
Virtualization Layer
Guest
Processes
Guest
Processes
Hardware
Virtualization Layer
Guest
Processes
Guest
Processes
Guest OS Guest OS
Container-based Virtualization Hypervisor-Based Virtualization
Host OS

Container-‐based
Virtualiza8on

•  Each
container
has:

•  Its
own
network
interface
(and
IP
Address)

•  Bridged,
routed
…

•  Its
own
ﬁlesystem

•  Isola8on
(security)

•  container
A
and
B
can’t
see
each
other

•  Isola8on
(resource
usage)

•  RAM,
CPU,
I/O

•  Current
systems

•  Linux-‐Vserver,
OpenVZ,
LXC

Container-‐based
Virtualiza8on

•  Implements
Linux
Namespaces

•  Mount
–
moun8ng/unmou8ng
file
systems

•  UTS
–
hostname,
domainname

•  IPC
–
SysV
message
queues,
semaphore,
memory
segments

•  Network
–
IPv4/IPv6
stacks,
rou8ng,
firewall,
/proc/net,

sock

•  PID
–
Own
set
of
pids

Chroot
is
filesystem
namespace

•  Current
systems

•  Linux-‐Vserver,
OpenVZ,
LXC

Container-‐based
Systems

•  Linux-‐VServer

•  Implements
its
own
features
in
Linux
kernel

•  limits
the
scope
of
the
ﬁle
system
from
diﬀerent
processes

through
the
tradi8onal
chroot

•  OpenVZ

•  Linux
Containers
(LXC)

•  Based
on
CGroups

Hypervisor-‐
vs
Container-‐based
Systems

Hypervisor
Container

Diﬀerent
Kernel
OS
Single
Kernel

Device
Emula8on
Syscall

Many
FS
caches
Single
FS
cache

Limits
per
machine
Limits
per
process

High
Performance
Overhead
Low
Performance
Overhead

MapReduce

•  MapReduce

•  A
parallel
programming
model

•  Simplicity,
eﬃciency
and
high
scalability

•  It
has
become
a
de
facto
standard
for
large-‐scale
data
analysis

•  MapReduce
has
also
aUracted
the
aUen8on
of
the
HPC

community

•  Simpler
approach
to
address
the
parallelism
problem

•  Highly
visible
case
where
MapReduce
has
been
successfully

used
by
companies
like
Google,
Yahoo!,
Facebook
and

Amazon

MapReduce
and
Containers

•  Apache
Mesos

•  Shares
a
cluster
between
mul8ple
diﬀerent
frameworks

•  Creates
another
level
of
resource
management

•  Management
is
taken
away
from
cluster’s
RMS

•  Apache
YARN

•  Hadoop
Next
Genera8on

•  BeUer
job
scheduling/monitoring

•  Uses
virtualiza8on
to
share
a
cluster
among
diﬀerent

applica8ons

Evalua8on

•  Experimental
Environment

•  Hadoop
cluster
composed
by
4
nodes

•  Two
processors
with
8
cores
(without
threads)
per
node

•  16GB
of
memory
per
node

•  146GB
of
disksize
per
node

•  Analyze
of
the
best
results
of
performance

•  Through
micro-‐benchmarks

•  HDFS
evalua8on
(TestDFSIO)

•  NameNode
evalua8on
(NNBench)

•  MapReduce
evalua8on
(MRBench)

•  Through
macro-‐benchmarks
(WordCount,
TeraBench)

•  Analyze
of
best
results
of
isola8on

•  Through
IBS
benchmark

•  At
least
50
execu8ons
were
performed
for
each
experiment

HDFS
Evalua8on

•  Semngs:

•  Replica8on
of
3
blocks

•  File
size
from
100
MB
to

3000
MB

•  All
Container-‐based
systems

have
performance
similar
to

na8ve

•  Results
o
OpenVZ
represents

loss
of
3Mbps

•  It
is
due
to
the
CFQ
scheduler

0
5
10
15
20
25
30
0 1000 2000 3000
File size (Bytes)
Throughput(Mbps)
lxc
nativa
ovz
vserver

HDFS
Evalua8on

•  All
of
Container-‐based

systems
obtained

performance
results
similar

to
na8ve

uses
a

Physical-‐based
network

0
5
10
15
20
25
30
0 1000 2000 3000
File size (Bytes)
Throughput(Mbps)
lxc
nativa
ovz
vserver

NameNode
Evalua8on
using
NNBench

•  NNBench
benchmark
was
chosen
to
evaluate
the
NameNode
component

reaches
a
latency
at
a
average
of
48ms,
while
LXC
obtained
the

worst
result
at
an
average
of
56ms

•  The
differences
are
not
so
significant
if
the
numbers
are
considered

•  However,
the
strengths
are
that
no
excep8on
was
observed
during
the
high

HDFS
management
stress,
and
that
all
systems
were
able
to
respond

effec8vely
as
the
na8ve

Na8ve
LXC
OpenVZ
VServer

Open/Read
(ms)
0.51

0.52
0.51
0.49

Create/Write
(ms)
54.65
56.89
51.96
48.90

• 

Generates
opera8ons
on
1000
files
on
HDFS

MapReduce
Evalua8on
using
MRBench

•  The
results
obtained
from
MRBench
show
that
MR
layer
suffers
no
substan8al

effect
while
running
on
different
container-‐based
virtualiza8on
systems

Na8ve
LXC
OpenVZ
VServer

Execu8on
Time

14251

13577
14304

13614

Analyzing
Performance
with
WordCount

0
20
40
60
80
100
120
140
160
180
Wordcount
ExecutionTime(seconds)
Native
LXC
OpenVZ
VServer
•  30
GB
of
input
data

•  The
peak
of
performance

degrada8on
from
OpenVZ

is
explained
by
the
I/O

scheduler
overhead

Analyzing
Performance
with
TeraSort

0
20
40
60
80
100
120
140
Terasort
ExecutionTime(seconds)
Native
LXC
OpenVZ
VServer
•  Standard
map/reduce
sort

•  Steps:

•  Generates
30
GB
of
input

data

•  Run
on
such
input
data.

•  A
HDFS
block
size
of
64MB

Performance
Isola8on

Container

A

Container

A

Container

B

Base
line

applica8on

Base
line

applica8on

Stress
Test

Execu8on
Time

Execu8on
Time

Performance
degrada8on
(%)

Performance
Isola8on

CPU
Memory
I/O
Fork
Bomb

LXC
0%
8.3%
5.5%
0%

•  We
chose
LXC

as
the
representa8ve
of
the
container-‐based
virtualiza8on
to
be

evaluated

•  The
limits

of
the
CPU
usage
per
container
is
working
well

•  no
signiﬁcant
impact
was
noted.

•  a
liUle
performance
degrada8on
needs
to
be
taken
into
account

•  The
fork
bomb
stress
test
reveals
that
the
LXC
has
a
security
subsystem
that

ensure
feasibility

Conclusions

•  we
found
that
all
container-‐based
systems
reach
a
near-‐na8ve
performance
for

MapReduce
workloads

•  the
results
of
performance
isola8on
reveled
that
the
LXC
has
improved
its

capabili8es
of
restrict
resources
among
containers

•  although
some
works
are
already
taking
advantages
of
container-‐based

systems
on
MR
clusters

•  this
work
demonstrated
the
beneﬁts
of
using
container-‐based
systems
to

support
MapReduce
clusters

Future
Work

•  We
plan
to
study
the
performance
isola8on
at
the
network-‐level

•  We
plan
to
study
the
scalability
while
increasing
the
number
of

nodes

•  We
plan
to
study
aspects
regarding
the
green
compu8ng,
such
as

the
trade-‐oﬀ
between
performance
and
energy
consump8on

Thank
you
for
your
aUen8on!

A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

Similar to A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters (20)

Recently uploaded

Recently uploaded (20)

A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters