Administering a Hadoop cluster isn't easy. Many Hadoop clusters suffer from Linux configuration problems that can negatively impact performance. With vast and sometimes confusing config/tuning options, it can can tempting (and scary) for a cluster administrator to make changes to Hadoop when cluster performance isn't as expected. Learn how to improve Hadoop cluster performance and eliminate common problem areas, applicable across use cases, using a handful of simple Linux configuration changes.
5. Click
to
edit
Master
:tle
style
CC
BY
2.0
/
Alex
Moundalexis
Home
sweet
home.
6. 6
Tips
from
a
Former
SA
Field
Guy
Easy
steps
to
take…
7. 7
Tips
from
a
Former
SA
Field
Guy
Easy
steps
to
take…
that
most
people
don’t.
8. What
This
Talk
Isn’t
About
• Deploying
• Puppet,
Chef,
Ansible,
homegrown
scripts,
intern
labor
• Sizing
&
Tuning
• Depends
heavily
on
data
and
workload
• Coding
• Unless
you
count
STDOUT
redirec:on
• Algorithms
• I
suck
at
math,
but
we’ll
try
some
mul:plica:on
later
8
9. 9
“
The
answer
to
most
Hadoop
ques:ons
is
it
depends.”
10. So
What
ARE
We
Talking
About?
• Seven
simple
things
• Quick
• Safe
• Viable
for
most
environments
and
use
cases
• Iden:fy
issue,
then
offer
solu:on
• Note:
Commands
run
as
root
or
sudo
10
12. Swapping
• A
form
of
memory
management
• When
OS
runs
low
on
memory…
• write
blocks
to
disk
• use
now-‐free
memory
for
other
things
• read
blocks
back
into
memory
from
disk
when
needed
• Also
known
as
paging
12
13. Swapping
• Problem:
Disks
are
slow,
especially
to
seek
• Hadoop
is
about
maximizing
IO
• spend
less
:me
acquiring
data
• operate
on
data
in
place
• large
streaming
reads/writes
from
disk
• Memory
usage
is
limited
within
JVM
• we
should
be
able
to
manage
our
memory
13
14. Disable
Swap
in
Kernel
• Well,
as
much
as
possible.
• Immediate:
#
echo
0
>
/proc/sys/vm/swappiness
• Persist
ager
reboot:
#
echo
“vm.swappiness
=
0”
>>
/etc/sysctl.conf
14
15. Swapping
Peculiari:es
• Behavior
varies
based
on
Linux
kernel
• CentOS
6.4+
/
Ubuntu
10.10+
• For
you
kernel
gurus,
that’s
Linux
2.6.32-‐303+
• Prior
• We
don’t
swap,
except
to
avoid
OOM
condi:on.
• Ager
• We
don’t
swap,
ever.
• Details:
hkp://:ny.cloudera.com/noswap
15
17. File
Access
Time
• Linux
tracks
access
:me
• writes
to
disk
even
if
all
you
did
was
read
• Problem
• more
disk
seeks
• HDFS
is
write-‐once,
read-‐many
• NameNode
tracks
access
informa:on
for
HDFS
17
18. Don’t
Track
Access
Time
• Mount
volumes
with
noatime
op:on
• In
/etc/fstab:
/dev/sdc
/data01
ext3
defaults,noatime
0
• Note:
noatime
assumes
nodirtime
as
well
• What
about
relatime?
• Faster
than
atime
but
slower
than
noatime
• No
reboot
required
• #
mount
-‐o
remount
/data01
18
20. Root
Reserved
Space
• EXT3/4
reserve
5%
of
disk
for
root-‐owned
files
• On
an
OS
disk,
sure
• System
logs,
kernel
panics,
etc
20
21. Click
to
edit
Master
:tle
style
CC
BY
2.0
/
Alex
Moundalexis
Disks
used
to
be
much
smaller,
right?
22. Do
The
Math
• Conserva:ve
• 5%
of
1
TB
disk
=
46
GB
• 5
data
disks
per
server
=
230
GB
• 5
servers
per
rack
=
1.15
TB
• Quasi-‐Aggressive
• 5%
of
4
TB
disk
=
186
GB
• 12
data
disks
per
server
=
2.23
TB
• 18
servers
per
rack
=
40.1
TB
• That’s
a
LOT
of
unused
storage!
22
23. Root
Reserved
Space
• On
a
Hadoop
data
disk,
no
root-‐owned
files
• When
crea:ng
a
par::on
#
mkfs.ext3
–m
0
/dev/sdc
• On
exis:ng
par::ons
#
tune2fs
-‐m
0
/dev/sdc
• 0
is
safe,
1
is
for
the
ultra-‐paranoid
23
25. Name
Service
Cache
Daemon
• Daemon
that
caches
name
service
requests
• Passwords
• Groups
• Hosts
• Helps
weather
network
hiccups
• Helps
more
with
high
latency
LDAP,
NIS,
NIS+
• Small
footprint
• Zero
configura:on
required
25
26. Name
Service
Cache
Daemon
• Hadoop
nodes
• largely
a
network-‐based
applica:on
• on
the
network
constantly
• issue
lots
of
DNS
lookups,
especially
HBase
&
distcp
• can
thrash
DNS
servers
• Reducing
latency
of
service
requests?
Smart.
• Reducing
impact
on
shared
infrastructure?
Smart.
26
27. Name
Service
Cache
Daemon
• Turn
it
on,
let
it
work,
leave
it
alone:
#
chkconfig
-‐-‐level
345
nscd
on
#
service
nscd
start
• Check
on
it
later:
#
nscd
-‐g
• Unless
using
Red
Hat
SSSD;
modify
ncsd
config
first!
• Don’t
use
nscd
to
cache
passwd,
group,
or
netgroup
• Red
Hat,
Using
NSCD
with
SSSD.
hkp://goo.gl/68HTMQ
27
29. File
Handle
Limits
• Kernel
refers
to
files
via
a
handle
• Also
called
descriptors
• Linux
is
a
mul:-‐user
system
• File
handles
protect
the
system
from
• Poor
coding
• Malicious
users
• Pictures
of
cats
on
the
Internet
29
30. 30
Microsog
Office
EULA.
Really.
java.io.FileNotFoundExcep:on:
(Too
many
open
files)
31. File
Handle
Limits
• Linux
defaults
usually
not
enough
• Increase
maximum
open
files
(default
1024)
#
echo
hdfs
–
nofile
32768
>>
/etc/security/limits.conf
#
echo
mapred
–
nofile
32768
>>
/etc/security/limits.conf
#
echo
hbase
–
nofile
32768
>>
/etc/security/limits.conf
• Bonus:
Increase
maximum
processes
too
#
echo
hdfs
–
nproc
32768
>>
/etc/security/limits.conf
#
echo
mapred
–
nproc
32768
>>
/etc/security/limits.conf
#
echo
hbase
–
nproc
32768
>>
/etc/security/limits.conf
• Note:
Cloudera
Manager
will
do
this
for
you.
31
32. 32
Don’t
be
tempted
to
share,
even
on
monster
disks.
6.
Dedicated
Disk
for
OS
and
Logs
33. The
Situa:on
in
Easy
Steps
1. Your
new
server
has
a
dozen
1
TB
disks
2. Eleven
disks
are
used
to
store
data
3. One
disk
is
used
for
the
OS
• 20
GB
for
the
OS
• 980
GB
sits
unused
4. Someone
asks
“can
we
store
data
there
too?”
5. Seems
reasonable,
lots
of
space…
“OK,
why
not.”
Sound
familiar?
33
34. 34
Microsog
Office
EULA.
Really.
I
don’t
understand
it,
there’s
no
consistency
to
these
run
>mes!
35. No
Love
for
Shared
Disk
• Our
quest
for
data
gets
interrupted
a
lot:
• OS
opera:ons
• OS
logs
• Hadoop
logging,
quite
chaky
• Hadoop
execu:on
• userspace
execu:on
• Disk
seeks
are
slow,
remember?
35
36. Dedicated
Disk
for
OS
and
Logs
• At
install
:me
• Disk
0,
OS
&
logs
• Disk
1-‐n,
Hadoop
data
• Ager
install,
more
complicated
effort,
requires
manual
HDFS
block
rebalancing:
1. Take
down
HDFS
• If
you
can
do
it
in
under
10
minutes,
just
the
DataNode
2. Move
or
distribute
blocks
from
disk0/dir
to
disk[1-‐n]/dir
3. Remove
dir
from
HDFS
config
(dfs.data.dir)
4. Start
HDFS
36
39. Name
Resolu:on
with
Hosts
File
• Set
canonical
names
properly
• Right
10.1.1.1
r01m01.cluster.org
r01m01
master1
10.1.1.2
r01w01.cluster.org
r01w01
worker1
• Wrong
10.1.1.1
r01m01
r01m01.cluster.org
master1
10.1.1.2
r01w01
r01w01.cluster.org
worker1
39
40. Name
Resolu:on
with
Hosts
File
• Set
loopback
address
properly
• Ensure
127.0.0.1
resolves
to
localhost,
NOT
hostname
• Right
127.0.0.1
localhost
• Wrong
127.0.0.1
r01m01
40
41. Name
Resolu:on
with
DNS
• Forward
• Reverse
• Hostname
should
MATCH
the
FQDN
in
DNS
41
43. Name
Resolu:on
Errata
• Mismatches?
Expect
odd
results.
• Problems
star:ng
DataNodes
• Non-‐FQDN
in
Web
UI
links
• Security
features
are
extra
sensi:ve
to
FQDN
• Errors
so
common
that
link
to
FAQ
is
included
in
logs!
• hkp://wiki.apache.org/hadoop/UnknownHost
• Get
name
resolu:on
working
BEFORE
enabling
nscd!
43
45. Summary
1. disable
vm.swappiness
2. data
disks:
mount
with
noatime
op:on
3. data
disks:
disable
root
reserve
space
4. enable
nscd
5. increase
file
handle
limits
6. use
dedicated
OS/logging
disk
7. sane
name
resolu:on
hkp://:ny.cloudera.com/7steps
45
50. Others
Things
to
Check
• Disk
IO
• hdparm
• #
hdparm
-‐Tt
/dev/sdc
• Looking
for
at
least
70
MB/s
from
7200
RPM
disks
• Slower
could
indicate
a
failing
drive,
disk
controller,
array,
etc.
• dd
• hkp://romanrm.ru/en/dd-‐benchmark
50
51. Others
Things
to
Check
• Disable
Red
Hat
Transparent
Huge
Pages
(RH6+
Only)
• Can
reduce
elevated
CPU
usage
• In
rc.local:
echo
never
>
/sys/kernel/mm/redhat_transparent_hugepage/defrag
echo
never
>
/sys/kernel/mm/redhat_transparent_hugepage/enabled
• Reference:
Linux
6
Transparent
Huge
Pages
and
Hadoop
Workloads,
hkp://goo.gl/WSF2qC
51
52. Others
Things
to
Check
• Enable
Jumbo
Frames
• Only
if
your
network
infrastructure
supports
it!
• Can
easily
(and
arguably)
boost
throughput
by
10-‐20%
52
53. Others
Things
to
Check
• Enable
Jumbo
Frames
• Only
if
your
network
infrastructure
supports
it!
• Can
easily
(and
arguably)
boost
throughput
by
10-‐20%
• Monitor
Everything
• How
else
will
you
know
what’s
happening?
• Nagios
• Ganglia
53
54. 54
Thank
You!
Alex
Moundalexis
@technmsg
We’re
hiring,
kids!
Well,
not
kids.