2. About
me
• Apache
FoundaAon
– Hadoop
CommiDer
and
PMC
member
– Hadoop
MR
contributor
~
4
years
– Author
of
Hadoop
Nextgen
core
• Head
of
Technology
PlaKorms
@InMobi
– Formerly
Architect
@Yahoo!
4. Current
LimitaAons
• Scalability
– Maximum
Cluster
size
–
4,000
nodes
– Maximum
concurrent
tasks
–
40,000
– Coarse
synchronizaAon
in
JobTracker
• Single
point
of
failure
– Failure
kills
all
queued
and
running
jobs
– Jobs
need
to
be
re-‐submiDed
by
users
• Restart
is
very
tricky
due
to
complex
state
• Hard
parAAon
of
resources
into
map
and
reduce
slots
5. Current
LimitaAons
• Lacks
support
for
alternate
paradigms
– IteraAve
applicaAons
implemented
using
Map-‐
Reduce
are
10x
slower.
– Example:
K-‐Means,
PageRank
• Lack
of
wire-‐compaAble
protocols
– Client
and
cluster
must
be
of
same
version
– ApplicaAons
and
workflows
cannot
migrate
to
different
clusters
6. Next
GeneraAon
Map-‐Reduce
Requirements
• Reliability
• Availability
• Scalability
-‐
Clusters
of
6,000
machines
– Each
machine
with
16
cores,
48G
RAM,
24TB
disks
– 100,000
concurrent
tasks
– 10,000
concurrent
jobs
• Wire
CompaAbility
• Agility
&
EvoluAon
–
Ability
for
customers
to
control
upgrades
to
the
grid
sodware
stack.
7. Next
GeneraAon
Map-‐Reduce
Architecture
• Split
up
the
two
major
funcAons
of
JobTracker
– Cluster
resource
management
– ApplicaAon
life-‐cycle
management
• Map-‐Reduce
becomes
user-‐land
library
9. Architecture
• Resource
Manager
– Global
resource
scheduler
– Hierarchical
queues
• Node
Manager
– Per-‐machine
agent
– Manages
the
life-‐cycle
of
container
– Container
resource
monitoring
• ApplicaAon
Master
– Per-‐applicaAon
– Manages
applicaAon
scheduling
and
task
execuAon
– E.g.
Map-‐Reduce
ApplicaAon
Master
10. Improvements
vis-‐à-‐vis
current
Map-‐
Reduce
• Scalability
– ApplicaAon
life-‐cycle
management
is
very
expensive
– ParAAon
resource
management
and
applicaAon
life-‐cycle
management
– ApplicaAon
management
is
distributed
– Hardware
trends
-‐
Currently
run
clusters
of
4,000
machines
• 6,000
2012
machines
>
12,000
2009
machines
• <8
cores,
16G,
4TB>
v/s
<16+
cores,
48/96G,
24TB>
11. Improvements
vis-‐à-‐vis
current
Map-‐
Reduce
• Availability
– ApplicaAon
Master
• OpAonal
failover
via
applicaAon-‐specific
checkpoint
• Map-‐Reduce
applicaAons
pick
up
where
they
led
off
– Resource
Manager
• No
single
point
of
failure
-‐
failover
via
ZooKeeper
• ApplicaAon
Masters
are
restarted
automaAcally
12. Improvements
vis-‐à-‐vis
current
Map-‐
Reduce
• Wire
CompaAbility
– Protocols
are
wire-‐compaAble
– Old
clients
can
talk
to
new
servers
– Rolling
upgrades
13. Improvements
vis-‐à-‐vis
current
Map-‐
Reduce
• Agility
/
EvoluAon
– Map-‐Reduce
now
becomes
a
user-‐land
library
– MulAple
versions
of
Map-‐Reduce
can
run
in
the
same
cluster
(ala
Apache
Pig)
• Faster
deployment
cycles
for
improvements
– Customers
upgrade
Map-‐Reduce
versions
on
their
schedule
14. Improvements
vis-‐à-‐vis
current
Map-‐
Reduce
• UAlizaAon
– Generic
resource
model
• Memory
• CPU
• Disk
b/w
• Network
b/w
– Remove
fixed
parAAon
of
map
and
reduce
slots
15. Improvements
vis-‐à-‐vis
current
Map-‐
Reduce
• Support
for
programming
paradigms
other
than
Map-‐Reduce
– MPI
– Master-‐Worker
– Machine
Learning
– IteraAve
processing
– Enabled
by
allowing
use
of
paradigm-‐
specific
ApplicaAon
Master
– Run
all
on
the
same
Hadoop
cluster
16. Summary
• The
next
generaAon
of
Map-‐Reduce
takes
Hadoop
to
the
next
level
– Scale-‐out
even
further
– High
availability
– Cluster
UAlizaAon
– Support
for
paradigms
other
than
Map-‐Reduce
17. Status
• Apache
Hadoop
0.23
release
is
out
– HDFS
FederaAon
– MRv2
• Currently
undergoing
tests
on
Small
scale
~
500
nodes
• Alpha
– 2000
nodes
– Q1
2012
• Beta/ProducAon
– Variety
of
applicaAons
and
loads
– 4000+
nodes
– Q2
2012