SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Lecture: 
The 
Google 
MapReduce 
h#p://research.google.com/archive/mapreduce.html 
10/03/2014 
Romain 
Jaco4n 
romain.jaco8n@orange.fr
Agenda 
• Introduc4on 
• Programming 
model 
• Implementa8on 
• Refinements 
• Performance 
• Experience 
• Conclusions 
2
Introduc8on 
Abstract 
• MapReduce 
is 
a 
programming 
model 
and 
an 
associated 
implementa8on 
for 
processing 
and 
genera8ng 
large 
data 
sets 
• Users 
specify 
a 
Map 
func4on 
that 
processes 
a 
key/value 
pair 
to 
generate 
a 
set 
of 
intermediate 
key/value 
pairs, 
and 
a 
Reduce 
func4on 
that 
merges 
all 
intermediate 
values 
associated 
with 
the 
same 
intermediate 
key 
• Programs 
wri#en 
in 
this 
func8onal 
style 
are 
automa4cally 
parallelized 
and 
executed 
on 
a 
large 
cluster 
of 
commodity 
machines 
• The 
run-­‐8me 
system 
take 
care 
of 
the 
details 
of 
par88oning 
of 
the 
input 
data, 
scheduling 
the 
program’s 
execu8on 
3
Introduc8on 
MapReduce 
• Google 
design 
this 
abstrac8on 
to 
allow 
them 
to 
express 
the 
simple 
computa8ons 
they 
want 
to 
perform 
but 
hides 
the 
messy 
details 
of 
paralleliza8on, 
fault-­‐tolerance, 
data 
distribu8on 
and 
load 
balancing 
in 
a 
library 
• This 
abstrac8on 
is 
inspired 
by 
the 
“map” 
and 
“reduce” 
primi8ves 
present 
in 
LISP 
• The 
major 
contribu4on 
of 
this 
work 
are 
a 
simple 
and 
powerful 
interface 
that 
enables 
automa4c 
paralleliza4on 
and 
distribu4on 
of 
large-­‐scale 
computa4ons 
4
Agenda 
• Introduc8on 
• Programming 
model 
• Implementa8on 
• Refinements 
• Performance 
• Experience 
• Conclusions 
5
Programming 
model 
• The 
computa8on 
takes 
a 
set 
of 
input 
key/value 
pairs, 
and 
produces 
a 
set 
of 
output 
key/value 
pairs 
• The 
user 
of 
the 
MapReduce 
library 
expresses 
the 
computa8on 
as 
two 
func8ons: 
Map 
and 
Reduce 
– Map 
func4on 
takes 
an 
input 
pair 
and 
produces 
a 
set 
of 
intermediate 
key/value 
pairs. 
MapReduce 
library 
groups 
together 
all 
intermediate 
values 
associated 
with 
the 
same 
intermediate 
key 
I 
and 
passes 
them 
to 
the 
Reduce 
func8on 
– Reduce 
func4on 
accepts 
intermediate 
key 
I 
and 
a 
set 
of 
values 
for 
that 
key. 
It 
merges 
together 
these 
values 
to 
form 
a 
possibly 
smaller 
set 
of 
values. 
Typically 
just 
zero 
or 
one 
output 
value 
is 
produced 
per 
Reduce 
invoca8on 
6 
input 
output 
A 
G 
T 
A 
C 
A 
MapReduce 
T 
T 
A 
G 
A 
C 
C 
G 
A 
C 
T 
A 
A 
C 
T 
G 
A 
T 
G, 
4 
C, 
5 
A, 
9 
T, 
6 
A 
G 
T 
A 
C 
A 
T 
T 
A 
G 
A 
C 
C 
G 
A 
C 
T 
A 
A 
C 
T 
G 
A 
T 
G, 
4 
C, 
5 
A, 
9 
T, 
6 
Reduce 
Reduce 
Reduce 
Reduce 
G, 
1 
G, 
1 
G, 
1 
G, 
1 
C, 
1 
C, 
1 
C, 
1 
C, 
1 
C, 
1 
A, 
1 
A, 
1 
A, 
1 
A, 
1 
A, 
1 
A, 
1 
Map 
T, 
1 
T, 
1 
T, 
1 
T, 
1
Programming 
model 
Examples: 
• Distributed 
Grep, 
Count 
of 
URL 
access 
frequency, 
Reverse 
Web-­‐Link 
graph, 
Term-­‐Vector 
per 
Host, 
Inverted 
index, 
Distributed 
sort, 
… 
7 
A 
G 
T 
A 
C 
A 
T 
T 
A 
G 
A 
C 
C 
G 
A 
C 
T 
A 
A 
C 
T 
G 
A 
T 
G, 
4 
C, 
5 
A, 
9 
T, 
6 
Reduce 
Reduce 
Reduce 
Reduce 
G, 
1 
G, 
1 
G, 
1 
G, 
1 
C, 
1 
C, 
1 
C, 
1 
C, 
1 
C, 
1 
A, 
1 
A, 
1 
A, 
1 
A, 
1 
A, 
1 
A, 
1 
A, 
1 
A, 
1 
A, 
1 
Map 
T, 
1 
T, 
1 
T, 
1 
T, 
1 
T, 
1 
T, 
1 
Map (k1,v1) ! à list(k2, v2)! 
Reduce (k2,list(v2)) à list(v2)!
Agenda 
• Introduc8on 
• Programming 
model 
• Implementa4on 
• Refinements 
• Performance 
• Experience 
• Conclusions 
8
Implementa8on 
Large 
clusters 
of 
COTS 
x86 
servers 
connected 
together 
with 
switched 
Ethernet 
• Dual-­‐processor 
x86 
running 
Linux, 
2-­‐4 
GB 
RAM 
• FastEthernet 
NIC 
or 
GigabitEthernet 
NIC 
• Hundreds 
or 
thousands 
of 
machines 
(machines 
failure 
are 
common) 
• Storage 
= 
inexpensive 
IDE 
disks 
(DAS) 
and 
a 
distributed 
file 
system 
(GFS 
= 
Google 
File 
System) 
• Users 
submit 
jobs 
to 
a 
scheduling 
system: 
each 
job 
consist 
of 
a 
set 
of 
tasks, 
and 
is 
mapped 
by 
the 
scheduler 
to 
a 
set 
of 
available 
machines 
within 
the 
cluster 
9
Implementa8on 
Execu4on 
overview 
(from 
the 
User 
point 
of 
view) 
• Map 
invoca8ons 
are 
distributed 
across 
mul8ple 
machines 
by 
par88oning 
the 
input 
data 
into 
a 
set 
of 
M 
splits 
– Number 
of 
splits 
M 
is 
automa4cally 
calculated 
based 
on 
the 
input 
data 
size 
• Reduce 
invoca8ons 
are 
distributed 
by 
par88oning 
the 
intermediate 
key 
space 
into 
R 
pieces 
using 
a 
par88on 
func8on: 
hash(key) 
mod 
R 
– Number 
of 
par88ons 
R 
and 
the 
par44oning 
func4on 
are 
specified 
by 
the 
user 
10 
MAP 
• input 
files 
splits 
in 
5 
pieces: 
M 
= 
5 
REDUCE 
• 2 
par88ons: 
R 
= 
2 
MapReduce 
input 
output 
Let’s 
run 
MapReduce 
with 
R 
= 
2 
on 
those 
input 
files 
with 
my 
C++ 
program 
that 
contain 
my 
custom 
Map 
and 
Reduce 
func8ons 
: 
public: virtual void Map(const MapInput &input) {…} 
public: virtual void Reduce(ReduceInput* input) {…}
Implementa8on 
Execu4on 
overview 
(from 
the 
Cluster 
point 
of 
view) 
• Map 
invoca8ons 
are 
distributed 
across 
mul8ple 
machines 
by 
par88oning 
the 
input 
data 
into 
a 
set 
of 
M 
splits 
– Number 
of 
splits 
M 
is 
automa4cally 
calculated 
based 
on 
the 
total 
input 
data 
size 
• Reduce 
invoca8ons 
are 
distributed 
by 
par88oning 
the 
intermediate 
key 
space 
into 
R 
pieces 
using 
a 
par88on 
func8on: 
hash(key) 
mod 
R 
– Number 
of 
par88ons 
R 
and 
the 
par44oning 
func4on 
are 
specified 
by 
the 
user 
11 
MAP 
• input 
files 
splits 
in 
5 
pieces: 
M 
= 
5 
• 5 
Map 
tasks 
with 
3 
workers 
REDUCE 
• 2 
par88ons: 
R 
= 
2 
• 2 
Reduce 
tasks 
with 
2 
workers
Implementa8on 
12 
Execu4on 
overview 
1) First 
the 
MapReduce 
library 
in 
the 
“user 
program” 
splits 
the 
input 
files 
into 
M 
pieces 
(16 
MB 
to 
64 
MB 
per 
piece, 
controllable 
by 
the 
user 
via 
an 
op8onal 
parameter). 
It 
then 
starts 
up 
many 
copies 
of 
the 
program 
on 
a 
cluster 
of 
machines. 
2) One 
of 
the 
copies 
of 
the 
program 
is 
special: 
the 
Master. 
The 
rest 
are 
workers 
that 
are 
assigned 
work 
by 
the 
master. 
There 
are 
M 
Map 
tasks 
and 
R 
Reduce 
tasks 
to 
assign: 
the 
Master 
pick 
idle 
workers 
and 
assigns 
each 
one 
a 
task. 
3) A 
worker 
who 
is 
assigned 
a 
Map 
task 
reads 
the 
content 
of 
the 
corresponding 
input 
split. 
It 
parses 
key/value 
pairs 
out 
of 
the 
input 
data 
and 
passes 
each 
pair 
to 
the 
user-­‐defined 
Map 
func8on. 
The 
intermediate 
key/value 
pairs 
produced 
by 
the 
Map 
func8on 
are 
buffered 
in 
memory 
4) Periodically, 
the 
buffered 
pairs 
are 
wri#en 
to 
local 
disk, 
par88oned 
into 
R 
regions 
by 
the 
par88oning 
func8on. 
The 
loca8ons 
of 
these 
local 
file 
are 
passed 
back 
to 
the 
master, 
who 
is 
responsible 
for 
forwarding 
these 
loca8ons 
to 
the 
reduce 
workers. 
5) Master 
no8fied 
Reduce 
tasks 
about 
the 
intermediate 
files 
loca8ons. 
Reduce 
worker 
read 
and 
sort 
all 
intermediate 
data 
so 
that 
all 
occurrences 
of 
the 
same 
key 
are 
grouped 
together. 
6) Reduce 
worker 
iterates 
over 
the 
sorted 
intermediate 
data 
and 
for 
each 
unique 
intermediate 
key 
encountered, 
it 
passes 
the 
key 
and 
the 
corresponding 
set 
of 
intermediate 
values 
to 
the 
user’s 
Reduce 
func8on. 
The 
output 
of 
the 
Reduce 
func8on 
is 
appended 
to 
a 
final 
output 
file 
for 
this 
Reduce 
par88on. 
7) When 
all 
Map 
tasks 
and 
Reduce 
tasks 
have 
been 
completed, 
the 
Master 
wakes 
up 
the 
“User 
Program”
Implementa8on 
Master 
Data 
Structures 
(for 
each 
job 
and 
each 
associated 
Map 
and 
Reduce 
sub-­‐tasks) 
• Task 
state 
• Worker 
machine 
iden4ty 
affected 
to 
of 
each 
tasks 
• Loca4ons 
and 
sizes 
of 
R 
intermediate 
file 
regions 
produced 
by 
each 
Map 
task 
– Map 
tasks 
update 
regularly 
the 
Master 
concerning 
intermediate 
file 
size 
loca8ons 
and 
sizes 
– Master 
then 
pushed 
incrementally 
the 
informa8on 
to 
workers 
that 
have 
in-­‐progress 
Reduce 
tasks 
13 
idle 
in-­‐progress 
completed 
Task 
Worker 
(x86 
Linux 
server) 
Map 
task 
for 
split0 
worker000042.dcmoon.internal.google.com! 
Map 
task 
for 
split1 
worker000051.dcmoon.internal.google.com! 
Map 
task 
for 
split2 
Worker000033.dcmoon.internal.google.com! 
Map 
task 
for 
split3 
worker001664.dcmoon.internal.google.com! 
Map 
task 
for 
split4 
worker001789.dcmoon.internal.google.com! 
Reduce 
task 
for 
par88on0 
worker001515.dcmoon.internal.google.com! 
Reduce 
task 
for 
par88on1 
worker001806.dcmoon.internal.google.com!
Implementa8on 
Fault 
Tolerance 
• MapReduce 
library 
tolerate 
machine 
failures 
(= 
worker 
failure) 
gracefully 
• Worker 
failure 
– Master 
“pings” 
every 
worker 
periodically: 
if 
no 
response 
from 
worker 
in 
a 
certain 
amount 
of 
8me, 
then 
Master 
marks 
worker 
as 
failed 
– Any 
Map 
task 
or 
Reduce 
task 
in-­‐progress 
on 
a 
failed 
worker 
is 
reset 
to 
idle 
and 
become 
eligible 
for 
re-­‐scheduling 
– Similarly, 
any 
completed 
Map 
tasks 
that 
is 
reset 
back 
to 
their 
ini8al 
idle 
state 
become 
eligible 
for 
scheduling 
on 
other 
workers 
– When 
a 
Map 
task 
is 
re-­‐scheduling 
aler 
a 
failed 
worker, 
all 
workers 
execu8ng 
associated 
Reduce 
tasks 
are 
no8fied 
of 
the 
re-­‐ 
execu8on. 
14
Implementa8on 
Fault 
Tolerance 
• Master 
failure 
– Master 
write 
periodic 
checkpoints 
of 
Master 
data 
structures 
– If 
Master 
dies, 
a 
new 
copy 
can 
be 
started 
from 
the 
last 
check 
pointed 
state 
– Google 
current 
implementa8on 
(2004) 
aborts 
the 
MapReduce 
computa8on 
if 
the 
Master 
fails 
… 
! 
– Clients 
can 
check 
for 
this 
condi8on 
and 
retry 
the 
MapReduce 
opera8on 
if 
they 
desire 
… 
15
Implementa8on 
Fault 
Tolerance 
• Seman4cs 
in 
the 
presence 
of 
failures 
– If 
Map 
and 
Reduce 
func8ons 
are 
determinis8c 
func8ons 
then 
one 
input 
give 
always 
the 
same 
output 
for 
a 
task: 
as 
a 
consequence 
distributed 
MapReduce 
implementa4on 
always 
produce 
the 
same 
output, 
even 
with 
failed-­‐rescheduled 
tasks 
– MapReduce 
rely 
on 
atomic 
commits 
of 
Map 
and 
Reduce 
task 
outputs 
to 
achieve 
this 
property: 
• Each 
in-­‐progress 
task 
writes 
its 
output 
to 
private 
temporary 
files. 
A 
Reduce 
task 
produces 
one 
such 
file, 
and 
a 
Map 
task 
produces 
R 
such 
files 
(one 
per 
Reduce 
tasks). 
• When 
a 
Map 
task 
completed, 
worker 
sends 
a 
message 
to 
the 
Master 
and 
includes 
the 
names 
of 
the 
R 
temporary 
files 
• When 
a 
Reduce 
task 
completes, 
worker 
atomically 
renames 
its 
temporary 
output 
file 
to 
the 
final 
output 
file 
– If 
the 
Master 
receives 
a 
comple@on 
message 
for 
an 
already 
completed 
Map 
task, 
it 
ignores 
the 
message 
• The 
first 
of 
duplicate 
Map 
tasks 
that 
ends 
gets 
the 
win 
– MapReduce 
rely 
on 
the 
atomic 
renames 
opera@on 
of 
the 
GFS 
to 
guarantee 
that 
a 
final 
Reduce 
output 
file 
contains 
just 
the 
data 
produced 
by 
only 
ONE 
execu@on 
of 
the 
task 
• The 
first 
of 
duplicate 
Reduce 
tasks 
that 
ends 
gets 
the 
win 
16 
See 
backup 
tasks 
later… 
;-­‐)
Implementa8on 
Locality 
• Network 
bandwidth 
is 
a 
rela8vely 
scarce 
resource 
• MapReduce 
conserve 
network 
bandwidth: 
– Master 
try 
to 
choose 
a 
Map 
worker 
that 
is 
also 
the 
GFS 
chunkserver 
responsible 
of 
the 
chunk 
replica 
of 
the 
input 
data 
(then 
Map 
task 
runs 
on 
the 
same 
chunkserver 
that 
stores 
input 
data) 
– Or 
Master 
a#empts 
to 
schedule 
a 
Map 
task 
near 
a 
replica 
(worker 
and 
GFS 
chunkserver 
on 
the 
same 
switch) 
– Map 
worker 
doesn’t 
use 
GFS 
to 
store 
intermediate 
file, 
but 
store 
them 
on 
local 
disks 
17 
Worst 
case 
(different 
switches) 
Second 
choice 
(same 
switch) 
First 
choice 
(no 
network 
impact) 
Map 
task 
Map 
task 
Map 
task 
input 
data 
input 
data 
input 
data
Implementa8on 
Task 
Granularity 
• Ideally 
M 
and 
R 
should 
be 
much 
larger 
than 
the 
number 
of 
worker 
machines 
– Improves 
dynamic 
load-­‐balancing 
– Speeds 
up 
recovery 
when 
a 
worker 
fails 
• Prac8cal 
bounds 
– Master 
must 
make 
O(M 
+ 
R) 
scheduling 
decisions 
– Master 
must 
keep 
O(M 
x 
R) 
state 
in 
memory 
• In 
prac8ce 
(2004) 
– User 
choose 
M 
so 
that 
each 
individual 
task 
is 
roughly 
16 
MB 
to 
64 
MB 
of 
input 
data 
– User 
make 
R 
a 
small 
mul8ple 
of 
the 
number 
of 
worker 
machines 
– Typical 
MapReduce 
computa8ons 
with 
M 
= 
200 
000 
and 
R 
= 
5 
000 
using 
2 
000 
worker 
machines 
18
Implementa8on 
Backup 
Tasks 
• Problem: 
a 
straggler 
is 
a 
machine 
that 
takes 
an 
unusually 
long 
8me 
to 
complete 
one 
of 
the 
last 
few 
Map 
or 
Reduce 
tasks 
in 
the 
computa8on: 
bad 
disk, 
compe88on 
for 
CPU, 
memory, 
local 
disk, 
… 
• Solu4on: 
When 
a 
MapReduce 
opera8on 
is 
close 
to 
comple8on, 
the 
Master 
schedules 
backup 
execu8ons 
of 
the 
remaining 
in-­‐progress 
tasks 
(the 
task 
is 
marked 
as 
completed 
whenever 
either 
the 
primary 
or 
the 
backup 
execu@on 
completes) 
• Mechanism 
tuned 
to 
increase 
computa8onal 
resources 
used 
by 
the 
MapReduce 
opera8on 
by 
no 
more 
than 
a 
few 
1% 
of 
backup 
tasks 
increase 
over 
the 
total 
of 
M 
+ 
R 
tasks 
– Example: 
Backup 
tasks 
threshold 
for 
M 
= 
200 
000 
and 
R 
= 
5 
000 
is 
(200 
000 
+ 
5 
000) 
/ 
100 
= 
2 
050 
• This 
significantly 
reduces 
the 
4me 
to 
complete 
large 
MapReduce 
opera4on 
! 
☺ 
• See 
performance 
benchmark 
later 
… 
;-­‐) 
19
Agenda 
• Introduc8on 
• Programming 
model 
• Implementa8on 
• Refinements 
• Performance 
• Experience 
• Conclusions 
20
Refinements 
Par44on 
func4on 
• Users 
specify 
the 
number 
of 
reduce 
tasks/output 
files 
they 
desire 
(= 
R) 
• Data 
gets 
par88oned 
across 
these 
tasks 
using 
a 
par88on 
func8on 
on 
the 
intermediate 
key 
– Default 
par88oning 
func8on 
is 
hash(key) 
mod 
R 
that 
gives 
fairly 
well-­‐balanced 
par88ons 
• User 
can 
provide 
custom 
par88oning 
func8on 
for 
special 
cases 
– Example: 
Output 
keys 
are 
URLs 
and 
you 
want 
all 
entries 
for 
a 
single 
host 
to 
end 
up 
in 
the 
same 
output 
file 
Ordering 
guarantees 
• Within 
a 
par88on 
the 
intermediate 
key/value 
pair 
are 
processed 
in 
increasing 
key 
order 
– Make 
it 
easy 
to 
to 
generate 
a 
sorted 
output 
file 
per 
par88on 
– Efficient 
random 
access 
lookups 
by 
key 
on 
output 
file 
21
Refinements 
Combiner 
func4on 
• User 
is 
allow 
to 
specify 
an 
op8onal 
combiner 
func8on 
that 
does 
par8al 
merging 
of 
this 
data 
before 
it 
is 
sent 
over 
the 
network 
• Combiner 
func8on 
is 
executed 
on 
each 
Map 
task 
(typically 
the 
same 
code 
is 
used 
to 
implement 
both 
the 
combiner 
and 
the 
reduce 
func8on) 
22 
Input 
files 
64 
MB 
GFS 
chunk 
Input 
files 
64 
MB 
GFS 
chunk 
Standard 
MAP 
task 
MAP 
func4on 
MAP 
task 
with 
Combiner 
func8on 
MAP 
func4on 
Combiner 
func4on 
R 
x 
intermediate 
files 
Romain, 
1 
(on 
local 
disks) 
Romain, 
1 
Romain, 
1 
Romain, 
3 
R 
x 
intermediate 
files 
(on 
local 
disks)
Refinements 
Input 
and 
Output 
types 
• MapReduce 
library 
provides 
support 
for 
reading 
input 
data 
in 
several 
formats 
– “text” 
mode 
input 
type 
treats 
each 
line 
as 
a 
key/value 
pair 
(offset 
in 
the 
file/contents 
of 
the 
line) 
– Sequence 
of 
key/value 
pairs 
sorted 
by 
key 
– … 
• A 
reader 
does 
not 
necessarily 
need 
to 
provide 
data 
read 
from 
a 
file 
– From 
a 
database 
(BigTable, 
Megastore 
?, 
Dremel 
?, 
Spanner 
?, 
…) 
– From 
data 
structured 
mapped 
in 
memory, 
– … 
• MapReduce 
also 
support 
a 
set 
of 
output 
types 
for 
producing 
data 
in 
different 
formats 
• User 
can 
add 
support 
for 
new 
input 
or 
output 
types 
23
Refinements 
Skipping 
Bad 
records 
• Problem: 
bugs 
in 
user 
code 
could 
cause 
the 
Map 
or 
Reduce 
func8ons 
to 
crash 
determinis8cally 
on 
certain 
records 
: 
such 
bugs 
prevent 
a 
MapReduce 
opera8on 
from 
comple8ng 
… 
• Solu4on: 
Some8mes 
it 
is 
acceptable 
to 
ignore 
few 
records 
(sta8s8cal 
analysis 
on 
large 
data 
set) 
There 
is 
an 
op8onal 
mode 
of 
execu8on 
where 
the 
MapReduce 
library 
detects 
which 
records 
cause 
determinis8c 
crashes 
and 
skips 
these 
records 
on 
order 
to 
make 
forward 
progress 
– When 
master 
has 
seen 
more 
than 
one 
failure 
on 
a 
par8cular 
record, 
it 
indicates 
that 
the 
record 
should 
be 
skipped 
when 
it 
issues 
the 
next 
re-­‐execu8on 
of 
the 
corresponding 
Map 
or 
Reduce 
task. 
24
Refinements 
Local 
execu4on 
• Problem: 
debugging 
MapReduce 
problems 
in 
a 
distributed 
system 
can 
be 
tricky 
… 
• Solu4on: 
an 
alterna8ve 
implementa8on 
of 
the 
MapReduce 
library 
that 
sequen8ally 
executes 
all 
of 
the 
work 
for 
a 
MapReduce 
opera8on 
on 
a 
local 
machine 
– To 
help 
facilitate 
debugging, 
profiling, 
and 
small-­‐scale 
tes8ng 
(GDB, 
…) 
– Computa8on 
can 
be 
limited 
to 
a 
par8cular 
map 
tasks 
25
Refinements 
Status 
informa4on 
• Master 
runs 
an 
internal 
HTTP 
server 
and 
exports 
a 
set 
of 
status 
pages 
for 
human 
consump8on 
– Progress 
of 
the 
computa8on 
– How 
many 
tasks 
completed/in 
progress 
– Bytes 
of 
input, 
bytes 
of 
intermediate 
data, 
bytes 
of 
output 
– Processing 
rates 
– … 
• Pages 
also 
contain 
link 
to 
the 
standard 
error 
and 
standard 
output 
files 
generated 
for 
each 
task 
• User 
use 
this 
data 
to 
predict 
how 
long 
the 
computa8on 
will 
take, 
and 
whether 
or 
not 
more 
resources 
should 
be 
added 
to 
the 
computa8on 
• Top 
level 
status 
page 
– Which 
workers 
have 
failed 
– Which 
Map 
and 
Reduce 
tasks 
were 
processing 
when 
they 
failed 
26
Refinements 
Counters 
• MapReduce 
library 
provides 
a 
counter 
facility 
to 
count 
occurences 
of 
various 
events 
– User 
code 
may 
want 
to 
count 
total 
numbers 
of 
words 
processed 
or 
other 
customs 
counters 
• User 
code 
creates 
a 
custom 
named 
counter 
object 
and 
then 
increments 
the 
counter 
appropriately 
in 
the 
Map 
and/or 
Reduce 
func8on 
• Counter 
values 
from 
individual 
worker 
machines 
are 
periodically 
propagated 
to 
the 
master 
(piggybacked 
on 
the 
Heartbeat 
response). 
Master 
aggregates 
the 
counter 
from 
successful 
Map/ 
Reduce 
tasks 
• MapReduce 
default 
counter 
values 
– Number 
of 
input 
key/value 
pairs 
processed 
– Number 
of 
output 
key/value 
pairs 
processed 
– … 
27
Agenda 
• Introduc8on 
• Programming 
model 
• Implementa8on 
• Refinements 
• Performance 
• Experience 
• Conclusions 
28
Performance 
Cluster 
configura4on 
• 1 
800 
machines 
– 2 
x 
2GHz 
Intel 
Xeon 
processors 
with 
Hyper-­‐Threading 
enabled, 
4 
GB 
RAM, 
2 
x 
160 
GB 
IDE 
disks, 
Gigabit 
NIC 
• Terra-­‐Search 
– Searching 
a 
rare 
three-­‐character 
pa#ern 
through 
10^10 
100-­‐byte 
records 
– Input 
split 
into 
64MB 
( 
M 
= 
15 
000 
and 
R 
= 
1 
) 
– Pa#ern 
occurs 
in 
92 
337 
records 
• Terra-­‐Sort 
– Sorts 
10^10 
100-­‐byte 
records 
( 
= 
1 
terabyte 
of 
data 
) 
– M 
= 
15.000 
and 
R 
= 
4.000 
29
Performance 
(Terra-­‐Search) 
Y-­‐axis 
show 
the 
rate 
at 
wich 
input 
data 
is 
scanned 
• Rate 
peaks 
at 
over 
30 
GB/s 
whith 
1 
764 
workers 
assigned 
• En8re 
computa8on 
= 
150 
seconds 
including 
60 
seconds 
of 
startup 
overhead: 
– Worker 
program 
propaga8on 
over 
network 
– GFS 
delay 
to 
open 
1.000 
GFS 
files 
Data 
transfer 
rate 
over 
4me 
30 
MAP 
reading 
phase
Performance 
(Terra-­‐Sort) 
Normal 
execu4on 
No 
backup 
tasks 
200 
tasks 
killed 
31
Performance 
(Terra-­‐Sort) 
Normal 
execu4on 
= 
891 
seconds 
No 
Backup 
tasks 
= 
1 
283 
seconds 
• Aler 
960 
seconds, 
all 
except 
5 
of 
reduce 
tasks 
are 
completed 
• Slow 
machines 
(= 
stragglers) 
don’t 
finish 
un8l 
300 
seconds 
later 
• An 
increase 
of 
44% 
in 
elapsed 
8me 
200 
tasks 
killed 
= 
933 
seconds 
• 200 
tasks 
killed 
out 
of 
1746 
worker 
processes 
(underlying 
scheduler 
immediately 
restarted 
new 
worker 
processes 
on 
these 
machines) 
• An 
increase 
of 
5% 
over 
the 
normal 
execu8on 
8me 
32
Agenda 
• Introduc8on 
• Programming 
model 
• Implementa8on 
• Refinements 
• Performance 
• Experience 
• Conclusions 
33
Experience 
(Feb 
2003 
– 
Aug 
2004) 
Usage 
• Large 
scale 
machine 
learning 
problems 
• Clustering 
for 
Google 
News 
and 
Froogle 
products 
• Extrac8on 
of 
data 
to 
produce 
reports 
of 
popular 
queries 
• Extrac8on 
of 
proper8es 
of 
web 
pages 
• Large-­‐scale 
graph 
computa8ons 
MapReduce 
instances 
over 
4me 
34 
MapReduce 
jobs 
run 
in 
august 
2004
Agenda 
• Introduc8on 
• Programming 
model 
• Implementa8on 
• Refinements 
• Performance 
• Experience 
• Conclusions 
35
Conclusions 
(2004) 
Success 
• MapReduce 
model 
is 
easy 
to 
use 
without 
experience 
with 
distributed 
systems 
• Can 
address 
large 
variety 
of 
problem 
(web 
search, 
sor8ng, 
data 
mining, 
machine 
learning, 
…) 
• Scale 
to 
large 
clusters 
(thousands 
of 
machine) 
Things 
learned 
• Restric8ng 
programming 
model 
make 
it 
easy 
to 
parallelize 
and 
to 
be 
fault-­‐tolerant 
• Network 
bandwidth 
is 
a 
scarce 
resource 
(op8miza8ons 
to 
reduce 
amount 
of 
data 
sent 
across 
the 
network) 
• Redundant 
execu8on 
to 
reduce 
impact 
of 
slow 
machines 
AND 
to 
handle 
failures 
and 
data 
loss 
36
Any 
ques8ons 
37 
Romain 
Jaco4n 
romain.jaco8n@orange.fr

Más contenido relacionado

La actualidad más candente

Spark SQL Bucketing at Facebook
 Spark SQL Bucketing at Facebook Spark SQL Bucketing at Facebook
Spark SQL Bucketing at FacebookDatabricks
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at FacebookDatabricks
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationmysqlops
 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Meghaj Mallick
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performanceSyed Zaid Irshad
 
Dynamic storage allocation techniques in Compiler design
Dynamic storage allocation techniques in Compiler designDynamic storage allocation techniques in Compiler design
Dynamic storage allocation techniques in Compiler designkunjan shah
 
6. Integrity and Security in DBMS
6. Integrity and Security in DBMS6. Integrity and Security in DBMS
6. Integrity and Security in DBMSkoolkampus
 
XPDDS17: Introduction to Intel SGX and SGX Virtualization - Kai Huang, Intel
XPDDS17: Introduction to Intel SGX and SGX Virtualization - Kai Huang, IntelXPDDS17: Introduction to Intel SGX and SGX Virtualization - Kai Huang, Intel
XPDDS17: Introduction to Intel SGX and SGX Virtualization - Kai Huang, IntelThe Linux Foundation
 
Corba concepts & corba architecture
Corba concepts & corba architectureCorba concepts & corba architecture
Corba concepts & corba architecturenupurmakhija1211
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
14. Query Optimization in DBMS
14. Query Optimization in DBMS14. Query Optimization in DBMS
14. Query Optimization in DBMSkoolkampus
 
Data Warehouse Design Considerations
Data Warehouse Design ConsiderationsData Warehouse Design Considerations
Data Warehouse Design ConsiderationsRam Kedem
 
ELK Stack - Kibana操作實務
ELK Stack - Kibana操作實務ELK Stack - Kibana操作實務
ELK Stack - Kibana操作實務Kedy Chang
 
Stanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache SparkStanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache SparkReynold Xin
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysisAmazon Web Services
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Apache Pig: A big data processor
Apache Pig: A big data processorApache Pig: A big data processor
Apache Pig: A big data processorTushar B Kute
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSAmazon Web Services
 

La actualidad más candente (20)

Spark SQL Bucketing at Facebook
 Spark SQL Bucketing at Facebook Spark SQL Bucketing at Facebook
Spark SQL Bucketing at Facebook
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
Percona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimizationPercona Live 2012PPT: MySQL Query optimization
Percona Live 2012PPT: MySQL Query optimization
 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performance
 
Dynamic storage allocation techniques in Compiler design
Dynamic storage allocation techniques in Compiler designDynamic storage allocation techniques in Compiler design
Dynamic storage allocation techniques in Compiler design
 
6. Integrity and Security in DBMS
6. Integrity and Security in DBMS6. Integrity and Security in DBMS
6. Integrity and Security in DBMS
 
XPDDS17: Introduction to Intel SGX and SGX Virtualization - Kai Huang, Intel
XPDDS17: Introduction to Intel SGX and SGX Virtualization - Kai Huang, IntelXPDDS17: Introduction to Intel SGX and SGX Virtualization - Kai Huang, Intel
XPDDS17: Introduction to Intel SGX and SGX Virtualization - Kai Huang, Intel
 
Shared Memory
Shared MemoryShared Memory
Shared Memory
 
Corba concepts & corba architecture
Corba concepts & corba architectureCorba concepts & corba architecture
Corba concepts & corba architecture
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
14. Query Optimization in DBMS
14. Query Optimization in DBMS14. Query Optimization in DBMS
14. Query Optimization in DBMS
 
Data Warehouse Design Considerations
Data Warehouse Design ConsiderationsData Warehouse Design Considerations
Data Warehouse Design Considerations
 
ELK Stack - Kibana操作實務
ELK Stack - Kibana操作實務ELK Stack - Kibana操作實務
ELK Stack - Kibana操作實務
 
Stanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache SparkStanford CS347 Guest Lecture: Apache Spark
Stanford CS347 Guest Lecture: Apache Spark
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Apache Pig: A big data processor
Apache Pig: A big data processorApache Pig: A big data processor
Apache Pig: A big data processor
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 

Similar a Google MapReduce Lecture: Programming Model and Implementation

MapReduce: Ordering and Large-Scale Indexing on Large Clusters
MapReduce: Ordering and  Large-Scale Indexing on Large ClustersMapReduce: Ordering and  Large-Scale Indexing on Large Clusters
MapReduce: Ordering and Large-Scale Indexing on Large ClustersIRJET Journal
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfTSANKARARAO
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationAhmad El Tawil
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce scriptHaripritha
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operationSubhas Kumar Ghosh
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comsoftwarequery
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)anh tuan
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacmlmphuong06
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 

Similar a Google MapReduce Lecture: Programming Model and Implementation (20)

MapReduce: Ordering and Large-Scale Indexing on Large Clusters
MapReduce: Ordering and  Large-Scale Indexing on Large ClustersMapReduce: Ordering and  Large-Scale Indexing on Large Clusters
MapReduce: Ordering and Large-Scale Indexing on Large Clusters
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
MapReduce
MapReduceMapReduce
MapReduce
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
MapReduce
MapReduceMapReduce
MapReduce
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
 
Map reduce
Map reduceMap reduce
Map reduce
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
Mapreduce2008 cacm
Mapreduce2008 cacmMapreduce2008 cacm
Mapreduce2008 cacm
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 

Último

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Google MapReduce Lecture: Programming Model and Implementation

  • 1. Lecture: The Google MapReduce h#p://research.google.com/archive/mapreduce.html 10/03/2014 Romain Jaco4n romain.jaco8n@orange.fr
  • 2. Agenda • Introduc4on • Programming model • Implementa8on • Refinements • Performance • Experience • Conclusions 2
  • 3. Introduc8on Abstract • MapReduce is a programming model and an associated implementa8on for processing and genera8ng large data sets • Users specify a Map func4on that processes a key/value pair to generate a set of intermediate key/value pairs, and a Reduce func4on that merges all intermediate values associated with the same intermediate key • Programs wri#en in this func8onal style are automa4cally parallelized and executed on a large cluster of commodity machines • The run-­‐8me system take care of the details of par88oning of the input data, scheduling the program’s execu8on 3
  • 4. Introduc8on MapReduce • Google design this abstrac8on to allow them to express the simple computa8ons they want to perform but hides the messy details of paralleliza8on, fault-­‐tolerance, data distribu8on and load balancing in a library • This abstrac8on is inspired by the “map” and “reduce” primi8ves present in LISP • The major contribu4on of this work are a simple and powerful interface that enables automa4c paralleliza4on and distribu4on of large-­‐scale computa4ons 4
  • 5. Agenda • Introduc8on • Programming model • Implementa8on • Refinements • Performance • Experience • Conclusions 5
  • 6. Programming model • The computa8on takes a set of input key/value pairs, and produces a set of output key/value pairs • The user of the MapReduce library expresses the computa8on as two func8ons: Map and Reduce – Map func4on takes an input pair and produces a set of intermediate key/value pairs. MapReduce library groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce func8on – Reduce func4on accepts intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values. Typically just zero or one output value is produced per Reduce invoca8on 6 input output A G T A C A MapReduce T T A G A C C G A C T A A C T G A T G, 4 C, 5 A, 9 T, 6 A G T A C A T T A G A C C G A C T A A C T G A T G, 4 C, 5 A, 9 T, 6 Reduce Reduce Reduce Reduce G, 1 G, 1 G, 1 G, 1 C, 1 C, 1 C, 1 C, 1 C, 1 A, 1 A, 1 A, 1 A, 1 A, 1 A, 1 Map T, 1 T, 1 T, 1 T, 1
  • 7. Programming model Examples: • Distributed Grep, Count of URL access frequency, Reverse Web-­‐Link graph, Term-­‐Vector per Host, Inverted index, Distributed sort, … 7 A G T A C A T T A G A C C G A C T A A C T G A T G, 4 C, 5 A, 9 T, 6 Reduce Reduce Reduce Reduce G, 1 G, 1 G, 1 G, 1 C, 1 C, 1 C, 1 C, 1 C, 1 A, 1 A, 1 A, 1 A, 1 A, 1 A, 1 A, 1 A, 1 A, 1 Map T, 1 T, 1 T, 1 T, 1 T, 1 T, 1 Map (k1,v1) ! à list(k2, v2)! Reduce (k2,list(v2)) à list(v2)!
  • 8. Agenda • Introduc8on • Programming model • Implementa4on • Refinements • Performance • Experience • Conclusions 8
  • 9. Implementa8on Large clusters of COTS x86 servers connected together with switched Ethernet • Dual-­‐processor x86 running Linux, 2-­‐4 GB RAM • FastEthernet NIC or GigabitEthernet NIC • Hundreds or thousands of machines (machines failure are common) • Storage = inexpensive IDE disks (DAS) and a distributed file system (GFS = Google File System) • Users submit jobs to a scheduling system: each job consist of a set of tasks, and is mapped by the scheduler to a set of available machines within the cluster 9
  • 10. Implementa8on Execu4on overview (from the User point of view) • Map invoca8ons are distributed across mul8ple machines by par88oning the input data into a set of M splits – Number of splits M is automa4cally calculated based on the input data size • Reduce invoca8ons are distributed by par88oning the intermediate key space into R pieces using a par88on func8on: hash(key) mod R – Number of par88ons R and the par44oning func4on are specified by the user 10 MAP • input files splits in 5 pieces: M = 5 REDUCE • 2 par88ons: R = 2 MapReduce input output Let’s run MapReduce with R = 2 on those input files with my C++ program that contain my custom Map and Reduce func8ons : public: virtual void Map(const MapInput &input) {…} public: virtual void Reduce(ReduceInput* input) {…}
  • 11. Implementa8on Execu4on overview (from the Cluster point of view) • Map invoca8ons are distributed across mul8ple machines by par88oning the input data into a set of M splits – Number of splits M is automa4cally calculated based on the total input data size • Reduce invoca8ons are distributed by par88oning the intermediate key space into R pieces using a par88on func8on: hash(key) mod R – Number of par88ons R and the par44oning func4on are specified by the user 11 MAP • input files splits in 5 pieces: M = 5 • 5 Map tasks with 3 workers REDUCE • 2 par88ons: R = 2 • 2 Reduce tasks with 2 workers
  • 12. Implementa8on 12 Execu4on overview 1) First the MapReduce library in the “user program” splits the input files into M pieces (16 MB to 64 MB per piece, controllable by the user via an op8onal parameter). It then starts up many copies of the program on a cluster of machines. 2) One of the copies of the program is special: the Master. The rest are workers that are assigned work by the master. There are M Map tasks and R Reduce tasks to assign: the Master pick idle workers and assigns each one a task. 3) A worker who is assigned a Map task reads the content of the corresponding input split. It parses key/value pairs out of the input data and passes each pair to the user-­‐defined Map func8on. The intermediate key/value pairs produced by the Map func8on are buffered in memory 4) Periodically, the buffered pairs are wri#en to local disk, par88oned into R regions by the par88oning func8on. The loca8ons of these local file are passed back to the master, who is responsible for forwarding these loca8ons to the reduce workers. 5) Master no8fied Reduce tasks about the intermediate files loca8ons. Reduce worker read and sort all intermediate data so that all occurrences of the same key are grouped together. 6) Reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered, it passes the key and the corresponding set of intermediate values to the user’s Reduce func8on. The output of the Reduce func8on is appended to a final output file for this Reduce par88on. 7) When all Map tasks and Reduce tasks have been completed, the Master wakes up the “User Program”
  • 13. Implementa8on Master Data Structures (for each job and each associated Map and Reduce sub-­‐tasks) • Task state • Worker machine iden4ty affected to of each tasks • Loca4ons and sizes of R intermediate file regions produced by each Map task – Map tasks update regularly the Master concerning intermediate file size loca8ons and sizes – Master then pushed incrementally the informa8on to workers that have in-­‐progress Reduce tasks 13 idle in-­‐progress completed Task Worker (x86 Linux server) Map task for split0 worker000042.dcmoon.internal.google.com! Map task for split1 worker000051.dcmoon.internal.google.com! Map task for split2 Worker000033.dcmoon.internal.google.com! Map task for split3 worker001664.dcmoon.internal.google.com! Map task for split4 worker001789.dcmoon.internal.google.com! Reduce task for par88on0 worker001515.dcmoon.internal.google.com! Reduce task for par88on1 worker001806.dcmoon.internal.google.com!
  • 14. Implementa8on Fault Tolerance • MapReduce library tolerate machine failures (= worker failure) gracefully • Worker failure – Master “pings” every worker periodically: if no response from worker in a certain amount of 8me, then Master marks worker as failed – Any Map task or Reduce task in-­‐progress on a failed worker is reset to idle and become eligible for re-­‐scheduling – Similarly, any completed Map tasks that is reset back to their ini8al idle state become eligible for scheduling on other workers – When a Map task is re-­‐scheduling aler a failed worker, all workers execu8ng associated Reduce tasks are no8fied of the re-­‐ execu8on. 14
  • 15. Implementa8on Fault Tolerance • Master failure – Master write periodic checkpoints of Master data structures – If Master dies, a new copy can be started from the last check pointed state – Google current implementa8on (2004) aborts the MapReduce computa8on if the Master fails … ! – Clients can check for this condi8on and retry the MapReduce opera8on if they desire … 15
  • 16. Implementa8on Fault Tolerance • Seman4cs in the presence of failures – If Map and Reduce func8ons are determinis8c func8ons then one input give always the same output for a task: as a consequence distributed MapReduce implementa4on always produce the same output, even with failed-­‐rescheduled tasks – MapReduce rely on atomic commits of Map and Reduce task outputs to achieve this property: • Each in-­‐progress task writes its output to private temporary files. A Reduce task produces one such file, and a Map task produces R such files (one per Reduce tasks). • When a Map task completed, worker sends a message to the Master and includes the names of the R temporary files • When a Reduce task completes, worker atomically renames its temporary output file to the final output file – If the Master receives a comple@on message for an already completed Map task, it ignores the message • The first of duplicate Map tasks that ends gets the win – MapReduce rely on the atomic renames opera@on of the GFS to guarantee that a final Reduce output file contains just the data produced by only ONE execu@on of the task • The first of duplicate Reduce tasks that ends gets the win 16 See backup tasks later… ;-­‐)
  • 17. Implementa8on Locality • Network bandwidth is a rela8vely scarce resource • MapReduce conserve network bandwidth: – Master try to choose a Map worker that is also the GFS chunkserver responsible of the chunk replica of the input data (then Map task runs on the same chunkserver that stores input data) – Or Master a#empts to schedule a Map task near a replica (worker and GFS chunkserver on the same switch) – Map worker doesn’t use GFS to store intermediate file, but store them on local disks 17 Worst case (different switches) Second choice (same switch) First choice (no network impact) Map task Map task Map task input data input data input data
  • 18. Implementa8on Task Granularity • Ideally M and R should be much larger than the number of worker machines – Improves dynamic load-­‐balancing – Speeds up recovery when a worker fails • Prac8cal bounds – Master must make O(M + R) scheduling decisions – Master must keep O(M x R) state in memory • In prac8ce (2004) – User choose M so that each individual task is roughly 16 MB to 64 MB of input data – User make R a small mul8ple of the number of worker machines – Typical MapReduce computa8ons with M = 200 000 and R = 5 000 using 2 000 worker machines 18
  • 19. Implementa8on Backup Tasks • Problem: a straggler is a machine that takes an unusually long 8me to complete one of the last few Map or Reduce tasks in the computa8on: bad disk, compe88on for CPU, memory, local disk, … • Solu4on: When a MapReduce opera8on is close to comple8on, the Master schedules backup execu8ons of the remaining in-­‐progress tasks (the task is marked as completed whenever either the primary or the backup execu@on completes) • Mechanism tuned to increase computa8onal resources used by the MapReduce opera8on by no more than a few 1% of backup tasks increase over the total of M + R tasks – Example: Backup tasks threshold for M = 200 000 and R = 5 000 is (200 000 + 5 000) / 100 = 2 050 • This significantly reduces the 4me to complete large MapReduce opera4on ! ☺ • See performance benchmark later … ;-­‐) 19
  • 20. Agenda • Introduc8on • Programming model • Implementa8on • Refinements • Performance • Experience • Conclusions 20
  • 21. Refinements Par44on func4on • Users specify the number of reduce tasks/output files they desire (= R) • Data gets par88oned across these tasks using a par88on func8on on the intermediate key – Default par88oning func8on is hash(key) mod R that gives fairly well-­‐balanced par88ons • User can provide custom par88oning func8on for special cases – Example: Output keys are URLs and you want all entries for a single host to end up in the same output file Ordering guarantees • Within a par88on the intermediate key/value pair are processed in increasing key order – Make it easy to to generate a sorted output file per par88on – Efficient random access lookups by key on output file 21
  • 22. Refinements Combiner func4on • User is allow to specify an op8onal combiner func8on that does par8al merging of this data before it is sent over the network • Combiner func8on is executed on each Map task (typically the same code is used to implement both the combiner and the reduce func8on) 22 Input files 64 MB GFS chunk Input files 64 MB GFS chunk Standard MAP task MAP func4on MAP task with Combiner func8on MAP func4on Combiner func4on R x intermediate files Romain, 1 (on local disks) Romain, 1 Romain, 1 Romain, 3 R x intermediate files (on local disks)
  • 23. Refinements Input and Output types • MapReduce library provides support for reading input data in several formats – “text” mode input type treats each line as a key/value pair (offset in the file/contents of the line) – Sequence of key/value pairs sorted by key – … • A reader does not necessarily need to provide data read from a file – From a database (BigTable, Megastore ?, Dremel ?, Spanner ?, …) – From data structured mapped in memory, – … • MapReduce also support a set of output types for producing data in different formats • User can add support for new input or output types 23
  • 24. Refinements Skipping Bad records • Problem: bugs in user code could cause the Map or Reduce func8ons to crash determinis8cally on certain records : such bugs prevent a MapReduce opera8on from comple8ng … • Solu4on: Some8mes it is acceptable to ignore few records (sta8s8cal analysis on large data set) There is an op8onal mode of execu8on where the MapReduce library detects which records cause determinis8c crashes and skips these records on order to make forward progress – When master has seen more than one failure on a par8cular record, it indicates that the record should be skipped when it issues the next re-­‐execu8on of the corresponding Map or Reduce task. 24
  • 25. Refinements Local execu4on • Problem: debugging MapReduce problems in a distributed system can be tricky … • Solu4on: an alterna8ve implementa8on of the MapReduce library that sequen8ally executes all of the work for a MapReduce opera8on on a local machine – To help facilitate debugging, profiling, and small-­‐scale tes8ng (GDB, …) – Computa8on can be limited to a par8cular map tasks 25
  • 26. Refinements Status informa4on • Master runs an internal HTTP server and exports a set of status pages for human consump8on – Progress of the computa8on – How many tasks completed/in progress – Bytes of input, bytes of intermediate data, bytes of output – Processing rates – … • Pages also contain link to the standard error and standard output files generated for each task • User use this data to predict how long the computa8on will take, and whether or not more resources should be added to the computa8on • Top level status page – Which workers have failed – Which Map and Reduce tasks were processing when they failed 26
  • 27. Refinements Counters • MapReduce library provides a counter facility to count occurences of various events – User code may want to count total numbers of words processed or other customs counters • User code creates a custom named counter object and then increments the counter appropriately in the Map and/or Reduce func8on • Counter values from individual worker machines are periodically propagated to the master (piggybacked on the Heartbeat response). Master aggregates the counter from successful Map/ Reduce tasks • MapReduce default counter values – Number of input key/value pairs processed – Number of output key/value pairs processed – … 27
  • 28. Agenda • Introduc8on • Programming model • Implementa8on • Refinements • Performance • Experience • Conclusions 28
  • 29. Performance Cluster configura4on • 1 800 machines – 2 x 2GHz Intel Xeon processors with Hyper-­‐Threading enabled, 4 GB RAM, 2 x 160 GB IDE disks, Gigabit NIC • Terra-­‐Search – Searching a rare three-­‐character pa#ern through 10^10 100-­‐byte records – Input split into 64MB ( M = 15 000 and R = 1 ) – Pa#ern occurs in 92 337 records • Terra-­‐Sort – Sorts 10^10 100-­‐byte records ( = 1 terabyte of data ) – M = 15.000 and R = 4.000 29
  • 30. Performance (Terra-­‐Search) Y-­‐axis show the rate at wich input data is scanned • Rate peaks at over 30 GB/s whith 1 764 workers assigned • En8re computa8on = 150 seconds including 60 seconds of startup overhead: – Worker program propaga8on over network – GFS delay to open 1.000 GFS files Data transfer rate over 4me 30 MAP reading phase
  • 31. Performance (Terra-­‐Sort) Normal execu4on No backup tasks 200 tasks killed 31
  • 32. Performance (Terra-­‐Sort) Normal execu4on = 891 seconds No Backup tasks = 1 283 seconds • Aler 960 seconds, all except 5 of reduce tasks are completed • Slow machines (= stragglers) don’t finish un8l 300 seconds later • An increase of 44% in elapsed 8me 200 tasks killed = 933 seconds • 200 tasks killed out of 1746 worker processes (underlying scheduler immediately restarted new worker processes on these machines) • An increase of 5% over the normal execu8on 8me 32
  • 33. Agenda • Introduc8on • Programming model • Implementa8on • Refinements • Performance • Experience • Conclusions 33
  • 34. Experience (Feb 2003 – Aug 2004) Usage • Large scale machine learning problems • Clustering for Google News and Froogle products • Extrac8on of data to produce reports of popular queries • Extrac8on of proper8es of web pages • Large-­‐scale graph computa8ons MapReduce instances over 4me 34 MapReduce jobs run in august 2004
  • 35. Agenda • Introduc8on • Programming model • Implementa8on • Refinements • Performance • Experience • Conclusions 35
  • 36. Conclusions (2004) Success • MapReduce model is easy to use without experience with distributed systems • Can address large variety of problem (web search, sor8ng, data mining, machine learning, …) • Scale to large clusters (thousands of machine) Things learned • Restric8ng programming model make it easy to parallelize and to be fault-­‐tolerant • Network bandwidth is a scarce resource (op8miza8ons to reduce amount of data sent across the network) • Redundant execu8on to reduce impact of slow machines AND to handle failures and data loss 36
  • 37. Any ques8ons 37 Romain Jaco4n romain.jaco8n@orange.fr