17. 17
パフォーマンスもあきらめてしまうのか?
SAN/NAS
data
data
data
data
data
data
daa
data
data
data
data
data
funcCon
RDBMS
従来のアーキテクチャ
data
funcCon
data
funcCon
data
funcCon
data
funcCon
data
funcCon
data
funcCon
data
funcCon
data
funcCon
data
funcCon
data
funcCon
data
funcCon
data
funcCon
Hadoop
funcCon
App
funcCon
App
funcCon
App
地理的にも分散?
22. 22
§ Mean
Time
To
Data
Loss
(MTTDL)
is
far
beeer
– Improves
as
cluster
size
increases
§ Does
not
require
idle
spare
drives
– Rest
of
cluster
has
sufficient
spare
capacity
to
absorb
one
node’s
data
– On
a
100-‐node
cluster,
1
node’s
data
==
1%
of
cluster
capacity
§ UNlizes
all
resources
–
no
wasted
“master-‐slave”
nodes
–
no
wasted
idle
spare
drives
…
all
spindles
put
to
use
§ Beeer
reliability
with
less
resources
– on
commodity
hardware!
MapR
Reliability
25. 25
コンテナのバランシング
l As
data
size
increases,
writes
spread
more
(like
dropping
a
pebble
in
a
pond)
l Larger
pebbles
spread
the
ripples
farther
l Space
balanced
by
moving
idle
containers
• Servers
keep
a
bunch
of
containers
"ready
to
go”
• Writes
get
distributed
around
the
cluster
28. 28
§ Re-‐sync
traffic
is
“secondary”
§ Each
node
conNnuously
measures
RTT
to
all
its
peers
§ More
throele
to
slower
peers
– Idle
system
runs
at
full
speed
§ All
automaNcally
MapR
Does
AutomaCc
Re-‐sync
ThroSling
33. 33
E
F
E
F
E
F
NameNode
NameNode
NameNode
MapRのNo
NameNode
HATM
アーキテクチャ
HDFS
FederaNon
MapR
(分散メタデータ)
•
複数の単一障害点
•
5,000万-‐2億ファイルが上限
•
性能ボトルネック
•
商用
NAS
が必要
•
自動フォールオーバーによる HA
•
迅速なクラスタ再起動
•
1兆ファイル以上に対応
(5,000倍以上)
•
10-‐20倍の性能
•
100%コモディティハードウェア
NAS
appliance
NameNode
NameNode
NameNode
A
B
C
D
E
F
DataNode
DataNode
DataNode
DataNode
DataNode
DataNode
A
F
C
D
E
D
B
C
E
B
C
F
B
F
A
B
A
D
E