2. Typical Work Writing file in
flow to HDFS
Reading file Rack
Agenda from HDFS Awareness
Planning for a
Q&A
Cluster
3. Client
Masters
HDFS
Map Reduce
{Name Node}
{Job Tracker}
{Secondary Name Node}
Hadoop Server Data Node Data Node Data Node Data Node
Roles Task Tracker Task Tracker Task Tracker Task Tracker
Slaves
Data Node Data Node Data Node Data Node
Task Tracker Task Tracker Task Tracker Task Tracker
5. Write data in to cluster (HDFS)
Analyze the data (Map Reduce)
Sample HDFS Store the result in to cluster (HDFS)
Workflow Read the result from cluster (HDFS)
Sample scenario: How many times customer called to customer care enquiring
about a recently launched product? Compare it against the AD campaign in the
television. Correlate both and find the best time to run the AD
HDFS
CRM Data
entry
SQOOP Map Reduce
Result
Result
6. I want to
File size 200 MB
write file
Hadoop Client Name Node
Write data in Data Node 1 Data Node 2
Ok! Block size is
64 MB. Split the
to cluster file in to 3 and
(HDFS Data Node 3 Data Node 4
write in to node
1,4,5
Data Node 5 Data Node 6
Data node
replicates as
Client write
Client Consults per replication Cycle repeats
data to one
Name node factor and for every block
data node
intimates
Name node
7. Name Node
DN 1 DN 5 DN 9
A C C Rack Aware:
Rack 1:
B DN 2 DN 6 B DN 10 Data node 1
A Data node 2
Rack 2:
DN 3 DN 7 C DN 11
Rack DN 4
A
DN 8 B DN 12
Data node 5
Awareness
Never loose data when a rack is down
Keep bulky flows within Rack when possible
Assumption in rack has higher bandwidth, and low latency
8. File.txt File size 200 MB
Hadoop Client Name Node
A B C
Replicate in 3,8
Multi Block
Replication
DN 1 DN 5 DN 9
A
DN 2 DN 6 DN 10
DN 3 A DN 7 DN 11
DN 4 DN 8 A DN 12
9. Data node sends hearth beats
Every 10th heart beat is Block report
Name node builds meta data from block report
If name node is down, HDFS is down
Name Node
Missing heartbeats signify lost nodes
Name node consults metadata and finds affected
data
Name node consults rack awareness script
Name node tells data node to replicate
10. File System Metadata:
File.txt = A0 {1,5,7}
A1 {1,7,9}
A2{5,10,15}
Primary Name Secondary Name
Node
node
Name Node & It’s been
1 hr, give
Secondary your data
Name node Not a hot standby for the name node* (Zoo keeper)
Connects to name node every one hour* (Configurable)
Housekeeping, backup of Name node meta data
Saved meta data can be used to rebuild name node
11. Primary Name Node Secondary Name Node
edits fsimage
edits fsimage
Understanding edits-
new
Secondary
name node Fsimage.ckpt
Fsimage.ckpt
house keeping
edits
Fsimage.ckpt
12. I want to read file
file.txt
Hadoop Client Name Node
Reading data Data Node 1 Data Node 2 Data Node 7
Ok!
File.txt = blck a
from HDFS A B B
{1,5,6}
Cluster Data Node 3 Data Node 4 Data Node 8
Blck b {8,1,2}
Blck c {5,8,9}
C
B
Data Node 5 Data Node 6 Data Node 9
C A A C
Client
Client Client picks Client reads
receives DN
Consults first node of data
list for each
Name node list sequentially
block
13. Single Point of
Failure
# Task per
node
Dual power
•1 core can run supply for
1.5 Mapper or redundancy
Reducer
Choosing right Master
hardware node
RAM thumb
rule – 1 GB/ No Commodity
Million blocks hardware
of data
Regular Data
backup
15. HDFS clusters at Yahoo! include about 3500 nodes
A typical cluster node has:
· 2 quad core Xeon processors @ 2.5ghz
Practice at · Red Hat Enterprise Linux Server Release 5.1
YAHoo! · Sun Java JDK 1.6.0_13-b03
· 4 directly attached SATA drives (one terabyte each)
· 16G RAM
· 1-gigabit Ethernet
16. 70 percent of the disk space is allocated to HDFS. The remainder is
reserved for the operating system (Red Hat Linux), logs, and space
to spill the output of map tasks. (MapReduce intermediate data are
not stored in HDFS.)
For each cluster, the NameNode and the BackupNode hosts are
Practice at specially provisioned with up to 64GB RAM; application tasks are
never assigned to those hosts.
YAHoo!
In total, a cluster of 3500 nodes has 9.8 PB of storage available as
blocks that are replicated three times yielding a net 3.3 PB of
storage for user applications. As a convenient approximation, one
thousand nodes represent one PB of application storage.
17. Durability of Data
uncorrelated node failures
Replication of data three times is a robust guard against loss of
data due to uncorrelated node failures.
Practice at correlated node failures, the failure of a rack or core switch.
YAHoo! HDFS can tolerate losing a rack switch (each block has a
replica on some other rack).
loss of electrical power to the cluster
a large cluster will lose a handful of blocks during a power-on
restart.
20. Automated failover
plan: Zookeeper, Yahoo’s distributed consensus technology to
build an automated failover solution
Scalability of the NameNode
Future work
Solution: Our near-term solution to scalability is to allow
multiple namespaces (and NameNodes) to share the physical
storage within a cluster.
Drawbacks: The main drawback of multiple independent
namespaces is the cost of managing them.
Yahoo! Adopted Hadoop for internal use at the end of 2006. So the data is a little bit out of date.
Why MapReduce intermediate data are not stored in HDFS? HDFS read and write is expensive.2. Ifintermediate data is lost, computation can be done on the corresponding dataNode again to get intermediate data. It is determinstic.
For row two, failure is no long uncommon, because, for example, if one machine dies in a thousand machines, when ten thousand machines, ten machines may die. This is a lot. So replica factor being two is necessary.