2. Hadoop
Hadoop is an open-source software framework for distributed storage and
distributed processing of large structured, semi-structured, and unstructured data
sets across clusters of commodity servers
3. Google MR
Hadoop
becomes
Apaches top
level project
Yahoo!'s 4000 node cluster followed by Facebook's 2300 node cluster
are the largest clusters
FB
launche
s hive
Nutch:
Doug Cutting &
Mike Cafarella
NDFS &
MR to
N tch
Doug
Cutting
Joins
Yahoo!
Hadoop
Subproject
of Lucene
Spins out of
Nutch
Yahoo!
Fastest
sort of a
TB, 910
nodes, 3.5
mins
Google GFS
2002 2003 2004 2005 2006 2007 2008 2009
NY Times
converts 4
TB of Image
archives
over 100
EC2s
Cloudera
founded
Doug Cutting
joined
cloudera
Fastest sort of a
TB, 62 secs over
1460 nodes.
Petabyte Sort :
hrs: 16.25
Nodes :3658
5. Design of HDFS
Designed for
Very Large Files
Streaming Data Access
Commodity Hardware
Not meant for
Low Latency data access
Lots of Small Files
Multiple Writers, arbitrary file modifications
6. Hadoop Storage: HDFS Architecture
Datanodes, Block Replication, Namenode[FsImage, Edits log]
Block Replication/Data Replication determines how redundant data is stored in hdfs
Replication factor determins the number of copies
2 is store one copy on different rack
3 is store one copy on different rack and one on same rack
Datanodes store the actual data
stored as blocks
the size of blocks can be tuned
default is usually 64 or 128MB
smaller the block size(the more blocks) the more the namenode would have to manage
Namenode manages block locations
stores "metadata"
names nodes are a point of failure
RAM is important here
8. Hadoop 2.x Cluster Architecture
ResourceManager
Master that arbitrates all the available cluster resources
ApplicationMaster
Negotiates resources with the ResourceManager and for working with the
NodeManagers to start the containers.
Is the middleman between NM and RM
Allows for greater scalability
NodeManager
Takes instructions from the ResourceManager and manage resources
available on a single node.
9. Federation
allows for multiple namespaces
separation of namespace and storage
Namespace: manages directories, files and blocks. It supports file system
operations such as creation, modification, deletion and listing of files and
directories.
Block Storage: It supports block-related operations such as creation,
deletion, modification and getting location of the blocks. It also takes care
of replica placement and replication. stores the blocks and provides
read/write access to it.
improve scalability and isolation
without federation namespace does not scale as easily
10. HDFS Federation
Hadoop 1.0
Datanode 1
Namenode
Block Management
NS1
Datanode n
Hadoop 2.0
Block Pool
Datanode 1
NN 1
Pool1
NS1
NN 2
Pool2
NS2
NN n
Pooln
NS n
Datanode 2 Datanode n
Blockstorage
11. HDFS FED Example
Hadoop 2.0
Datanode 1
NN 1
NS1
/user/data/et
l
NN 2
NS2
/user/data/x
ml
NN n
NS n
/home/strea
ming/data/w
eather
Datanode 2 Datanode n
12. HA
Prior to Hadoop 2.0 –
One NameNode for metadata management
Single point of failure
HDFS High Availability –
Two NameNodes in the same cluster
Active NameNode: responsive for all client operations
Standby NameNode: maintain enough state to provide a fast failover
Shared storage
Active NN writes edit log
Standby NN reads edit log and applies to its own namespace
During failover, Standby NN reads all the edits and transitions to Active state
18. Coherency Model
First block is visible to read once more than a block’s worth of data is
written
The current block is the one that’s not always visible to reader
20. DataNode
HDFS
InputSplit
Memory
Buffer
p1 p2
p1 p2 p3
p3 p2
p1
p2
p3
Map 1
DataNode
p1
p1
Reduce
DataNode
p2
p2
Reduce
p1
p2
• Intermediate
map output
files.
• Sorted by key
• Part-m-00000
• Combine()
• Spills data to disk.
• Partitions data.
• Sorts by key
• Map takes <k,v>.
• Applies map() in <k,v>
• Writes the o/p to
mem
Merge.1file/partition
Sort/merge Reduce
Output
HDFS
Output
HDFS
DataNode n
p1 p4
p1 p2 p4
p4
p1
p2
p4
HDFS
InputSplit
Memory
Buffer
Map 1
100
MB
shuffleMap [o/p is sorted by key]
21. MR gotchas
Map takes input splits as key Value pairs
Output from mapper is always sorted but based on Key.
context.write(outKey, outValue);
then result will be sorted based on outKey
Default partition is hashing keys
Reducer reduces a set of intermediate values which share a key to a smaller set
of values.
reduce() function is called for each key
setNumOfReducetasks(0)