Map Reduce

Why do we need Grid Storage-->Single disk can not host all the data Computation-->Single cpu can not provide all the computing needs Parallel jobs--> Serial execution is no more viable option

What we expect from a framework Distributed storage Job specification platform Job spliting/merging Job execution and monitoring

Basic attributes expected ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Hadoop Core ,[object Object],[object Object],[object Object],[object Object]

HDFS attributes Distributed, Reliable,Scalable Optimized for streaming reads, very large data sets Assumes write once read several times No local caching possible due to large files and streaming reads High data replication Fit logically with mapreduce Synchronized access to metadata--> namenode Metadata (Edit log, FSI image) stored in namenode local os file system.

HDFS Copied from HDFS design document

Mapreduce framework attributes Fair isolation--> easy synchronization and fail over ...

Mapreduce Copied from yahoo tutorial

Fault tolerant goal ,[object Object],[object Object],[object Object],[object Object],[object Object]

Fault tolerant goal contd.. ,[object Object],[object Object],[object Object],[object Object],[object Object]

Resource management goal ,[object Object],[object Object],[object Object],[object Object]

Resource management goal contd.. ,[object Object],[object Object],[object Object],[object Object]

Scalability goal Flat scalability--> addition and removal of a node is fairly straight forward

Sub projects Zoo keeper for small shared information (useful for synchronization, lock, leader selection and so many sharing problems in distributed systems). Hbase for semi structured data (provides implementation of google big table design) Hive for ad hoc query analysis (currently supports insertion in multiple tables, group by, multiple table selection and order by is under construction) Avro for data serialization applicable to map reduce

Map Reduce

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Map Reduce

Similar to Map Reduce (20)

Recently uploaded

Recently uploaded (20)

Map Reduce