In the ppt i have explained the basic difference between the hadoop architectures.
hadoop architecture 1 and hadoop architecture 2
i have taken the reference from the website for the preperation.
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Basic Hadoop Architecture V1 vs V2
1. Introduction
Open source framework from Apache Software
Foundation –java programming language
Google published – GFS in Oct 2003 and
MapReduce Algorithm in Dec 2004.
Google’s proprietary distributed Filesystem to
store and manage data efficiently and reliably
on commodity software.
MapReduce is a parallel and distributed
programming model used for processing and
generating large datasets.
10 December 2011
2. Advantage
• Open source – free license
• Highly Availability – replication technique
• Highly Scalability – store and distribute huge
data
• Better performance – distributes to different
nodes and perform task parallel it can process
PB(Peta Bytes)
• It handles huge and varied types of data-
parallel computing technique
• Very flexible –we can integrate new data source
• It solves complex problems
3. Application
• Recommendation systems
• Processing very big data
• Processing Diversity of data
• Best to process the data when it is at
rest
• Log processing
4. Limitation
• Not suitable for small data sets
• Not suitable for executing comples
queries
• Bit tough to process the data when it is at
motion
7. HADOOP V1
Fsimage - file stored on os
- contains complete
directory structure of HDFS
Logfile - file that records either
events in an operating system or
other software(software type).
8. limitations
• Batch processing of huge amount of data
• Not suitable for Real-time Data
processing
• Not suitable for Data streaming
• It supports upto 4000 nodes per cluster
11. Difference between 1.x and 2.x
Hadoop 1.x Hadoop 2.x
It manages only one name space It manages multiple name space
It supports one and only programming model
(ie) MapReduce
It supports multiple programming models with YARN
component like MapReduce, Streaming, Graph etc.
It has lot of limitations in Scalability
It has overcome the limitations with new
architecture
It does not have multi-tenancy support It has multi-tenancy support
It uses fixed size Slots mechanism for storage
purpose
It uses variable-sized Containers
It supports maximum 4K nodes per cluster It supports more then 10K nodes per cluster