SlideShare una empresa de Scribd logo
1 de 30
Sunderdeep Engineering College
Department of Computer Science
Session-2017-18
Topic:-
Submitted to Submitted by
Mr.Ashutosh Rao Kamran Khan
H.O.D. (CSE) Dept. B.tech IIIrd Year
Contents
 Introduction
 What’s Big Data?
 3’V of Big Data
 Problem & Solution
 What’s Hadoop?
 HDFS
 MapReduce
 Architecture of Hadoop
 Applications of Hadoop
 Pros & Cons of Hadoop
 Conclusion
 Refrences
Introduction
 Apache Hadoop is an open source, Scalable, and Fault
tolerant framework written in Java. It efficiently processes large
volumes of data (BIG DATA) on a cluster of commodity hardware.
Hadoop is not only a storage system but is a platform for large
data storage as well as processing.
 Created by Doug Cutting, Mike Cafarella in 2005.
 Doug named it after his son's toy elephant
 Now Apache Hadoop is a registered trademark of the Apache
Software Foundation.
What’s the Problem?
What is Big Data?
Data which are very large in size is called Big
Data. Normally we work on data of size
MB(Word ,Excel) or maximum GB(Movies,
Codes) but data in Peta bytes i.e. 10^15 byte
size is called Big Data. It is stated that almost
90% of today's data has been generated in
the past 5 years.
3V's of Big Data
 Velocity: The data is increasing at a very fast rate. It is
estimated that the volume of data will double in every 2
years.
 Variety: Now a days data are not stored in rows and column.
Data is structured as well as unstructured. Log file, CCTV
footage is unstructured data. Data which can be saved in
tables are structured data like the transaction data of the
bank.
 Volume: The amount of data which we deal with is of very
large size of Peta bytes.
So what is the problem??
Processing that large data is very
difficult in relational database.
It would take too much time to process
data and cost.
Traditional Approach
 In this approach, an enterprise will have a computer to store and process
big data. Here data will be stored in an RDBMS like Oracle Database, MS
SQL Server or DB2 and sophisticated softwares can be written to interact
with the database, process the required data and present it to the users
for analysis purpose.
 This approach works well where we have less volume of data that can be
accommodated by standard database servers, or up to the limit of the
processor which is processing the data. But when it comes to dealing
with huge amounts of data, it is really a tedious task to process such data
through a traditional database server.
‘s Solution!!
 Google solved this problem using an algorithm called MapReduce.
This algorithm divides the task into small parts and assigns those
parts to many computers connected over the network, and collects
the results to form the final result dataset.
Solution!!
What is Hadoop?
The Apache Hadoop software library is a framework
that allows for the distributed processing of large
data sets across clusters of computers using simple
programming models.
 It is made by apache software foundation in 2011.
 Written in JAVA.
Hadoop is open source software.
Framework
Massive Storage
Processing Power
We can solve this problem by Distributed
Computing.
But the problems in distributed computing is –
 Hardware failure
Chances of hardware failure is always there.
 Combine the data after analysis
Data from all disks have to be combined from all the disks which is a mess.
To Solve all the Problems Hadoop Came.
It has two main parts –
 Hadoop Distributed File System (HDFS),
 Data Processing Framework & MapReduce
Hadoop Distributed File System
 It ties so many small and reasonable priced machines
together into a single cost effective computer cluster.
 Data and application processing are protected against
hardware failure.
 If a node goes down, jobs are automatically redirected to
other nodes to make sure the distributed computing does
not fail.
 It automatically stores multiple copies of all data.
 It provides simplified programming model which allows user
to quickly read and write the distributed system.
HDFS Architecture
 NameNode in HDFS Architecture is also known as Master node. HDFS Namenode
stores meta-data i.e. number of data block, replicas and other details. This meta-data is
available in memory in the master for faster retrieval of data. NameNode maintains and
manages the slave nodes, and assigns tasks to them. It should deploy on reliable
hardware as it is the centerpiece of HDFS.
 DataNode in HDFS Architecture is also known as Slave. In Hadoop HDFS Architecture,
DataNode stores actual data in HDFS. It performs read and write operation as per the
request of the client. DataNodes can deploy on commodity hardware.
 In HDFS, when NameNode starts, first it reads HDFS state from an image file, FsImage.
After that, it applies edits from the edits log file. NameNode then writes new HDFS state
to the FsImage. Then it starts normal operation with an empty edits file. At the time of
start-up, NameNode merges FsImage and edits files, so the edit log file could get very
large over time. A side effect of a larger edits file is that next restart of Namenode takes
longer.
 Secondary Namenode solves this issue. Secondary NameNode downloads the FsImage
and EditLogs from the NameNode. And then merges EditLogs with the FsImage
(FileSystem Image). It keeps edits log size within a limit. It stores the modified FsImage
into persistent storage. And we can use it in the case of NameNode failure.
 Secondary NameNode performs a regular checkpoint in HDFS.

 The JobTracker is the service within Hadoop that farms out MapReduce tasks to specific
nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack.
 Client applications submit jobs to the Job tracker.
 The JobTracker talks to the NameNode to determine the location of the data
 The JobTracker locates TaskTracker nodes with available slots at or near the data
 The JobTracker submits the work to the chosen TaskTracker nodes.
 The TaskTracker nodes are monitored. If they do not submit heartbeat signals often
enough, they are deemed to have failed and the work is scheduled on a
different TaskTracker.
 A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to
do then: it may resubmit the job elsewhere, it may mark that specific record as something
to avoid, and it may may even blacklist the TaskTracker as unreliable.
 When the work is completed, the JobTracker updates its status.
 Client applications can poll the JobTracker for information.
 The JobTracker is a point of failure for the Hadoop MapReduce service. If it goes down, all
running jobs are halted.
MapReduce
MapReduce is a programming model for processing and
generating large data sets with a parallel, distributed
algorithm on a cluster.
It is an associative implementation for processing and
generating large data sets.
MAP function that process a key pair to generates a set
of intermediate key pairs.
REDUCE function that merges all intermediate values
associated with the same intermediate key
Applications
Pros of Hadoop
 Computing power
 Flexibility
 Fault Tolerance
 Low Cost
 Scalability
Cons of Hadoop
 1. Integration with existing systems
 Hadoop is not optimised for ease for use. Installing and integrating with existing
 databases might prove to be difficult, especially since there is no software support
 provided.
 2. Administration and ease of use
 Hadoop requires knowledge of MapReduce, while most data practitioners use SQL. This
 means significant training may be required to administer Hadoop clusters.
 3. Security
 Hadoop lacks the level of security functionality needed for safe enterprise deployment,
 especially if it concerns sensitive data.
Conclusion:
Hadoop has been very effective solution for companies
dealing with the data in petabytes.
It has solved many problems in industry related to
huge data
management and distributed system.
As it is open source, so it is adopted by companies
widely.
References
https://www.knowledgehut.com/blog/bigdata-hadoop/top-
pros-and-cons-of-hadoop
https://data-flair.training/blogs/hadoop-hdfs-architecture/
https://www.dezyre.com/article/hadoop-architecture-
explained-what-it-is-and-why-it-matters/317
https://www.tutorialspoint.com/hadoop/index.htm
https://www.edureka.co/blog/hadoop-tutorial/
THANK YOU!!!

Más contenido relacionado

La actualidad más candente

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1Abbas Maazallahi
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Edureka!
 
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesHadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesDaniel Abadi
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesDavid Tjahjono,MD,MBA(UK)
 

La actualidad más candente (20)

Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
 
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesHadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and Opportunities
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 

Similar a Hadoop by kamran khan

Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystemrohitraj268
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Rupak Roy
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxUttara University
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoopRexRamos9
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 

Similar a Hadoop by kamran khan (20)

Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 
Cppt
CpptCppt
Cppt
 
Cppt
CpptCppt
Cppt
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 

Último

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Último (20)

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 

Hadoop by kamran khan

  • 1. Sunderdeep Engineering College Department of Computer Science Session-2017-18 Topic:- Submitted to Submitted by Mr.Ashutosh Rao Kamran Khan H.O.D. (CSE) Dept. B.tech IIIrd Year
  • 2. Contents  Introduction  What’s Big Data?  3’V of Big Data  Problem & Solution  What’s Hadoop?  HDFS  MapReduce  Architecture of Hadoop  Applications of Hadoop  Pros & Cons of Hadoop  Conclusion  Refrences
  • 3. Introduction  Apache Hadoop is an open source, Scalable, and Fault tolerant framework written in Java. It efficiently processes large volumes of data (BIG DATA) on a cluster of commodity hardware. Hadoop is not only a storage system but is a platform for large data storage as well as processing.  Created by Doug Cutting, Mike Cafarella in 2005.  Doug named it after his son's toy elephant  Now Apache Hadoop is a registered trademark of the Apache Software Foundation.
  • 5. What is Big Data? Data which are very large in size is called Big Data. Normally we work on data of size MB(Word ,Excel) or maximum GB(Movies, Codes) but data in Peta bytes i.e. 10^15 byte size is called Big Data. It is stated that almost 90% of today's data has been generated in the past 5 years.
  • 6.
  • 7.
  • 8. 3V's of Big Data  Velocity: The data is increasing at a very fast rate. It is estimated that the volume of data will double in every 2 years.  Variety: Now a days data are not stored in rows and column. Data is structured as well as unstructured. Log file, CCTV footage is unstructured data. Data which can be saved in tables are structured data like the transaction data of the bank.  Volume: The amount of data which we deal with is of very large size of Peta bytes.
  • 9. So what is the problem?? Processing that large data is very difficult in relational database. It would take too much time to process data and cost.
  • 10. Traditional Approach  In this approach, an enterprise will have a computer to store and process big data. Here data will be stored in an RDBMS like Oracle Database, MS SQL Server or DB2 and sophisticated softwares can be written to interact with the database, process the required data and present it to the users for analysis purpose.  This approach works well where we have less volume of data that can be accommodated by standard database servers, or up to the limit of the processor which is processing the data. But when it comes to dealing with huge amounts of data, it is really a tedious task to process such data through a traditional database server.
  • 11. ‘s Solution!!  Google solved this problem using an algorithm called MapReduce. This algorithm divides the task into small parts and assigns those parts to many computers connected over the network, and collects the results to form the final result dataset.
  • 13. What is Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.  It is made by apache software foundation in 2011.  Written in JAVA.
  • 14. Hadoop is open source software. Framework Massive Storage Processing Power
  • 15. We can solve this problem by Distributed Computing. But the problems in distributed computing is –  Hardware failure Chances of hardware failure is always there.  Combine the data after analysis Data from all disks have to be combined from all the disks which is a mess.
  • 16. To Solve all the Problems Hadoop Came. It has two main parts –  Hadoop Distributed File System (HDFS),  Data Processing Framework & MapReduce
  • 17. Hadoop Distributed File System  It ties so many small and reasonable priced machines together into a single cost effective computer cluster.  Data and application processing are protected against hardware failure.  If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail.  It automatically stores multiple copies of all data.  It provides simplified programming model which allows user to quickly read and write the distributed system.
  • 19.  NameNode in HDFS Architecture is also known as Master node. HDFS Namenode stores meta-data i.e. number of data block, replicas and other details. This meta-data is available in memory in the master for faster retrieval of data. NameNode maintains and manages the slave nodes, and assigns tasks to them. It should deploy on reliable hardware as it is the centerpiece of HDFS.  DataNode in HDFS Architecture is also known as Slave. In Hadoop HDFS Architecture, DataNode stores actual data in HDFS. It performs read and write operation as per the request of the client. DataNodes can deploy on commodity hardware.  In HDFS, when NameNode starts, first it reads HDFS state from an image file, FsImage. After that, it applies edits from the edits log file. NameNode then writes new HDFS state to the FsImage. Then it starts normal operation with an empty edits file. At the time of start-up, NameNode merges FsImage and edits files, so the edit log file could get very large over time. A side effect of a larger edits file is that next restart of Namenode takes longer.  Secondary Namenode solves this issue. Secondary NameNode downloads the FsImage and EditLogs from the NameNode. And then merges EditLogs with the FsImage (FileSystem Image). It keeps edits log size within a limit. It stores the modified FsImage into persistent storage. And we can use it in the case of NameNode failure.  Secondary NameNode performs a regular checkpoint in HDFS. 
  • 20.  The JobTracker is the service within Hadoop that farms out MapReduce tasks to specific nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack.  Client applications submit jobs to the Job tracker.  The JobTracker talks to the NameNode to determine the location of the data  The JobTracker locates TaskTracker nodes with available slots at or near the data  The JobTracker submits the work to the chosen TaskTracker nodes.  The TaskTracker nodes are monitored. If they do not submit heartbeat signals often enough, they are deemed to have failed and the work is scheduled on a different TaskTracker.  A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something to avoid, and it may may even blacklist the TaskTracker as unreliable.  When the work is completed, the JobTracker updates its status.  Client applications can poll the JobTracker for information.  The JobTracker is a point of failure for the Hadoop MapReduce service. If it goes down, all running jobs are halted.
  • 21. MapReduce MapReduce is a programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster. It is an associative implementation for processing and generating large data sets. MAP function that process a key pair to generates a set of intermediate key pairs. REDUCE function that merges all intermediate values associated with the same intermediate key
  • 22.
  • 23.
  • 24.
  • 26. Pros of Hadoop  Computing power  Flexibility  Fault Tolerance  Low Cost  Scalability
  • 27. Cons of Hadoop  1. Integration with existing systems  Hadoop is not optimised for ease for use. Installing and integrating with existing  databases might prove to be difficult, especially since there is no software support  provided.  2. Administration and ease of use  Hadoop requires knowledge of MapReduce, while most data practitioners use SQL. This  means significant training may be required to administer Hadoop clusters.  3. Security  Hadoop lacks the level of security functionality needed for safe enterprise deployment,  especially if it concerns sensitive data.
  • 28. Conclusion: Hadoop has been very effective solution for companies dealing with the data in petabytes. It has solved many problems in industry related to huge data management and distributed system. As it is open source, so it is adopted by companies widely.