SlideShare una empresa de Scribd logo
1 de 12
Hadoop – Big Deal !!
Author : Abhishek Kumar
+1-323-806-5474
Contents
• What is Hadoop
• Hadoop Components
• Why Hadoop
• HDFS
• HDFS Features
• When Not to use Hadoop
• HDFS Components
• DFS and HDFS
• Hadoop & Big data – Relatives !
What is Hadoop
• Conventional Definition
• Framework for Distributed Processing of Large Datasets( usually unstructured data)
across clusters of Commodity Hardware.
• Well, I Have been really bad with these bookish definitions never understood the heavy
terms used in them . So, here are some explanations –
• Distributed Processing : Spreading a heavy task across various workers/resources
to improve the time taken to deliver the task.
• Large Datasets(Unstructured data) : The data which does not have any defined
structure /format or size.
• Commodity Hardware : Hardware easily available usually with low performance
issues . These can failover anytime.
So, as of now we can say that Hadoop is nothing but a system that stores huge volume of unstructured data in a way that the data can
be accessed for reading faster.
• Fun Fact: Hadoop follows all standards, directory structure and other patters
of LINUX/UNIX. Most details easily available on “Apache” web site.
Hadoop Components
Level 0 - Hadoop
HDFS
MapReduce
Hadoop Distributed File System
Simple Programming Model
HDFS : HDFS is just a file system that serves the storage of data, in hadoop way.
MapReduce : Though termed as joint word but Map and Reduce are 2 separate
programs that helps in defining the Map for data spread in distributed
environment and reduce the complexity/volume of data sent/received or
processed.
Why Hadoop !
• So If Hadoop is another storage system then why so hype !
• Yes, Hadoop again is a Distributed File Processing system but I see something
that makes it different or in fact special “Faster I/O Processing using
commodity hardware”.
• We all know that this generation has no issue with Storage
size. We have TBs of hard drives available at home too .
But, only problem remains is accessing the huge volume of
unstructured data using low performance I/O devices we
have. This is where Hadoop enters to rescue. How !! .. We
might know that through other slides.
• Fun Fact: Hadoop is not a software which you can download and install
on your system. It is a set of tools organized to serve some specific
purpose.
HDFS
Conventional Definition : HDFS is a file system designed for storing very large files with streaming
data access patterns running clusters on commodity hardware.
Like Name Says – It is a Distributed File System following some specific protocols/standards or
techniques, we will call Hadoop way 
Map Reduce
Engine
__________
HDFS Cluster
Job Tracker
__________
Name Node
Task Tracker
____________
Data Node
Task Tracker
____________
Data Node
Task Tracker
____________
Data Node
Task Tracker
____________
Data Node
HDFS Advantages
• Fault Tolerance
• Now, if Hadoop has an important highlight in its definition i.e.
”using commodity hardware”, then we can be certain of failovers.
But Hadoop handles this failing nodes very effectively and
ensures that we do not loose any data anyway. How – read about
replication ..
• Handles large Datasets
• No doubt why companies like Facebook, Google, yahoo etc.
prefers it. So proven system for handling large data sets.
• Streaming access to File system data
• You have your “youtube” videos using this .
• High Performance
• The facts says that the processing time for data using Hadoop is
“n”-times faster, where n is “number of nodes/data nodes”.
When Not to Use Hadoop/HDFS
• For many small files used in transactions
• Low Latency data access
• When there are many people who modifies the data/files (
multiple writers) arbitrarily.
HDFS Components
Name Node
(Job Tracker)
Data Nodes
(Task Trackers)
Name Node : This component of HDFS is generally on a High Performance machine and
if we talk in layman terms, it is kind of “Index” for the data spread across several data
nodes. We can also call it metadata storage process.
Data Node : This is responsible for storing actual data. This runs as Daemon in local
machines.
Fun Fact: Daemon is a resident program that runs in background on your machine as
processes. Daemon is terminology used in UNIX. In DOS we call it TSR.
DFS and HDFS
• So, what is difference between a regular Distributed File
System and Hadoop !!
• Hadoop processes the data in local nodes and just transmits the
output to Client while in regular DFS data is brought to master node
from various nodes for processing. So quiet obvious – Hadoop has to
transfer less amount of data( just the output) over network while a
regular DFS has to transfer huge volume of data on network. This
Makes Hadoop winner for faster processing!!
• This type of processing of data on data nodes is called data
localization which is one of the important super powers of
Hadoop ..
Hadoop & Big Data – Relatives !
Relation is not very complex. Its just like simple husband-wife relation
where Hadoop comes in just to resolves issues with Big data .
In other words, Big data provides challenges for Hadoop to resolve.
Thanks !
Probably will provide more details in
next presentation 

Más contenido relacionado

La actualidad más candente

Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big data
Cyanny LIANG
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 

La actualidad más candente (19)

Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
DataLogix Hadoop Solution
DataLogix Hadoop SolutionDataLogix Hadoop Solution
DataLogix Hadoop Solution
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop Fundamentals
Hadoop FundamentalsHadoop Fundamentals
Hadoop Fundamentals
 
Hadoop
Hadoop Hadoop
Hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big data
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jha
 
Hadoop
HadoopHadoop
Hadoop
 
Pptx present
Pptx presentPptx present
Pptx present
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 

Destacado

Destacado (11)

Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Deep dive hadoop
Deep dive hadoopDeep dive hadoop
Deep dive hadoop
 
Demystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep DiveDemystify Big Data, Data Science & Signal Extraction Deep Dive
Demystify Big Data, Data Science & Signal Extraction Deep Dive
 
Hadoop
HadoopHadoop
Hadoop
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
 
Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 

Similar a Hadoop – big deal

hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
bhargavi804095
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
ManiMaran230751
 

Similar a Hadoop – big deal (20)

Hadoop
HadoopHadoop
Hadoop
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Anju
AnjuAnju
Anju
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Último (20)

ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 

Hadoop – big deal

  • 1. Hadoop – Big Deal !! Author : Abhishek Kumar +1-323-806-5474
  • 2. Contents • What is Hadoop • Hadoop Components • Why Hadoop • HDFS • HDFS Features • When Not to use Hadoop • HDFS Components • DFS and HDFS • Hadoop & Big data – Relatives !
  • 3. What is Hadoop • Conventional Definition • Framework for Distributed Processing of Large Datasets( usually unstructured data) across clusters of Commodity Hardware. • Well, I Have been really bad with these bookish definitions never understood the heavy terms used in them . So, here are some explanations – • Distributed Processing : Spreading a heavy task across various workers/resources to improve the time taken to deliver the task. • Large Datasets(Unstructured data) : The data which does not have any defined structure /format or size. • Commodity Hardware : Hardware easily available usually with low performance issues . These can failover anytime. So, as of now we can say that Hadoop is nothing but a system that stores huge volume of unstructured data in a way that the data can be accessed for reading faster. • Fun Fact: Hadoop follows all standards, directory structure and other patters of LINUX/UNIX. Most details easily available on “Apache” web site.
  • 4. Hadoop Components Level 0 - Hadoop HDFS MapReduce Hadoop Distributed File System Simple Programming Model HDFS : HDFS is just a file system that serves the storage of data, in hadoop way. MapReduce : Though termed as joint word but Map and Reduce are 2 separate programs that helps in defining the Map for data spread in distributed environment and reduce the complexity/volume of data sent/received or processed.
  • 5. Why Hadoop ! • So If Hadoop is another storage system then why so hype ! • Yes, Hadoop again is a Distributed File Processing system but I see something that makes it different or in fact special “Faster I/O Processing using commodity hardware”. • We all know that this generation has no issue with Storage size. We have TBs of hard drives available at home too . But, only problem remains is accessing the huge volume of unstructured data using low performance I/O devices we have. This is where Hadoop enters to rescue. How !! .. We might know that through other slides. • Fun Fact: Hadoop is not a software which you can download and install on your system. It is a set of tools organized to serve some specific purpose.
  • 6. HDFS Conventional Definition : HDFS is a file system designed for storing very large files with streaming data access patterns running clusters on commodity hardware. Like Name Says – It is a Distributed File System following some specific protocols/standards or techniques, we will call Hadoop way  Map Reduce Engine __________ HDFS Cluster Job Tracker __________ Name Node Task Tracker ____________ Data Node Task Tracker ____________ Data Node Task Tracker ____________ Data Node Task Tracker ____________ Data Node
  • 7. HDFS Advantages • Fault Tolerance • Now, if Hadoop has an important highlight in its definition i.e. ”using commodity hardware”, then we can be certain of failovers. But Hadoop handles this failing nodes very effectively and ensures that we do not loose any data anyway. How – read about replication .. • Handles large Datasets • No doubt why companies like Facebook, Google, yahoo etc. prefers it. So proven system for handling large data sets. • Streaming access to File system data • You have your “youtube” videos using this . • High Performance • The facts says that the processing time for data using Hadoop is “n”-times faster, where n is “number of nodes/data nodes”.
  • 8. When Not to Use Hadoop/HDFS • For many small files used in transactions • Low Latency data access • When there are many people who modifies the data/files ( multiple writers) arbitrarily.
  • 9. HDFS Components Name Node (Job Tracker) Data Nodes (Task Trackers) Name Node : This component of HDFS is generally on a High Performance machine and if we talk in layman terms, it is kind of “Index” for the data spread across several data nodes. We can also call it metadata storage process. Data Node : This is responsible for storing actual data. This runs as Daemon in local machines. Fun Fact: Daemon is a resident program that runs in background on your machine as processes. Daemon is terminology used in UNIX. In DOS we call it TSR.
  • 10. DFS and HDFS • So, what is difference between a regular Distributed File System and Hadoop !! • Hadoop processes the data in local nodes and just transmits the output to Client while in regular DFS data is brought to master node from various nodes for processing. So quiet obvious – Hadoop has to transfer less amount of data( just the output) over network while a regular DFS has to transfer huge volume of data on network. This Makes Hadoop winner for faster processing!! • This type of processing of data on data nodes is called data localization which is one of the important super powers of Hadoop ..
  • 11. Hadoop & Big Data – Relatives ! Relation is not very complex. Its just like simple husband-wife relation where Hadoop comes in just to resolves issues with Big data . In other words, Big data provides challenges for Hadoop to resolve.
  • 12. Thanks ! Probably will provide more details in next presentation 