SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Apache Hadoop is a Java software framework that allows for the distributed processing
of large data sets across clusters of computers spread across the world using a simple
programming model.
•  Distributed, scalable and
reliable
•  Fault‐tolerant storage
system
Hadoop Distributed
File System
•  High-performance parallel
data processing
•  Employs the divide-conquer
principle
Map-Reduce
Programming Model
A class teacher of class 5 needs to find out the name of the student with highest marks
for each subject.
Total students : 50
Total subjects : 5
Our Goal
To minimize the Total time spent
Time to process each
subject per student
: 1min
Total time spent : 250mins
Subject 1 : S1-98
Subject 2 : S13-95
Subject 3 : S1-97
Subject 4 : S23-100
Subject 5 : S8-99
Input
Output
HDFS: Distribute the
data into blocks across
multiple nodes
Distribute papers across 5 peons – Each
peon will have papers of 10 students for
each subject (50 papers each)
a)
Map Phase: Apply
business logic on
distributed data in parallel
Each peon will provide list of subjects
with student name and highest marks
from his data from a list of 10 students.
Total time spent: 50mins (in parallel)
b)
Reduce Phase: Iterate
over the map phase
output and get final result
Total records left: 5 students for 5
subjects only. Time to get subject list for
student name with highest marks: 25mins
c)
Total time spent: 50 + 25 = 75mins
Social Media Data
Analyzing Web Clickstream Data
Server Log Data
Machine and Sensor Data
HDFS Layer : --
Stores files across storage nodes
in a Hadoop cluster
Consists of :
•  Namenode & Datanodes
Map-Reduce Engine : --
Processes vast amounts of data in-
parallel on large clusters in a
reliable & fault-tolerant manner
Consists of :
•  Job Tracker & Task Trackers
Namenode
Datanode_1 Datanode_2 Datanode_3
HDFS
Block 1
HDFS
Block 2
HDFS
Block 3 Block 4
Storage & Replication of Blocks in HDFS
Filedividedintoblocks
Block 1
Block 2
Block 3
Block 4
HDFS Client
File write
request
Job
Tracker
Task Tracker 1 Task Tracker _2 Task Tracker _3
HDFS
Block 1
HDFS
Block 2
HDFS
Block 3 Block 4
Map-Reduce
job from
client
Executes individual
Map-Reduce tasks
assigned by Job
Tracker
Task Trackers retrieve data from HDFS which is stored on the
Data-node i.e. the same system where Task Tracker is running.
Task
Tracker
Data
Node
Slave
m/c
NameNode
Ø  Maps a block to the Datanodes
Ø  Controls read/write access to files
Ø  Manages Replication Engine for Blocks
DataNode
Ø  Responsible for serving read and write
requests (block creation, deletion, and
replication)
JobTracker
Ø  Accepts Map-Reduce tasks from the clients
Ø  Assigns tasks to the Task Trackers &
monitors their status
TaskTracker
Ø  Worker daemon, runs Map-Reduce tasks
Ø  Sends heart-beat to Job Tracker
Ø  Retrieves Job resources from HDFS
NameNode DataNode
JobTracker TaskTracker
Hadoop
Daemons
Hadoop
Services
HDFS MapReduce YARN
YARN stands for “Yet
Another Resource
Negotiator”, a framework
to provide generic
resource management
solution to Hadoop
clusters.
Allows easy integration of
multiple data processing
algorithms to the data stored in
HDFS
Query Language Pig Scripting
Coordination Service
Columnar Database
Log Management
Data Exchange
Designing Workflow
Machine Learning
Messaging System
a)  Apache Website
à http://hadoop.apache.org/
b)  Learning YARN
à https://www.packtpub.com/big-data-and-business-intelligence/learning-yarn
c)  Hadoop: The definitive guide
àhttp://shop.oreilly.com/product/0636920033448.do
Hadoop Introduction

Más contenido relacionado

La actualidad más candente

Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHanborq Inc.
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesNitin Khattar
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Designsudhakara st
 
HDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsHDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsDataWorks Summit
 
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsCoordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsKonstantin V. Shvachko
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapakapa rohit
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemAnand Kulkarni
 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksCloudera, Inc.
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introductioninjae yeo
 

La actualidad más candente (20)

Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenes
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
HDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsHDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once Semantics
 
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsCoordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS FederationMarch 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introduction
 

Destacado

Les Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à releverLes Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à releverOCTO Technology Suisse
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceApache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceEdureka!
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analyticsEdureka!
 
De la pensée projet à la pensée produit
De la pensée projet à la pensée produitDe la pensée projet à la pensée produit
De la pensée projet à la pensée produitOCTO Technology Suisse
 
Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?OCTO Technology Suisse
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka Edureka!
 
Control Transactions using PowerCenter
Control Transactions using PowerCenterControl Transactions using PowerCenter
Control Transactions using PowerCenterEdureka!
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Edureka!
 
하둡 (Hadoop) 및 관련기술 훑어보기
하둡 (Hadoop) 및 관련기술 훑어보기하둡 (Hadoop) 및 관련기술 훑어보기
하둡 (Hadoop) 및 관련기술 훑어보기beom kyun choi
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Edureka!
 
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...Edureka!
 
DevOps : mission [im]possible ?
DevOps : mission [im]possible ?DevOps : mission [im]possible ?
DevOps : mission [im]possible ?rfelden
 
Polar Expeditions and Agility: the 1910 Race to the South Pole and Modern Tales
Polar Expeditions and Agility: the 1910 Race to the South Pole and Modern TalesPolar Expeditions and Agility: the 1910 Race to the South Pole and Modern Tales
Polar Expeditions and Agility: the 1910 Race to the South Pole and Modern TalesOCTO Technology Suisse
 
Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre e...
Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre e...Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre e...
Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre e...OCTO Technology Suisse
 

Destacado (18)

Les Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à releverLes Business Analysts face à l'agilité : de nouveaux challenges à relever
Les Business Analysts face à l'agilité : de nouveaux challenges à relever
 
Agile & Top Management
Agile & Top ManagementAgile & Top Management
Agile & Top Management
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Apache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduceApache Spark beyond Hadoop MapReduce
Apache Spark beyond Hadoop MapReduce
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
 
De la pensée projet à la pensée produit
De la pensée projet à la pensée produitDe la pensée projet à la pensée produit
De la pensée projet à la pensée produit
 
Cloud : en 2017, sortez du stratus !
Cloud : en 2017, sortez du stratus !Cloud : en 2017, sortez du stratus !
Cloud : en 2017, sortez du stratus !
 
Démystifions l'API-culture!
Démystifions l'API-culture!Démystifions l'API-culture!
Démystifions l'API-culture!
 
Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?Afterwork Blockchain : la prochaine technologie disruptive ?
Afterwork Blockchain : la prochaine technologie disruptive ?
 
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka
 
Control Transactions using PowerCenter
Control Transactions using PowerCenterControl Transactions using PowerCenter
Control Transactions using PowerCenter
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
 
하둡 (Hadoop) 및 관련기술 훑어보기
하둡 (Hadoop) 및 관련기술 훑어보기하둡 (Hadoop) 및 관련기술 훑어보기
하둡 (Hadoop) 및 관련기술 훑어보기
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
 
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
What Is Salesforce CRM? | Salesforce CRM Tutorial For Beginners | Salesforce ...
 
DevOps : mission [im]possible ?
DevOps : mission [im]possible ?DevOps : mission [im]possible ?
DevOps : mission [im]possible ?
 
Polar Expeditions and Agility: the 1910 Race to the South Pole and Modern Tales
Polar Expeditions and Agility: the 1910 Race to the South Pole and Modern TalesPolar Expeditions and Agility: the 1910 Race to the South Pole and Modern Tales
Polar Expeditions and Agility: the 1910 Race to the South Pole and Modern Tales
 
Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre e...
Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre e...Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre e...
Afterwork Big Data - Data Science & Machine Learning : explorer, comprendre e...
 

Similar a Hadoop Introduction

Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaData Con LA
 
Hadoop security
Hadoop securityHadoop security
Hadoop securityBiju Nair
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemMilad Sobhkhiz
 
Big Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptxBig Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptxssuser8c3ea7
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorialvinayiqbusiness
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptxSwarnaSLcse
 

Similar a Hadoop Introduction (20)

HADOOP.pptx
HADOOP.pptxHADOOP.pptx
HADOOP.pptx
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jha
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Big Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptxBig Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptx
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Hdfs
HdfsHdfs
Hdfs
 
Unit 1
Unit 1Unit 1
Unit 1
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 

Último

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 

Último (20)

A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 

Hadoop Introduction

  • 1.
  • 2. Apache Hadoop is a Java software framework that allows for the distributed processing of large data sets across clusters of computers spread across the world using a simple programming model.
  • 3.
  • 4. •  Distributed, scalable and reliable •  Fault‐tolerant storage system Hadoop Distributed File System •  High-performance parallel data processing •  Employs the divide-conquer principle Map-Reduce Programming Model
  • 5. A class teacher of class 5 needs to find out the name of the student with highest marks for each subject. Total students : 50 Total subjects : 5 Our Goal To minimize the Total time spent Time to process each subject per student : 1min Total time spent : 250mins Subject 1 : S1-98 Subject 2 : S13-95 Subject 3 : S1-97 Subject 4 : S23-100 Subject 5 : S8-99 Input Output
  • 6. HDFS: Distribute the data into blocks across multiple nodes Distribute papers across 5 peons – Each peon will have papers of 10 students for each subject (50 papers each) a) Map Phase: Apply business logic on distributed data in parallel Each peon will provide list of subjects with student name and highest marks from his data from a list of 10 students. Total time spent: 50mins (in parallel) b) Reduce Phase: Iterate over the map phase output and get final result Total records left: 5 students for 5 subjects only. Time to get subject list for student name with highest marks: 25mins c) Total time spent: 50 + 25 = 75mins
  • 7. Social Media Data Analyzing Web Clickstream Data Server Log Data Machine and Sensor Data
  • 8. HDFS Layer : -- Stores files across storage nodes in a Hadoop cluster Consists of : •  Namenode & Datanodes Map-Reduce Engine : -- Processes vast amounts of data in- parallel on large clusters in a reliable & fault-tolerant manner Consists of : •  Job Tracker & Task Trackers
  • 9. Namenode Datanode_1 Datanode_2 Datanode_3 HDFS Block 1 HDFS Block 2 HDFS Block 3 Block 4 Storage & Replication of Blocks in HDFS Filedividedintoblocks Block 1 Block 2 Block 3 Block 4 HDFS Client File write request
  • 10. Job Tracker Task Tracker 1 Task Tracker _2 Task Tracker _3 HDFS Block 1 HDFS Block 2 HDFS Block 3 Block 4 Map-Reduce job from client Executes individual Map-Reduce tasks assigned by Job Tracker Task Trackers retrieve data from HDFS which is stored on the Data-node i.e. the same system where Task Tracker is running. Task Tracker Data Node Slave m/c
  • 11. NameNode Ø  Maps a block to the Datanodes Ø  Controls read/write access to files Ø  Manages Replication Engine for Blocks DataNode Ø  Responsible for serving read and write requests (block creation, deletion, and replication) JobTracker Ø  Accepts Map-Reduce tasks from the clients Ø  Assigns tasks to the Task Trackers & monitors their status TaskTracker Ø  Worker daemon, runs Map-Reduce tasks Ø  Sends heart-beat to Job Tracker Ø  Retrieves Job resources from HDFS NameNode DataNode JobTracker TaskTracker Hadoop Daemons
  • 12.
  • 13. Hadoop Services HDFS MapReduce YARN YARN stands for “Yet Another Resource Negotiator”, a framework to provide generic resource management solution to Hadoop clusters.
  • 14.
  • 15. Allows easy integration of multiple data processing algorithms to the data stored in HDFS
  • 16.
  • 17. Query Language Pig Scripting Coordination Service Columnar Database Log Management Data Exchange Designing Workflow Machine Learning Messaging System
  • 18. a)  Apache Website à http://hadoop.apache.org/ b)  Learning YARN à https://www.packtpub.com/big-data-and-business-intelligence/learning-yarn c)  Hadoop: The definitive guide àhttp://shop.oreilly.com/product/0636920033448.do