SlideShare una empresa de Scribd logo
1 de 11
Hadoop
Big Data
• Lots of Data
• The challenges include capture, storage, search, transfer,
analysis and visualization.
• Systems/Enterprise generates huge amount of data from
Terabyte to Petabytes of information.
Characteristics of Big
Data
The 3Vs are
• Volume
• Variety
• Velocity
What is Hadoop?
• Apache Hadoop is the framework that allows for
distributed processing of arrange datasets across cluster
of commodity computers using simple programming
model
• Its is Open source Data Management.
Hadoop System-
Principles
• Scale-Out rather then scale-up
• Bring code to data rather data to code
• Deal with failures – they are common
• Abstract complexity of distributed and concurrent
applications
HDFS
Filesystem cluster is managed by three types processes
• Name node
• Data node
• Secondary node
Files and Blocks
• Files are split into blocks(single unit of storage).
• Replicated across machine at load time.
• By default 3 replication.
Hadoop - MapReduce
• Model for processing large amount of data in parallel.
• Derived from functional programming.
• Can be implemented in multiple languages.
MapReduce Model
• Impose key-value input/output
• Defines map and reduce funtions
map : (k1,v1) -> list (k2,v2)
reduce : (k2,list(v2)) -> list (k3,v3)
MapReduce Framework
• Takes care of distributed processing and coordination
• Scheduling
• Task localization with Data
• Error Handling
• Data Synchronization
Yarn Daemons
- Node Manager
• Manages resources of single node
• There is one instance per node in the cluster
- Resource Manager
• Manages Resources for Cluster
• Instructs Node Manager to allocate resources

Más contenido relacionado

La actualidad más candente

Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture
Ganesh B
 
Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the room
cacois
 

La actualidad más candente (20)

Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Architecture
Hadoop Architecture Hadoop Architecture
Hadoop Architecture
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...Hadoop ecosystem  J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
Hadoop ecosystem J.AYEESHA PARVEEN II-M.SC.,COMPUTER SCIENCE, BON SECOURS CO...
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop Hive
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop: The elephant in the room
Hadoop: The elephant in the roomHadoop: The elephant in the room
Hadoop: The elephant in the room
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Impala turbocharge your big data access
Impala   turbocharge your big data accessImpala   turbocharge your big data access
Impala turbocharge your big data access
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
 

Destacado

Arth303 delhi sultanate-final
Arth303 delhi sultanate-finalArth303 delhi sultanate-final
Arth303 delhi sultanate-final
rajuprokity
 
The mughal dynasty
The mughal dynastyThe mughal dynasty
The mughal dynasty
rajuprokity
 

Destacado (12)

Creating a Marketing Plan
Creating a Marketing PlanCreating a Marketing Plan
Creating a Marketing Plan
 
2014 Supreme Court Update with Eric Daigle
2014 Supreme Court Update with Eric Daigle2014 Supreme Court Update with Eric Daigle
2014 Supreme Court Update with Eric Daigle
 
Arth303 delhi sultanate-final
Arth303 delhi sultanate-finalArth303 delhi sultanate-final
Arth303 delhi sultanate-final
 
The mughal dynasty
The mughal dynastyThe mughal dynasty
The mughal dynasty
 
Entreprise résiliente : netflix et zappos
Entreprise résiliente : netflix et zapposEntreprise résiliente : netflix et zappos
Entreprise résiliente : netflix et zappos
 
Diagnostico del sistema operativo
Diagnostico del sistema operativoDiagnostico del sistema operativo
Diagnostico del sistema operativo
 
08 mughals
08 mughals08 mughals
08 mughals
 
insights in recent guidelines in management of diabetic hypertensive dyslipi...
insights in recent guidelines in management of diabetic  hypertensive dyslipi...insights in recent guidelines in management of diabetic  hypertensive dyslipi...
insights in recent guidelines in management of diabetic hypertensive dyslipi...
 
Operating System Chapter 1
Operating System Chapter 1Operating System Chapter 1
Operating System Chapter 1
 
Non carious lesions of teeth
Non carious lesions of teethNon carious lesions of teeth
Non carious lesions of teeth
 
PRODUCTION, CHARACTERIZATION AND FUEL PROPERTIES OF ALTERNATIVE DIESEL FUEL F...
PRODUCTION, CHARACTERIZATION AND FUEL PROPERTIES OF ALTERNATIVE DIESEL FUEL F...PRODUCTION, CHARACTERIZATION AND FUEL PROPERTIES OF ALTERNATIVE DIESEL FUEL F...
PRODUCTION, CHARACTERIZATION AND FUEL PROPERTIES OF ALTERNATIVE DIESEL FUEL F...
 
Algebra expresiones
Algebra expresionesAlgebra expresiones
Algebra expresiones
 

Similar a Hadoop

Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
MaharajothiP
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
bhargavi804095
 

Similar a Hadoop (20)

Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 
2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Anju
AnjuAnju
Anju
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
 
002 Introduction to hadoop v3
002   Introduction to hadoop v3002   Introduction to hadoop v3
002 Introduction to hadoop v3
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Hadoop

  • 2. Big Data • Lots of Data • The challenges include capture, storage, search, transfer, analysis and visualization. • Systems/Enterprise generates huge amount of data from Terabyte to Petabytes of information.
  • 3. Characteristics of Big Data The 3Vs are • Volume • Variety • Velocity
  • 4. What is Hadoop? • Apache Hadoop is the framework that allows for distributed processing of arrange datasets across cluster of commodity computers using simple programming model • Its is Open source Data Management.
  • 5. Hadoop System- Principles • Scale-Out rather then scale-up • Bring code to data rather data to code • Deal with failures – they are common • Abstract complexity of distributed and concurrent applications
  • 6. HDFS Filesystem cluster is managed by three types processes • Name node • Data node • Secondary node
  • 7. Files and Blocks • Files are split into blocks(single unit of storage). • Replicated across machine at load time. • By default 3 replication.
  • 8. Hadoop - MapReduce • Model for processing large amount of data in parallel. • Derived from functional programming. • Can be implemented in multiple languages.
  • 9. MapReduce Model • Impose key-value input/output • Defines map and reduce funtions map : (k1,v1) -> list (k2,v2) reduce : (k2,list(v2)) -> list (k3,v3)
  • 10. MapReduce Framework • Takes care of distributed processing and coordination • Scheduling • Task localization with Data • Error Handling • Data Synchronization
  • 11. Yarn Daemons - Node Manager • Manages resources of single node • There is one instance per node in the cluster - Resource Manager • Manages Resources for Cluster • Instructs Node Manager to allocate resources