SlideShare una empresa de Scribd logo
1 de 13
MapReduce
Introduction to MapReduce
• MapReduce is a programming model and an
associated implementation for processing
and generating large data sets with a
parallel, distributed algorithm on a cluster.
• Map() procedure that performs filtering and
sorting.
• Reduce() procedure that performs a
summary operation (such as statistical
operations)
What is MapReduce in Hadoop?
• Hadoop is a free, Java-based programming framework that supports
the processing of large data sets in a distributed computing
environment. It is part of the Apache project sponsored by the
Apache Software Foundation.
• MapReduce is defined as the framework of Hadoop, which is used
to process a huge amount of data parallelly on large clusters of
commodity hardware in a reliable manner. It allows the application
to store the data in the distributed form and process large dataset
across groups of computers using simple programming models
Why use MapReduce?
MapReduce Architecture
How MapReduce in Hadoop works?
Steps of MapReduce in Hadoop
• Map stage − The map or mapper’s job is to process the input
data. Generally, the input data is in the form of file or directory
and is stored in the Hadoop file system (HDFS). The input file is
passed to the mapper function line by line. The mapper
processes the data and creates several small chunks of data.
• Reduce stage − This stage is the combination of
the Shuffle stage and the Reduce stage. The Reducer’s job is to
process the data that comes from the mapper. After processing,
it produces a new set of output, which will be stored in the
HDFS.
Algorithm Behind MapReduce
MapReduce Jobs
Application Master
• Responsible for the
execution of a single
application or MapReduce
job
• Divides job requests into
tasks and assigns them to
Node-Managers running on
the slave node
Node-Manager
• Has many dynamic
resource containers
which executes each
active map or reduce
task
• Communicates regularly
with the Application
Master
MapReduce jobs workflow
Advantage of MapReduce
• Fault tolerance: It can handle failures without downtime.
Speed: It splits, shuffles, and reduces the unstructured data in a short
time.
• Cost-effective: Hadoop MapReduce has a scale-out feature that enables
users to process or store the data in a cost-effective manner.
• Scalability: It provides a highly scalable framework. MapReduce allows
users to run applications from many nodes.
• Parallel processing: As Hadoop storage data in the distributed file system
and the MapReduce program’s working, it divides tasks task map and
reduce and that could execute in parallel. And again, because of the
parallel execution, it reduces the entire run time.
Limitations Of MapReduce
• MapReduce cannot cache the intermediate data in memory
for a further requirement which diminishes the performance
of Hadoop.
• It is only suitable for Batch Processing of a Huge amounts
of Data, and it is not flexible, means the MapReduce framework
is rigid.
• A lot of manual coding is required, even for common operations
such as join, filter, projection, aggregates, sorting, distinct...
• Semantics are hidden inside the map and reduce functions, so it
is difficult to maintain, extend and optimize them.
Mapreduce Hadop.pptx

Más contenido relacionado

Similar a Mapreduce Hadop.pptx

Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 

Similar a Mapreduce Hadop.pptx (20)

Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Anju
AnjuAnju
Anju
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
Hadoop
HadoopHadoop
Hadoop
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Analytics 3
Analytics 3Analytics 3
Analytics 3
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Getting started big data
Getting started big dataGetting started big data
Getting started big data
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop 80hr v1.0
Hadoop 80hr v1.0Hadoop 80hr v1.0
Hadoop 80hr v1.0
 
hadoop.pptx
hadoop.pptxhadoop.pptx
hadoop.pptx
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 

Último

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Último (20)

Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 

Mapreduce Hadop.pptx

  • 2. Introduction to MapReduce • MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster. • Map() procedure that performs filtering and sorting. • Reduce() procedure that performs a summary operation (such as statistical operations)
  • 3. What is MapReduce in Hadoop? • Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation. • MapReduce is defined as the framework of Hadoop, which is used to process a huge amount of data parallelly on large clusters of commodity hardware in a reliable manner. It allows the application to store the data in the distributed form and process large dataset across groups of computers using simple programming models
  • 6. How MapReduce in Hadoop works?
  • 7. Steps of MapReduce in Hadoop • Map stage − The map or mapper’s job is to process the input data. Generally, the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS). The input file is passed to the mapper function line by line. The mapper processes the data and creates several small chunks of data. • Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer’s job is to process the data that comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS.
  • 9. MapReduce Jobs Application Master • Responsible for the execution of a single application or MapReduce job • Divides job requests into tasks and assigns them to Node-Managers running on the slave node Node-Manager • Has many dynamic resource containers which executes each active map or reduce task • Communicates regularly with the Application Master
  • 11. Advantage of MapReduce • Fault tolerance: It can handle failures without downtime. Speed: It splits, shuffles, and reduces the unstructured data in a short time. • Cost-effective: Hadoop MapReduce has a scale-out feature that enables users to process or store the data in a cost-effective manner. • Scalability: It provides a highly scalable framework. MapReduce allows users to run applications from many nodes. • Parallel processing: As Hadoop storage data in the distributed file system and the MapReduce program’s working, it divides tasks task map and reduce and that could execute in parallel. And again, because of the parallel execution, it reduces the entire run time.
  • 12. Limitations Of MapReduce • MapReduce cannot cache the intermediate data in memory for a further requirement which diminishes the performance of Hadoop. • It is only suitable for Batch Processing of a Huge amounts of Data, and it is not flexible, means the MapReduce framework is rigid. • A lot of manual coding is required, even for common operations such as join, filter, projection, aggregates, sorting, distinct... • Semantics are hidden inside the map and reduce functions, so it is difficult to maintain, extend and optimize them.