SlideShare una empresa de Scribd logo
MapReduce
Programming Model
Adarsha Dhakal
Jaydeep Shah
Prakash Upadhyaya
Ritu Ratnam 1
MapReduce Programming Model
2
Introduction
• MapReduce is a programming model introduced by Google for processing
and generating large data sets on clusters of computers.
• Google first formulated the framework for the purpose of serving Google’s
Web page indexing, and the new framework replaced earlier indexing
algorithms.
• Beginner developers find the MapReduce framework beneficial because
library routines can be used to create parallel programs without any worries
about infra-cluster communication, task monitoring or failure handling
processes.
• MapReduce runs on a large cluster of commodity machines and is highly
scalable.
• It has several forms of implementation provided by multiple programming
languages, like Java, C# and C++.
• MapReduce is a general-purpose programming model for data-
intensive computing.
• It was introduced by Google in 2004 to construct its web index.
• It is also used at Yahoo, Facebook etc. It uses a parallel computing
model that distributes computational tasks to large number of
nodes(approximately 1000-10000 nodes.)
• It is fault-tolerable. It can work even when 1600 nodes among 1800
nodes fails.
• Hadoop framework from Apache Software Foundation is an
implementation of MapReduce Programming Model
Phases for MapReduce
1. Input Splits
2. Mapping
3. Shuffling
4. Sorting
5. Reducing
Steps for MapReduce
• Step 1: Transform raw data into key/value pairs in parallel.
• The mapper will get the data file and make the Rating the key and
the values will be the reviews. We will add number 1 for reviews.
• Step 2: Shuffle and sort by the MapReduce model.
• The process of transferring mappers’ intermediate output to the
reducer is known as shuffling. It will collect all the reviews(number
1s) together with the individual key and it will sort them. it will get
sorted by key.
• Step3: Process the data using Reduce.
• Reduce will count each value(number 1) for each key.
• Although, the map and reduce functions in MapReduce model is not
exactly same as in functional programming.
• Map and Reduce functions in MapReduce model:
• Map: It process a (key, value) pair and returns a list of
(intermediate key, value) pairs
map(k1, v1)→list(k2, v2)
• Reduce: It merges all intermediate values having the same
intermediate key
reduce(k2, list(v2))→list(v3)
Basic Concept
• In MapReduce model, user has to write only two functions map and
reduce.
• Few examples that can be easily expressed as MapReduce
computations:
• Distributed Grep ( is an efficient way to utilize a Hadoop cluster to
find log messages hidden within terabytes of log data)
• Count of URL Access Frequency
• Inverted Index
• Mining
Advantages
• MapReduce facilitates automatic parallelization and distribution,
reducing the time required to run the programs
• MapReduce provides fault tolerance by re-executing, writing map
output to a distributed file system, and restarting failed map or reducer
task
• MapReduce is a cost-effective solution for processing of data
• MapReduce processes large volume of unprocessed data very quickly
• MapReduce utilizes simple programming model to handle tasks more
efficiently and quickly and is easy to learn
• MapReduce is flexible and works with several Hadoop languages to
handle and store data
Limitations
• MapReduce is a low-level programming model which involves a lot of
writing code
• The batch-based processing nature of MapReduce makes it unsuitable for
real-time processing
• It does not support data pipelining or overlapping of Map and Reduce
functions
• Task initialization, coordination, monitoring, and scheduling take up a large
chunk of MapReduce's execution time and reduce its performance
• MapReduce cannot cache the intermediate data in memory, thereby
diminishing Hadoop’s performance
The data we have has 20491 rows and 2 columns, and
our task is to provide individual count of ratings.
MAPPING each rating with a shuffle and giving counter of 1.
Later sorting the ratings with the count.
REDUCING leads to giving lesser number of data.
Each rating has their total count from the data from Review of Hotel
Implementing MapReduce Programming
Model
• Hadoop, developed by Apache
• Spark, developed by AMPLab at UC Berkley
• Phoenix++, developed at Stanford University
• MARISSA (MApReduce Implementation for Streaming Science Application,
developed at SUNY Binghamton
• DRYAD and DRYADLINQ, developed by Microsoft
• MapReduce-MPI, Developed by Steve Plimpton (Sandia)
• Disco, developed by NOKIA
• Themis, developed by Rasmussen et al
• MR4C, developed by Skybox Imaging
Bibliography
• MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat Google, Inc.
• MapReduce Tutorial, https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
• Hadoop – MapReduce, https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm
• MapReduce-Implementation-in-Python, https://github.com/rshah204/MapReduce-Implementation-in-
Python/blob/master/MapReduce.ipynb
• Hotel Reviews, https://www.kaggle.com/datasets/yash10kundu/hotel-reviews?resource=download
• MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON, Zeba Khanam
and Shafali Agarwal, Department of Computer Application, JSSATE, Noida, IJCSIT Vol 7, No 4, August
2015

Más contenido relacionado

La actualidad más candente

Developing a Map Reduce Application
Developing a Map Reduce ApplicationDeveloping a Map Reduce Application
Developing a Map Reduce Application
Dr. C.V. Suresh Babu
 
Os structure
Os structureOs structure
Os structure
Mohd Arif
 
Principal source of optimization in compiler design
Principal source of optimization in compiler designPrincipal source of optimization in compiler design
Principal source of optimization in compiler design
Rajkumar R
 
Hardware and Software parallelism
Hardware and Software parallelismHardware and Software parallelism
Hardware and Software parallelism
prashantdahake
 
Common Standards in Cloud Computing
Common Standards in Cloud ComputingCommon Standards in Cloud Computing
Common Standards in Cloud Computing
mrzahidfaiz.blogspot.com
 
Migration into a Cloud
Migration into a CloudMigration into a Cloud
Migration into a Cloud
Divya S
 
Network Layer design Issues.pptx
Network Layer design Issues.pptxNetwork Layer design Issues.pptx
Network Layer design Issues.pptx
Acad
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
JYoTHiSH o.s
 
Levels of Virtualization.docx
Levels of Virtualization.docxLevels of Virtualization.docx
Levels of Virtualization.docx
kumari36
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
Dabbal Singh Mahara
 
Virtualization in cloud computing
Virtualization in cloud computingVirtualization in cloud computing
Virtualization in cloud computing
Mohammad Ilyas Malik
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
vampugani
 
OLAP operations
OLAP operationsOLAP operations
OLAP operations
kunj desai
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
huda2018
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed system
ishapadhy
 
Scheduling Definition, objectives and types
Scheduling Definition, objectives and types Scheduling Definition, objectives and types
Scheduling Definition, objectives and types
Maitree Patel
 
Implementation levels of virtualization
Implementation levels of virtualizationImplementation levels of virtualization
Implementation levels of virtualization
Gokulnath S
 
RPC: Remote procedure call
RPC: Remote procedure callRPC: Remote procedure call
RPC: Remote procedure call
Sunita Sahu
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Ooad unit – 1 introduction
Ooad unit – 1 introductionOoad unit – 1 introduction
Ooad unit – 1 introduction
Babeetha Muruganantham
 

La actualidad más candente (20)

Developing a Map Reduce Application
Developing a Map Reduce ApplicationDeveloping a Map Reduce Application
Developing a Map Reduce Application
 
Os structure
Os structureOs structure
Os structure
 
Principal source of optimization in compiler design
Principal source of optimization in compiler designPrincipal source of optimization in compiler design
Principal source of optimization in compiler design
 
Hardware and Software parallelism
Hardware and Software parallelismHardware and Software parallelism
Hardware and Software parallelism
 
Common Standards in Cloud Computing
Common Standards in Cloud ComputingCommon Standards in Cloud Computing
Common Standards in Cloud Computing
 
Migration into a Cloud
Migration into a CloudMigration into a Cloud
Migration into a Cloud
 
Network Layer design Issues.pptx
Network Layer design Issues.pptxNetwork Layer design Issues.pptx
Network Layer design Issues.pptx
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
 
Levels of Virtualization.docx
Levels of Virtualization.docxLevels of Virtualization.docx
Levels of Virtualization.docx
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 
Virtualization in cloud computing
Virtualization in cloud computingVirtualization in cloud computing
Virtualization in cloud computing
 
Memory Management in OS
Memory Management in OSMemory Management in OS
Memory Management in OS
 
OLAP operations
OLAP operationsOLAP operations
OLAP operations
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed system
 
Scheduling Definition, objectives and types
Scheduling Definition, objectives and types Scheduling Definition, objectives and types
Scheduling Definition, objectives and types
 
Implementation levels of virtualization
Implementation levels of virtualizationImplementation levels of virtualization
Implementation levels of virtualization
 
RPC: Remote procedure call
RPC: Remote procedure callRPC: Remote procedure call
RPC: Remote procedure call
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Ooad unit – 1 introduction
Ooad unit – 1 introductionOoad unit – 1 introduction
Ooad unit – 1 introduction
 

Similar a MapReduce Programming Model

writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
jani shaik
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
WasyihunSema2
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
NelakurthyVasanthRed1
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
TSANKARARAO
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
Haripritha
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
KhanKhaja1
 
Hadoop
HadoopHadoop
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
Jakir Hossain
 
Mapreduce Hadop.pptx
Mapreduce Hadop.pptxMapreduce Hadop.pptx
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
Ahmad El Tawil
 
closd computing 4th updated MODULE-4.pptx
closd computing 4th updated MODULE-4.pptxclosd computing 4th updated MODULE-4.pptx
closd computing 4th updated MODULE-4.pptx
MaruthiPrasad96
 
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
Jose Luis Lopez Pino
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
Dr. C.V. Suresh Babu
 
Map reduce advantages over parallel databases
Map reduce advantages over parallel databases Map reduce advantages over parallel databases
Map reduce advantages over parallel databases
Ahmad El Tawil
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
Harisankar H
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
Abhishek Singh
 
E031201032036
E031201032036E031201032036
E031201032036
ijceronline
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User Group
Csaba Toth
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
Jyotirmoy Dey
 

Similar a MapReduce Programming Model (20)

writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
Big Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdfBig Data Analytics Chapter3-6@2021.pdf
Big Data Analytics Chapter3-6@2021.pdf
 
Big Data.pptx
Big Data.pptxBig Data.pptx
Big Data.pptx
 
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdfmodule3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
module3part-1-bigdata-230301002404-3db4f2a4 (1).pdf
 
Mapreduce script
Mapreduce scriptMapreduce script
Mapreduce script
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
 
Hadoop
HadoopHadoop
Hadoop
 
Map reducecloudtech
Map reducecloudtechMap reducecloudtech
Map reducecloudtech
 
Mapreduce Hadop.pptx
Mapreduce Hadop.pptxMapreduce Hadop.pptx
Mapreduce Hadop.pptx
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
closd computing 4th updated MODULE-4.pptx
closd computing 4th updated MODULE-4.pptxclosd computing 4th updated MODULE-4.pptx
closd computing 4th updated MODULE-4.pptx
 
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Map reduce advantages over parallel databases
Map reduce advantages over parallel databases Map reduce advantages over parallel databases
Map reduce advantages over parallel databases
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
E031201032036
E031201032036E031201032036
E031201032036
 
Hadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User GroupHadoop and Mapreduce for .NET User Group
Hadoop and Mapreduce for .NET User Group
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 

Más de AdarshaDhakal

cloud_ch1.pptx
cloud_ch1.pptxcloud_ch1.pptx
cloud_ch1.pptx
AdarshaDhakal
 
Concealed Object Recognition
Concealed Object RecognitionConcealed Object Recognition
Concealed Object Recognition
AdarshaDhakal
 
Highway Networks
Highway NetworksHighway Networks
Highway Networks
AdarshaDhakal
 
An IoT based smart irrigation management system(SIMS) using machine learning ...
An IoT based smart irrigation management system(SIMS) using machine learning ...An IoT based smart irrigation management system(SIMS) using machine learning ...
An IoT based smart irrigation management system(SIMS) using machine learning ...
AdarshaDhakal
 
Concept Sorting in Knowledge Elicitation
Concept Sorting in Knowledge ElicitationConcept Sorting in Knowledge Elicitation
Concept Sorting in Knowledge Elicitation
AdarshaDhakal
 
Shape Preserving Interpolation Using C2 Rational Cubic Spline
Shape Preserving Interpolation Using C2 Rational Cubic SplineShape Preserving Interpolation Using C2 Rational Cubic Spline
Shape Preserving Interpolation Using C2 Rational Cubic Spline
AdarshaDhakal
 

Más de AdarshaDhakal (6)

cloud_ch1.pptx
cloud_ch1.pptxcloud_ch1.pptx
cloud_ch1.pptx
 
Concealed Object Recognition
Concealed Object RecognitionConcealed Object Recognition
Concealed Object Recognition
 
Highway Networks
Highway NetworksHighway Networks
Highway Networks
 
An IoT based smart irrigation management system(SIMS) using machine learning ...
An IoT based smart irrigation management system(SIMS) using machine learning ...An IoT based smart irrigation management system(SIMS) using machine learning ...
An IoT based smart irrigation management system(SIMS) using machine learning ...
 
Concept Sorting in Knowledge Elicitation
Concept Sorting in Knowledge ElicitationConcept Sorting in Knowledge Elicitation
Concept Sorting in Knowledge Elicitation
 
Shape Preserving Interpolation Using C2 Rational Cubic Spline
Shape Preserving Interpolation Using C2 Rational Cubic SplineShape Preserving Interpolation Using C2 Rational Cubic Spline
Shape Preserving Interpolation Using C2 Rational Cubic Spline
 

Último

原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 

Último (20)

原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 

MapReduce Programming Model

  • 1. MapReduce Programming Model Adarsha Dhakal Jaydeep Shah Prakash Upadhyaya Ritu Ratnam 1
  • 3. Introduction • MapReduce is a programming model introduced by Google for processing and generating large data sets on clusters of computers. • Google first formulated the framework for the purpose of serving Google’s Web page indexing, and the new framework replaced earlier indexing algorithms. • Beginner developers find the MapReduce framework beneficial because library routines can be used to create parallel programs without any worries about infra-cluster communication, task monitoring or failure handling processes. • MapReduce runs on a large cluster of commodity machines and is highly scalable. • It has several forms of implementation provided by multiple programming languages, like Java, C# and C++.
  • 4. • MapReduce is a general-purpose programming model for data- intensive computing. • It was introduced by Google in 2004 to construct its web index. • It is also used at Yahoo, Facebook etc. It uses a parallel computing model that distributes computational tasks to large number of nodes(approximately 1000-10000 nodes.) • It is fault-tolerable. It can work even when 1600 nodes among 1800 nodes fails. • Hadoop framework from Apache Software Foundation is an implementation of MapReduce Programming Model
  • 5.
  • 6. Phases for MapReduce 1. Input Splits 2. Mapping 3. Shuffling 4. Sorting 5. Reducing
  • 7.
  • 8. Steps for MapReduce • Step 1: Transform raw data into key/value pairs in parallel. • The mapper will get the data file and make the Rating the key and the values will be the reviews. We will add number 1 for reviews. • Step 2: Shuffle and sort by the MapReduce model. • The process of transferring mappers’ intermediate output to the reducer is known as shuffling. It will collect all the reviews(number 1s) together with the individual key and it will sort them. it will get sorted by key. • Step3: Process the data using Reduce. • Reduce will count each value(number 1) for each key.
  • 9. • Although, the map and reduce functions in MapReduce model is not exactly same as in functional programming. • Map and Reduce functions in MapReduce model: • Map: It process a (key, value) pair and returns a list of (intermediate key, value) pairs map(k1, v1)→list(k2, v2) • Reduce: It merges all intermediate values having the same intermediate key reduce(k2, list(v2))→list(v3)
  • 10.
  • 11. Basic Concept • In MapReduce model, user has to write only two functions map and reduce. • Few examples that can be easily expressed as MapReduce computations: • Distributed Grep ( is an efficient way to utilize a Hadoop cluster to find log messages hidden within terabytes of log data) • Count of URL Access Frequency • Inverted Index • Mining
  • 12.
  • 13. Advantages • MapReduce facilitates automatic parallelization and distribution, reducing the time required to run the programs • MapReduce provides fault tolerance by re-executing, writing map output to a distributed file system, and restarting failed map or reducer task • MapReduce is a cost-effective solution for processing of data • MapReduce processes large volume of unprocessed data very quickly • MapReduce utilizes simple programming model to handle tasks more efficiently and quickly and is easy to learn • MapReduce is flexible and works with several Hadoop languages to handle and store data
  • 14. Limitations • MapReduce is a low-level programming model which involves a lot of writing code • The batch-based processing nature of MapReduce makes it unsuitable for real-time processing • It does not support data pipelining or overlapping of Map and Reduce functions • Task initialization, coordination, monitoring, and scheduling take up a large chunk of MapReduce's execution time and reduce its performance • MapReduce cannot cache the intermediate data in memory, thereby diminishing Hadoop’s performance
  • 15. The data we have has 20491 rows and 2 columns, and our task is to provide individual count of ratings.
  • 16. MAPPING each rating with a shuffle and giving counter of 1. Later sorting the ratings with the count.
  • 17. REDUCING leads to giving lesser number of data. Each rating has their total count from the data from Review of Hotel
  • 18. Implementing MapReduce Programming Model • Hadoop, developed by Apache • Spark, developed by AMPLab at UC Berkley • Phoenix++, developed at Stanford University • MARISSA (MApReduce Implementation for Streaming Science Application, developed at SUNY Binghamton • DRYAD and DRYADLINQ, developed by Microsoft • MapReduce-MPI, Developed by Steve Plimpton (Sandia) • Disco, developed by NOKIA • Themis, developed by Rasmussen et al • MR4C, developed by Skybox Imaging
  • 19. Bibliography • MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat Google, Inc. • MapReduce Tutorial, https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html • Hadoop – MapReduce, https://www.tutorialspoint.com/hadoop/hadoop_mapreduce.htm • MapReduce-Implementation-in-Python, https://github.com/rshah204/MapReduce-Implementation-in- Python/blob/master/MapReduce.ipynb • Hotel Reviews, https://www.kaggle.com/datasets/yash10kundu/hotel-reviews?resource=download • MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON, Zeba Khanam and Shafali Agarwal, Department of Computer Application, JSSATE, Noida, IJCSIT Vol 7, No 4, August 2015