SlideShare una empresa de Scribd logo
1 de 25
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Agenda for today’s Session
 MapReduce Way
 Classes and Packages in MapReduce
 Explanation of a Complete MapReduce Program
 MapReduce Examples on Analytics
 MapReduce Example on Testing
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Example on Word Count Process
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Way
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Way – Word Count Process
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Input/Output Classes in MapReduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Input Format – Class Hierarchy
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Output Format – Class Hierarchy
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Packages and Classes in Word Count
MapReduce Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Packages to Import
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
All these packages are present in
hadoop-common.jar
All these
packages are
present in
hadoop-mapreduce-
client-core.jar
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Mapper Class
public static class Map extends
Mapper<LongWritable, Text, Text, IntWritable> {
Name of the Mapper Class which
inherits Super Class Mapper
Mapper Class takes 4 Arguments i.e.
Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Reducer Class
public static class Reduce extends
Reducer<Text, IntWritable, Text, IntWritable> {
Name of the Reducer Class which
inherits Super Class Reducer
Reducer Class takes 4 Arguments i.e.
Reducer <KEYIN, VALUEIN, KEYOUT, VALUEOUT>
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Its Time to see some MapReduce Examples
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce is useful in a wide range of applications in multiple domains.
It is majorly used for 2 things:
 Analytics: Process the data and give the desired results
 Testing: Perform few test cases using MRUnit
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Let us see few MapReduce Examples
on Analytics
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Temperature Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Weather Forecasting
 Problem Statement:
» Analysing weather data of Austin to determine Hot and Cold
Days.
We have weather data set of Austin by NCIE.
NOAA's National Centres for Environmental Information (NCEI)
(previously NCDC) is responsible for preserving, monitoring, assessing,
and providing public access to the Nation's treasure of climate and
historical weather data and information.
Refer -> ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01
Temperature Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Temperature Example - Weather Dataset
6th Column
Max Temp
6th Column
Min Temp
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Last.fm Example
is an online music website where users listen to various tracks,
the data gets collected like shown below. Write a map reduce
program to get the Number of unique listeners.
The data is coming in log files and looks like as shown below:
UserId TrackId Shared Radio Skip
100001 150 1 1 0
100005 103 0 0 1
100142 78 1 0 0
110005 289 1 0 1
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Let us see a MapReduce Example
on Testing
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MRUnit Testing Framework
 Provides 4 drivers for separately testing MapReduce code
» MapDriver
» ReduceDriver
» MapReduceDriver
» PipelineMapReduceDriver
 Helps in filling the gap between MapReduce programs and JUnit*
 Better control on log messages with JUnit Integration
*JUnit is a simple framework
to write repeatable tests.
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce MRUnit Example
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Learning Resources
 Hadoop Tutorial: www.edureka.co/blog/hadoop-tutorial
 MapReduce Tutorial: www.edureka.co/blog/mapreduce-tutorial
 MapReduce Interview Questions:
www.edureka.co/blog/interview-questions/hadoop-interview-questions-mapreduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Thank You …
Questions/Queries/Feedback

Más contenido relacionado

La actualidad más candente

Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 

La actualidad más candente (20)

Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Why Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) ModelWhy Spark Is the Next Top (Compute) Model
Why Spark Is the Next Top (Compute) Model
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
MapReduce
MapReduceMapReduce
MapReduce
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Introduction to HiveQL
Introduction to HiveQLIntroduction to HiveQL
Introduction to HiveQL
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 

Similar a MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka

Final ProjectsData SetsData Sets in R Packages.docx
Final ProjectsData SetsData Sets in R Packages.docxFinal ProjectsData SetsData Sets in R Packages.docx
Final ProjectsData SetsData Sets in R Packages.docx
AKHIL969626
 
Manno_Larry_Resume
Manno_Larry_ResumeManno_Larry_Resume
Manno_Larry_Resume
Larry Manno
 

Similar a MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka (20)

Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
 
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
Hadoop Administration Training | Hadoop Administration Tutorial | Hadoop Admi...
 
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |EdurekaHadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka
 
Final ProjectsData SetsData Sets in R Packages.docx
Final ProjectsData SetsData Sets in R Packages.docxFinal ProjectsData SetsData Sets in R Packages.docx
Final ProjectsData SetsData Sets in R Packages.docx
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Jonathan Hodge-SPEDDEXES-2014
Jonathan Hodge-SPEDDEXES-2014Jonathan Hodge-SPEDDEXES-2014
Jonathan Hodge-SPEDDEXES-2014
 
Enhancing the TIMES New User Experience - Second Step, a VEDA TIMES-Starter M...
Enhancing the TIMES New User Experience - Second Step, a VEDA TIMES-Starter M...Enhancing the TIMES New User Experience - Second Step, a VEDA TIMES-Starter M...
Enhancing the TIMES New User Experience - Second Step, a VEDA TIMES-Starter M...
 
Hawaii Utility Integration Efforts
Hawaii Utility Integration EffortsHawaii Utility Integration Efforts
Hawaii Utility Integration Efforts
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
 
Manno_Larry_Resume
Manno_Larry_ResumeManno_Larry_Resume
Manno_Larry_Resume
 
Data Science Full Course | Edureka
Data Science Full Course | EdurekaData Science Full Course | Edureka
Data Science Full Course | Edureka
 
What's in a Map?
What's in a Map?What's in a Map?
What's in a Map?
 
Predictive Analytics Using R | Edureka
Predictive Analytics Using R | EdurekaPredictive Analytics Using R | Edureka
Predictive Analytics Using R | Edureka
 
Haley brown resume 2017
Haley brown resume 2017Haley brown resume 2017
Haley brown resume 2017
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Supply Chain Planning and SAP APO Overview
Supply Chain Planning and SAP APO OverviewSupply Chain Planning and SAP APO Overview
Supply Chain Planning and SAP APO Overview
 
Data analytics and downscaling for climate research in a big data world
Data analytics and downscaling for climate research in a big data worldData analytics and downscaling for climate research in a big data world
Data analytics and downscaling for climate research in a big data world
 
SC5 Hangout2 pilot 1 description
SC5 Hangout2  pilot 1 descriptionSC5 Hangout2  pilot 1 description
SC5 Hangout2 pilot 1 description
 
The road to continuous deployment: a case study (DPC16)
The road to continuous deployment: a case study (DPC16)The road to continuous deployment: a case study (DPC16)
The road to continuous deployment: a case study (DPC16)
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 

Más de Edureka!

Más de Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

MapReduce Example | MapReduce Programming | Hadoop MapReduce Tutorial | Edureka