This Hadoop tutorial on MapReduce Example ( Mapreduce Tutorial Blog Series: https://goo.gl/w0on2G ) will help you understand how to write a MapReduce program in Java. You will also get to see multiple mapreduce examples on Analytics and Testing.
Check our complete Hadoop playlist here: https://goo.gl/ExJdZs
Below are the topics covered in this tutorial:
1) MapReduce Way
2) Classes and Packages in MapReduce
3) Explanation of a Complete MapReduce Program
4) MapReduce Examples on Analytics
5) MapReduce Example on Testing - MRUnit
2. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Agenda for today’s Session
MapReduce Way
Classes and Packages in MapReduce
Explanation of a Complete MapReduce Program
MapReduce Examples on Analytics
MapReduce Example on Testing
10. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Packages to Import
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
All these packages are present in
hadoop-common.jar
All these
packages are
present in
hadoop-mapreduce-
client-core.jar
11. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Mapper Class
public static class Map extends
Mapper<LongWritable, Text, Text, IntWritable> {
Name of the Mapper Class which
inherits Super Class Mapper
Mapper Class takes 4 Arguments i.e.
Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
12. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Reducer Class
public static class Reduce extends
Reducer<Text, IntWritable, Text, IntWritable> {
Name of the Reducer Class which
inherits Super Class Reducer
Reducer Class takes 4 Arguments i.e.
Reducer <KEYIN, VALUEIN, KEYOUT, VALUEOUT>
14. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MapReduce is useful in a wide range of applications in multiple domains.
It is majorly used for 2 things:
Analytics: Process the data and give the desired results
Testing: Perform few test cases using MRUnit
17. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Weather Forecasting
Problem Statement:
» Analysing weather data of Austin to determine Hot and Cold
Days.
We have weather data set of Austin by NCIE.
NOAA's National Centres for Environmental Information (NCEI)
(previously NCDC) is responsible for preserving, monitoring, assessing,
and providing public access to the Nation's treasure of climate and
historical weather data and information.
Refer -> ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01
Temperature Example
20. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
Last.fm Example
is an online music website where users listen to various tracks,
the data gets collected like shown below. Write a map reduce
program to get the Number of unique listeners.
The data is coming in log files and looks like as shown below:
UserId TrackId Shared Radio Skip
100001 150 1 1 0
100005 103 0 0 1
100142 78 1 0 0
110005 289 1 0 1
22. www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
MRUnit Testing Framework
Provides 4 drivers for separately testing MapReduce code
» MapDriver
» ReduceDriver
» MapReduceDriver
» PipelineMapReduceDriver
Helps in filling the gap between MapReduce programs and JUnit*
Better control on log messages with JUnit Integration
*JUnit is a simple framework
to write repeatable tests.