SlideShare a Scribd company logo
1 of 131
BUMPER
Topic 1
HDFS – Hands On (Part – 1)
Class 2 – Hadoop Distributed File System
AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
Pre-requisites
HDFS – Hands On
Virtual Machine is up and running.
Connected to your Virtual Machine using putty as ‘hduser’.
Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
hadoop: This is the binary executable.
Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
hadoop: This is the binary executable.
fs: Invokes the Hadoop file system, which is the HDFS.
Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
hadoop: This is the binary executable.
fs: Invokes the Hadoop file system, which is the HDFS.
<command>: Indicates what is the purpose of the
statement and always preceded by a ‘-‘.
Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
hadoop: This is the binary executable.
fs: Invokes the Hadoop file system, which is the HDFS.
<command>: Indicates what is the purpose of the statement and
always preceded by a ‘-‘.
<args>: Indicates the arguments that are applicable for the
command.
Where do DataNodes store data?
HDFS – Hands On
Where do DataNodes store data?
HDFS – Hands On
hadoop.tmp.dir = /tmp/hadoop
Where do DataNodes store data?
HDFS – Hands On
hadoop.tmp.dir = /tmp/hadoop
dfs.data.dir = ($hadoop.tmp.dir)/dfs/data
Where do DataNodes store data?
HDFS – Hands On
hadoop.tmp.dir = /tmp/hadoop
dfs.data.dir = ($hadoop.tmp.dir)/dfs/data
= /tmp/hadoop/dfs/data
Where do DataNodes store data?
HDFS – Hands On
hadoop.tmp.dir = /tmp/hadoop
dfs.data.dir = ($hadoop.tmp.dir)/dfs/data
= /tmp/hadoop/dfs/data
VERSION >> Java properties file
blk_********* >> Raw data of a file
blk_******.meta >> Metadata of the block
How come there is a block when we have not loaded any file?
jobtracker.info
HDFS – Hands On
fsck
HDFS – Hands On
Generates a summary report that lists the overall health of the filesystem.
fsck
HDFS – Hands On
Total size: Indicates the size of the directory (root directory in our case).
Does not account for replication.
Total dirs: Indicates the number of directories in HDFS
Total files: Indicates the number of files in HDFS
Total blocks: Indicates the number of blocks
Default replication factor:
Average replication factor:
Corrupt blocks:
Missing replicas:
Number of data nodes:
Number of racks:
Edit .bashrc
HDFS – Hands On
Navigate to the home directory.
cd
List hidden files.
ls -a
Edit the .bashrc file.
vi .bashrc
Update HADOOP paths using ‘export’ command.
export HADOOP_CONF=/home/hduser/hadoop/conf
export HADOOP_PREFIX=/home/hduser/hadoop
# Add Hadoop bin/ directory to path
export PATH=$PATH:$HADOOP_PREFIX/bin
Execute the updated contents of the .bashrc file.
source ~/.bashrc
copyFromLocal
HDFS – Hands On
Copies file from local file system to HDFS.
hadoop fs –copyFromLocal <Path to source file on Local File System> <Target
path in HDFS>
hadoop fs –copyFromLocal NOTICE.txt noticehdfs.txt
copyFromLocal
HDFS – Hands On
copyFromLocal commands internally results in:
 a file getting split into multiple blocks.
 the client contacting the NameNode to find out where each block
should be copied in the cluster.
 replication of blocks to nodes assigned by NameNode.
How many blocks were created?
HDFS – Hands On
RECAP
HDFS Commonly used commands
HDFS Concepts
BUMPER
BUMPER
Topic 2
HDFS – Hands On (Part – 2)
Class 2 – Hadoop Distributed File System
AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
Load a file larger than the block size
HDFS – Hands On
Load a 200 MB file and see how many blocks were created.
Command to generate a 200 MB dummy file.
dd if=/dev/zero of=file.txt count=1024 bs=204800
hadoop fs –copyFromLocal file.txt file.txt
cd /tmp/hadoop/dfs/data/current
ls –lrt
Load a file larger than the block size
HDFS – Hands On
Block 1 = 64 MB
Block 2 = 64 MB
Block 3 = 8 MB
Block 4 = 64 MB
fsck
HDFS – Hands On
fsck after loading 2 additional files.
Total size has increased.
Total dirs: 7. Additions - /user and /user/hduser directories.
Total files: 3. Additions - 2 newly loaded files.
Total blocks: 6. Additions - 1 block of the 1st file and 4 blocks of the 2nd file.
cat
HDFS – Hands On
Displays contents of file on the command prompt.
hadoop fs –cat <Path of file in HDFS>
hadoop fs –cat noticehdfs.txt
copyToLocal
HDFS – Hands On
Copies file from HDFS to local file system.
hadoop fs –copyToLocal <Path of file in HDFS> <Path of file in Local File System>
hadoop fs –copyToLocal noticehdfs.txt noticelocal.txt
mkdir
HDFS – Hands On
Creates a directory inside HDFS.
HDFS paths are relative.
Creates directory in current user’s home directory
hadoop fs –mkdir newdir
Creates new directory under root
hadoop fs –mkdir /newdir
rm
HDFS – Hands On
Removes file (s).
hadoop fs –rm <File Name>
Removes file and empty directories.
hadoop fs –rm noticehdfs.txt
Trash feature
HDFS – Hands On
Prevents accidental deletion of files and directories.
Disabled by default.
To enable, configure the fs.trash.interval property in core-site.xml file.
RECAP
HDFS Commonly used commands
HDFS Concepts
BUMPER
BUMPER
Topic 3
HDFS – Web UI
Class 2 – Hadoop Distributed File System
AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
NameNode Web Interface
HDFS – Hands On
HDFS Web Interface URL.
http://<namenode_host>:50070/
From the Virtual Machine:
http://localhost:50070/
From outside the Virtual Machine:
http://<IP Address of VM or Hostname of VM>:50070/
Example- http://192.168.234.135:50070/
NameNode Web Interface
HDFS – Hands On
Server Name and Port
Last start time of the NameNode
Hadoop Version, followed by subversion source code repository
To browse the files in HDFS
View NameNode log files
Number of files, directories and blocks. Heap memory utilized/available.
Storage capacity of machines in the cluster
How much space utilized in HDFS
Space utilized by O/S, Applications etc.
Amount of space available on HDFS
How many blocks have replicas less than Replication Factor
Nodes that are active and in contact with NameNode
Nodes that are NOT in contact with NameNode
Nodes administratively removed from the cluster
RECAP
HDFS Web UI
BUMPER
BUMPER
Topic 4
Class 2 – Hadoop Distributed File System
MapReduce – Hands On (Part – 1)
AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
How does MapReduce work?
MapReduce
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Reducer
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Reducer
Mapping Phase
Reducing Phase
How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Reducer
Mapping Phase
Reducing Phase
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic Specify Path & Output
format
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic Specify Path & Output
format
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic Specify Path & Output
format
Replication, Rack
Awareness etc.
Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic Specify Path & Output
format
Replication, Rack
Awareness etc.
MapReduce Execution Framework
MapReduce
MapReduce Execution Framework
MapReduce
Mapper Process
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper ProcessDriver
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Driver
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
InputFormat
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Input Split 1
InputFormat
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Calculates
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Calculates
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Calculates
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Record
Reader
Calculates
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Record
Reader
Reads Reads
Calculates
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Record
Reader
Reads Reads
Calculates
Defines
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Shuffle
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
Calculates
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
OutputFormat
Calculates
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Output Data
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
OutputFormat
Calculates
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
OutputFormat
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
OutputFormat
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Defines
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Defines
Defines
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Defines
Defines
Calculates
Defines
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Defines
Defines
Calculates
Defines
Defines
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
RECAP
MapReduce Execution Framework
BUMPER
BUMPER
Topic 5
Class 2 – Hadoop Distributed File System
MapReduce – Hands On (Part – 2)
AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
Java MapReduce Programming
MapReduce
Hello World of MapReduce >> Word Count program
Eclipse – Integrated Development Environment (IDE)
https://www.eclipse.org/downloads/
RECAP
Part two of Java MapReduce program
BUMPER

More Related Content

What's hot

Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
Hafizur Rahman
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
pappupassindia
 

What's hot (20)

Hadoop
HadoopHadoop
Hadoop
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
 
Introduction to Hadoop part 2
Introduction to Hadoop part 2Introduction to Hadoop part 2
Introduction to Hadoop part 2
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overviewHdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 

Viewers also liked

Hadoop eco system-first class
Hadoop eco system-first classHadoop eco system-first class
Hadoop eco system-first class
alogarg
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
soujavajug
 
Stata datman
Stata datmanStata datman
Stata datman
izahn
 

Viewers also liked (20)

Big data gaurav
Big data gauravBig data gaurav
Big data gaurav
 
WebC2 t1 t2-t3
WebC2 t1 t2-t3WebC2 t1 t2-t3
WebC2 t1 t2-t3
 
Analytics overview class-ppt
Analytics overview  class-pptAnalytics overview  class-ppt
Analytics overview class-ppt
 
Class ppt intro to-sas
Class ppt   intro to-sasClass ppt   intro to-sas
Class ppt intro to-sas
 
C1 t1,t2,t3,t4 complete
C1 t1,t2,t3,t4 completeC1 t1,t2,t3,t4 complete
C1 t1,t2,t3,t4 complete
 
Class ppt overview of analytics
Class ppt overview of analyticsClass ppt overview of analytics
Class ppt overview of analytics
 
Class ppt intro to r
Class ppt intro to rClass ppt intro to r
Class ppt intro to r
 
Hadoop story
Hadoop storyHadoop story
Hadoop story
 
Hadoop eco system-first class
Hadoop eco system-first classHadoop eco system-first class
Hadoop eco system-first class
 
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
 
Hadoop map reduce data flow
Hadoop map reduce data flowHadoop map reduce data flow
Hadoop map reduce data flow
 
Map reduce
Map reduceMap reduce
Map reduce
 
HadoopFileFormats_2016
HadoopFileFormats_2016HadoopFileFormats_2016
HadoopFileFormats_2016
 
Secrets in Kubernetes
Secrets in KubernetesSecrets in Kubernetes
Secrets in Kubernetes
 
Stata datman
Stata datmanStata datman
Stata datman
 
R Regression Models with Zelig
R Regression Models with ZeligR Regression Models with Zelig
R Regression Models with Zelig
 
Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities Overview
 
Graphing stata (2 hour course)
Graphing stata (2 hour course)Graphing stata (2 hour course)
Graphing stata (2 hour course)
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
Hadoop File System Shell Commands,
Hadoop File System Shell Commands,Hadoop File System Shell Commands,
Hadoop File System Shell Commands,
 

Similar to Bd class 2 complete

Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
Edureka!
 

Similar to Bd class 2 complete (20)

Data analysis on hadoop
Data analysis on hadoopData analysis on hadoop
Data analysis on hadoop
 
MapReduce1.pptx
MapReduce1.pptxMapReduce1.pptx
MapReduce1.pptx
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
HDFS_Command_Reference
HDFS_Command_ReferenceHDFS_Command_Reference
HDFS_Command_Reference
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Hadoop basic commands
Hadoop basic commandsHadoop basic commands
Hadoop basic commands
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands Top 10 Hadoop Shell Commands
Top 10 Hadoop Shell Commands
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installation
 
Hadoop, Map Reduce and Apache Pig tutorial
Hadoop, Map Reduce and Apache Pig tutorialHadoop, Map Reduce and Apache Pig tutorial
Hadoop, Map Reduce and Apache Pig tutorial
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Recently uploaded (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 

Bd class 2 complete

  • 2.
  • 3. Topic 1 HDFS – Hands On (Part – 1) Class 2 – Hadoop Distributed File System
  • 4. AGENDA • What is Big Data? • Hadoop Distributed File System • MapReduce • Understanding Hadoop Ecosystem • Setting up a Hadoop Cluster • HDFS – Hands On • MapReduce-Hands On
  • 5. Pre-requisites HDFS – Hands On Virtual Machine is up and running. Connected to your Virtual Machine using putty as ‘hduser’.
  • 6. Command Syntax HDFS – Hands On hadoop fs –ls / (To list directory contents)
  • 7. Command Syntax HDFS – Hands On hadoop fs –ls / (To list directory contents) hadoop fs -<command> <args>
  • 8. Command Syntax HDFS – Hands On hadoop fs –ls / (To list directory contents) hadoop fs -<command> <args> hadoop: This is the binary executable.
  • 9. Command Syntax HDFS – Hands On hadoop fs –ls / (To list directory contents) hadoop fs -<command> <args> hadoop: This is the binary executable. fs: Invokes the Hadoop file system, which is the HDFS.
  • 10. Command Syntax HDFS – Hands On hadoop fs –ls / (To list directory contents) hadoop fs -<command> <args> hadoop: This is the binary executable. fs: Invokes the Hadoop file system, which is the HDFS. <command>: Indicates what is the purpose of the statement and always preceded by a ‘-‘.
  • 11. Command Syntax HDFS – Hands On hadoop fs –ls / (To list directory contents) hadoop fs -<command> <args> hadoop: This is the binary executable. fs: Invokes the Hadoop file system, which is the HDFS. <command>: Indicates what is the purpose of the statement and always preceded by a ‘-‘. <args>: Indicates the arguments that are applicable for the command.
  • 12. Where do DataNodes store data? HDFS – Hands On
  • 13. Where do DataNodes store data? HDFS – Hands On hadoop.tmp.dir = /tmp/hadoop
  • 14. Where do DataNodes store data? HDFS – Hands On hadoop.tmp.dir = /tmp/hadoop dfs.data.dir = ($hadoop.tmp.dir)/dfs/data
  • 15. Where do DataNodes store data? HDFS – Hands On hadoop.tmp.dir = /tmp/hadoop dfs.data.dir = ($hadoop.tmp.dir)/dfs/data = /tmp/hadoop/dfs/data
  • 16. Where do DataNodes store data? HDFS – Hands On hadoop.tmp.dir = /tmp/hadoop dfs.data.dir = ($hadoop.tmp.dir)/dfs/data = /tmp/hadoop/dfs/data VERSION >> Java properties file blk_********* >> Raw data of a file blk_******.meta >> Metadata of the block How come there is a block when we have not loaded any file?
  • 18. fsck HDFS – Hands On Generates a summary report that lists the overall health of the filesystem.
  • 19. fsck HDFS – Hands On Total size: Indicates the size of the directory (root directory in our case). Does not account for replication. Total dirs: Indicates the number of directories in HDFS Total files: Indicates the number of files in HDFS Total blocks: Indicates the number of blocks Default replication factor: Average replication factor: Corrupt blocks: Missing replicas: Number of data nodes: Number of racks:
  • 20. Edit .bashrc HDFS – Hands On Navigate to the home directory. cd List hidden files. ls -a Edit the .bashrc file. vi .bashrc Update HADOOP paths using ‘export’ command. export HADOOP_CONF=/home/hduser/hadoop/conf export HADOOP_PREFIX=/home/hduser/hadoop # Add Hadoop bin/ directory to path export PATH=$PATH:$HADOOP_PREFIX/bin Execute the updated contents of the .bashrc file. source ~/.bashrc
  • 21. copyFromLocal HDFS – Hands On Copies file from local file system to HDFS. hadoop fs –copyFromLocal <Path to source file on Local File System> <Target path in HDFS> hadoop fs –copyFromLocal NOTICE.txt noticehdfs.txt
  • 22. copyFromLocal HDFS – Hands On copyFromLocal commands internally results in:  a file getting split into multiple blocks.  the client contacting the NameNode to find out where each block should be copied in the cluster.  replication of blocks to nodes assigned by NameNode.
  • 23. How many blocks were created? HDFS – Hands On
  • 24. RECAP HDFS Commonly used commands HDFS Concepts
  • 27.
  • 28. Topic 2 HDFS – Hands On (Part – 2) Class 2 – Hadoop Distributed File System
  • 29. AGENDA • What is Big Data? • Hadoop Distributed File System • MapReduce • Understanding Hadoop Ecosystem • Setting up a Hadoop Cluster • HDFS – Hands On • MapReduce-Hands On
  • 30. Load a file larger than the block size HDFS – Hands On Load a 200 MB file and see how many blocks were created. Command to generate a 200 MB dummy file. dd if=/dev/zero of=file.txt count=1024 bs=204800 hadoop fs –copyFromLocal file.txt file.txt cd /tmp/hadoop/dfs/data/current ls –lrt
  • 31. Load a file larger than the block size HDFS – Hands On Block 1 = 64 MB Block 2 = 64 MB Block 3 = 8 MB Block 4 = 64 MB
  • 32. fsck HDFS – Hands On fsck after loading 2 additional files. Total size has increased. Total dirs: 7. Additions - /user and /user/hduser directories. Total files: 3. Additions - 2 newly loaded files. Total blocks: 6. Additions - 1 block of the 1st file and 4 blocks of the 2nd file.
  • 33. cat HDFS – Hands On Displays contents of file on the command prompt. hadoop fs –cat <Path of file in HDFS> hadoop fs –cat noticehdfs.txt
  • 34. copyToLocal HDFS – Hands On Copies file from HDFS to local file system. hadoop fs –copyToLocal <Path of file in HDFS> <Path of file in Local File System> hadoop fs –copyToLocal noticehdfs.txt noticelocal.txt
  • 35. mkdir HDFS – Hands On Creates a directory inside HDFS. HDFS paths are relative. Creates directory in current user’s home directory hadoop fs –mkdir newdir Creates new directory under root hadoop fs –mkdir /newdir
  • 36. rm HDFS – Hands On Removes file (s). hadoop fs –rm <File Name> Removes file and empty directories. hadoop fs –rm noticehdfs.txt
  • 37. Trash feature HDFS – Hands On Prevents accidental deletion of files and directories. Disabled by default. To enable, configure the fs.trash.interval property in core-site.xml file.
  • 38. RECAP HDFS Commonly used commands HDFS Concepts
  • 41.
  • 42. Topic 3 HDFS – Web UI Class 2 – Hadoop Distributed File System
  • 43. AGENDA • What is Big Data? • Hadoop Distributed File System • MapReduce • Understanding Hadoop Ecosystem • Setting up a Hadoop Cluster • HDFS – Hands On • MapReduce-Hands On
  • 44. NameNode Web Interface HDFS – Hands On HDFS Web Interface URL. http://<namenode_host>:50070/ From the Virtual Machine: http://localhost:50070/ From outside the Virtual Machine: http://<IP Address of VM or Hostname of VM>:50070/ Example- http://192.168.234.135:50070/
  • 45. NameNode Web Interface HDFS – Hands On Server Name and Port Last start time of the NameNode Hadoop Version, followed by subversion source code repository To browse the files in HDFS View NameNode log files Number of files, directories and blocks. Heap memory utilized/available. Storage capacity of machines in the cluster How much space utilized in HDFS Space utilized by O/S, Applications etc. Amount of space available on HDFS How many blocks have replicas less than Replication Factor Nodes that are active and in contact with NameNode Nodes that are NOT in contact with NameNode Nodes administratively removed from the cluster
  • 49.
  • 50. Topic 4 Class 2 – Hadoop Distributed File System MapReduce – Hands On (Part – 1)
  • 51. AGENDA • What is Big Data? • Hadoop Distributed File System • MapReduce • Understanding Hadoop Ecosystem • Setting up a Hadoop Cluster • HDFS – Hands On • MapReduce-Hands On
  • 52. How does MapReduce work? MapReduce
  • 53. How does MapReduce work? MapReduce Map Input List Map Output List Reduce Input List Reduce Output List Mapping Phase Reducing Phase
  • 54. How does MapReduce work? MapReduce Map Input List Map Output List Reduce Input List Reduce Output List Mapping Phase Reducing Phase
  • 55. How does MapReduce work? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Mapping Phase Reducing Phase
  • 56. How does MapReduce work? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Mapping Phase Reducing Phase
  • 57. How does MapReduce work? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Mapping Phase Reducing Phase
  • 58. How does MapReduce work? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Mapping Phase Reducing Phase
  • 59. How does MapReduce work? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Reducer Mapping Phase Reducing Phase
  • 60. How does MapReduce work? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Reducer Mapping Phase Reducing Phase
  • 61. How does MapReduce work? MapReduce Map Input List Map Output List Mapper Reduce Input List Reduce Output List Reducer Mapping Phase Reducing Phase
  • 62. Hadoop MapReduce MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 63. Hadoop MapReduce MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 64. Hadoop MapReduce MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 65. Hadoop MapReduce MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 66. Hadoop MapReduce MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 67. Hadoop MapReduce MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 68. Hadoop MapReduce MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 69. Hadoop MapReduce MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 70. Hadoop MapReduce MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 71. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output
  • 72. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS
  • 73. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format
  • 74. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format
  • 75. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’
  • 76. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records
  • 77. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records
  • 78. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic
  • 79. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic
  • 80. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic
  • 81. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic User Defined Logic
  • 82. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic User Defined Logic
  • 83. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic User Defined Logic Specify Path & Output format
  • 84. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic User Defined Logic Specify Path & Output format
  • 85. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic User Defined Logic Specify Path & Output format Replication, Rack Awareness etc.
  • 86. Hadoop MapReduce – Roles: User vs. Framework MapReduce <1, King Queen King> <King, 1> <Queen, 1> <King, 1> <2, Minister King Soldier> <3, Queen Soldier King> <Minister, 1> <King, 1> <Soldier, 1> <Queen, 1> <Soldier, 1> <King, 1> <King, 1> <King, 1> <King, 1> <King, 1> <Minister, 1> <Queen, 1> <Queen, 1> <Soldier,1> <Soldier,1> <King, (1,1,1,1)> <Minister, 1> <Queen, (1,1)> <Soldier, (1,1)> <King, 4> <Minister, 1> King Queen King Minister King Soldier Queen Soldier King Input Splitting Map Shuffling Reduce Result <Queen, 2> <Soldier, 2> Map Output Load data into HDFS Specify Path & Input Format Create ‘Input Splits’ Create individual Records User Defined Logic User Defined Logic Specify Path & Output format Replication, Rack Awareness etc.
  • 90. MapReduce Execution Framework MapReduce Reduce Process Mapper ProcessDriver
  • 91. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Driver
  • 92. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver
  • 93. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver InputFormat
  • 94. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Input Split 1 InputFormat
  • 95. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat
  • 96. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Calculates
  • 97. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Calculates
  • 98. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Calculates
  • 99. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Record Reader Calculates
  • 100. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Record Reader Reads Reads Calculates
  • 101. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Record Reader Reads Reads Calculates Defines
  • 102. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs
  • 103. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs
  • 104. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs
  • 105. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs
  • 106. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs
  • 107. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs
  • 108. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs Shuffle
  • 109. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle
  • 110. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 111. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 112. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Reducer Record Reader Reads Passes <K,V> pairs Reads Calculates Defines Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 113. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Reducer Record Reader Reads Passes <K,V> pairs Reads Passes <K,V> pairs Calculates Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 114. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Reduce Process Mapper Process Mapper Reducer Record Reader Reads Passes <K,V> pairs Reads Passes <K,V> pairs OutputFormat Calculates Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 115. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Output Data Reads Passes <K,V> pairs Reads Passes <K,V> pairs OutputFormat Calculates Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 116. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Reads Passes <K,V> pairs OutputFormat Calculates Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 117. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Reads Passes <K,V> pairs OutputFormat Calculates Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 118. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Writes Reads Passes <K,V> pairs Writes OutputFormat Calculates Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 119. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Writes Reads Passes <K,V> pairs Writes OutputFormat Defines Calculates Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 120. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Writes Reads Passes <K,V> pairs Writes OutputFormat Defines Defines Calculates Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 121. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Writes Reads Passes <K,V> pairs Writes OutputFormat Defines Defines Calculates Defines Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 122. MapReduce Execution Framework MapReduce Reduce Process Mapper Process Input HDFS File - inputFile.txt Block A Block B Block C Driver Mapper Reducer Record Reader Input Split 1 Input Split 2 Input Split 3 Input Split 4 Writer InputFormat Output Data Reduce Process Mapper Process Mapper Reducer Record Reader Writer Output Data Reads Passes <K,V> pairs Writes Reads Passes <K,V> pairs Writes OutputFormat Defines Defines Calculates Defines Defines Defines Defines Defines Passes <K,V> pairs Passes <K,V> pairs <K, V> pairs <K, V> pairs Partition Shuffle Sort
  • 124. BUMPER
  • 125. BUMPER
  • 126.
  • 127. Topic 5 Class 2 – Hadoop Distributed File System MapReduce – Hands On (Part – 2)
  • 128. AGENDA • What is Big Data? • Hadoop Distributed File System • MapReduce • Understanding Hadoop Ecosystem • Setting up a Hadoop Cluster • HDFS – Hands On • MapReduce-Hands On
  • 129. Java MapReduce Programming MapReduce Hello World of MapReduce >> Word Count program Eclipse – Integrated Development Environment (IDE) https://www.eclipse.org/downloads/
  • 130. RECAP Part two of Java MapReduce program
  • 131. BUMPER

Editor's Notes

  1. Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  2. Do we know the topic number for this?
  3. Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  4. Do we know the topic number for this?
  5. Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  6. Do we know the topic number for this?
  7. Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  8. Do we know the topic number for this?
  9. Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.  
  10. Do we know the topic number for this?