More Related Content Similar to Bd class 2 complete (20) Bd class 2 complete3. Topic 1
HDFS – Hands On (Part – 1)
Class 2 – Hadoop Distributed File System
4. AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
7. Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
8. Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
hadoop: This is the binary executable.
9. Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
hadoop: This is the binary executable.
fs: Invokes the Hadoop file system, which is the HDFS.
10. Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
hadoop: This is the binary executable.
fs: Invokes the Hadoop file system, which is the HDFS.
<command>: Indicates what is the purpose of the
statement and always preceded by a ‘-‘.
11. Command Syntax
HDFS – Hands On
hadoop fs –ls / (To list directory contents)
hadoop fs -<command> <args>
hadoop: This is the binary executable.
fs: Invokes the Hadoop file system, which is the HDFS.
<command>: Indicates what is the purpose of the statement and
always preceded by a ‘-‘.
<args>: Indicates the arguments that are applicable for the
command.
14. Where do DataNodes store data?
HDFS – Hands On
hadoop.tmp.dir = /tmp/hadoop
dfs.data.dir = ($hadoop.tmp.dir)/dfs/data
15. Where do DataNodes store data?
HDFS – Hands On
hadoop.tmp.dir = /tmp/hadoop
dfs.data.dir = ($hadoop.tmp.dir)/dfs/data
= /tmp/hadoop/dfs/data
16. Where do DataNodes store data?
HDFS – Hands On
hadoop.tmp.dir = /tmp/hadoop
dfs.data.dir = ($hadoop.tmp.dir)/dfs/data
= /tmp/hadoop/dfs/data
VERSION >> Java properties file
blk_********* >> Raw data of a file
blk_******.meta >> Metadata of the block
How come there is a block when we have not loaded any file?
18. fsck
HDFS – Hands On
Generates a summary report that lists the overall health of the filesystem.
19. fsck
HDFS – Hands On
Total size: Indicates the size of the directory (root directory in our case).
Does not account for replication.
Total dirs: Indicates the number of directories in HDFS
Total files: Indicates the number of files in HDFS
Total blocks: Indicates the number of blocks
Default replication factor:
Average replication factor:
Corrupt blocks:
Missing replicas:
Number of data nodes:
Number of racks:
20. Edit .bashrc
HDFS – Hands On
Navigate to the home directory.
cd
List hidden files.
ls -a
Edit the .bashrc file.
vi .bashrc
Update HADOOP paths using ‘export’ command.
export HADOOP_CONF=/home/hduser/hadoop/conf
export HADOOP_PREFIX=/home/hduser/hadoop
# Add Hadoop bin/ directory to path
export PATH=$PATH:$HADOOP_PREFIX/bin
Execute the updated contents of the .bashrc file.
source ~/.bashrc
21. copyFromLocal
HDFS – Hands On
Copies file from local file system to HDFS.
hadoop fs –copyFromLocal <Path to source file on Local File System> <Target
path in HDFS>
hadoop fs –copyFromLocal NOTICE.txt noticehdfs.txt
22. copyFromLocal
HDFS – Hands On
copyFromLocal commands internally results in:
a file getting split into multiple blocks.
the client contacting the NameNode to find out where each block
should be copied in the cluster.
replication of blocks to nodes assigned by NameNode.
28. Topic 2
HDFS – Hands On (Part – 2)
Class 2 – Hadoop Distributed File System
29. AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
30. Load a file larger than the block size
HDFS – Hands On
Load a 200 MB file and see how many blocks were created.
Command to generate a 200 MB dummy file.
dd if=/dev/zero of=file.txt count=1024 bs=204800
hadoop fs –copyFromLocal file.txt file.txt
cd /tmp/hadoop/dfs/data/current
ls –lrt
31. Load a file larger than the block size
HDFS – Hands On
Block 1 = 64 MB
Block 2 = 64 MB
Block 3 = 8 MB
Block 4 = 64 MB
32. fsck
HDFS – Hands On
fsck after loading 2 additional files.
Total size has increased.
Total dirs: 7. Additions - /user and /user/hduser directories.
Total files: 3. Additions - 2 newly loaded files.
Total blocks: 6. Additions - 1 block of the 1st file and 4 blocks of the 2nd file.
33. cat
HDFS – Hands On
Displays contents of file on the command prompt.
hadoop fs –cat <Path of file in HDFS>
hadoop fs –cat noticehdfs.txt
34. copyToLocal
HDFS – Hands On
Copies file from HDFS to local file system.
hadoop fs –copyToLocal <Path of file in HDFS> <Path of file in Local File System>
hadoop fs –copyToLocal noticehdfs.txt noticelocal.txt
35. mkdir
HDFS – Hands On
Creates a directory inside HDFS.
HDFS paths are relative.
Creates directory in current user’s home directory
hadoop fs –mkdir newdir
Creates new directory under root
hadoop fs –mkdir /newdir
36. rm
HDFS – Hands On
Removes file (s).
hadoop fs –rm <File Name>
Removes file and empty directories.
hadoop fs –rm noticehdfs.txt
37. Trash feature
HDFS – Hands On
Prevents accidental deletion of files and directories.
Disabled by default.
To enable, configure the fs.trash.interval property in core-site.xml file.
43. AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
44. NameNode Web Interface
HDFS – Hands On
HDFS Web Interface URL.
http://<namenode_host>:50070/
From the Virtual Machine:
http://localhost:50070/
From outside the Virtual Machine:
http://<IP Address of VM or Hostname of VM>:50070/
Example- http://192.168.234.135:50070/
45. NameNode Web Interface
HDFS – Hands On
Server Name and Port
Last start time of the NameNode
Hadoop Version, followed by subversion source code repository
To browse the files in HDFS
View NameNode log files
Number of files, directories and blocks. Heap memory utilized/available.
Storage capacity of machines in the cluster
How much space utilized in HDFS
Space utilized by O/S, Applications etc.
Amount of space available on HDFS
How many blocks have replicas less than Replication Factor
Nodes that are active and in contact with NameNode
Nodes that are NOT in contact with NameNode
Nodes administratively removed from the cluster
50. Topic 4
Class 2 – Hadoop Distributed File System
MapReduce – Hands On (Part – 1)
51. AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
53. How does MapReduce work?
MapReduce
Map Input List
Map Output List
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
54. How does MapReduce work?
MapReduce
Map Input List
Map Output List
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
55. How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
56. How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
57. How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
58. How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Mapping Phase
Reducing Phase
59. How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Reducer
Mapping Phase
Reducing Phase
60. How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Reducer
Mapping Phase
Reducing Phase
61. How does MapReduce work?
MapReduce
Map Input List
Map Output List
Mapper
Reduce Input List
Reduce Output List
Reducer
Mapping Phase
Reducing Phase
62. Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
63. Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
64. Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
65. Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
66. Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
67. Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
68. Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
69. Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
70. Hadoop MapReduce
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
71. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
72. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
73. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
74. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
75. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
76. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
77. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
78. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic
79. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic
80. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic
81. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic
82. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic
83. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic Specify Path & Output
format
84. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic Specify Path & Output
format
85. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic Specify Path & Output
format
Replication, Rack
Awareness etc.
86. Hadoop MapReduce – Roles: User vs. Framework
MapReduce
<1, King
Queen King>
<King, 1>
<Queen, 1>
<King, 1>
<2, Minister
King Soldier>
<3, Queen
Soldier King>
<Minister, 1>
<King, 1>
<Soldier, 1>
<Queen, 1>
<Soldier, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<King, 1>
<Minister, 1>
<Queen, 1>
<Queen, 1>
<Soldier,1>
<Soldier,1>
<King,
(1,1,1,1)>
<Minister,
1>
<Queen,
(1,1)>
<Soldier,
(1,1)>
<King, 4>
<Minister, 1>
King Queen King
Minister King
Soldier
Queen Soldier
King
Input Splitting Map Shuffling Reduce Result
<Queen, 2>
<Soldier, 2>
Map Output
Load data into HDFS
Specify Path &
Input Format
Create ‘Input Splits’
Create individual
Records
User Defined Logic User Defined Logic Specify Path & Output
format
Replication, Rack
Awareness etc.
97. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Calculates
98. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Calculates
99. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Record
Reader
Calculates
100. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Record
Reader
Reads Reads
Calculates
101. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Record
Reader
Reads Reads
Calculates
Defines
102. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
103. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
104. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
105. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
106. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
107. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
108. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Shuffle
109. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
110. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
111. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
112. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Reads
Passes <K,V> pairs
Reads
Calculates
Defines
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
113. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
Calculates
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
114. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
OutputFormat
Calculates
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
115. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Output Data
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
OutputFormat
Calculates
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
116. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
OutputFormat
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
117. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Reads
Passes <K,V> pairs
OutputFormat
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
118. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
119. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Defines
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
120. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Defines
Defines
Calculates
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
121. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Defines
Defines
Calculates
Defines
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
122. MapReduce Execution Framework
MapReduce
Reduce Process
Mapper Process
Input HDFS File - inputFile.txt
Block A Block B Block C
Driver
Mapper
Reducer
Record
Reader
Input Split 1 Input Split 2 Input Split 3 Input Split 4
Writer
InputFormat
Output Data
Reduce Process
Mapper Process
Mapper
Reducer
Record
Reader
Writer
Output Data
Reads
Passes <K,V> pairs
Writes
Reads
Passes <K,V> pairs
Writes
OutputFormat
Defines
Defines
Calculates
Defines
Defines
Defines
Defines
Defines
Passes <K,V> pairs
Passes <K,V> pairs
<K, V> pairs <K, V> pairs
Partition Shuffle
Sort
127. Topic 5
Class 2 – Hadoop Distributed File System
MapReduce – Hands On (Part – 2)
128. AGENDA
• What is Big Data?
• Hadoop Distributed File System
• MapReduce
• Understanding Hadoop Ecosystem
• Setting up a Hadoop Cluster
• HDFS – Hands On
• MapReduce-Hands On
Editor's Notes Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.
Do we know the topic number for this? Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.
Do we know the topic number for this? Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.
Do we know the topic number for this? Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.
Do we know the topic number for this? Welcome to the Big Data Course, jointly presented by Jigsaw Academy and Wiley. Through this course, we hope to create a new international breed of versatile Big Data analysts.
Do we know the topic number for this?