Setup and run hadoop distrubution file system example 2.2
1. Setup Hadoop 2.x (2.2.0) on Ubuntu
In this tutorial I am going to guide you through setting up hadoop 2.2.0 environment on
Ubuntu.
Prerequistive
$ sudo apt-get install openjdk-7-jdk
$ java -version
java version "1.7.0_25"
OpenJDK Runtime Environment (IcedTea 2.3.12) (7u25-2.3.12-4ubuntu3)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
$ cd /usr/lib/jvm
$ ln -s java-7-openjdk-amd64 jdk
$ sudo apt-get install openssh-server
Add Hadoop Group and User
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
$ sudo adduser hduser sudo
After user is created, re-login into ubuntu using hduser
Setup SSH Certificate
$ ssh-keygen -t rsa -P ''
...
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
...
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh localhost
Setup Hadoop Environment Variables
$cd ~
$vi .bashrc
paste following to the end of the file
#Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
###end of paste
$ cd /usr/local/hadoop/etc/hadoop
$ vi hadoop-env.sh
2. #modify JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/jdk/
Re-login into Ubuntu using hdser and check hadoop version
$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
At this point, hadoop is installed.
Configure Hadoop
$ cd /usr/local/hadoop/etc/hadoop
$ vi core-site.xml
#Paste following between <configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
$ vi yarn-site.xml
#Paste following between <configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
$ mv mapred-site.xml.template mapred-site.xml
$ vi mapred-site.xml
#Paste following between <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode
$ cd /usr/local/hadoop/etc/hadoop
$ vi hdfs-site.xml
Paste following between <configuration> tag
3. <property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>
Format Namenode
hduser@ubuntu40:~$ hdfs namenode -format
Start Hadoop Service
$ start-dfs.sh
....
$ start-yarn.sh
....
hduser@ubuntu40:~$ jps
If everything is sucessful, you should see following services running
2583 DataNode
2970 ResourceManager
3461 Jps
3177 NodeManager
2361 NameNode
2840 SecondaryNameNode
Run Hadoop Example
hduser@ubuntu: cd /usr/local/hadoop
hduser@ubuntu:/usr/local/hadoop$ hadoop jar
./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
Number of Maps = 2
Samples per Map = 5
13/10/21 18:41:03 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
13/10/21 18:41:04 INFO client.RMProxy: Connecting to ResourceManager at
/0.0.0.0:8032
13/10/21 18:41:04 INFO input.FileInputFormat: Total input paths to process :
2
13/10/21 18:41:04 INFO mapreduce.JobSubmitter: number of splits:2
13/10/21 18:41:04 INFO Configuration.deprecation: user.name is deprecated.
Instead, use mapreduce.job.user.name
...
4. Hadoop FileSystem (HDFS) Tutorial
In this tutorial I will show some common commands for HDFS operations.
If you don't have Hadoop setup in your linux, you can follow Hadoop Setup Guide
Log into Linux, "hduser" is the login used in following examples.
Start Hadoop If it's not running
$ start-dfs.sh
....
$ start-yarn.sh
Create someFile.txt in your home directory
hduser@ubuntu:~$ vi someFile.txt
Paste any text you want in to the file and save it.
Create Home Directory In HDFS (If it doesn't exist)
hduser@ubuntu:~$ hadoop fs -mkdir -p /user/hduser
Copy file someFile.txt from local disk to the user’s directory in HDFS.
hduser@ubuntu:~$ hadoop fs -copyFromLocal someFile.txt someFile.txt
Get a directory listing of the user’s home directory in HDFS
hduser@ubuntu:~$ hadoop fs –ls
Found 1 items
-rw-r--r-1 hduser supergroup
5 2013-10-27 17:57 someFile.txt
Display the contents of the HDFS file /user/hduser/someFile.txt
hduser@ubuntu:~$ hadoop fs –cat /user/hduser/someFile.txt
Get a directory listing of the HDFS root directory
hduser@ubuntu:~$ hadoop fs –ls /
copy that file to the local disk, named as someFile2.txt
hduser@ubuntu:~$ hadoop fs –copyToLocal /user/hduser/someFile.txt
someFile2.txt
Delete the file from hadoop hdfs
hduser@ubuntu:~$ hadoop fs –rm someFile.txt
Deleted someFile.txt