Install Hadoop Cluster on Ubuntu

Hadoop Cluster
安裝
Intern Report

主要參考網頁
 http://bigdatahandler.com/hadoop-
hdfs/installing-single-node-hadoop-2-2-0-
on-ubuntu/

Software Versions
 Ubuntu Linux 12.04.4 LTS
 Hadoop 2.2.0

 If you are using putty to access your Linux
box remotely, please install openssh by
running this command, this also helps in
configuring SSH access easily in the later
part of the installation:
sudo apt-get install openssh-server

Prerequisites:
 Installing Java v1.7
 Adding dedicated Hadoop system user.
 Configuring SSH access.

1. Installing Java v1.7:
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java7-installer
export JAVA_HOME=/usr/lib/jvm/java-7-oracle

2. Adding dedicated Hadoop
system user.
 a. Adding group:
sudo addgroup hadoop
 b. Creating a user and adding the user to
a group:
sudo adduser –ingroup hadoop hduser

3. Configuring SSH access:
 su – hduser
 ssh-keyegen -t rsa -P "“
 cat $HOME/.ssh/id_rsa.pub >>
$HOME/.ssh/authorized_keys
 ssh hduser@localhost

 i. Run this following command to download
Hadoop version 2.2.0
wget http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-
2.2.0/hadoop-2.2.0.tar.gz
 ii. Unpack the compressed hadoop file by using
this command:
tar -xvzf hadoop-2.2.0.tar.gz
 iii. move hadoop-2.2.0 to hadoop directory by
using give command
mv hadoop-2.2.0 hadoop

 iv. Move hadoop package of your choice
sudo mv hadoop /usr/local/
 v. Make sure to change the owner of all the files
to the hduser user and hadoop group by using
this command:
cd /usr/local/
sudo chown -R hduser:hadoop hadoop

 The following are the required files we will use
for the perfect configuration of the single
node Hadoop cluster.
a. yarn-site.xml:
b. core-site.xml
c. mapred-site.xml
d. hdfs-site.xml
e. Update $HOME/.bashrc
 We can find the list of files in Hadoop
directory which is located in
cd /usr/local/hadoop/etc/hadoop

a.yarn-site.xml:
<configuration>
 <property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-
services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler
</value>
</property>
</configuration>

b. core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

c. mapred-site.xml:
<configuration>
property>
<name>mapreduce.framework.name
</name>
<value>yarn</value>
</property>
</configuration>

sudo mkdir -p
$HADOOP_HOME/yarn_data/hdfs/namenode
sudo mkdir -p
$HADOOP_HOME/yarn_data/hdfs/datanode

d. hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/yarn_data/hdfs/datanode</value>
</property>
</configuration>

 i. Go back to the root and edit
the .bashrc file.
vi .bashrc

#Set Hadoop-related environment variables
export HADOOP_PREFIX=/usr/local/hadoop
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
#Native Path
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
#Java path
export JAVA_HOME='/usr/lib/jvm/java-7-oracle'
#Add Hadoop bin/ directory to PATH
export
PATH=$PATH:$HADOOP_HOME/bin:$JAVA_PATH/bin:$HADOOP_HOME/sbi
n

Formatting and Starting/Stopping
the HDFS filesystem via the
NameNode

 i. The first step to starting up your Hadoop
installation is formatting the Hadoop
filesystem which is implemented on top of
the local filesystem of your cluster. You
need to do this the first time you set up a
Hadoop cluster. Do not format a running
Hadoop filesystem as you will lose all the
data currently in the cluster (in HDFS).
hadoop namenode -format

 ii. Start Hadoop Daemons by running the
following commands:
 Name node:
hadoop-daemon.sh start namenode
 Data node:
hadoop-daemon.sh start datanode

 Resource Manager:
yarn-daemon.sh start resourcemanager
 Node Manager:
yarn-daemon.sh start nodemanager
 Job History Server:
mr-jobhistory-daemon.sh start historyserver

 Stop Hadoop by running the following
command
stop-dfs.sh
stop-yarn.sh

 Start and stop hadoop daemons all at
once.
start-all.sh
stop-all.sh

Install Hadoop Cluster on Ubuntu

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (18)

Similar a Install Hadoop Cluster on Ubuntu

Similar a Install Hadoop Cluster on Ubuntu (20)

Último

Último (20)

Install Hadoop Cluster on Ubuntu