SlideShare una empresa de Scribd logo
1 de 12
Descargar para leer sin conexión
RHive Tutorial – Installing Hadoop
This tutorial is for beginning users without much foreknowledge of Hadoop. It
gives a simple explanation on how to install Hadoop before Hadoop.

RHive has dependency for Hive and Hive in turn has dependency for Hadoop.
Thus Hadoop and Hive must have been installed in order to install RHive.

The method of installing Hadoop to be introduced in this tutorial is about
setting up a small Hadoop environment for RHive.
This installation of fundamentals is useful for quickly building** a small-scale
distributed environment by using VMs or just a few servers.
For large, well-structured environments this may not be appropriate.

Installing Hadoop
Work Environment

The environment used in this tutorial is set up like the following:

   •   Server cluster environment: Cloud Service
   •   Server Number: Total of 4 virtual machines
   •   Server specs: virtual machine, 1 core, 1Gb main memory, 25Gb Harddisk for
       OS, 2TB additional harddisk
   •   OS: CentOS5
   •   Network: 10.x.x.x IP address

Pre-installation Checklist

Checking root account, firewall, SElinux

You must be able to connect to the servers prepared for Hadoop installation
via a root account or obtain sudoer permission to wield a system-level access
such as root level access.
And each server must be void of have special firewall or security settings.
If you are using a Linux with such settings then you must have clearance to
control them or already know how to use them.
If SElinux or firewall is running while strong rules are in place for security
purposes, then you must manually configure Hadoop-related port or ACL
(Access Control List) or simply disable SElinux and firewall altogether.
This tutorial installs Hadoop to an isolated VM with no external access. Since
they are isolated and unable to be connected from outside, their installed
SELinux and firewall are entirely deactivated.

Check Server IP Address
You must know the IP addresses of the servers you will be using.
The servers used in this tutorial each have the following IP addresses:

10.1.1.1	
  
10.1.1.2	
  
10.1.1.3	
  
10.1.1.4	
  

This tutorial will make 10.1.1.1 into Hadoop name node.
And 10.1.1.2, 10.1.1.3, 10.1.1.4 will become Hadoop’s Job nodes.

Preliminary preparations before installing Hadoop

Setting hosts file

There is a need to edit each server’s /etc/hosts
You might already know—these files are those that manually map hostnames
and IP addresses.
Doing this is to make setting Hadoop convenient.

Use the following settings to connect to all (four) servers and add the following
lines to /etc/hosts files.

10.1.1.1	
  node0	
  
10.1.1.2	
  node1	
  
10.1.1.3	
  node2	
  
10.1.1.4	
  node3	
  

node0 ~ node3 are arbitrary hostnames: any memorable name will do.
But changing them after having installed Hadoop and ran it is quite dangerous
and you need to take that into consideration.

Installing Java

As Hadoop is written in Java, JVM is required, naturally.
Oftentimes, Java is installed once Linux is, and even if it isn’t, it can be easily
installed.
If the servers you are using do not have Java installed then use the following
command to install Java to all servers.

yum	
  install	
  java	
  
Assigning JAVA_HOME environment variable

JAVA_HOME must have environment variables set.
The directory where Java SDK or JRE is installed must be assigned to
JAVA_HOME but if your OS is a CentOS, then you can use the following
command to find it out.

update-­‐alternatives	
  -­‐-­‐display	
  java	
  

In the work environment used in this tutorial, JAVA_HOME is "/usr/lib/jvm/jre-
1.6.0-openjdk.x86_64".
JAVA_HOME’s path can change depending on the user’s environment or
installed Java version, so you must find out your server’s exact JAVA_HOME.
On that matter, refer to a document on Linux distributions or some other
document on how to install Java.

Once you found out JAVA_HOME, register the environment variable to
/etc/profile, ~/.bashrc, or etc.

JAVA_HOME=/usr/lib/jvm/jre-­‐1.6.0-­‐openjdk.x86_64/	
  
export	
  JAVA_HOME=/usr/lib/jvm/jre-­‐1.6.0-­‐openjdk.x86_64	
  

Certainly indeed, the task of installing Java and assigning JAVA_HOME
should be similarly done for all servers.

Downloading Hadoop

Now we’ll start installing Hadoop.
As Hadoop is written in Java, merely decompressing the downloaded file
alone completes the installation.
Hadoop-1.0.0 version is packaged with rpm and deb so you can use rpm or
dpkg (etc.) to install Hadoop.
However, since Hive does not yet support Hadoop-1.0.0, it is not wise to use
this with Hive.

Hadoop needs a directory to install to before installing. In other words, you
must decide upon and create a proper directory to decompress.
And it must be a location where there is sufficient disk space.
Hadoop uses a lot of space once it starts up, making log files and managing
HDFS while storing files.
Thus it is good to check whether the disk space where Hadoop will be
installed to has sufficient hard disk space, and if there is a large add-on hard
disk installed somewhere, then check where it is mounted before installing.
This tutorial has made at least 2TB worth of hard disk mount in each server’s
“/mnt”, and made a “/mnt/srv” directory below that to install Hadoop in that
directory.

It’s good to establish the same directory structure to all other servers as well.

Make an arbitrary directory called srv, like below.

mkdir	
  /mnt/srv	
  
cd	
  /mnt/srv	
  

We will install Hadoop under the base directory chosen above.

Now we are going to download our Hadoop from Hadoop’s official website.
This tutorial recommends using version 0.20.203
You can download every Hadoop version from the following site.
http://www.apache.org/dyn/closer.cgi/hadoop/common/

The same version must be installed to all the servers. One way to do this is to
copy the downloaded file to all servers.

Download the latest Hadoop version from the server like below.

Wget	
                http://apache.tt.co.kr//hadoop/common/hadoop-­‐
0.20.203.0/hadoop-­‐0.20.203.0rc1.tar.gz	
  

You can also change the mirror site, which is proper to you.

Decompress the downloaded file.

tar	
  xvfz	
  hadoop-­‐0.20.203.0rc1.tar.gz	
  

Once you downloaded it into one server, in order to singly make the same
directory for other servers and copy the file into them, you can use shell
command like the following.
If you are not accustomed to using shell programming, then just manually do
the same work for every other server.
$	
  for	
  I	
  in	
  `seq	
  3`;	
  do	
  ssh	
  node$I	
  'mkdir	
  /mnt/srv'	
  done	
  
$	
  for	
  I	
  in	
  `seq	
  3`;	
  do	
  scp	
  hadoop*.gz	
  node$I:/mnt/srv/;	
  done	
  
$	
   for	
   I	
   in	
   `seq	
   3`;	
   do	
   ssh	
   node$I	
   'cd	
   /mnt/srv/;	
   tar	
   xvfz	
  
hadoop*.gz';	
  done	
  


Making SSH Key

In order to enable Hadoop namenode to control each node, you must create
and set null passphrase key.
Hadoop connects to each server from namenode, to run tasktracker or
datanode, but to do this it must be able to connect to each node without
passwords.
This tutorial will create and make a key to enable connecting to all servers
with a root account.
With the command below, create private key and public key which doesn’t ask
for passwords.

ssh-­‐keygen	
  -­‐t	
  rsa	
  -­‐P	
  ''	
  -­‐f	
  ~/.ssh/id_rsa	
  

Now register public key to authorized_keys.

cat	
  ~/.ssh/id_rsa.pub	
  >>	
  ~/.ssh/authorized_keys	
  

Now see if you can use the command below to connect to localhost via ssh
without entering a password.

ssh	
  localhost	
  

If you login without being asked for passwords, then it is done.
Now exit the connected localhost.

exit	
  

If you fail to connect or see a password prompt despite having properly
created the openssh and keys as mentioned above,
you might need to check sshd settings and make changes.
You can usually edit the sshd settings file path “/etc/ssh/sshd_config” by using
a text editor.
Edit the sshd_config file using any familiar editor.
vi	
  /etc/ssh/sshd_config	
  

There are many configuration values in the file, but the items you should focus
on are listed below.
If the code line below is disabled (If a # is attached ahead of the line or the file
is blank)
Edit the contents or insert the following, then quit the editor.

RSAAuthentication	
  yes	
  
PubkeyAuthentication	
  yes	
  
AuthorizedKeysFile	
  .ssh/authorized_keys	
  

If you still cannot connect to localhost via ssh without being asked for a
password despite having modified the settings file, then consult the system
administrator or refer to relevant documents on configuring sshd.

Now you must take the key file’s public key and insert then into other servers’
~/.ssh/authorized_keys.
Normally you would have to add ~/.ssh/id_rsa.pub to authorized_keys after
having copied them to other servers, but for the sake of convenience, this
tutorial will be copying authorized_keys to another server.
Copy the entire thing like below.


$	
  for	
  I	
  in	
  `seq	
  3`;	
  do	
  scp	
  ~/.ssh/id_rsa.pub	
  node$I:~/.ssh/;	
  done	
  


Fixing Hadoop Configurations

Once Hadoop is installed, Hadoop settings need configuring.
Now head over to the Hadoop conf directory.
This tutorial will modify 4 files: hadoop-env.sh, core-site.xml, mapred-site.xml,
and hdfs-site.xml.

Move to Hadoop conf Directory

First, head over to Hadoop’s conf directory, which was already installed.

cd	
  /mnt/srv/hadoop-­‐0.20.203.0/conf	
  

Modify hadoop-env.sh

Open a text editor and modify hadoop-env.sh.
vi	
  hadoop-­‐env.sh	
  

Look for the lines shown below and edit the lines to your liking.

export	
  JAVA_HOME=/usr/java/default	
  
export	
  HADOOP_LOG_DIR=/mnt/srv/hadoopdata/data/logs	
  

JAVA_HOME can be set the same as the JAVA_HOME found out earlier in
this tutorial.
As HADOOP_LOG_DIR is where Hadoop’s logs** will be saved, it’s good to
choose a location with sufficient space.
We will be using a directory called /mnt/srv/hadoopdata/data/logs.

Editing core-site.xml

Open core-sie.xml with a text editor.

vi	
  core-­‐site.xml	
  

In here, adjust hadoop.tmp.dir and fs.default.name to appropriate values.


<configuration>	
  
<property>	
  
<name>fs.default.name</name>	
  
<value>hdfs://node0:9000</value>	
  
</property>	
  
<property>	
  
<name>hadoop.tmp.dir</name>	
  
<value>/mnt/srv/hadoopdata/hadoop-­‐${user.name}</value>	
  
<description>A	
  base	
  for	
  other	
  temporary	
  directories.</description>	
  
</property>	
  
</configuration>	
  


Editing hdfs-site.xml
There is no need to edit hdfs-site.xml.
But should you need to edit anything else, you can open and adjust its values
with a text editor, just like core-site.xml can be.
Open hdfs-site.xml with a text editor.

vi	
  hdfs-­‐site.xml	
  

Should you want to increase the number of files Hadoop will simultaneously
open, adjust the values like below:

<configuration>	
  
<property>	
  
<name>dfs.datanode.max.xcievers</name>	
  
<value>4096</value>	
  
</property>	
  
</configuration>	
  

The above is optional and not obligatory.

Editing mapred-site.xml

Open mapred-site.xml with a text editor like vi.

vi	
  mapred-­‐site.xml	
  

If you open the file and look through the contents, you may find something like
the following. In here, you should edit the value of mapred.job.tracker to suit
your environment.
Use defaults for the rest customize them to your liking.

<configuration>	
  
<property>	
  
<name>mapred.job.tracker</name>	
  
<value>node0:9001</value>	
  
</property>	
  
<property>	
  
<name>mapred.jobtracker.taskScheduler</name>	
  
<value>org.apache.hadoop.mapred.FairScheduler</value>	
  
</property>	
  
<property>	
  
<name>mapred.tasktracker.map.tasks.maximum</name>	
  
<value>6</value>	
  
</property>	
  
<property>	
  
<name>mapred.tasktracker.reduce.tasks.maximum</name>	
  
<value>6</value>	
  
</property>	
  
<property>	
  
<name>mapred.child.java.opts</name>	
  
<value>-­‐Xmx2048M</value>	
  
</property>	
  
<property>	
  
<name>mapred.reduce.tasks</name>	
  
<value>16</value>	
  
</property>	
  
<property>	
  
<name>mapred.task.timeout</name>	
  
<value>3600000</value>	
  
</property>	
  
</configuration>	
  

Activating Hadoop

Checking whether Hadoop is Running

After installing Hadoop, you can use a web browser to connect to a webpage
that can check up on Hadoop’s status.
It’s normally serviced as port 50030.

http://node0:50030/

If you see Hadoop’s state as “RUNNING” like below, then Hadoop is running
as normal.
node0	
  Hadoop	
  Map/Reduce	
  Administration	
  
	
  
Quick	
  Links	
  State:	
  RUNNING	
  
Started:	
  Thu	
  Jan	
  05	
  17:24:18	
  EST	
  2012	
  
Version:	
  0.20.203.0,	
  r1099333	
  
Compiled:	
  Wed	
  May	
  4	
  07:57:50	
  PDT	
  2011	
  by	
  oom	
  
Identifier:	
  201201051724	
  


Naturally, you cannot connect to the page above if Hadoop namenode is on
the other side of the firewall and 50030 is not open.

Trying to Run MRbench

Hadoop provides several useful utilities by default.
Among them, hadoop-test-* allows you an easy view of the map/reduce task.
As Hadoop version used in this tutorial is 0.20.203.0, the Hadoop home
directory must contain the hadoop-test-0.20.203.0.jar file.
And you can check whether Hadoop’s Map/Reduce is running via the
following command:

$HADOOP_HOME/bin/hadoop	
                     jar	
           $HADOOP_HOME/hadoop-­‐test-­‐
0.20.203.0.jar	
  mrbench	
  

The results of executing the above command are as follows.

MRBenchmark.0.0.2	
  
11/12/07	
  13:15:36	
  INFO	
  mapred.MRBench:	
  creating	
  control	
  file:	
  1	
  
numLines,	
  ASCENDING	
  sortOrder	
  
11/12/07	
   13:15:36	
   INFO	
   mapred.MRBench:	
   created	
   control	
   file:	
  
/benchmarks/MRBench/mr_input/input_-­‐1026698718.txt	
  
11/12/07	
   13:15:36	
   INFO	
   mapred.MRBench:	
   Running	
   job	
   0:	
  
input=hdfs://node0:9000/benchmarks/MRBench/mr_input	
  
output=hdfs://node0:9000/benchmarks/MRBench/mr_output/output_12
20591687	
  
11/12/07	
   13:15:36	
   INFO	
   mapred.FileInputFormat:	
   Total	
   input	
  
paths	
  to	
  process	
  :	
  1	
  
11/12/07	
   13:15:37	
   INFO	
              mapred.JobClient:	
          Running	
     job:	
  
job_201112071314_0001	
  
11/12/07	
  13:15:38	
  INFO	
  mapred.JobClient:	
  	
  map	
  0%	
  reduce	
  0%	
  
11/12/07	
  13:15:55	
  INFO	
  mapred.JobClient:	
  	
  map	
  50%	
  reduce	
  0%	
  
11/12/07	
  13:15:58	
  INFO	
  mapred.JobClient:	
  	
  map	
  100%	
  reduce	
  0%	
  
11/12/07	
  13:16:10	
  INFO	
  mapred.JobClient:	
  	
  map	
  100%	
  reduce	
  100%	
  
11/12/07	
   13:16:15	
   INFO	
            mapred.JobClient:	
             Job	
     complete:	
  
job_201112071314_0001	
  
11/12/07	
  13:16:15	
  INFO	
  mapred.JobClient:	
  Counters:	
  26	
  
11/12/07	
  13:16:15	
  INFO	
  mapred.JobClient:	
  	
  	
  Job	
  Counters	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Launched	
   reduce	
  
tasks=1	
  
11/12/07	
                                   13:16:15	
                                      INFO	
  
mapred.JobClient:	
  	
  	
  	
  	
  SLOTS_MILLIS_MAPS=22701	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Total	
   time	
   spent	
  
by	
  all	
  reduces	
  waiting	
  after	
  reserving	
  slots	
  (ms)=0	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Total	
   time	
   spent	
  
by	
  all	
  maps	
  waiting	
  after	
  reserving	
  slots	
  (ms)=0	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Launched	
   map	
  
tasks=2	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Data-­‐local	
   map	
  
tasks=2	
  
11/12/07	
                                   13:16:15	
                                      INFO	
  
mapred.JobClient:	
  	
  	
  	
  	
  SLOTS_MILLIS_REDUCES=15000	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
   File	
   Input	
   Format	
  
Counters	
  
11/12/07	
  13:16:15	
  INFO	
  mapred.JobClient:	
  	
  	
  	
  	
  Bytes	
  Read=4	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
   File	
   Output	
   Format	
  
Counters	
  
11/12/07	
  13:16:15	
  INFO	
  mapred.JobClient:	
  	
  	
  	
  	
  Bytes	
  Written=3	
  
11/12/07	
  13:16:15	
  INFO	
  mapred.JobClient:	
  	
  	
  FileSystemCounters	
  
11/12/07	
  13:16:15	
  INFO	
  mapred.JobClient:	
  	
  	
  	
  	
  FILE_BYTES_READ=13	
  
11/12/07	
                                   13:16:15	
                                      INFO	
  
mapred.JobClient:	
  	
  	
  	
  	
  HDFS_BYTES_READ=244	
  
11/12/07	
                                   13:16:15	
                                      INFO	
  
mapred.JobClient:	
  	
  	
  	
  	
  FILE_BYTES_WRITTEN=63949	
  
11/12/07	
                                    13:16:15	
                                     INFO	
  
mapred.JobClient:	
  	
  	
  	
  	
  HDFS_BYTES_WRITTEN=3	
  
11/12/07	
  13:16:15	
  INFO	
  mapred.JobClient:	
  	
  	
  Map-­‐Reduce	
  Framework	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Map	
   output	
  
materialized	
  bytes=19	
  
11/12/07	
   13:16:15	
                                                  INFO	
                 mapred.JobClient:	
  	
  	
  	
  	
   Map	
     input	
  
records=1	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Reduce	
   shuffle	
  
bytes=19	
  
11/12/07	
  13:16:15	
  INFO	
  mapred.JobClient:	
  	
  	
  	
  	
  Spilled	
  Records=2	
  
11/12/07	
  13:16:15	
  INFO	
  mapred.JobClient:	
  	
  	
  	
  	
  Map	
  output	
  bytes=5	
  
11/12/07	
  13:16:15	
  INFO	
  mapred.JobClient:	
  	
  	
  	
  	
  Map	
  input	
  bytes=2	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Combine	
   input	
  
records=0	
  
11/12/07	
                                   13:16:15	
                                                                                          INFO	
  
mapred.JobClient:	
  	
  	
  	
  	
  SPLIT_RAW_BYTES=240	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Reduce	
   input	
  
records=1	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Reduce	
   input	
  
groups=1	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Combine	
   output	
  
records=0	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Reduce	
   output	
  
records=1	
  
11/12/07	
   13:16:15	
   INFO	
   mapred.JobClient:	
  	
  	
  	
  	
   Map	
   output	
  
records=1	
  
DataLines	
  	
  	
  	
  	
  	
  	
  Maps	
  	
  	
  	
  Reduces	
  AvgTime	
  (milliseconds)	
  
1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  2	
  	
  	
  	
  	
  	
  	
  1	
  	
  	
  	
  	
  	
  	
  39487	
  

If there were no errors running then Hadoop ran without problems.
Now you can make Map/Reduce implementations for yourself and use
Hadoop to perform distributed processing.

Más contenido relacionado

La actualidad más candente

Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setupMohammad_Tariq
 
Linux apache installation
Linux apache installationLinux apache installation
Linux apache installationDima Gomaa
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation Mahantesh Angadi
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Linux Webserver Installation Command and GUI.ppt
Linux Webserver Installation Command and GUI.pptLinux Webserver Installation Command and GUI.ppt
Linux Webserver Installation Command and GUI.pptwebhostingguy
 
Setting up LAMP for Linux newbies
Setting up LAMP for Linux newbiesSetting up LAMP for Linux newbies
Setting up LAMP for Linux newbiesShabir Ahmad
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guideNaveed Bashir
 
Hadoop single cluster installation
Hadoop single cluster installationHadoop single cluster installation
Hadoop single cluster installationMinh Tran
 
Cloudera cluster setup and configuration
Cloudera cluster setup and configurationCloudera cluster setup and configuration
Cloudera cluster setup and configurationSudheer Kondla
 
[MathWorks] Versioning Infrastructure
[MathWorks] Versioning Infrastructure[MathWorks] Versioning Infrastructure
[MathWorks] Versioning InfrastructurePerforce
 
WE18_Performance_Up.ppt
WE18_Performance_Up.pptWE18_Performance_Up.ppt
WE18_Performance_Up.pptwebhostingguy
 
Salt conf 2014-installing-openstack-using-saltstack-v02
Salt conf 2014-installing-openstack-using-saltstack-v02Salt conf 2014-installing-openstack-using-saltstack-v02
Salt conf 2014-installing-openstack-using-saltstack-v02Yazz Atlas
 

La actualidad más candente (17)

Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setup
 
Hadoop on ec2
Hadoop on ec2Hadoop on ec2
Hadoop on ec2
 
Linux apache installation
Linux apache installationLinux apache installation
Linux apache installation
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
Linux
LinuxLinux
Linux
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Linux Webserver Installation Command and GUI.ppt
Linux Webserver Installation Command and GUI.pptLinux Webserver Installation Command and GUI.ppt
Linux Webserver Installation Command and GUI.ppt
 
Installing lemp with ssl and varnish on Debian 9
Installing lemp with ssl and varnish on Debian 9Installing lemp with ssl and varnish on Debian 9
Installing lemp with ssl and varnish on Debian 9
 
Setting up LAMP for Linux newbies
Setting up LAMP for Linux newbiesSetting up LAMP for Linux newbies
Setting up LAMP for Linux newbies
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guide
 
Hadoop single cluster installation
Hadoop single cluster installationHadoop single cluster installation
Hadoop single cluster installation
 
Cloudera cluster setup and configuration
Cloudera cluster setup and configurationCloudera cluster setup and configuration
Cloudera cluster setup and configuration
 
are available here
are available hereare available here
are available here
 
[MathWorks] Versioning Infrastructure
[MathWorks] Versioning Infrastructure[MathWorks] Versioning Infrastructure
[MathWorks] Versioning Infrastructure
 
Hadoop completereference
Hadoop completereferenceHadoop completereference
Hadoop completereference
 
WE18_Performance_Up.ppt
WE18_Performance_Up.pptWE18_Performance_Up.ppt
WE18_Performance_Up.ppt
 
Salt conf 2014-installing-openstack-using-saltstack-v02
Salt conf 2014-installing-openstack-using-saltstack-v02Salt conf 2014-installing-openstack-using-saltstack-v02
Salt conf 2014-installing-openstack-using-saltstack-v02
 

Similar a R hive tutorial supplement 1 - Installing Hadoop

Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Configuration of Apache Web Server On CentOS 8
Configuration of Apache Web Server On CentOS 8Configuration of Apache Web Server On CentOS 8
Configuration of Apache Web Server On CentOS 8Kaan Aslandağ
 
Single node setup
Single node setupSingle node setup
Single node setupKBCHOW123
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows habeebulla g
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentFarzad Nozarian
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 
Deploying your rails application to a clean ubuntu 10
Deploying your rails application to a clean ubuntu 10Deploying your rails application to a clean ubuntu 10
Deploying your rails application to a clean ubuntu 10Maurício Linhares
 
Hadoop meet Rex(How to construct hadoop cluster with rex)
Hadoop meet Rex(How to construct hadoop cluster with rex)Hadoop meet Rex(How to construct hadoop cluster with rex)
Hadoop meet Rex(How to construct hadoop cluster with rex)Jun Hong Kim
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuis Rodríguez Castromil
 
Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0venkatakrishnan k
 

Similar a R hive tutorial supplement 1 - Installing Hadoop (20)

Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Configuration of Apache Web Server On CentOS 8
Configuration of Apache Web Server On CentOS 8Configuration of Apache Web Server On CentOS 8
Configuration of Apache Web Server On CentOS 8
 
Single node setup
Single node setupSingle node setup
Single node setup
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Lumen
LumenLumen
Lumen
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Deploying your rails application to a clean ubuntu 10
Deploying your rails application to a clean ubuntu 10Deploying your rails application to a clean ubuntu 10
Deploying your rails application to a clean ubuntu 10
 
grate techniques
grate techniquesgrate techniques
grate techniques
 
Drupal from scratch
Drupal from scratchDrupal from scratch
Drupal from scratch
 
Hadoop meet Rex(How to construct hadoop cluster with rex)
Hadoop meet Rex(How to construct hadoop cluster with rex)Hadoop meet Rex(How to construct hadoop cluster with rex)
Hadoop meet Rex(How to construct hadoop cluster with rex)
 
Hadoop on osx
Hadoop on osxHadoop on osx
Hadoop on osx
 
Apache
ApacheApache
Apache
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
 
Apache
ApacheApache
Apache
 
Apache
ApacheApache
Apache
 
Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0Power point on linux commands,appache,php,mysql,html,css,web 2.0
Power point on linux commands,appache,php,mysql,html,css,web 2.0
 

Más de Aiden Seonghak Hong

RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치Aiden Seonghak Hong
 
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치Aiden Seonghak Hong
 
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치Aiden Seonghak Hong
 
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스Aiden Seonghak Hong
 
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수Aiden Seonghak Hong
 
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수Aiden Seonghak Hong
 
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수Aiden Seonghak Hong
 
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정Aiden Seonghak Hong
 
R hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceR hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceAiden Seonghak Hong
 
R hive tutorial - udf, udaf, udtf functions
R hive tutorial - udf, udaf, udtf functionsR hive tutorial - udf, udaf, udtf functions
R hive tutorial - udf, udaf, udtf functionsAiden Seonghak Hong
 
RHive tutorials - Basic functions
RHive tutorials - Basic functionsRHive tutorials - Basic functions
RHive tutorials - Basic functionsAiden Seonghak Hong
 

Más de Aiden Seonghak Hong (13)

IoT and Big data with R
IoT and Big data with RIoT and Big data with R
IoT and Big data with R
 
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
 
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
 
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
 
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
 
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
 
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
 
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
 
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
 
R hive tutorial 1
R hive tutorial 1R hive tutorial 1
R hive tutorial 1
 
R hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceR hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduce
 
R hive tutorial - udf, udaf, udtf functions
R hive tutorial - udf, udaf, udtf functionsR hive tutorial - udf, udaf, udtf functions
R hive tutorial - udf, udaf, udtf functions
 
RHive tutorials - Basic functions
RHive tutorials - Basic functionsRHive tutorials - Basic functions
RHive tutorials - Basic functions
 

Último

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Último (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

R hive tutorial supplement 1 - Installing Hadoop

  • 1. RHive Tutorial – Installing Hadoop This tutorial is for beginning users without much foreknowledge of Hadoop. It gives a simple explanation on how to install Hadoop before Hadoop. RHive has dependency for Hive and Hive in turn has dependency for Hadoop. Thus Hadoop and Hive must have been installed in order to install RHive. The method of installing Hadoop to be introduced in this tutorial is about setting up a small Hadoop environment for RHive. This installation of fundamentals is useful for quickly building** a small-scale distributed environment by using VMs or just a few servers. For large, well-structured environments this may not be appropriate. Installing Hadoop Work Environment The environment used in this tutorial is set up like the following: • Server cluster environment: Cloud Service • Server Number: Total of 4 virtual machines • Server specs: virtual machine, 1 core, 1Gb main memory, 25Gb Harddisk for OS, 2TB additional harddisk • OS: CentOS5 • Network: 10.x.x.x IP address Pre-installation Checklist Checking root account, firewall, SElinux You must be able to connect to the servers prepared for Hadoop installation via a root account or obtain sudoer permission to wield a system-level access such as root level access. And each server must be void of have special firewall or security settings. If you are using a Linux with such settings then you must have clearance to control them or already know how to use them. If SElinux or firewall is running while strong rules are in place for security purposes, then you must manually configure Hadoop-related port or ACL (Access Control List) or simply disable SElinux and firewall altogether. This tutorial installs Hadoop to an isolated VM with no external access. Since they are isolated and unable to be connected from outside, their installed SELinux and firewall are entirely deactivated. Check Server IP Address
  • 2. You must know the IP addresses of the servers you will be using. The servers used in this tutorial each have the following IP addresses: 10.1.1.1   10.1.1.2   10.1.1.3   10.1.1.4   This tutorial will make 10.1.1.1 into Hadoop name node. And 10.1.1.2, 10.1.1.3, 10.1.1.4 will become Hadoop’s Job nodes. Preliminary preparations before installing Hadoop Setting hosts file There is a need to edit each server’s /etc/hosts You might already know—these files are those that manually map hostnames and IP addresses. Doing this is to make setting Hadoop convenient. Use the following settings to connect to all (four) servers and add the following lines to /etc/hosts files. 10.1.1.1  node0   10.1.1.2  node1   10.1.1.3  node2   10.1.1.4  node3   node0 ~ node3 are arbitrary hostnames: any memorable name will do. But changing them after having installed Hadoop and ran it is quite dangerous and you need to take that into consideration. Installing Java As Hadoop is written in Java, JVM is required, naturally. Oftentimes, Java is installed once Linux is, and even if it isn’t, it can be easily installed. If the servers you are using do not have Java installed then use the following command to install Java to all servers. yum  install  java  
  • 3. Assigning JAVA_HOME environment variable JAVA_HOME must have environment variables set. The directory where Java SDK or JRE is installed must be assigned to JAVA_HOME but if your OS is a CentOS, then you can use the following command to find it out. update-­‐alternatives  -­‐-­‐display  java   In the work environment used in this tutorial, JAVA_HOME is "/usr/lib/jvm/jre- 1.6.0-openjdk.x86_64". JAVA_HOME’s path can change depending on the user’s environment or installed Java version, so you must find out your server’s exact JAVA_HOME. On that matter, refer to a document on Linux distributions or some other document on how to install Java. Once you found out JAVA_HOME, register the environment variable to /etc/profile, ~/.bashrc, or etc. JAVA_HOME=/usr/lib/jvm/jre-­‐1.6.0-­‐openjdk.x86_64/   export  JAVA_HOME=/usr/lib/jvm/jre-­‐1.6.0-­‐openjdk.x86_64   Certainly indeed, the task of installing Java and assigning JAVA_HOME should be similarly done for all servers. Downloading Hadoop Now we’ll start installing Hadoop. As Hadoop is written in Java, merely decompressing the downloaded file alone completes the installation. Hadoop-1.0.0 version is packaged with rpm and deb so you can use rpm or dpkg (etc.) to install Hadoop. However, since Hive does not yet support Hadoop-1.0.0, it is not wise to use this with Hive. Hadoop needs a directory to install to before installing. In other words, you must decide upon and create a proper directory to decompress. And it must be a location where there is sufficient disk space. Hadoop uses a lot of space once it starts up, making log files and managing HDFS while storing files. Thus it is good to check whether the disk space where Hadoop will be installed to has sufficient hard disk space, and if there is a large add-on hard disk installed somewhere, then check where it is mounted before installing. This tutorial has made at least 2TB worth of hard disk mount in each server’s
  • 4. “/mnt”, and made a “/mnt/srv” directory below that to install Hadoop in that directory. It’s good to establish the same directory structure to all other servers as well. Make an arbitrary directory called srv, like below. mkdir  /mnt/srv   cd  /mnt/srv   We will install Hadoop under the base directory chosen above. Now we are going to download our Hadoop from Hadoop’s official website. This tutorial recommends using version 0.20.203 You can download every Hadoop version from the following site. http://www.apache.org/dyn/closer.cgi/hadoop/common/ The same version must be installed to all the servers. One way to do this is to copy the downloaded file to all servers. Download the latest Hadoop version from the server like below. Wget   http://apache.tt.co.kr//hadoop/common/hadoop-­‐ 0.20.203.0/hadoop-­‐0.20.203.0rc1.tar.gz   You can also change the mirror site, which is proper to you. Decompress the downloaded file. tar  xvfz  hadoop-­‐0.20.203.0rc1.tar.gz   Once you downloaded it into one server, in order to singly make the same directory for other servers and copy the file into them, you can use shell command like the following. If you are not accustomed to using shell programming, then just manually do the same work for every other server.
  • 5. $  for  I  in  `seq  3`;  do  ssh  node$I  'mkdir  /mnt/srv'  done   $  for  I  in  `seq  3`;  do  scp  hadoop*.gz  node$I:/mnt/srv/;  done   $   for   I   in   `seq   3`;   do   ssh   node$I   'cd   /mnt/srv/;   tar   xvfz   hadoop*.gz';  done   Making SSH Key In order to enable Hadoop namenode to control each node, you must create and set null passphrase key. Hadoop connects to each server from namenode, to run tasktracker or datanode, but to do this it must be able to connect to each node without passwords. This tutorial will create and make a key to enable connecting to all servers with a root account. With the command below, create private key and public key which doesn’t ask for passwords. ssh-­‐keygen  -­‐t  rsa  -­‐P  ''  -­‐f  ~/.ssh/id_rsa   Now register public key to authorized_keys. cat  ~/.ssh/id_rsa.pub  >>  ~/.ssh/authorized_keys   Now see if you can use the command below to connect to localhost via ssh without entering a password. ssh  localhost   If you login without being asked for passwords, then it is done. Now exit the connected localhost. exit   If you fail to connect or see a password prompt despite having properly created the openssh and keys as mentioned above, you might need to check sshd settings and make changes. You can usually edit the sshd settings file path “/etc/ssh/sshd_config” by using a text editor. Edit the sshd_config file using any familiar editor.
  • 6. vi  /etc/ssh/sshd_config   There are many configuration values in the file, but the items you should focus on are listed below. If the code line below is disabled (If a # is attached ahead of the line or the file is blank) Edit the contents or insert the following, then quit the editor. RSAAuthentication  yes   PubkeyAuthentication  yes   AuthorizedKeysFile  .ssh/authorized_keys   If you still cannot connect to localhost via ssh without being asked for a password despite having modified the settings file, then consult the system administrator or refer to relevant documents on configuring sshd. Now you must take the key file’s public key and insert then into other servers’ ~/.ssh/authorized_keys. Normally you would have to add ~/.ssh/id_rsa.pub to authorized_keys after having copied them to other servers, but for the sake of convenience, this tutorial will be copying authorized_keys to another server. Copy the entire thing like below. $  for  I  in  `seq  3`;  do  scp  ~/.ssh/id_rsa.pub  node$I:~/.ssh/;  done   Fixing Hadoop Configurations Once Hadoop is installed, Hadoop settings need configuring. Now head over to the Hadoop conf directory. This tutorial will modify 4 files: hadoop-env.sh, core-site.xml, mapred-site.xml, and hdfs-site.xml. Move to Hadoop conf Directory First, head over to Hadoop’s conf directory, which was already installed. cd  /mnt/srv/hadoop-­‐0.20.203.0/conf   Modify hadoop-env.sh Open a text editor and modify hadoop-env.sh.
  • 7. vi  hadoop-­‐env.sh   Look for the lines shown below and edit the lines to your liking. export  JAVA_HOME=/usr/java/default   export  HADOOP_LOG_DIR=/mnt/srv/hadoopdata/data/logs   JAVA_HOME can be set the same as the JAVA_HOME found out earlier in this tutorial. As HADOOP_LOG_DIR is where Hadoop’s logs** will be saved, it’s good to choose a location with sufficient space. We will be using a directory called /mnt/srv/hadoopdata/data/logs. Editing core-site.xml Open core-sie.xml with a text editor. vi  core-­‐site.xml   In here, adjust hadoop.tmp.dir and fs.default.name to appropriate values. <configuration>   <property>   <name>fs.default.name</name>   <value>hdfs://node0:9000</value>   </property>   <property>   <name>hadoop.tmp.dir</name>   <value>/mnt/srv/hadoopdata/hadoop-­‐${user.name}</value>   <description>A  base  for  other  temporary  directories.</description>   </property>   </configuration>   Editing hdfs-site.xml
  • 8. There is no need to edit hdfs-site.xml. But should you need to edit anything else, you can open and adjust its values with a text editor, just like core-site.xml can be. Open hdfs-site.xml with a text editor. vi  hdfs-­‐site.xml   Should you want to increase the number of files Hadoop will simultaneously open, adjust the values like below: <configuration>   <property>   <name>dfs.datanode.max.xcievers</name>   <value>4096</value>   </property>   </configuration>   The above is optional and not obligatory. Editing mapred-site.xml Open mapred-site.xml with a text editor like vi. vi  mapred-­‐site.xml   If you open the file and look through the contents, you may find something like the following. In here, you should edit the value of mapred.job.tracker to suit your environment. Use defaults for the rest customize them to your liking. <configuration>   <property>   <name>mapred.job.tracker</name>   <value>node0:9001</value>   </property>   <property>   <name>mapred.jobtracker.taskScheduler</name>  
  • 9. <value>org.apache.hadoop.mapred.FairScheduler</value>   </property>   <property>   <name>mapred.tasktracker.map.tasks.maximum</name>   <value>6</value>   </property>   <property>   <name>mapred.tasktracker.reduce.tasks.maximum</name>   <value>6</value>   </property>   <property>   <name>mapred.child.java.opts</name>   <value>-­‐Xmx2048M</value>   </property>   <property>   <name>mapred.reduce.tasks</name>   <value>16</value>   </property>   <property>   <name>mapred.task.timeout</name>   <value>3600000</value>   </property>   </configuration>   Activating Hadoop Checking whether Hadoop is Running After installing Hadoop, you can use a web browser to connect to a webpage that can check up on Hadoop’s status. It’s normally serviced as port 50030. http://node0:50030/ If you see Hadoop’s state as “RUNNING” like below, then Hadoop is running as normal.
  • 10. node0  Hadoop  Map/Reduce  Administration     Quick  Links  State:  RUNNING   Started:  Thu  Jan  05  17:24:18  EST  2012   Version:  0.20.203.0,  r1099333   Compiled:  Wed  May  4  07:57:50  PDT  2011  by  oom   Identifier:  201201051724   Naturally, you cannot connect to the page above if Hadoop namenode is on the other side of the firewall and 50030 is not open. Trying to Run MRbench Hadoop provides several useful utilities by default. Among them, hadoop-test-* allows you an easy view of the map/reduce task. As Hadoop version used in this tutorial is 0.20.203.0, the Hadoop home directory must contain the hadoop-test-0.20.203.0.jar file. And you can check whether Hadoop’s Map/Reduce is running via the following command: $HADOOP_HOME/bin/hadoop   jar   $HADOOP_HOME/hadoop-­‐test-­‐ 0.20.203.0.jar  mrbench   The results of executing the above command are as follows. MRBenchmark.0.0.2   11/12/07  13:15:36  INFO  mapred.MRBench:  creating  control  file:  1   numLines,  ASCENDING  sortOrder   11/12/07   13:15:36   INFO   mapred.MRBench:   created   control   file:   /benchmarks/MRBench/mr_input/input_-­‐1026698718.txt   11/12/07   13:15:36   INFO   mapred.MRBench:   Running   job   0:   input=hdfs://node0:9000/benchmarks/MRBench/mr_input   output=hdfs://node0:9000/benchmarks/MRBench/mr_output/output_12 20591687   11/12/07   13:15:36   INFO   mapred.FileInputFormat:   Total   input   paths  to  process  :  1   11/12/07   13:15:37   INFO   mapred.JobClient:   Running   job:   job_201112071314_0001  
  • 11. 11/12/07  13:15:38  INFO  mapred.JobClient:    map  0%  reduce  0%   11/12/07  13:15:55  INFO  mapred.JobClient:    map  50%  reduce  0%   11/12/07  13:15:58  INFO  mapred.JobClient:    map  100%  reduce  0%   11/12/07  13:16:10  INFO  mapred.JobClient:    map  100%  reduce  100%   11/12/07   13:16:15   INFO   mapred.JobClient:   Job   complete:   job_201112071314_0001   11/12/07  13:16:15  INFO  mapred.JobClient:  Counters:  26   11/12/07  13:16:15  INFO  mapred.JobClient:      Job  Counters   11/12/07   13:16:15   INFO   mapred.JobClient:           Launched   reduce   tasks=1   11/12/07   13:16:15   INFO   mapred.JobClient:          SLOTS_MILLIS_MAPS=22701   11/12/07   13:16:15   INFO   mapred.JobClient:           Total   time   spent   by  all  reduces  waiting  after  reserving  slots  (ms)=0   11/12/07   13:16:15   INFO   mapred.JobClient:           Total   time   spent   by  all  maps  waiting  after  reserving  slots  (ms)=0   11/12/07   13:16:15   INFO   mapred.JobClient:           Launched   map   tasks=2   11/12/07   13:16:15   INFO   mapred.JobClient:           Data-­‐local   map   tasks=2   11/12/07   13:16:15   INFO   mapred.JobClient:          SLOTS_MILLIS_REDUCES=15000   11/12/07   13:16:15   INFO   mapred.JobClient:       File   Input   Format   Counters   11/12/07  13:16:15  INFO  mapred.JobClient:          Bytes  Read=4   11/12/07   13:16:15   INFO   mapred.JobClient:       File   Output   Format   Counters   11/12/07  13:16:15  INFO  mapred.JobClient:          Bytes  Written=3   11/12/07  13:16:15  INFO  mapred.JobClient:      FileSystemCounters   11/12/07  13:16:15  INFO  mapred.JobClient:          FILE_BYTES_READ=13   11/12/07   13:16:15   INFO   mapred.JobClient:          HDFS_BYTES_READ=244   11/12/07   13:16:15   INFO   mapred.JobClient:          FILE_BYTES_WRITTEN=63949   11/12/07   13:16:15   INFO  
  • 12. mapred.JobClient:          HDFS_BYTES_WRITTEN=3   11/12/07  13:16:15  INFO  mapred.JobClient:      Map-­‐Reduce  Framework   11/12/07   13:16:15   INFO   mapred.JobClient:           Map   output   materialized  bytes=19   11/12/07   13:16:15   INFO   mapred.JobClient:           Map   input   records=1   11/12/07   13:16:15   INFO   mapred.JobClient:           Reduce   shuffle   bytes=19   11/12/07  13:16:15  INFO  mapred.JobClient:          Spilled  Records=2   11/12/07  13:16:15  INFO  mapred.JobClient:          Map  output  bytes=5   11/12/07  13:16:15  INFO  mapred.JobClient:          Map  input  bytes=2   11/12/07   13:16:15   INFO   mapred.JobClient:           Combine   input   records=0   11/12/07   13:16:15   INFO   mapred.JobClient:          SPLIT_RAW_BYTES=240   11/12/07   13:16:15   INFO   mapred.JobClient:           Reduce   input   records=1   11/12/07   13:16:15   INFO   mapred.JobClient:           Reduce   input   groups=1   11/12/07   13:16:15   INFO   mapred.JobClient:           Combine   output   records=0   11/12/07   13:16:15   INFO   mapred.JobClient:           Reduce   output   records=1   11/12/07   13:16:15   INFO   mapred.JobClient:           Map   output   records=1   DataLines              Maps        Reduces  AvgTime  (milliseconds)   1                              2              1              39487   If there were no errors running then Hadoop ran without problems. Now you can make Map/Reduce implementations for yourself and use Hadoop to perform distributed processing.