Hadoop on aws amazon

Hadoop Cluster Configuration on AWS EC2

-----------------------------------------------------------------------------------------------------------
Buy some Instances on Aws amazon and one master and 10 slaves

ec2-50-17-21-209.compute-1.amazonaws.com master
ec2-54-242-251-124.compute-1.amazonaws.com slave1
----------------------------------------------------------------------------------------------------------------------------
• Make seperation as one master and 10 slaves
----------------------------------------------------------------------------------------------------------------------------
• Make sure ssh is working from master to all slaves
----------------------------------------------------------------------------------------------------------------------------
• Add the Ip and DNS name and duplicate DNS name in /etc/hosts in master
----------------------------------------------------------------------------------------------------------------------------
• Master /etc/hosts file Looks like this.

127.0.0.1 localhost localhost.localdomain
10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master
10.155.244.83 ec2-54-242-251-124.compute-1.amazonaws.com slave1

----------------------------------------------------------------------------------------------------------------------------
• and slaves etc/hosts file looks like this.

• remove 127.0.0.1 in all slaves

10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master

---------------------------------------------------------------------------------------------------------------------------
• Download Hadoop installation folder from ApacheHadoop release and keep it in master folder
(Ex:-/usr/local/hadoop1.0.4)
----------------------------------------------------------------------------------------------------------------------------
• Open the Hadoop.env.sh file from (Hadoop-.10.4/conf/) folder
----------------------------------------------------------------------------------------------------------------------------
• set the environment variables for JAVA PATH,HADOOP_HOME, LD_LIBRARY_PATH,
HADOOP_OPTS

export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64
export HADOOP_HOME=/usr/local/hadoop-1.0.4/
export LD_LIBRARY_PATH=/usr/local/hadoop-1.0.4/lib/native/Linux-amd64-64
export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
export HADOOP_HEAPSIZE=400000 (in MB)
----------------------------------------------------------------------------------------------------------------------------
• Open the Hdfs-Site.xml file.

• and set the following param's
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration>
<property>
<name>hadoop.log.dir</name>
<value>/media/ephemeral0/logs</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/media/ephemeral0/tmp-${user.name}
</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/media/ephemeral0/data-${user.name}</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/media/ephemeral0/name-${user.name}</value>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication</description>
</property>
<property>
<name>dfs.block.size</name>
<value>536870912</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

</configuration>

----------------------------------------------------------------------------------------------------------------------------
• Open the Mapred-site.xml.




<configuration>
<property>
<value>/media/ephemeral0/logs</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>60000</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>-Xmx400m</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>14</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>

<value>14</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/media/ephemeral0/system-${user.name}</value>
<description>
system directory to run map and reduce tasks
</description>
</property>
<property>
<value>/media/ephemeral0/log-${user.name}</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>10</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name> mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.GzipCodec</value>
</property>
<property>
<name>mapred.create.symlink</name>
<value>true</value>
</property>
<property>
<name>mapred.child.ulimit</name>
<value>unlimited</value>
</property>
</configuration>

----------------------------------------------------------------------------------------------------------------------------
• Open the Core-Site.Xml




<configuration>
<property>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/media/ephemeral0/tmp-${user.name}</value>
</property>
<property>
</property>
<property>
<name>dfs.name.dir</name>
<value>/media/ephemeral0/name-${user.name}</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
----------------------------------------------------------------------------------------------------------------------------
• Open the Masters file and set the following param's
master
----------------------------------------------------------------------------------------------------------------------------
• Open the Slaves file and set the following param's

slave1
salve2
salve3
salve4
salve5
salve6
salve7
salve8
salve9

salve10
----------------------------------------------------------------------------------------------------------------------------

• Give owner Permision to the Ec2-user in all slaves for the folder /media(all folders which are
all we using for hadoop).
----------------------------------------------------------------------------------------------------------------------------
• from master copy full hadoop-1.0.4 to all slave
ex:- Scp -r /usr/local/hadoop-1.0.4 ec2-50-17-21-209.compute-
1.amazonaws.com:/usr/local/hadoop-1.0.4
----------------------------------------------------------------------------------------------------------------------------
• copy to all slaves from master.
----------------------------------------------------------------------------------------------------------------------------
• Add port 50000-50100 in security groups in aws console.
Hadoop namenode -format from master
and start-all.sh
----------------------------------------------------------------------------------------------------------------------------

Hadoop on aws amazon

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (7)

Similar a Hadoop on aws amazon

Similar a Hadoop on aws amazon (20)

Hadoop on aws amazon