SlideShare una empresa de Scribd logo
1 de 6
Hadoop Cluster Configuration on AWS EC2

-----------------------------------------------------------------------------------------------------------
Buy some Instances on Aws amazon and one master and 10 slaves

ec2-50-17-21-209.compute-1.amazonaws.com master
ec2-54-242-251-124.compute-1.amazonaws.com slave1
ec2-23-23-17-15.compute-1.amazonaws.com slave2
ec2-50-19-79-241.compute-1.amazonaws.com slave3
ec2-50-16-49-229.compute-1.amazonaws.com slave4
ec2-174-129-99-84.compute-1.amazonaws.com slave5
ec2-50-16-105-188.compute-1.amazonaws.com slave6
ec2-174-129-92-105.compute-1.amazonaws.com slave7
ec2-54-242-20-144.compute-1.amazonaws.com slave8
ec2-54-243-24-10.compute-1.amazonaws.com slave9
ec2-204-236-205-227.compute-1.amazonaws.com slave10
----------------------------------------------------------------------------------------------------------------------------
     • Make seperation as one master and 10 slaves
----------------------------------------------------------------------------------------------------------------------------
     • Make sure ssh is working from master to all slaves
----------------------------------------------------------------------------------------------------------------------------
     • Add the Ip and DNS name and duplicate DNS name in /etc/hosts in master
----------------------------------------------------------------------------------------------------------------------------
     • Master /etc/hosts file Looks like this.

        127.0.0.1 localhost localhost.localdomain
        10.155.245.153       ec2-50-17-21-209.compute-1.amazonaws.com master
        10.155.244.83        ec2-54-242-251-124.compute-1.amazonaws.com slave1
        10.155.245.185       ec2-23-23-17-15.compute-1.amazonaws.com slave2
        10.155.244.208       ec2-50-19-79-241.compute-1.amazonaws.com slave3
        10.155.244.246       ec2-50-16-49-229.compute-1.amazonaws.com slave4
        10.155.245.217       ec2-174-129-99-84.compute-1.amazonaws.com slave5
        10.155.244.177       ec2-50-16-105-188.compute-1.amazonaws.com slave6
        10.155.245.152       ec2-174-129-92-105.compute-1.amazonaws.com slave7
        10.155.244.145       ec2-54-242-20-144.compute-1.amazonaws.com slave8
        10.155.244.71        ec2-54-243-24-10.compute-1.amazonaws.com slave9
        10.155.244.46        ec2-204-236-205-227.compute-1.amazonaws.com slave10

----------------------------------------------------------------------------------------------------------------------------
     • and slaves etc/hosts file looks like this.

    •    remove 127.0.0.1 in all slaves

        10.155.245.153             ec2-50-17-21-209.compute-1.amazonaws.com master
        10.155.244.83              ec2-54-242-251-124.compute-1.amazonaws.com slave1
        10.155.245.185             ec2-23-23-17-15.compute-1.amazonaws.com slave2
        10.155.244.208             ec2-50-19-79-241.compute-1.amazonaws.com slave3
        10.155.244.246             ec2-50-16-49-229.compute-1.amazonaws.com slave4
10.155.245.217             ec2-174-129-99-84.compute-1.amazonaws.com slave5
        10.155.244.177             ec2-50-16-105-188.compute-1.amazonaws.com slave6
        10.155.245.152             ec2-174-129-92-105.compute-1.amazonaws.com slave7
        10.155.244.145             ec2-54-242-20-144.compute-1.amazonaws.com slave8
        10.155.244.71              ec2-54-243-24-10.compute-1.amazonaws.com slave9
        10.155.244.46              ec2-204-236-205-227.compute-1.amazonaws.com slave10

---------------------------------------------------------------------------------------------------------------------------
     • Download Hadoop installation folder from ApacheHadoop release and keep it in master folder
         (Ex:-/usr/local/hadoop1.0.4)
----------------------------------------------------------------------------------------------------------------------------
     • Open the Hadoop.env.sh file from (Hadoop-.10.4/conf/) folder
----------------------------------------------------------------------------------------------------------------------------
     • set the environment variables for JAVA PATH,HADOOP_HOME, LD_LIBRARY_PATH,
         HADOOP_OPTS

         export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64
         export HADOOP_HOME=/usr/local/hadoop-1.0.4/
         export LD_LIBRARY_PATH=/usr/local/hadoop-1.0.4/lib/native/Linux-amd64-64
         export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
         export HADOOP_HEAPSIZE=400000 (in MB)
----------------------------------------------------------------------------------------------------------------------------
     • Open the Hdfs-Site.xml file.

   • and set the following param's
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
         <name>hadoop.log.dir</name>
         <value>/media/ephemeral0/logs</value>
    </property>
     <property>
         <name>hadoop.tmp.dir</name>
         <value>/media/ephemeral0/tmp-${user.name}
       </value>
    </property>
    <property>
         <name>dfs.data.dir</name>
         <value>/media/ephemeral0/data-${user.name}</value>
    </property>
    <property>
         <name>dfs.name.dir</name>
         <value>/media/ephemeral0/name-${user.name}</value>
    </property>
<property>
             <name>fs.default.name</name>
             <value>hdfs://master:9000</value>
        </property>
        <property>
             <name>dfs.replication</name>
             <value>3</value>
             <description>Default block replication</description>
        </property>
           <property>
                <name>dfs.block.size</name>
                        <value>536870912</value>
                 <description>Default block replication.
               The actual number of replications can be specified when the file is created.
                 The default is used if replication is not specified in create time.
                 </description>
                  </property>

</configuration>

----------------------------------------------------------------------------------------------------------------------------
     • Open the Mapred-site.xml.

    •     and set the following param's

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
         <name>hadoop.log.dir</name>
         <value>/media/ephemeral0/logs</value>
    </property>
    <property>
         <name>mapred.child.java.opts</name>
         <value>60000</value>
    </property>
       <property>
         <name>dfs.datanode.max.xcievers</name>
         <value>-Xmx400m</value>
    </property>
       <property>
         <name>mapred.tasktracker.map.tasks.maximum</name>
         <value>14</value>
    </property>
       <property>
         <name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>14</value>
     </property>
<property>
 <name>mapred.system.dir</name>
  <value>/media/ephemeral0/system-${user.name}</value>
   <description>
      system directory to run map and reduce tasks
       </description>
       </property>
<property>
<name>hadoop.log.dir</name>
<value>/media/ephemeral0/log-${user.name}</value>
</property>
       <property>
        <name>mapred.job.tracker</name>
          <value>master:9001</value>
           <description>The host and port that the MapReduce job tracker runs
            at. If "local", then jobs are run in-process as a single map
             and reduce task.
               </description>
               </property>
       <property>
             <name>mapred.tasktracker.map.tasks.maximum</name>
             <value>10</value>
             <description>The host and port that the MapReduce job tracker runs
               at. If "local", then jobs are run in-process as a single map
                 and reduce task.
             </description>
             </property>
<property>
 <name> mapreduce.map.output.compress</name>
 <value>true</value>
</property>
<property>
 <name>mapreduce.map.output.compress.codec</name>
 <value>org.apache.hadoop.io.compress.GzipCodec</value>
</property>
<property>
   <name>mapred.create.symlink</name>
   <value>true</value>
</property>
<property>
<name>mapred.child.ulimit</name>
<value>unlimited</value>
</property>
</configuration>
----------------------------------------------------------------------------------------------------------------------------
     • Open the Core-Site.Xml

    •    and set the following param's

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
      <property>
            <name>dfs.data.dir</name>
            <value>/media/ephemeral0/data-${user.name}</value>
      </property>
      <property>
            <name>hadoop.tmp.dir</name>
            <value>/media/ephemeral0/tmp-${user.name}</value>
      </property>
      <property>
            <name>dfs.data.dir</name>
            <value>/media/ephemeral0/data-${user.name}</value>
      </property>
      <property>
            <name>dfs.name.dir</name>
            <value>/media/ephemeral0/name-${user.name}</value>
      </property>
      <property>
            <name>fs.default.name</name>
            <value>hdfs://master:9000</value>
      </property>
</configuration>
----------------------------------------------------------------------------------------------------------------------------
     • Open the Masters file and set the following param's
         master
----------------------------------------------------------------------------------------------------------------------------
     • Open the Slaves file and set the following param's

        slave1
        salve2
        salve3
        salve4
        salve5
        salve6
        salve7
        salve8
        salve9
salve10
----------------------------------------------------------------------------------------------------------------------------

    •    Give owner Permision to the Ec2-user in all slaves for the folder /media(all folders which are
         all we using for hadoop).
----------------------------------------------------------------------------------------------------------------------------
     • from master copy full hadoop-1.0.4 to all slave
         ex:- Scp -r /usr/local/hadoop-1.0.4 ec2-50-17-21-209.compute-
         1.amazonaws.com:/usr/local/hadoop-1.0.4
----------------------------------------------------------------------------------------------------------------------------
     • copy to all slaves from master.
----------------------------------------------------------------------------------------------------------------------------
     • Add port 50000-50100 in security groups in aws console.
         Hadoop namenode -format from master
         and start-all.sh
----------------------------------------------------------------------------------------------------------------------------

Más contenido relacionado

La actualidad más candente

Memcached Functions For My Sql Seemless Caching In My Sql
Memcached Functions For My Sql Seemless Caching In My SqlMemcached Functions For My Sql Seemless Caching In My Sql
Memcached Functions For My Sql Seemless Caching In My SqlMySQLConference
 
database-querry-student-note
database-querry-student-notedatabase-querry-student-note
database-querry-student-noteLeerpiny Makouach
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?alexbrasetvik
 
Kickin' Ass with Cache-Fu (without notes)
Kickin' Ass with Cache-Fu (without notes)Kickin' Ass with Cache-Fu (without notes)
Kickin' Ass with Cache-Fu (without notes)err
 
Nagios Conference 2012 - Sheeri Cabral - Alerting With MySQL and Nagios
Nagios Conference 2012 - Sheeri Cabral - Alerting With MySQL and NagiosNagios Conference 2012 - Sheeri Cabral - Alerting With MySQL and Nagios
Nagios Conference 2012 - Sheeri Cabral - Alerting With MySQL and NagiosNagios
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalabilityWim Godden
 
Kickin' Ass with Cache-Fu (with notes)
Kickin' Ass with Cache-Fu (with notes)Kickin' Ass with Cache-Fu (with notes)
Kickin' Ass with Cache-Fu (with notes)err
 
Nashvile Symfony Routes Presentation
Nashvile Symfony Routes PresentationNashvile Symfony Routes Presentation
Nashvile Symfony Routes PresentationBrent Shaffer
 
Managing themes and server environments with extensible configuration arrays
Managing themes and server environments with extensible configuration arraysManaging themes and server environments with extensible configuration arrays
Managing themes and server environments with extensible configuration arraysChris Olbekson
 
MySql 5.7 Backup Script
MySql 5.7 Backup ScriptMySql 5.7 Backup Script
MySql 5.7 Backup ScriptHızlan ERPAK
 
Introduction To Lamp P2
Introduction To Lamp P2Introduction To Lamp P2
Introduction To Lamp P2Amzad Hossain
 
Memcached Presentation @757rb
Memcached Presentation @757rbMemcached Presentation @757rb
Memcached Presentation @757rbKen Collins
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PgDay.Seoul
 
Getting Into Drupal 8 Configuration
Getting Into Drupal 8 ConfigurationGetting Into Drupal 8 Configuration
Getting Into Drupal 8 ConfigurationPhilip Norton
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON Padma shree. T
 
Optimizer Cost Model MySQL 5.7
Optimizer Cost Model MySQL 5.7Optimizer Cost Model MySQL 5.7
Optimizer Cost Model MySQL 5.7I Goo Lee
 
Deconstructing the Functional Web with Clojure
Deconstructing the Functional Web with ClojureDeconstructing the Functional Web with Clojure
Deconstructing the Functional Web with ClojureNorman Richards
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuningelliando dias
 

La actualidad más candente (20)

Memcached Functions For My Sql Seemless Caching In My Sql
Memcached Functions For My Sql Seemless Caching In My SqlMemcached Functions For My Sql Seemless Caching In My Sql
Memcached Functions For My Sql Seemless Caching In My Sql
 
database-querry-student-note
database-querry-student-notedatabase-querry-student-note
database-querry-student-note
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
 
Kickin' Ass with Cache-Fu (without notes)
Kickin' Ass with Cache-Fu (without notes)Kickin' Ass with Cache-Fu (without notes)
Kickin' Ass with Cache-Fu (without notes)
 
Nagios Conference 2012 - Sheeri Cabral - Alerting With MySQL and Nagios
Nagios Conference 2012 - Sheeri Cabral - Alerting With MySQL and NagiosNagios Conference 2012 - Sheeri Cabral - Alerting With MySQL and Nagios
Nagios Conference 2012 - Sheeri Cabral - Alerting With MySQL and Nagios
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
Kickin' Ass with Cache-Fu (with notes)
Kickin' Ass with Cache-Fu (with notes)Kickin' Ass with Cache-Fu (with notes)
Kickin' Ass with Cache-Fu (with notes)
 
Nashvile Symfony Routes Presentation
Nashvile Symfony Routes PresentationNashvile Symfony Routes Presentation
Nashvile Symfony Routes Presentation
 
Managing themes and server environments with extensible configuration arrays
Managing themes and server environments with extensible configuration arraysManaging themes and server environments with extensible configuration arrays
Managing themes and server environments with extensible configuration arrays
 
MySql 5.7 Backup Script
MySql 5.7 Backup ScriptMySql 5.7 Backup Script
MySql 5.7 Backup Script
 
Introduction To Lamp P2
Introduction To Lamp P2Introduction To Lamp P2
Introduction To Lamp P2
 
Memcached Presentation @757rb
Memcached Presentation @757rbMemcached Presentation @757rb
Memcached Presentation @757rb
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
Getting Into Drupal 8 Configuration
Getting Into Drupal 8 ConfigurationGetting Into Drupal 8 Configuration
Getting Into Drupal 8 Configuration
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Optimizer Cost Model MySQL 5.7
Optimizer Cost Model MySQL 5.7Optimizer Cost Model MySQL 5.7
Optimizer Cost Model MySQL 5.7
 
Lost without a trace
Lost without a traceLost without a trace
Lost without a trace
 
Deconstructing the Functional Web with Clojure
Deconstructing the Functional Web with ClojureDeconstructing the Functional Web with Clojure
Deconstructing the Functional Web with Clojure
 
Puppet
PuppetPuppet
Puppet
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
 

Destacado

Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSandish Kumar H N
 
Presto Meetup @ Facebook (3/22/2016)
Presto Meetup @ Facebook (3/22/2016)Presto Meetup @ Facebook (3/22/2016)
Presto Meetup @ Facebook (3/22/2016)Martin Traverso
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at TwitterBill Graham
 
Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Wojciech Biela
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small StartHiroshi Toyama
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 

Destacado (7)

Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto Comparition
 
Sandish3Certs
Sandish3CertsSandish3Certs
Sandish3Certs
 
Presto Meetup @ Facebook (3/22/2016)
Presto Meetup @ Facebook (3/22/2016)Presto Meetup @ Facebook (3/22/2016)
Presto Meetup @ Facebook (3/22/2016)
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
 
Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.Presto - Analytical Database. Overview and use cases.
Presto - Analytical Database. Overview and use cases.
 
Presto Meetup 2016 Small Start
Presto Meetup 2016 Small StartPresto Meetup 2016 Small Start
Presto Meetup 2016 Small Start
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 

Similar a Hadoop on aws amazon

glance replicator
glance replicatorglance replicator
glance replicatoririx_jp
 
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스PgDay.Seoul
 
Network Automation: Ansible 102
Network Automation: Ansible 102Network Automation: Ansible 102
Network Automation: Ansible 102APNIC
 
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce StackIQ
 
10 things I learned building Nomad packs
10 things I learned building Nomad packs10 things I learned building Nomad packs
10 things I learned building Nomad packsBram Vogelaar
 
Deploy hadoop cluster
Deploy hadoop clusterDeploy hadoop cluster
Deploy hadoop clusterChirag Ahuja
 
Short Intro to PHP and MySQL
Short Intro to PHP and MySQLShort Intro to PHP and MySQL
Short Intro to PHP and MySQLJussi Pohjolainen
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesLindsay Holmwood
 
12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs
12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs
12c: Testing audit features for Data Pump (Export & Import) and RMAN jobsMonowar Mukul
 
Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016StackIQ
 
How we use and deploy Varnish at Opera
How we use and deploy Varnish at OperaHow we use and deploy Varnish at Opera
How we use and deploy Varnish at OperaCosimo Streppone
 
Survey of Front End Topics in Rails
Survey of Front End Topics in RailsSurvey of Front End Topics in Rails
Survey of Front End Topics in RailsBenjamin Vandgrift
 
Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013Simon McCartney
 
Drush - use full power - Alexander Schedrov
Drush - use full power - Alexander SchedrovDrush - use full power - Alexander Schedrov
Drush - use full power - Alexander SchedrovDrupalCampDN
 
Drush - use full power - DrupalCamp Donetsk 2014
Drush - use full power - DrupalCamp Donetsk 2014Drush - use full power - DrupalCamp Donetsk 2014
Drush - use full power - DrupalCamp Donetsk 2014Alex S
 
SQL Track: Restoring databases with powershell
SQL Track: Restoring databases with powershellSQL Track: Restoring databases with powershell
SQL Track: Restoring databases with powershellITProceed
 

Similar a Hadoop on aws amazon (20)

glance replicator
glance replicatorglance replicator
glance replicator
 
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
[Pgday.Seoul 2019] Citus를 이용한 분산 데이터베이스
 
Network Automation: Ansible 102
Network Automation: Ansible 102Network Automation: Ansible 102
Network Automation: Ansible 102
 
Stacki - The1600+ Server Journey
Stacki - The1600+ Server JourneyStacki - The1600+ Server Journey
Stacki - The1600+ Server Journey
 
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
 
10 things I learned building Nomad packs
10 things I learned building Nomad packs10 things I learned building Nomad packs
10 things I learned building Nomad packs
 
Play vs Rails
Play vs RailsPlay vs Rails
Play vs Rails
 
Deploy hadoop cluster
Deploy hadoop clusterDeploy hadoop cluster
Deploy hadoop cluster
 
Short Intro to PHP and MySQL
Short Intro to PHP and MySQLShort Intro to PHP and MySQL
Short Intro to PHP and MySQL
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
 
12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs
12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs
12c: Testing audit features for Data Pump (Export & Import) and RMAN jobs
 
Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016
 
How we use and deploy Varnish at Opera
How we use and deploy Varnish at OperaHow we use and deploy Varnish at Opera
How we use and deploy Varnish at Opera
 
infra-as-code
infra-as-codeinfra-as-code
infra-as-code
 
Survey of Front End Topics in Rails
Survey of Front End Topics in RailsSurvey of Front End Topics in Rails
Survey of Front End Topics in Rails
 
Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013Stack kicker devopsdays-london-2013
Stack kicker devopsdays-london-2013
 
Monkey man
Monkey manMonkey man
Monkey man
 
Drush - use full power - Alexander Schedrov
Drush - use full power - Alexander SchedrovDrush - use full power - Alexander Schedrov
Drush - use full power - Alexander Schedrov
 
Drush - use full power - DrupalCamp Donetsk 2014
Drush - use full power - DrupalCamp Donetsk 2014Drush - use full power - DrupalCamp Donetsk 2014
Drush - use full power - DrupalCamp Donetsk 2014
 
SQL Track: Restoring databases with powershell
SQL Track: Restoring databases with powershellSQL Track: Restoring databases with powershell
SQL Track: Restoring databases with powershell
 

Hadoop on aws amazon

  • 1. Hadoop Cluster Configuration on AWS EC2 ----------------------------------------------------------------------------------------------------------- Buy some Instances on Aws amazon and one master and 10 slaves ec2-50-17-21-209.compute-1.amazonaws.com master ec2-54-242-251-124.compute-1.amazonaws.com slave1 ec2-23-23-17-15.compute-1.amazonaws.com slave2 ec2-50-19-79-241.compute-1.amazonaws.com slave3 ec2-50-16-49-229.compute-1.amazonaws.com slave4 ec2-174-129-99-84.compute-1.amazonaws.com slave5 ec2-50-16-105-188.compute-1.amazonaws.com slave6 ec2-174-129-92-105.compute-1.amazonaws.com slave7 ec2-54-242-20-144.compute-1.amazonaws.com slave8 ec2-54-243-24-10.compute-1.amazonaws.com slave9 ec2-204-236-205-227.compute-1.amazonaws.com slave10 ---------------------------------------------------------------------------------------------------------------------------- • Make seperation as one master and 10 slaves ---------------------------------------------------------------------------------------------------------------------------- • Make sure ssh is working from master to all slaves ---------------------------------------------------------------------------------------------------------------------------- • Add the Ip and DNS name and duplicate DNS name in /etc/hosts in master ---------------------------------------------------------------------------------------------------------------------------- • Master /etc/hosts file Looks like this. 127.0.0.1 localhost localhost.localdomain 10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master 10.155.244.83 ec2-54-242-251-124.compute-1.amazonaws.com slave1 10.155.245.185 ec2-23-23-17-15.compute-1.amazonaws.com slave2 10.155.244.208 ec2-50-19-79-241.compute-1.amazonaws.com slave3 10.155.244.246 ec2-50-16-49-229.compute-1.amazonaws.com slave4 10.155.245.217 ec2-174-129-99-84.compute-1.amazonaws.com slave5 10.155.244.177 ec2-50-16-105-188.compute-1.amazonaws.com slave6 10.155.245.152 ec2-174-129-92-105.compute-1.amazonaws.com slave7 10.155.244.145 ec2-54-242-20-144.compute-1.amazonaws.com slave8 10.155.244.71 ec2-54-243-24-10.compute-1.amazonaws.com slave9 10.155.244.46 ec2-204-236-205-227.compute-1.amazonaws.com slave10 ---------------------------------------------------------------------------------------------------------------------------- • and slaves etc/hosts file looks like this. • remove 127.0.0.1 in all slaves 10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master 10.155.244.83 ec2-54-242-251-124.compute-1.amazonaws.com slave1 10.155.245.185 ec2-23-23-17-15.compute-1.amazonaws.com slave2 10.155.244.208 ec2-50-19-79-241.compute-1.amazonaws.com slave3 10.155.244.246 ec2-50-16-49-229.compute-1.amazonaws.com slave4
  • 2. 10.155.245.217 ec2-174-129-99-84.compute-1.amazonaws.com slave5 10.155.244.177 ec2-50-16-105-188.compute-1.amazonaws.com slave6 10.155.245.152 ec2-174-129-92-105.compute-1.amazonaws.com slave7 10.155.244.145 ec2-54-242-20-144.compute-1.amazonaws.com slave8 10.155.244.71 ec2-54-243-24-10.compute-1.amazonaws.com slave9 10.155.244.46 ec2-204-236-205-227.compute-1.amazonaws.com slave10 --------------------------------------------------------------------------------------------------------------------------- • Download Hadoop installation folder from ApacheHadoop release and keep it in master folder (Ex:-/usr/local/hadoop1.0.4) ---------------------------------------------------------------------------------------------------------------------------- • Open the Hadoop.env.sh file from (Hadoop-.10.4/conf/) folder ---------------------------------------------------------------------------------------------------------------------------- • set the environment variables for JAVA PATH,HADOOP_HOME, LD_LIBRARY_PATH, HADOOP_OPTS export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64 export HADOOP_HOME=/usr/local/hadoop-1.0.4/ export LD_LIBRARY_PATH=/usr/local/hadoop-1.0.4/lib/native/Linux-amd64-64 export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true" export HADOOP_HEAPSIZE=400000 (in MB) ---------------------------------------------------------------------------------------------------------------------------- • Open the Hdfs-Site.xml file. • and set the following param's <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.log.dir</name> <value>/media/ephemeral0/logs</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/media/ephemeral0/tmp-${user.name} </value> </property> <property> <name>dfs.data.dir</name> <value>/media/ephemeral0/data-${user.name}</value> </property> <property> <name>dfs.name.dir</name> <value>/media/ephemeral0/name-${user.name}</value> </property>
  • 3. <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>dfs.replication</name> <value>3</value> <description>Default block replication</description> </property> <property> <name>dfs.block.size</name> <value>536870912</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> </configuration> ---------------------------------------------------------------------------------------------------------------------------- • Open the Mapred-site.xml. • and set the following param's <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>hadoop.log.dir</name> <value>/media/ephemeral0/logs</value> </property> <property> <name>mapred.child.java.opts</name> <value>60000</value> </property> <property> <name>dfs.datanode.max.xcievers</name> <value>-Xmx400m</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>14</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name>
  • 4. <value>14</value> </property> <property> <name>mapred.system.dir</name> <value>/media/ephemeral0/system-${user.name}</value> <description> system directory to run map and reduce tasks </description> </property> <property> <name>hadoop.log.dir</name> <value>/media/ephemeral0/log-${user.name}</value> </property> <property> <name>mapred.job.tracker</name> <value>master:9001</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>10</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name> mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapred.create.symlink</name> <value>true</value> </property> <property> <name>mapred.child.ulimit</name> <value>unlimited</value> </property> </configuration>
  • 5. ---------------------------------------------------------------------------------------------------------------------------- • Open the Core-Site.Xml • and set the following param's <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.data.dir</name> <value>/media/ephemeral0/data-${user.name}</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/media/ephemeral0/tmp-${user.name}</value> </property> <property> <name>dfs.data.dir</name> <value>/media/ephemeral0/data-${user.name}</value> </property> <property> <name>dfs.name.dir</name> <value>/media/ephemeral0/name-${user.name}</value> </property> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration> ---------------------------------------------------------------------------------------------------------------------------- • Open the Masters file and set the following param's master ---------------------------------------------------------------------------------------------------------------------------- • Open the Slaves file and set the following param's slave1 salve2 salve3 salve4 salve5 salve6 salve7 salve8 salve9
  • 6. salve10 ---------------------------------------------------------------------------------------------------------------------------- • Give owner Permision to the Ec2-user in all slaves for the folder /media(all folders which are all we using for hadoop). ---------------------------------------------------------------------------------------------------------------------------- • from master copy full hadoop-1.0.4 to all slave ex:- Scp -r /usr/local/hadoop-1.0.4 ec2-50-17-21-209.compute- 1.amazonaws.com:/usr/local/hadoop-1.0.4 ---------------------------------------------------------------------------------------------------------------------------- • copy to all slaves from master. ---------------------------------------------------------------------------------------------------------------------------- • Add port 50000-50100 in security groups in aws console. Hadoop namenode -format from master and start-all.sh ----------------------------------------------------------------------------------------------------------------------------