SlideShare una empresa de Scribd logo
1 de 8
Descargar para leer sin conexión
Apache Hama (v0.2) : User Guide
                          a BSP-based distributed computing framework

                                              Edward J. Yoon
                                         <edwardyoon@apache.org>

* Thanks to reviewers (Tommaso, Franklin, Filipe, and all the Hama developer team members).




Introduction
HAMA is a distributed computing framework based on BSP (Bulk Synchronous Parallel) [1] computing
techniques for massive scientific computations (e.g., matrix, graph, network, ..., etc), it is currently being
incubated as one of the incubator project by the Apache Software Foundation.

The greatest benefit of the adoption of BSP computing techniques is the opportunity to speed up iteration
loops during the iterative process which requires several passes of messages before the final processed
output is available, such as finding the shortest path, and so on. (See “Random Communication
Benchmark” results at http://wiki.apache.org/hama/Benchmarks)

Hama also provides an easy-to-program, as well as a flexible programming model, as compared with
traditional models of Message Passing [2], and is also compatible with any distributed storage (e.g.,
HDFS, HBase, …, etc), so you can use the Hama BSP on your existing Hadoop clusters.




Hama Architecture
Hama consists of three major components: BSPMaster, GroomServers and Zookeeper. It is very similar
to Hadoop architecture, except for the portion of communication and synchronization mechanisms.
BSPMaster

BSPMaster is responsible for the following:

    ●   Maintaining groom server status.
    ●   Controlling super steps in a cluster.
    ●   Maintaining job progress information.
    ●   Scheduling Jobs and Assigning tasks to groom servers
    ●   Disseminating execution class across groom servers.
    ●   Controlling fault.
    ●   Providing users with the cluster control interface.

A BSP Master and multiple grooms are started by the script. Then, the BSP master starts up with a RPC
server for groom servers. Groom servers start up with a BSPPeer instance and a RPC proxy to contact
the BSP master. After it’s started, each groom periodically sends a heartbeat message that encloses its
groom server status, including maximum task capacity, unused memory, and so on.

Each time the BSP master receives a heartbeat message, it brings the groom server status up-to-date.
Then, the BSP master makes use of the groom servers' status in order to effectively assign tasks to idle
groom servers and returns a heartbeat response that contains assigned tasks and others actions that
a groom server has to do. For now, we have a FIFO [3] job scheduler and very simple task assignment
algorithms.



GroomServer

A Groom Server (shortly referred to as groom) is a process that performs BSP tasks assigned by the
BSPMaster. Each groom contacts the BSPMaster and takes assigned tasks and reports its status by
means of periodical piggybacks with the BSPMaster. Each groom is designed to run with HDFS or other
distributed storages. Basically, a groom server and a data node should be run on one physical node.
Zookeeper

A Zookeeper is used to manage the efficient barrier synchronisation of the BSPPeers. (Later, it will also
be used for the area of a fault tolerance system.)




BSP Programming Model
Hama BSP is based on the Bulk Synchronous Parallel Model (BSP), in which a computation involves a
number of supersteps, each having many distributed computational peers that synchronize at the end of
the superstep. Basically, a BSP program consists of a sequence of supersteps. Each superstep consists
of the following three phases:

    ●    Local computation
    ●    Process communication
    ●    Barrier synchronization

NOTE that these phases should be always be in sequential order. (To see more detailed information
about “Bulk Synchronous Parallel Model” - http://en.wikipedia.org/wiki/Bulk_synchronous_parallel)

Hama provides a user-defined function “bsp()” that can be used to write your own BSP program. The
bsp() function handles the whole parallel part of the program. (This means that the bsp() function is not a
iteration part of the program.)
In the 0.2 version, it only takes one argument, which is a communication protocol interface. Later, more
arguments such as input or reporter could also be included.



Communication

Within the bsp() function, you can use the powerful communication functions for many purposes using
BSPPeerProtocol. We tried to follow the standard library of BSP world as much as possible. The following
table describes all the functions you can use:


Function                                        Description

send(String peerName, BSPMessage msg)           Sends a message to another peer.


put(BSPMessage msg)                             Puts a message to local queue.

getCurrentMessage()                             Returns a received message.

getNumCurrentMessages()                         Returns the number of received messages.

sync()                                          Barrier synchronization.
getPeerName()                                    Returns a peer’s hostname.

getAllPeerNames()                                Returns all peer’s hostname.

getSuperstepCount()                              Returns the count of supersteps.


The send(), put() and all the other functions are very flexible. For example, you can send a lot of
messages to any of processes in bsp() function:


        public void bsp(BSPPeerProtocol bspPeer) throws IOException,
                           KeeperException, InterruptedException {
                 for (String otherPeer : bspPeer.getAllPeerNames()) {
                           String peerName = bspPeer.getPeerName();
                           BSPMessage msg =
                                    new BSPMessage(Bytes.toBytes(peerName), Bytes.toBytes(“Hi”));
                           bspPeer.send(peerName, mgs);
                 }
                 bspPeer.sync();
        }




Synchronization

When all the processes have entered the barrier via the sync() function, the Hama proceeds to the next
superstep. In the previous example, the BSP job will be finished by one synchronization after sending a
message “Hi” to all peers.

But, keep in mind that the sync() function is not the end of the BSP job. As was previously mentioned, all
the communication functions are very flexible. For example, the sync() function also can be called in a for
loop so that you can use to program the iterative methods sequentally:


        public void bsp(BSPPeerProtocol bspPeer) throws IOException,
                             KeeperException, InterruptedException {
                 for (int i = 0; i < 100; i++) {
                             ….
                             bspPeer.sync();
                 }
        }


The BSP job will be finished only when all the processes have no more local and outgoing queue entries
and all processes are done, or killed by the user.




Example
Let’s take one more practical example. We may want to estimate PI using Hama BSP.
The value of PI can be calculated in a number of ways. In this example, we are estimating PI by using the
following method:

    ●   Each task locally executes its portion of the loop a number of times.

        iterations = 10000
        circle_count = 0

        do j = 1,iterations
                 generate 2 random numbers between 0 and 1
                 xcoordinate = random1
                 ycoordinate = random2
                 if (xcoordinate, ycoordinate) inside circle
                 then circle_count = circle_count + 1
        end do

        PI = 4.0*circle_count/iterations

    ●   One task acts as master and collects the results through the BSP communication interface.

        PI = pi_sum / n_processes

1) Each process computes the value of Pi locally, and 2) sends it to a master task using the send()
function. Then, 3) the master task can recieve the messages using the sync() function. Finally, we can
calculate the average value of the sum of PI values from each peer, shown below:


        public void bsp(BSPPeerProtocol bspPeer) throws IOException,
                            KeeperException, InterruptedException {
                int in = 0, out = 0;
                for (int i = 0; i < iterations; i++) {
                            double x = 2.0 * Math.random() - 1.0, y = 2.0 * Math.random() - 1.0;
                            if ((Math.sqrt(x * x + y * y) < 1.0)) {
                                       in++;
                            } else {
                                       out++;
                            }
                }

                byte[] tagName = Bytes.toBytes(bspPeer.getPeerName());
                byte[] myData = Bytes.toBytes(4.0 * (double) in / (double) iterations);
                BSPMessage estimate = new BSPMessage(tagName, myData);

                bspPeer.send(masterTask, estimate);
                bspPeer.sync();

                if (bspPeer.getPeerName().equals(masterTask)) {
                         double pi = 0.0;
                         int numPeers = bspPeer.getNumCurrentMessage();
                         BSPMessage received;
                         while ((received = bspPeer.getCurrentMessage()) != null) {
pi += Bytes.toDouble(received.getData());
                       }

                       pi = pi / numPeers;
                       writeResult(pi);
               }
       }




BSP Job Configuration

The BSP job configuration interface is almost the same as the MapReduce job configuration:


       HamaConfiguration conf = new HamaConfiguration();
       BSPJob bsp = new BSPJob(conf, MyBSP.class);
       bsp.setJobName("My BSP program");
       bsp.setBspClass(MyEstimator.class);
       …
       bsp.waitForCompletion(true);


If you are familiar with MapReduce programming, you can skim this section.


Getting Started With Hama

Requirements

Current Hama requires JRE 1.6 or higher and ssh to be set up between nodes in the cluster:

   ●   hadoop-0.20.x for HDFS (distributed mode only)
   ●   Sun Java JDK 1.5.x or higher version


Startup Scripts and Run examples

The {$HAMA_HOME/bin} directory contains some script used to start up the Hama daemons.

   ●   start-bspd.sh - Starts all Hama daemons, the BSPMaster, GroomServers and Zookeeper.

To start the Hama, run the following command:

       $ bin/start-bspd.sh

This will startup a BSPMaster, GroomServers and Zookeeper on your machine.
To run the examples, run the following command:
$ bin/hama jar build/hama-0.2.0-dev-examples.jar pi or test

Then, everything is OK if you see the following:


[root@test1 hama-trunk]# bin/start-bspd.sh
test1: starting zookeeper, logging to /usr/local/src/hama-trunk/bin/../logs/hama-root-zookeeper-test1.out
starting bspmaster, logging to /usr/local/src/hama-trunk/bin/../logs/hama-root-bspmaster-test1.out
test2: starting groom, logging to /usr/local/src/hama-trunk/bin/../logs/hama-root-groom-test2.out
test3: starting groom, logging to /usr/local/src/hama-trunk/bin/../logs/hama-root-groom-test3.out
test4: starting groom, logging to /usr/local/src/hama-trunk/bin/../logs/hama-root-groom-test4.out
test1: starting groom, logging to /usr/local/src/hama-trunk/bin/../logs/hama-root-groom-test1.out
[root@test1 hama-trunk]#
[root@test1 hama-trunk]#
[root@test1 hama-trunk]# bin/hama jar build/hama-0.2.0-dev-examples.jar pi
11/02/15 15:49:44 INFO bsp.BSPJobClient: Running job: job_201102151549_0001
11/02/15 15:49:51 INFO bsp.BSPJobClient: The total number of supersteps: 1
Estimated value of PI is 3.1445
Job Finished in 6.683 seconds
[root@test1 hama-trunk]# bin/hama jar build/hama-0.2.0-dev-examples.jar test
11/02/15 15:51:50 INFO bsp.BSPJobClient: Running job: job_201102151549_0002
11/02/15 15:52:17 INFO bsp.BSPJobClient: The total number of supersteps: 4
Each task printed the "Hello World" as below:
Tue Feb 15 15:51:51 KST 2011: Hello BSP from 1 of 4: test1:61000
Tue Feb 15 15:52:00 KST 2011: Hello BSP from 2 of 4: test2:61000
Tue Feb 15 15:52:06 KST 2011: Hello BSP from 3 of 4: test3:61000
Tue Feb 15 15:52:11 KST 2011: Hello BSP from 4 of 4: test4:61000


(To see more detailed information about “Getting Started With Hama” - http://wiki.apache.org/hama/
GettingStarted)


Command Line Interfaces

In previous section, the jar command has introduced. In addition to this command, Hama provides several
command for BSP job administration:

              Usage: hama job [-submit <job-file>] | [-status <job-id>] | [-kill <job-id>] | [-list [all]] | [-list-active-
              grooms] | [-kill-task <task-id>] | [-fail-task <task-id>]


Command                         Description

-submit <job-file>              Submits the job.

-status <job-id>                Prints the job status.

-kill <job-id>                  Kills the job.

-list [all]                     -list all displays all jobs. -list displays only jobs which are yet to be completed.

-list-active-grooms             Displays the list of active groom server in the cluster.
-list-attempt-ids      Displays the list of tasks for a given job currently in a particular state (running or
<jobId> <task-state>   completed).

-kill-task <task-id>   Kills the task. Killed tasks are NOT counted against failed attempts.

-fail-task <task-id>   Fails the task. Failed tasks are counted against failed attempts.




References
    1. Bulk Synchronous Parallel, http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
    2. Message Passing, http://en.wikipedia.org/wiki/Message_Passing
    3. FIFO, http://en.wikipedia.org/wiki/FIFO

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

The Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphXThe Pregel Programming Model with Spark GraphX
The Pregel Programming Model with Spark GraphX
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
Unit 2 part-2
Unit 2 part-2Unit 2 part-2
Unit 2 part-2
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)
 
Unit 2
Unit 2Unit 2
Unit 2
 
maXbox Starter78 PortablePixmap
maXbox Starter78 PortablePixmapmaXbox Starter78 PortablePixmap
maXbox Starter78 PortablePixmap
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Unit 4-apache pig
Unit 4-apache pigUnit 4-apache pig
Unit 4-apache pig
 
Lec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptxLec_4_1_IntrotoPIG.pptx
Lec_4_1_IntrotoPIG.pptx
 
An Introduction To Map-Reduce
An Introduction To Map-ReduceAn Introduction To Map-Reduce
An Introduction To Map-Reduce
 
Task Parallel Library Data Flows
Task Parallel Library Data FlowsTask Parallel Library Data Flows
Task Parallel Library Data Flows
 
Unit 3
Unit 3Unit 3
Unit 3
 

Similar a Apache hama 0.2-userguide

Writing a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdfWriting a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdfRomanKhavronenko
 
Hacker Thursdays: An introduction to binary exploitation
Hacker Thursdays: An introduction to binary exploitationHacker Thursdays: An introduction to binary exploitation
Hacker Thursdays: An introduction to binary exploitationOWASP Hacker Thursday
 
Jackson beyond JSON: XML, CSV
Jackson beyond JSON: XML, CSVJackson beyond JSON: XML, CSV
Jackson beyond JSON: XML, CSVTatu Saloranta
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPIyaman dua
 
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopApache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopEdward Yoon
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойSigma Software
 
Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014MongoDB
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Databricks
 
JIT compilation for CPython
JIT compilation for CPythonJIT compilation for CPython
JIT compilation for CPythondelimitry
 
Analyzing Firebird 3.0
Analyzing Firebird 3.0Analyzing Firebird 3.0
Analyzing Firebird 3.0PVS-Studio
 
Stackless Python In Eve
Stackless Python In EveStackless Python In Eve
Stackless Python In Eveguest91855c
 
Stackless Python In Eve
Stackless Python In EveStackless Python In Eve
Stackless Python In Evel xf
 
OmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPOmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPIntel IT Center
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationGlobalLogic Ukraine
 

Similar a Apache hama 0.2-userguide (20)

Writing a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdfWriting a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdf
 
MPI
MPIMPI
MPI
 
Hacker Thursdays: An introduction to binary exploitation
Hacker Thursdays: An introduction to binary exploitationHacker Thursdays: An introduction to binary exploitation
Hacker Thursdays: An introduction to binary exploitation
 
Jackson beyond JSON: XML, CSV
Jackson beyond JSON: XML, CSVJackson beyond JSON: XML, CSV
Jackson beyond JSON: XML, CSV
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
 
Flow control in node.js
Flow control in node.jsFlow control in node.js
Flow control in node.js
 
Go on!
Go on!Go on!
Go on!
 
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on HadoopApache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай Мозговой
 
Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014Hadoop - MongoDB Webinar June 2014
Hadoop - MongoDB Webinar June 2014
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
 
JIT compilation for CPython
JIT compilation for CPythonJIT compilation for CPython
JIT compilation for CPython
 
Analyzing Firebird 3.0
Analyzing Firebird 3.0Analyzing Firebird 3.0
Analyzing Firebird 3.0
 
Analyzing Firebird 3.0
Analyzing Firebird 3.0Analyzing Firebird 3.0
Analyzing Firebird 3.0
 
Stackless Python In Eve
Stackless Python In EveStackless Python In Eve
Stackless Python In Eve
 
Parallel computing(2)
Parallel computing(2)Parallel computing(2)
Parallel computing(2)
 
Stackless Python In Eve
Stackless Python In EveStackless Python In Eve
Stackless Python In Eve
 
OmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPOmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMP
 
Java 8 Lambda and Streams
Java 8 Lambda and StreamsJava 8 Lambda and Streams
Java 8 Lambda and Streams
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python Integration
 

Más de Edward Yoon

(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learningEdward Yoon
 
K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링Edward Yoon
 
차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스Edward Yoon
 
Quick Understanding of NoSQL
Quick Understanding of NoSQLQuick Understanding of NoSQL
Quick Understanding of NoSQLEdward Yoon
 
The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big dataEdward Yoon
 
Apache hama @ Samsung SW Academy
Apache hama @ Samsung SW AcademyApache hama @ Samsung SW Academy
Apache hama @ Samsung SW AcademyEdward Yoon
 
Introduction of Apache Hama - 2011
Introduction of Apache Hama - 2011Introduction of Apache Hama - 2011
Introduction of Apache Hama - 2011Edward Yoon
 
MongoDB introduction
MongoDB introductionMongoDB introduction
MongoDB introductionEdward Yoon
 
Monitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsMonitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsEdward Yoon
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time applicationEdward Yoon
 
Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear AlgebraEdward Yoon
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And HbaseEdward Yoon
 

Más de Edward Yoon (13)

(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
(소스콘 2015 발표자료) Apache HORN, a large scale deep learning
 
K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링K means 알고리즘을 이용한 영화배우 클러스터링
K means 알고리즘을 이용한 영화배우 클러스터링
 
차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스차세대하둡과 주목해야할 오픈소스
차세대하둡과 주목해야할 오픈소스
 
Quick Understanding of NoSQL
Quick Understanding of NoSQLQuick Understanding of NoSQL
Quick Understanding of NoSQL
 
The evolution of web and big data
The evolution of web and big dataThe evolution of web and big data
The evolution of web and big data
 
Apache hama @ Samsung SW Academy
Apache hama @ Samsung SW AcademyApache hama @ Samsung SW Academy
Apache hama @ Samsung SW Academy
 
Introduction of Apache Hama - 2011
Introduction of Apache Hama - 2011Introduction of Apache Hama - 2011
Introduction of Apache Hama - 2011
 
MongoDB introduction
MongoDB introductionMongoDB introduction
MongoDB introduction
 
Monitoring and mining network traffic in clouds
Monitoring and mining network traffic in cloudsMonitoring and mining network traffic in clouds
Monitoring and mining network traffic in clouds
 
Usage case of HBase for real-time application
Usage case of HBase for real-time applicationUsage case of HBase for real-time application
Usage case of HBase for real-time application
 
Understand Of Linear Algebra
Understand Of Linear AlgebraUnderstand Of Linear Algebra
Understand Of Linear Algebra
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And Hbase
 
Heart Proposal
Heart ProposalHeart Proposal
Heart Proposal
 

Último

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 

Último (20)

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 

Apache hama 0.2-userguide

  • 1. Apache Hama (v0.2) : User Guide a BSP-based distributed computing framework Edward J. Yoon <edwardyoon@apache.org> * Thanks to reviewers (Tommaso, Franklin, Filipe, and all the Hama developer team members). Introduction HAMA is a distributed computing framework based on BSP (Bulk Synchronous Parallel) [1] computing techniques for massive scientific computations (e.g., matrix, graph, network, ..., etc), it is currently being incubated as one of the incubator project by the Apache Software Foundation. The greatest benefit of the adoption of BSP computing techniques is the opportunity to speed up iteration loops during the iterative process which requires several passes of messages before the final processed output is available, such as finding the shortest path, and so on. (See “Random Communication Benchmark” results at http://wiki.apache.org/hama/Benchmarks) Hama also provides an easy-to-program, as well as a flexible programming model, as compared with traditional models of Message Passing [2], and is also compatible with any distributed storage (e.g., HDFS, HBase, …, etc), so you can use the Hama BSP on your existing Hadoop clusters. Hama Architecture Hama consists of three major components: BSPMaster, GroomServers and Zookeeper. It is very similar to Hadoop architecture, except for the portion of communication and synchronization mechanisms.
  • 2. BSPMaster BSPMaster is responsible for the following: ● Maintaining groom server status. ● Controlling super steps in a cluster. ● Maintaining job progress information. ● Scheduling Jobs and Assigning tasks to groom servers ● Disseminating execution class across groom servers. ● Controlling fault. ● Providing users with the cluster control interface. A BSP Master and multiple grooms are started by the script. Then, the BSP master starts up with a RPC server for groom servers. Groom servers start up with a BSPPeer instance and a RPC proxy to contact the BSP master. After it’s started, each groom periodically sends a heartbeat message that encloses its groom server status, including maximum task capacity, unused memory, and so on. Each time the BSP master receives a heartbeat message, it brings the groom server status up-to-date. Then, the BSP master makes use of the groom servers' status in order to effectively assign tasks to idle groom servers and returns a heartbeat response that contains assigned tasks and others actions that a groom server has to do. For now, we have a FIFO [3] job scheduler and very simple task assignment algorithms. GroomServer A Groom Server (shortly referred to as groom) is a process that performs BSP tasks assigned by the BSPMaster. Each groom contacts the BSPMaster and takes assigned tasks and reports its status by means of periodical piggybacks with the BSPMaster. Each groom is designed to run with HDFS or other distributed storages. Basically, a groom server and a data node should be run on one physical node.
  • 3. Zookeeper A Zookeeper is used to manage the efficient barrier synchronisation of the BSPPeers. (Later, it will also be used for the area of a fault tolerance system.) BSP Programming Model Hama BSP is based on the Bulk Synchronous Parallel Model (BSP), in which a computation involves a number of supersteps, each having many distributed computational peers that synchronize at the end of the superstep. Basically, a BSP program consists of a sequence of supersteps. Each superstep consists of the following three phases: ● Local computation ● Process communication ● Barrier synchronization NOTE that these phases should be always be in sequential order. (To see more detailed information about “Bulk Synchronous Parallel Model” - http://en.wikipedia.org/wiki/Bulk_synchronous_parallel) Hama provides a user-defined function “bsp()” that can be used to write your own BSP program. The bsp() function handles the whole parallel part of the program. (This means that the bsp() function is not a iteration part of the program.) In the 0.2 version, it only takes one argument, which is a communication protocol interface. Later, more arguments such as input or reporter could also be included. Communication Within the bsp() function, you can use the powerful communication functions for many purposes using BSPPeerProtocol. We tried to follow the standard library of BSP world as much as possible. The following table describes all the functions you can use: Function Description send(String peerName, BSPMessage msg) Sends a message to another peer. put(BSPMessage msg) Puts a message to local queue. getCurrentMessage() Returns a received message. getNumCurrentMessages() Returns the number of received messages. sync() Barrier synchronization.
  • 4. getPeerName() Returns a peer’s hostname. getAllPeerNames() Returns all peer’s hostname. getSuperstepCount() Returns the count of supersteps. The send(), put() and all the other functions are very flexible. For example, you can send a lot of messages to any of processes in bsp() function: public void bsp(BSPPeerProtocol bspPeer) throws IOException, KeeperException, InterruptedException { for (String otherPeer : bspPeer.getAllPeerNames()) { String peerName = bspPeer.getPeerName(); BSPMessage msg = new BSPMessage(Bytes.toBytes(peerName), Bytes.toBytes(“Hi”)); bspPeer.send(peerName, mgs); } bspPeer.sync(); } Synchronization When all the processes have entered the barrier via the sync() function, the Hama proceeds to the next superstep. In the previous example, the BSP job will be finished by one synchronization after sending a message “Hi” to all peers. But, keep in mind that the sync() function is not the end of the BSP job. As was previously mentioned, all the communication functions are very flexible. For example, the sync() function also can be called in a for loop so that you can use to program the iterative methods sequentally: public void bsp(BSPPeerProtocol bspPeer) throws IOException, KeeperException, InterruptedException { for (int i = 0; i < 100; i++) { …. bspPeer.sync(); } } The BSP job will be finished only when all the processes have no more local and outgoing queue entries and all processes are done, or killed by the user. Example Let’s take one more practical example. We may want to estimate PI using Hama BSP.
  • 5. The value of PI can be calculated in a number of ways. In this example, we are estimating PI by using the following method: ● Each task locally executes its portion of the loop a number of times. iterations = 10000 circle_count = 0 do j = 1,iterations generate 2 random numbers between 0 and 1 xcoordinate = random1 ycoordinate = random2 if (xcoordinate, ycoordinate) inside circle then circle_count = circle_count + 1 end do PI = 4.0*circle_count/iterations ● One task acts as master and collects the results through the BSP communication interface. PI = pi_sum / n_processes 1) Each process computes the value of Pi locally, and 2) sends it to a master task using the send() function. Then, 3) the master task can recieve the messages using the sync() function. Finally, we can calculate the average value of the sum of PI values from each peer, shown below: public void bsp(BSPPeerProtocol bspPeer) throws IOException, KeeperException, InterruptedException { int in = 0, out = 0; for (int i = 0; i < iterations; i++) { double x = 2.0 * Math.random() - 1.0, y = 2.0 * Math.random() - 1.0; if ((Math.sqrt(x * x + y * y) < 1.0)) { in++; } else { out++; } } byte[] tagName = Bytes.toBytes(bspPeer.getPeerName()); byte[] myData = Bytes.toBytes(4.0 * (double) in / (double) iterations); BSPMessage estimate = new BSPMessage(tagName, myData); bspPeer.send(masterTask, estimate); bspPeer.sync(); if (bspPeer.getPeerName().equals(masterTask)) { double pi = 0.0; int numPeers = bspPeer.getNumCurrentMessage(); BSPMessage received; while ((received = bspPeer.getCurrentMessage()) != null) {
  • 6. pi += Bytes.toDouble(received.getData()); } pi = pi / numPeers; writeResult(pi); } } BSP Job Configuration The BSP job configuration interface is almost the same as the MapReduce job configuration: HamaConfiguration conf = new HamaConfiguration(); BSPJob bsp = new BSPJob(conf, MyBSP.class); bsp.setJobName("My BSP program"); bsp.setBspClass(MyEstimator.class); … bsp.waitForCompletion(true); If you are familiar with MapReduce programming, you can skim this section. Getting Started With Hama Requirements Current Hama requires JRE 1.6 or higher and ssh to be set up between nodes in the cluster: ● hadoop-0.20.x for HDFS (distributed mode only) ● Sun Java JDK 1.5.x or higher version Startup Scripts and Run examples The {$HAMA_HOME/bin} directory contains some script used to start up the Hama daemons. ● start-bspd.sh - Starts all Hama daemons, the BSPMaster, GroomServers and Zookeeper. To start the Hama, run the following command: $ bin/start-bspd.sh This will startup a BSPMaster, GroomServers and Zookeeper on your machine. To run the examples, run the following command:
  • 7. $ bin/hama jar build/hama-0.2.0-dev-examples.jar pi or test Then, everything is OK if you see the following: [root@test1 hama-trunk]# bin/start-bspd.sh test1: starting zookeeper, logging to /usr/local/src/hama-trunk/bin/../logs/hama-root-zookeeper-test1.out starting bspmaster, logging to /usr/local/src/hama-trunk/bin/../logs/hama-root-bspmaster-test1.out test2: starting groom, logging to /usr/local/src/hama-trunk/bin/../logs/hama-root-groom-test2.out test3: starting groom, logging to /usr/local/src/hama-trunk/bin/../logs/hama-root-groom-test3.out test4: starting groom, logging to /usr/local/src/hama-trunk/bin/../logs/hama-root-groom-test4.out test1: starting groom, logging to /usr/local/src/hama-trunk/bin/../logs/hama-root-groom-test1.out [root@test1 hama-trunk]# [root@test1 hama-trunk]# [root@test1 hama-trunk]# bin/hama jar build/hama-0.2.0-dev-examples.jar pi 11/02/15 15:49:44 INFO bsp.BSPJobClient: Running job: job_201102151549_0001 11/02/15 15:49:51 INFO bsp.BSPJobClient: The total number of supersteps: 1 Estimated value of PI is 3.1445 Job Finished in 6.683 seconds [root@test1 hama-trunk]# bin/hama jar build/hama-0.2.0-dev-examples.jar test 11/02/15 15:51:50 INFO bsp.BSPJobClient: Running job: job_201102151549_0002 11/02/15 15:52:17 INFO bsp.BSPJobClient: The total number of supersteps: 4 Each task printed the "Hello World" as below: Tue Feb 15 15:51:51 KST 2011: Hello BSP from 1 of 4: test1:61000 Tue Feb 15 15:52:00 KST 2011: Hello BSP from 2 of 4: test2:61000 Tue Feb 15 15:52:06 KST 2011: Hello BSP from 3 of 4: test3:61000 Tue Feb 15 15:52:11 KST 2011: Hello BSP from 4 of 4: test4:61000 (To see more detailed information about “Getting Started With Hama” - http://wiki.apache.org/hama/ GettingStarted) Command Line Interfaces In previous section, the jar command has introduced. In addition to this command, Hama provides several command for BSP job administration: Usage: hama job [-submit <job-file>] | [-status <job-id>] | [-kill <job-id>] | [-list [all]] | [-list-active- grooms] | [-kill-task <task-id>] | [-fail-task <task-id>] Command Description -submit <job-file> Submits the job. -status <job-id> Prints the job status. -kill <job-id> Kills the job. -list [all] -list all displays all jobs. -list displays only jobs which are yet to be completed. -list-active-grooms Displays the list of active groom server in the cluster.
  • 8. -list-attempt-ids Displays the list of tasks for a given job currently in a particular state (running or <jobId> <task-state> completed). -kill-task <task-id> Kills the task. Killed tasks are NOT counted against failed attempts. -fail-task <task-id> Fails the task. Failed tasks are counted against failed attempts. References 1. Bulk Synchronous Parallel, http://en.wikipedia.org/wiki/Bulk_synchronous_parallel 2. Message Passing, http://en.wikipedia.org/wiki/Message_Passing 3. FIFO, http://en.wikipedia.org/wiki/FIFO