SlideShare una empresa de Scribd logo
1 de 20
CIS 210 February 2013
Sun/Oracle Grid Engine is:
 A quick and easy way to set up a multi-
  cluster system using existing hardware
 Oracle Grid Engine is the most widely
  deployed workload management solution in
  the industry and offers unmatched
  scalability. On top of a rich set of advanced
  scheduling capabilities and the flexibility to
  adapt to any computing environment and
  application workload, Oracle Grid Engine
  offers comprehensive support for the cloud
  computing model.
How to Install
 Via Webappl.blogspot.com
 http://webappl.blogspot.com/2011/05/ins
  tall-sun-grid-engine-sge-on-ubuntu.html
Install SGE on master node:
   Install SGE on master node:
    mpiuser@ub0:~$ sudo apt-get install
    gridengine-client gridengine-common
    gridengine-master gridengine-qmon
    gridengine-exec
    #remove gridengine-exec from the list if
    master node is not supposed to run jobs
    #during the installation, we need to set
    the cluster CELL name (such as
    „default‟)
Install SGE on other nodes:
 Install SGE on other nodes:
 mpiuser@ub1:~$ sudo apt-get install
  gridengine-client gridengine-exec

   The CELL name is set the same as that
    of the master node
Set SGE_ROOT and
SGE_CELL
   Set SGE_ROOT and SGE_CELL
    environment variables:
    $SGE_ROOT refers to the installation path
    of SGE
    $SGE_CELL is cell name which is „default‟
    on our machine
    Edit /etc/profile and /etc/bash.bachrc, add
    the following two lines
    export SGE_ROOT=/var/lib/gridengine
    #this is the path on our machines
    export SGE_CELL=default
    Source the script: source /etc/profile
Configure SGE with qmon
   Configure SGE with qmon (This section is
    modified from a note by Junjun Mao)
   Invoke qmon as superuser:
    mpiuser@ub0:~$ sudo qmon
   #On our machine, qmon failed to start due to
    missing fonts „-adobe-helvetica-…”
   # To solve the fonts problem:
    mpiuser@ub0:~$ sudo apt-get install xfs xfstt
    mpiuser@ub0:~$ sudo apt-get install t1-
    xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86-
    nonfree-syriac xfonts-75dpi xfonts-100dpi
    mpiuser@ub0:~$ sudo reboot #after reboot,
    the problem is gone
Configure hosts
 Configure hosts
 "Host Configuration" -> "Administration
  Host" -> Add master node and other
  administrative nodes
  "Host Configuration" -> "Submit Host" ->
  Add master node and other submit
  nodes
  "Host Configuration" -> "Execution Host"
  -> Add slave nodes
  ->Click on "Done" to finish
Configure the user
 Configure the user
 Add or delete users that are allowed to
  access SGE here. In this example, a user
  is added to an existing group and later this
  group will be allowed to submit jobs.
  Everything else is left as default values.
 "User Configuration" -> "Userset" ->
  Highlight userset "arusers" and click on
  "Modify" -> Input user name in
  "User/Group" field
  ->Click "Done" to finish
Configure the queue
   Configure the queue
    While Host Configuration deals what
    computing resources are available and
    User Configuration defines who have
    access to the resources, this Queue
    Control defines ways to connect hosts
    and users.
Queue Control
   "Queue Control" -> "Hosts" -> Confirm the execution
    hosts show up there.
    "Queue Control" -> "Cluster Queues" -> Click on
    "Add" -> Name the queue, add execution nodes to
    Hostlist;
    and
    "Use access" -> allow access to user group arusers;
    "General Configuration" -> Field "Slots" -> Raise the
    number to total CPU cores on slave nodes (ok to use
    a bigger number than actual CPU cores).
    "Queue Control" -> "Queue Instances" -> This is the
    place to manually assign hosts to queues, and
    control the state (active, suspend ...) of hosts.
Configure parallel environment
   Configure parallel environment
    "Queue Control" -> "Cluster Queues" -> Select a queue that will
    run parallel jobs -> Click on "Modify" -> "Parallel Environment" -
    > Click on icon "PE" below the right and left arrows -> Click on
    "Add" -> Name the PE, slots = 999, start_proc_args =
    $SGE_ROOT/mpi/startmpi.sh $pe_hostfile, stop_proc_args =
    $SGE_ROOT/mpi/stopmpi.sh, allocation_rule=$fill_up, check
    "Control slaves" to make this variable checked.
    Make sure the configured PE is loaded from "Available PE" to
    "Referenced PE".
    Confirm and close all config windows and open "Queue Control"
    -> "Cluster Queues" -> "Parallel Environment" again, the named
    PE should show up.
    Once created and linked to a queue, PE can be edited from
    "Queue Control" -> "PE" too.
Check whether sge hosts are
running properly
   Check whether sge hosts are running properly
    mpiuser@ub0:~$ qhost #it should list the system info from all
    nodes
    mpiuser@ub0:~$ qconf -sel #it should list the hostnames of
    nodes
    mpiuser@ub0:~$ qconf -sql #it should list the queues
    mpiuser@ub0:~$ ps aux | grep sge_qmaster | grep -v grep
    #check master daemon
    mpiuser@ub0:~$ ps aux | grep sge_execd | grep -v grep
    #check execute daemon
    mpiuser@ub1:~$ ps aux | grep sge_ execd | grep -v grep
    #check execute daemon
    #If sge_qmaster or sge_execd daemon is not running, try
    starting by service
    #mpiuser@ub1:~$ sudo service gridengine-master start
    #mpiuser@ub1:~$ sudo service gridengine-exec start
    …
    #Reboot node(s) if sge_qmaster or sge_execd fails to start
Run a test script
   Run a test script
    Make a script named „test‟ with content:
    #!/bin/bash
    ### Request Bourne shell as shell for job
    #$ -S /bin/bash
    ### Use current directory as working directory
    #$ -CWD
    ### Name the job:
    #$ -N test
    echo “Running environment:”
    env
    echo “=============================”
    ###end of script
Job Submission
   To submit the job: qsub test
    #a job id returned if successful
    Query the job status: qstat
    #If the job is running successfully, there
    will be two output files produced in the
    current working directory with name
    test.oXXX (the standard output) and
    test.eXXX (the standard error), where
    test is the job name and XXX is the job
    id.
Always check your logs
   Check log messages if error occurs
    mpiuser@ub0:~$ less
    /var/spool/gridengine/qmaster/messages
    #master node
    mpiuser@ub0:~$ less
    /var/spool/gridengine/execd/ub0/messag
    es #exec node
Possible Errors
   Question: My output file has a Warning: no
    access to tty (Bad file descriptor).Thus no
    job control in this shell.
    Answer: This warning is caused if you are
    using the tcsh or csh as shell for submitting
    job. It is safe to ignore this warning.
    Alternatively you can qsub -S /bin/bash to
    run your program in different shell or add a
    line of „#$ -S /bin/bash‟ in the job script.
Possible Errors
   Question: Master host failed to respond properly. Error message is “error: commlib
    error: access denied (client IP resolved to host name „ub0…‟. This is not identical to
    clients host name „ub0‟) error: unable to contact qmaster using port 6444 on host „ub0‟”
    Answer: Reboot the master node or install the SGE from source code on master node
    (Solutions not confirmed yet). It also could be due to that the utility of gethostname (full
    path is „/usr/lib/gridengine/gethostname‟ on our machines) returns a different hostname
    to that from running command „hostname -f‟. If this is the case (e.g., host having
    multiple network interfaces), create a file named „host_aliases‟ under
    „$SGE_ROOT/$SGE_CELL/common‟ and populate as follows,
    # cat host_aliases
    ub0 ub0.my.com ub0-grid
    ub1 ub1.my.com ub1-grid
    ub2 ub2.my.com ub2-grid
    ub3 ub3.my.com ub3-grid
    and then restart the gridengine daemon (see man page of sge_host_aliases for
    details). Check the aliases:
    mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0-grid
    mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0
    #both of them should return ub0
Sources
 http://manpages.ubuntu.com/manpages/
  /jaunty/man5/sge_conf.5.html
 http://webappl.blogspot.com/2011/05/ins
  tall-sun-grid-engine-sge-on-ubuntu.html
 http://pka.engr.ccny.cuny.edu/~jmao/nod
  e/49
 http://webappl.blogspot.com/2011/05/set
  ting-up-mpich2-cluster-with-ubuntu.html

Más contenido relacionado

Más de Dan Morrill

Más de Dan Morrill (13)

Using Regular Expressions in Grep
Using Regular Expressions in GrepUsing Regular Expressions in Grep
Using Regular Expressions in Grep
 
Understanding the security_organization
Understanding the security_organizationUnderstanding the security_organization
Understanding the security_organization
 
You should ask before copying that media
You should ask before copying that mediaYou should ask before copying that media
You should ask before copying that media
 
Cis 216 – shell scripting
Cis 216 – shell scriptingCis 216 – shell scripting
Cis 216 – shell scripting
 
Understanding advanced persistent threats (APT)
Understanding advanced persistent threats (APT)Understanding advanced persistent threats (APT)
Understanding advanced persistent threats (APT)
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
What is cloud computing
What is cloud computingWhat is cloud computing
What is cloud computing
 
Social Media Plan for CityU of Seattle
Social Media Plan for CityU of SeattleSocial Media Plan for CityU of Seattle
Social Media Plan for CityU of Seattle
 
BSIS Overview
BSIS OverviewBSIS Overview
BSIS Overview
 
Case Studies In Social Media Chinese
Case Studies In Social Media ChineseCase Studies In Social Media Chinese
Case Studies In Social Media Chinese
 
Case Studies In Social Media
Case Studies In Social MediaCase Studies In Social Media
Case Studies In Social Media
 
Turn On Tune In Step Out
Turn On Tune In Step OutTurn On Tune In Step Out
Turn On Tune In Step Out
 
Technology And The Future Of Management
Technology And The Future Of ManagementTechnology And The Future Of Management
Technology And The Future Of Management
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Último (20)

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 

Working with Oracle/Sun Grid Engine

  • 2.
  • 3. Sun/Oracle Grid Engine is:  A quick and easy way to set up a multi- cluster system using existing hardware  Oracle Grid Engine is the most widely deployed workload management solution in the industry and offers unmatched scalability. On top of a rich set of advanced scheduling capabilities and the flexibility to adapt to any computing environment and application workload, Oracle Grid Engine offers comprehensive support for the cloud computing model.
  • 4. How to Install  Via Webappl.blogspot.com  http://webappl.blogspot.com/2011/05/ins tall-sun-grid-engine-sge-on-ubuntu.html
  • 5. Install SGE on master node:  Install SGE on master node: mpiuser@ub0:~$ sudo apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon gridengine-exec #remove gridengine-exec from the list if master node is not supposed to run jobs #during the installation, we need to set the cluster CELL name (such as „default‟)
  • 6. Install SGE on other nodes:  Install SGE on other nodes:  mpiuser@ub1:~$ sudo apt-get install gridengine-client gridengine-exec  The CELL name is set the same as that of the master node
  • 7. Set SGE_ROOT and SGE_CELL  Set SGE_ROOT and SGE_CELL environment variables: $SGE_ROOT refers to the installation path of SGE $SGE_CELL is cell name which is „default‟ on our machine Edit /etc/profile and /etc/bash.bachrc, add the following two lines export SGE_ROOT=/var/lib/gridengine #this is the path on our machines export SGE_CELL=default Source the script: source /etc/profile
  • 8. Configure SGE with qmon  Configure SGE with qmon (This section is modified from a note by Junjun Mao)  Invoke qmon as superuser: mpiuser@ub0:~$ sudo qmon  #On our machine, qmon failed to start due to missing fonts „-adobe-helvetica-…”  # To solve the fonts problem: mpiuser@ub0:~$ sudo apt-get install xfs xfstt mpiuser@ub0:~$ sudo apt-get install t1- xfree86-nonfree ttf-xfree86-nonfree ttf-xfree86- nonfree-syriac xfonts-75dpi xfonts-100dpi mpiuser@ub0:~$ sudo reboot #after reboot, the problem is gone
  • 9. Configure hosts  Configure hosts  "Host Configuration" -> "Administration Host" -> Add master node and other administrative nodes "Host Configuration" -> "Submit Host" -> Add master node and other submit nodes "Host Configuration" -> "Execution Host" -> Add slave nodes ->Click on "Done" to finish
  • 10. Configure the user  Configure the user  Add or delete users that are allowed to access SGE here. In this example, a user is added to an existing group and later this group will be allowed to submit jobs. Everything else is left as default values.  "User Configuration" -> "Userset" -> Highlight userset "arusers" and click on "Modify" -> Input user name in "User/Group" field ->Click "Done" to finish
  • 11. Configure the queue  Configure the queue While Host Configuration deals what computing resources are available and User Configuration defines who have access to the resources, this Queue Control defines ways to connect hosts and users.
  • 12. Queue Control  "Queue Control" -> "Hosts" -> Confirm the execution hosts show up there. "Queue Control" -> "Cluster Queues" -> Click on "Add" -> Name the queue, add execution nodes to Hostlist; and "Use access" -> allow access to user group arusers; "General Configuration" -> Field "Slots" -> Raise the number to total CPU cores on slave nodes (ok to use a bigger number than actual CPU cores). "Queue Control" -> "Queue Instances" -> This is the place to manually assign hosts to queues, and control the state (active, suspend ...) of hosts.
  • 13. Configure parallel environment  Configure parallel environment "Queue Control" -> "Cluster Queues" -> Select a queue that will run parallel jobs -> Click on "Modify" -> "Parallel Environment" - > Click on icon "PE" below the right and left arrows -> Click on "Add" -> Name the PE, slots = 999, start_proc_args = $SGE_ROOT/mpi/startmpi.sh $pe_hostfile, stop_proc_args = $SGE_ROOT/mpi/stopmpi.sh, allocation_rule=$fill_up, check "Control slaves" to make this variable checked. Make sure the configured PE is loaded from "Available PE" to "Referenced PE". Confirm and close all config windows and open "Queue Control" -> "Cluster Queues" -> "Parallel Environment" again, the named PE should show up. Once created and linked to a queue, PE can be edited from "Queue Control" -> "PE" too.
  • 14. Check whether sge hosts are running properly  Check whether sge hosts are running properly mpiuser@ub0:~$ qhost #it should list the system info from all nodes mpiuser@ub0:~$ qconf -sel #it should list the hostnames of nodes mpiuser@ub0:~$ qconf -sql #it should list the queues mpiuser@ub0:~$ ps aux | grep sge_qmaster | grep -v grep #check master daemon mpiuser@ub0:~$ ps aux | grep sge_execd | grep -v grep #check execute daemon mpiuser@ub1:~$ ps aux | grep sge_ execd | grep -v grep #check execute daemon #If sge_qmaster or sge_execd daemon is not running, try starting by service #mpiuser@ub1:~$ sudo service gridengine-master start #mpiuser@ub1:~$ sudo service gridengine-exec start … #Reboot node(s) if sge_qmaster or sge_execd fails to start
  • 15. Run a test script  Run a test script Make a script named „test‟ with content: #!/bin/bash ### Request Bourne shell as shell for job #$ -S /bin/bash ### Use current directory as working directory #$ -CWD ### Name the job: #$ -N test echo “Running environment:” env echo “=============================” ###end of script
  • 16. Job Submission  To submit the job: qsub test #a job id returned if successful Query the job status: qstat #If the job is running successfully, there will be two output files produced in the current working directory with name test.oXXX (the standard output) and test.eXXX (the standard error), where test is the job name and XXX is the job id.
  • 17. Always check your logs  Check log messages if error occurs mpiuser@ub0:~$ less /var/spool/gridengine/qmaster/messages #master node mpiuser@ub0:~$ less /var/spool/gridengine/execd/ub0/messag es #exec node
  • 18. Possible Errors  Question: My output file has a Warning: no access to tty (Bad file descriptor).Thus no job control in this shell. Answer: This warning is caused if you are using the tcsh or csh as shell for submitting job. It is safe to ignore this warning. Alternatively you can qsub -S /bin/bash to run your program in different shell or add a line of „#$ -S /bin/bash‟ in the job script.
  • 19. Possible Errors  Question: Master host failed to respond properly. Error message is “error: commlib error: access denied (client IP resolved to host name „ub0…‟. This is not identical to clients host name „ub0‟) error: unable to contact qmaster using port 6444 on host „ub0‟” Answer: Reboot the master node or install the SGE from source code on master node (Solutions not confirmed yet). It also could be due to that the utility of gethostname (full path is „/usr/lib/gridengine/gethostname‟ on our machines) returns a different hostname to that from running command „hostname -f‟. If this is the case (e.g., host having multiple network interfaces), create a file named „host_aliases‟ under „$SGE_ROOT/$SGE_CELL/common‟ and populate as follows, # cat host_aliases ub0 ub0.my.com ub0-grid ub1 ub1.my.com ub1-grid ub2 ub2.my.com ub2-grid ub3 ub3.my.com ub3-grid and then restart the gridengine daemon (see man page of sge_host_aliases for details). Check the aliases: mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0-grid mpiuser@ub0:~$ /usr/lib/gridengine/gethostname -aname ub0 #both of them should return ub0
  • 20. Sources  http://manpages.ubuntu.com/manpages/ /jaunty/man5/sge_conf.5.html  http://webappl.blogspot.com/2011/05/ins tall-sun-grid-engine-sge-on-ubuntu.html  http://pka.engr.ccny.cuny.edu/~jmao/nod e/49  http://webappl.blogspot.com/2011/05/set ting-up-mpich2-cluster-with-ubuntu.html