happily stolen from:
https://wiki.rocksclusters.org/wiki/index.php/Sun_GridEngine
cause I keep forgetting it...
Add Frontend as a SGE Execution Host in Rocks
To setup the frontend node to also be a SGE execution host which queued jobs can be run on (like the compute nodes), do the following:
[edit]
Quick Setup
# cd /opt/gridengine # ./install_execd (accept all of the default answers) # qconf -mq all.q (if needed, adjust the number of slots for [frontend.local=4] and other parameters) # /etc/init.d/sgemaster.frontend stop # /etc/init.d/sgemaster.frontend start # /etc/init.d/sgeexecd.frontend stop # /etc/init.d/sgeexecd.frontend start
[edit]
Detailed Setup
1. As root, make sure$SGE_ROOT
, etc. are setup correctly on the frontend:
# env | grep SGEIt should return back something like:
SGE_CELL=default SGE_ARCH=lx26-amd64 SGE_EXECD_PORT=537 SGE_QMASTER_PORT=536 SGE_ROOT=/opt/gridengineIf not, source the file
/etc/profile.d/sge-binaries.[c]sh
or check if the SGE Roll is properly installed and enabled:
# rocks list roll NAME VERSION ARCH ENABLED sge: 5.2 x86_64 yes
2. Run the
install_execd
script to setup the frontend as a SGE execution host:
# cd $SGE_ROOT # ./install_execdAccept all of the default answers as suggested by the script.
- NOTE: For the following examples below, the text
should be substituted with the actual "short hostname" of your frontend (as reported by the commandhostname -s
).
hostname
on your frontend returns back the "FQDN long hostname" of:
# hostname mycluster.mydomain.orgthen
hostname -s
should return back just:
# hostname -s mycluster
3. Verify that the number of job slots for the frontend is equal to the number of physical processors/cores on your frontend that you wish to make available for queued jobs by checking the value of the
slots
parameter of the queue configuration for all.q
:
# qconf -sq all.q | grep slots slots 1,[compute-0-0.local=4],[The.local=4]
[.local=4]
means that SGE can run up
to 4 jobs on the frontend. Be aware that since the frontend is
normally used for other tasks besides running compute jobs, it is
recommended that not all the installed physical processors/cores on the
frontend be available to be scheduled by SGE to avoid overloading the
frontend.
For example, on a 4-core frontend, to configure SGE to use only up to 3 of the 4 cores, you can modify the slots for
.local
from 4 to 3 by typing:
# qconf -mattr queue slots '[If there are additional queues besides the default.local=3]' all.q
all.q
one, repeat the above for each queue.
Read
"man queue_conf"
for a list of resource limit parameters such as s_cpu
, h_cpu
, s_vmem
, and h_vmem
that can be adjusted to prevent jobs from overloading the frontend.
- NOTE: For Rocks 5.2 or older, the frontend may have been default configured during installation with only 1 job slot (
[
) in the default.local=1] all.q
queue, which will only allow up to 1 queued job to run on the frontend. To check the value of theslots
parameter of the queue configuration forall.q
, type:
# qconf -sq all.q | grep slots slots 1,[compute-0-0.local=4],[If needed, modify the slots for.local=1]
.local
from 1 to 4 (or up to the maximum number of physical processors/cores on your frontend that you wish to use) by typing:
# qconf -mattr queue slots '[.local=4]' all.q
- NOTE: For Rocks 5.3 or older, create the file
/opt/gridengine/default/common/host_aliases
to contain both the .local hostname and the FQDN long hostname of your frontend:
# vi $SGE_ROOT/default/common/host_aliases
.local .mydomain.org
- NOTE: For Rocks 5.3 or older, edit the file
/opt/gridengine/default/common/act_qmaster
to contain the .local hostname of your frontend:
# vi $SGE_ROOT/default/common/act_qmaster
.local
- NOTE: For Rocks 5.3 or older, edit the file
/etc/init.d/sgemaster.
:
# vi /etc/init.d/sgemaster.and comment out the line:
/bin/hostname --fqdn > $SGE_ROOT/default/common/act_qmasterby inserting a
#
character at the beginning, so it becomes:
#/bin/hostname --fqdn > $SGE_ROOT/default/common/act_qmasterin order to prevent the file
/opt/gridengine/default/common/act_qmaster
from getting overwritten with incorrect data every time sgemaster.
is run during bootup.
4. Restart both qmaster and execd for SGE on the frontend:
# /etc/init.d/sgemaster.stop # /etc/init.d/sgemaster. start # /etc/init.d/sgeexecd. stop # /etc/init.d/sgeexecd. start
And everything will start working. :)