Showing posts with label rocks. Show all posts
Showing posts with label rocks. Show all posts

Friday, February 7, 2014

adding the head node of rocks as compute node

happily stolen from:


https://wiki.rocksclusters.org/wiki/index.php/Sun_GridEngine

cause I keep forgetting it...


 Add Frontend as a SGE Execution Host in Rocks

To setup the frontend node to also be a SGE execution host which queued jobs can be run on (like the compute nodes), do the following:
[edit]

Quick Setup

# cd /opt/gridengine
# ./install_execd    (accept all of the default answers)
# qconf -mq all.q    (if needed, adjust the number of slots for [frontend.local=4] and other parameters)
# /etc/init.d/sgemaster.frontend stop
# /etc/init.d/sgemaster.frontend start
# /etc/init.d/sgeexecd.frontend stop
# /etc/init.d/sgeexecd.frontend start
[edit]

Detailed Setup

1. As root, make sure $SGE_ROOT, etc. are setup correctly on the frontend:
# env | grep SGE
It should return back something like:
SGE_CELL=default
SGE_ARCH=lx26-amd64
SGE_EXECD_PORT=537
SGE_QMASTER_PORT=536
SGE_ROOT=/opt/gridengine
If not, source the file /etc/profile.d/sge-binaries.[c]sh or check if the SGE Roll is properly installed and enabled:
# rocks list roll
NAME          VERSION ARCH   ENABLED
sge:          5.2     x86_64 yes

2. Run the install_execd script to setup the frontend as a SGE execution host:
# cd $SGE_ROOT
# ./install_execd 
Accept all of the default answers as suggested by the script.


  • NOTE: For the following examples below, the text should be substituted with the actual "short hostname" of your frontend (as reported by the command hostname -s).
For example, if running the command hostname on your frontend returns back the "FQDN long hostname" of:
# hostname
mycluster.mydomain.org
then hostname -s should return back just:
# hostname -s
mycluster

3. Verify that the number of job slots for the frontend is equal to the number of physical processors/cores on your frontend that you wish to make available for queued jobs by checking the value of the slots parameter of the queue configuration for all.q:
# qconf -sq all.q | grep slots
slots                 1,[compute-0-0.local=4],[.local=4]
The [.local=4] means that SGE can run up to 4 jobs on the frontend. Be aware that since the frontend is normally used for other tasks besides running compute jobs, it is recommended that not all the installed physical processors/cores on the frontend be available to be scheduled by SGE to avoid overloading the frontend.
For example, on a 4-core frontend, to configure SGE to use only up to 3 of the 4 cores, you can modify the slots for .local from 4 to 3 by typing:
# qconf -mattr queue slots '[.local=3]' all.q
If there are additional queues besides the default all.q one, repeat the above for each queue.
Read "man queue_conf" for a list of resource limit parameters such as s_cpu, h_cpu, s_vmem, and h_vmem that can be adjusted to prevent jobs from overloading the frontend.


  • NOTE: For Rocks 5.2 or older, the frontend may have been default configured during installation with only 1 job slot ([.local=1]) in the default all.q queue, which will only allow up to 1 queued job to run on the frontend. To check the value of the slots parameter of the queue configuration for all.q, type:
# qconf -sq all.q | grep slots
slots                 1,[compute-0-0.local=4],[.local=1] 
If needed, modify the slots for .local from 1 to 4 (or up to the maximum number of physical processors/cores on your frontend that you wish to use) by typing:
# qconf -mattr queue slots '[.local=4]' all.q


  • NOTE: For Rocks 5.3 or older, create the file /opt/gridengine/default/common/host_aliases to contain both the .local hostname and the FQDN long hostname of your frontend:
# vi $SGE_ROOT/default/common/host_aliases
.local .mydomain.org


  • NOTE: For Rocks 5.3 or older, edit the file /opt/gridengine/default/common/act_qmaster to contain the .local hostname of your frontend:
# vi $SGE_ROOT/default/common/act_qmaster
.local


  • NOTE: For Rocks 5.3 or older, edit the file /etc/init.d/sgemaster.:
# vi /etc/init.d/sgemaster.
and comment out the line:
/bin/hostname --fqdn > $SGE_ROOT/default/common/act_qmaster
by inserting a # character at the beginning, so it becomes:
#/bin/hostname --fqdn > $SGE_ROOT/default/common/act_qmaster
in order to prevent the file /opt/gridengine/default/common/act_qmaster from getting overwritten with incorrect data every time sgemaster. is run during bootup.

4. Restart both qmaster and execd for SGE on the frontend:
# /etc/init.d/sgemaster. stop
# /etc/init.d/sgemaster. start
# /etc/init.d/sgeexecd. stop
# /etc/init.d/sgeexecd. start


And everything will start working. :)

Thursday, July 18, 2013

X-Forwarding in qlogin on rocks linux > 5

to enable X forwarding from your nodes to your original remote session:

In rocks 4.3,I get the output
# qconf -sconf | grep qlogin
qlogin_command               /opt/gridengine/bin/rocks-qlogin.sh
qlogin_daemon                /usr/sbin/sshd -i
But in rocks 5.3,I get
# qconf -sconf | grep qlogin
qlogin_command               builtin
qlogin_daemon                builtin
So I changed it in rocks 5.3
# qconf -mconf global
qlogin_command               /opt/gridengine/bin/rocks-qlogin.sh
qlogin_daemon                /usr/sbin/sshd -i

And modify the script:

/opt/gridengine/bin/rocks-qlogin.sh

to include this part

/usr/bin/ssh -Y -p $PORT $HOST
The Y basically allows none secured X authentification.
 

Wednesday, October 10, 2012

submitting all files in a directory to qsub

recently we needed a small script to submit all files in a directory to a script, executed by qsub.

#!/bin/bash if [ $# -lt 2 ]; then echo Missing arguments... echo "Use: process.sh 'input dir' 'output dir'" exit 1 fi if [ -d $1 ]; then for file in `ls $1` do qsub -cwd -p -512 run.sh $1$file $2 done else echo "Missing or incorrect input directory..." exit 1 fi

and the actual run.sh script is just a small java program, which takes our two parameters.

java -Xmx1024m -jar $HOME/data/jars/DataExtractor-0.1.jar $1 $2

Thursday, May 3, 2012

Basic Rocks linux stuff.

Some basic Rocks Linux Cluster stuff... just some stuff I need to remember and always forget:

list all host interfaces:


rocks list host interface

changing the mac address of a network card: 


rocks set host interface mac HOSTNAME iface=eth1 mac=00:00:00:00:00:02



please be a aware that you also have to remove the mac address definition in the following file:


/etc/sysconfig/networking/devices/ifcfg-eth*


you should be able to just uncomment the mac address line, without any ill effects and this will simplify this process next time.


and possible in the dhcpd.conf file


vim /etc/dhcpd.conf


in case you modify the eth0 interface. To ensure that the dhcpd configuration still works as supposed to.





Friday, November 4, 2011

rocks linux - virtual hosts with apache

currently I have a cluster setup in the office with about 5 nodes and a bit over 50 cpus, so this morning I decided to rebuild some of the nodes, since I needed to make some changes to the cluster.

20 minutes into this procedure, I keep getting odd error messages, like file 'update.img' not found and so.

So while backtracking the latest changes I did to the server. I realized that I setup 20 virtual hosts in the apache configuration, which ended up screwing with the kickstart configuration. It turns out that the order of the virtual host seems to quite important and that the kickstart configuration always needs to be first, and than you have to define the other virtual hosts.

Example of a virtual host configuration, which allows the kickstart configuration to work,


vim /etc/httpd/conf.d/rocks.conf


actual file:


<IfModule mod_mime.c>
AddHandler cgi-script .cgi
</IfModule>

UseCanonicalName Off


DirectoryIndex index.cgi

<Directory "/var/www/html">
Options FollowSymLinks Indexes ExecCGI
AllowOverride None
Order allow,deny
Allow from all
</Directory>

<Directory "/var/www/html/proc">
Options FollowSymLinks Indexes ExecCGI
AllowOverride None
Order deny,allow
Allow from 10.1.0.0/255.255.0.0
Allow from 127.0.0.1
Deny from all
</Directory>

<Directory "/var/www/html/pxelinux">
Options FollowSymLinks Indexes ExecCGI
AllowOverride None
Order deny,allow
Allow from 10.1.0.0/255.255.0.0
Allow from 127.0.0.1
Deny from all
</Directory>

<VirtualHost *:80>
ServerName kickstart.host.com
DocumentRoot "/var/www/html"
</VirtualHost>

<VirtualHost *:80>
ServerName virtual.host.com
DocumentRoot "/var/www/cts/html"
</VirtualHost>

Thursday, October 27, 2011

rocks 5.4 cluster - installing subversion

sometimes the most trivial things turn out to be rather interessting.

For example my latest drama with rocks was that I just needed subversion, to convert one of my BinBase images to a development image for BinBase.

Little did I know, that in the rocks 5.4 version, the installer fails to provide you with the base repository. Quality control anyone?

Anyway, to get your yum to work, as exspected. Just open your


/etc/yum.conf


file and add the following section



[base]
name=CentOS-$releasever - Base
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os



at the end of the file and than you can install subversion like always using:



yum -y install subversion

Thursday, March 4, 2010

rocks linux cluster - adding a new parallel environment

by default rocks ships with a couple of environments, which execute stuff on different nodes. But sometimes you just want to have a node all to your self and take over all it's slots.

Todo this you can just create a new environment and which gives you a defined number of cpus for a specified job.


  1. create a file which describes the paralell environment like this

  2. pe_name threaded
    slots 999
    user_lists NONE
    xuser_lists NONE
    start_proc_args /bin/true
    stop_proc_args /bin/true
    allocation_rule $pe_slots
    control_slaves FALSE
    job_is_first_task TRUE
    urgency_slots min
    accounting_summary FALSE

  3. register this on the head node

  4. qconf -Ap file.txt

  5. add it to the list of available envionments

  6. qconf -mq all.q
    pe_list make mpich mpi orte threaded

  7. test it with qlogin

  8. qlogin -pe threaded 4

Monday, March 1, 2010

scala/groovy on rocks linux

well since there is not scala/groovy roll for rocks we need to install it the traditional way.

  • go into the directory /shared/apps on the frontend
  • if apps doesn't exist create it
  • copy your scala/groovy tgz there
  • gunzip and untar it
  • edit your extend-compute.xml as shown here
  • add a new file modification section like this


<file name="/etc/profile" mode="append">

GROOVY_HOME=/share/apps/groovy
SCALA_HOME=/share/apps/scala

export GROOVY_HOME
export SCALA_HOME

PATH=$GROOVY_HOME/bin:$PATH
PATH=$SCALA_HOME/bin:$PATH

export PATH

</file>


  • rebuild your dist as shown here
  • reinstall you nodes as shown here

Wednesday, February 24, 2010

rocks linux cluster - mounting an nfs share on all nodes

after setting up the latest cluster I tried to provide to all nodes a couple of nfs shares, since user demanded this.

Well in rocks linux it's rather simple, once you understand the concept behind.

So a step to step tutorial.

  • go to the profile directory
  • cd /export/rocks/install/site-profiles/5.3/nodes/
  • make a copy of the skeleton file
  • cp skeleton.xml extend-compute.xml
  • edit file to tell it that we need to create a directory and add a line to the fstab. The right place for this is in the post section


    mkdir -p /mnt/share

    <file name="/etc/fstab" mode="append">
    server:/mount /mnt/share nfs defaults 0 0
    </file>

  • change back to the main install dir
  • cd /export/rocks/install
  • rebuild rocks distibution
  • rocks create distro
  • rebuild nodes
  • ssh compute-0-0 '/boot/kickstart/cluster-kickstart'

congratulations if you did everything right your node should now boot up and have a directory mounted.