Torque Resource Manager with Maui Cluster Scheduler on Ubuntu 12.04 LTS

Ubuntu Packages Prerequisites

apt-get install g++ gpp kcc
apt-get install libssl-dev
apt-get install libxml2-dev
apt-get install libtool
apt-get install openssh-server

Torque & Maui Downloads

Torque Resource Manager:- http://www.adaptivecomputing.com/support/download-center/torque-download/
Maui Cluster Scheduler (Registration required):- http://www.adaptivecomputing.com/support/download-center/maui-cluster-scheduler/


Node Details

Master Node + Compute Node -> n1.test.com
Compute Node -> n2.test.com

Torque Installation on Master node (n1.test.com)

./configure --prefix=/opt/torque --with-server-home=/opt/torque/spool --enable-server \
--enable-clients --with-scp --enable-mom
make
make install

#Export the torque libraries
echo “/opt/torque/lib” > /etc/ld.so.conf.d/torque.conf
ldconfig

make packages

./torque-package-server-linux-x86_64.sh –install
./torque-package-clients-linux-x86_64.sh  --install
./torque-package-mom-linux-x86_64.sh –install
./torque-package-devel-linux-x86_64.sh –install
./torque-package-doc-linux-x86_64.sh –install

Install only below if Master node doesn't act as compute node

./torque-package-server-linux-x86_64.sh –install
./torque-package-clients-linux-x86_64.sh  --install
./torque-package-devel-linux-x86_64.sh –install
./torque-package-doc-linux-x86_64.sh –install

Export Environment Path

export PATH=$PATH:/opt/torque/sbin:/opt/torque/bin

Initiate serverdb

pbs_server -t create

Enable TORQUE services

cd contrib/init.d
cp debian.pbs_mom /etc/init.d/pbs_mom
cp debian.pbs_server /etc/init.d/pbs_server
cp debian.pbs_trqauthd /etc/init.d/pbs_trqauthd

update-rc.d pbs_mom defaults
update-rc.d trqauthd default
update-rc.d pbs_server default

Queue Configuration

qmgr -c "set server scheduling=true"
qmgr -c "create queue batch queue_type=execution"
qmgr -c "set queue batch started=true"
qmgr -c "set queue batch enabled=true"
qmgr -c "set queue batch resources_default.nodes=4"
qmgr -c "set queue batch resources_default.walltime=3600"
qmgr -c "set server default_queue=batch"
qmgr -c "set server keep_completed = 0"
qmgr -c "set queue batch resources_default.ncpus = 1"
qmgr -c "set queue batch resources_default.nodect = 1"
qmgr -c "set queue batch resources_default.nodes = 1"

Optional Configurations

//qmgr -c "set queue batch max_running = 4"
//qmgr -c "set queue batch resources_max.ncpus = 4"
//qmgr -c "set queue batch resources_min.ncpus = 1"
//qmgr -c "set queue batch resources_max.nodes = 2"

Add the below lines if Master node act as Compute node

cat /opt/torque/spool/mom_priv/config
$pbsserver n1 # note: hostname running pbs_server
$logevent 255 # bitmap of which events to log

Maui Installation

./configure --prefix=/opt/maui --with-pbs=/opt/torque --with-spooldir=/opt/maui/spool
make
make install

Export Maui Environment Path

export PATH=$PATH:/opt/maui/sbin:/opt/maui/bin

Enable Maui services

Refer https://rravikumar.wordpress.com/2013/08/02/maui-init-d-script-for-ubuntu/
update-rc.d maui defaults

Torque Installation on Compute node (n2.test.com)

Copy the client & mom package scripts to a NFS shared folder and install them as show below

./torque-package-clients-linux-x86_64.sh  --install
./torque-package-mom-linux-x86_64.sh –install

Enable TORQUE services

cp debian.pbs_mom /etc/init.d/pbs_mom
cp debian.pbs_trqauthd /etc/init.d/pbs_trqauthd

update-rc.d trqauthd default
update-rc.d pbs_mom defaults

Adding Compute Nodes to PBS

On Master Node execute the below command to add n2.test.com to PBS
qmgr -c 'create node n2.test.com np=[ncpus]' 
Advertisements

4 thoughts on “Torque Resource Manager with Maui Cluster Scheduler on Ubuntu 12.04 LTS

  1. Hello!
    I work at the Institute of Biochemistry and Genetics.
    And it has now become necessary server on which the installed Torque / Maui scheduling systems.I decided to try it myself installed systems.
    There is a server ubuntu 12. Where 5 cores and 2 GB of RAM.I want to put it on Torque / Maui scheduling systems.
    Could you please help me in the name of science 🙂 ?
    It is necessary for processing the nucleotide sequences of the genes.
    ———————————
    I started doing everything as you wrote:
    With the program Putty I connected to the server.
    Then introduced the first command:
    apt-get install g+ + gpp kcc
    I asked the system:
    Blah blah blah
    ………..
    After this operation, 82.0 MB of additional disk space will be used.Do you want to continue [Y / n]?
    I entered the “Y” and an installed this package.
    After that I also install the remaining packages.
    apt-get install libssl-dev
    apt-get install libxml2-dev
    apt-get install libtool
    apt-get install openssh-server
    —————————–

    Then I can not understand what to do next.
    I I enter in the program Putty command:
    . / configure – prefix = / opt / torque – with-server-home = / opt / torque / spool – enable-server \ – enable-clients – with-scp – enable-mom
    A system reports to me:
    -bash:. / configure: No such file or directory

    You probably missed the simple steps.
    However, I am a newbie and do not understand. Could you tell what to do next?

  2. Thanks Ravikumar, instructions worked great, however if the master node is to be used as a compute node too, I had to make sure the /etc/hosts file maps the hostname to 127.0.0.1, not ubuntu’s default of 127.0.1.1.

    My working /etc/hosts has something like this:

    127.0.0.1 localhost
    127.0.0.1 myhostname myhostname.myserver.ca

    — wondering if you had to do the same thing?

  3. Thanks Ravikumar,

    I am following your instruction for my ubuntu cluster. I got it right upto the pbs_server create step. When I invoke the qmgr, I get this error:

    Unable to communicate with stokes(127.0.0.1)
    Cannot connect to specified server host ‘stokes’.
    qmgr: cannot connect to server (errno=111) Connection refused

    Here stokes is my hostname. My etc/hosts file looks like this:

    127.0.0.1 localhost
    127.0.0.1 stokes master-node

    192.168.1.10 master-node
    192.168.1.11 compute-node-11 node001
    192.168.1.12 compute-node-12 node002
    192.168.1.13 compute-node-13 node003

    Could you or anyone suggest what may be wrong?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s