Torque
Torque allows users to queue their jobs and schedules them for completion. Each command has its own man page. The commands are outlined below.
|
Command
|
Description
|
|
qsub
|
Submit job to the queue
|
|
qdel
|
Remove job from the queue
|
|
qhold
|
Place job on hold
|
|
qrls
|
Release hold on the job
|
|
qstat
|
Show the status of the queued jobs
|
To run a batch job on Njal you will need to submit a PBS script using qsub. For example,
for a simple serial job (non-MPI) the PBS script might look like
#PBS -l nodes=2:ppn=2
#PBS -N myjob
#PBS -j oe
# Send me mail on job start, job end and if job aborts
#PBS -M mymail.account@sunysb.edu
#PBS -m bea
cd ~/mycode
./myprogram
This will start a job called myjob who's executable is located in the ~/mycode path and named myprogram. This job will run on 2 nodes using 2 processors per node (ppn). For instruction on starting parallel (MPI) batch jobs see the mpiexec section.
qsub -I starts an interactive job and this is useful for debugging your code.
For example, to start a interactive session for a small
job using 1 node and 1 processor use
qsub -q small -I -l nodes=1:ppn=1.
Torque will give you a shell on one of the compute nodes, but
without using rsh or rlogin; saving resources and most importantly
obeying Torque's system policies for node allocation!
Njal has both Myrinet and Ethernet nodes. To specify to that you want only nodes with Myrinet you use the Mryinet attribute (mryi) on the command line. For example,
qub -I -l nodes=40:myri:ppn=2
will allocate 80 CPUs on Myrinet nodes.
For more information on Torque see the Torque website.
|