Back to Documentation page

Tutorial: Getting Started

1. SERIAL JOBS:
~~~~~~~~~~~~~~~

    i.  Running

        Serial jobs have to be started via a PBS script. Assuming
        that the program called myprogram is in the directory
        ~/mycode this is what the script will look like.

        #PBS -l nodes=1:ppn=1
        #PBS -N myjob15
        #PBS -j oe
        cd ~/mycode
        ./myprogram

	-N myjob15 specifies the name of the job will be myjob15
	-l nodes=1:ppn=1 specified that the job will use 1 node and
	   that there is 1 processor per node.

    ii. Debugging

	To save master from CPU and I/O intensive applications, debug
        your code on one of the compute nodes.  This can be done by
	starting an interactive PBS session with

	qsub -I.

	For example, to start a interactive PBS session for a small
	job using 1 node and 1 processor use

	qsub -q small -I -l nodes=1:ppn=1.

	PBS will give you a shell on one of the compute nodes, but
	without using rsh or rlogin, thus saving resources.  Note
	that PBS allocates and reserves the node for you!

    Check the qsub man page for more information.


2. PARALLEL JOBS:
~~~~~~~~~~~~~~~~~

    Parallel jobs are run and debuged through PBS as described in
    Section 1.  However, parallel jobs using MPI should be run using
    mpiexec.

    Mpiexec uses the task manager library of PBS to spawn copies of
    the executable on the nodes in a PBS allocation.  Mpiexec helps
    clean up the slave MPI processes when the job is aborted on the
    master node.  Another benefit is that resources used by the
    spawned processes are accounted correctly with mpiexec, and
    reported in the PBS logs.

    To use mpiexec, place something like this in your PBS script

    #PBS -l nodes=78:ppn=2
    #PBS -N myjob15
    #PBS -j oe
    cd ~/mycode
    mpiexec [-n cpu] [-comm=type] program

    where cpu is _optional_ and should be inherited from PBS -l
    nodes=, but can be useful for debugging purposes.  type should
    match the type of program: 'p4' or 'none'.  type states whether
    the program is an mpi program or not.

    To test mpiexec try the following (using interactive PBS session):

        [joseph@master joseph]$ qsub -q small -I -l nodes=5:ppn=2
        qsub: waiting for job 1175.master to start

    At this point you have to wait a few seconds until PBS starts
    to run your interactive session.  To see what exactly what is
    going on open a new master xterminal and type

        [joseph@master joseph]$ qstat -an

        master:
                                                                    Req'd  Req'd   Elap
        Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
        --------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
        1175.master     joseph   small    STDIN         --    5  --    --    --  Q   --
            --
        [joseph@master joseph]$ qstat -an

    Here you can see that master has queued your interactive session and
    that the reason qsub is waiting is because PBS has not yet run your
    job.  Going back to your interactive terminal you wait just a moment
    longer until PBS allocates the back-end nodes for your job.  Once it
    does this qsub will give you a prompt on one of the compute nodes (the
    one that controls all your other compute nodes reserved).  This looks
    like this:

        [joseph@master joseph]$ qsub -q small -I -l nodes=5:ppn=2
        qsub: waiting for job 1175.master to start
        qsub: job 1175.master ready

        [joseph@a9 ~]$

    Going back to your other xterminal where you typed qstat you can type
    qstat again to see exactly which nodes have been allocated to your
    interactive session.  This might look something like this:

        [joseph@master joseph]$ qstat -an

        master:
                                                                    Req'd  Req'd   Elap
        Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
        --------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
        1175.master     joseph   small    STDIN        5341   5  --    --    --  R   --
            a9/1+a9/0+a8/1+a8/0+a7/1+a7/0+a6/1+a6/0+a5/1+a5/0
        [joseph@master joseph]$

    Now that you see the nodes and CPU's your have been allocated.  To
    test out mpiexec do the following:

        [joseph@a9 ~]$ mpiexec -comm=none hostname
        a9
        a9
        a8
        a6
        a7
        a7
        a8
        a5
        a5
        a6
        [joseph@a9 ~]$

    That is it!  Remember that the point of this example was mpiexec.
    I thought I would try to squeeze an interactive PBS session (qsub -I)
    in for fun!  Anyway, everything that is amazing about mpiexec happens
    behind the scenes: cleanup and accounting.  But combined with an
    interactive PBS session it can help you to debug your parallel code.
    See the Njal website for more info on this.

    By default it assumes that the program will use both processors on
    the node. If you want to run a single instance of the program
    on each node you need to add -ppn 1 to the list of options.



Back to the top of the page