Button to Fermilab Home Page
Button to NuMI-MINOS Home Page Menu Bar Nu Oscillation Image
Button to Project General Documentation Page Button to NuMI at Work Page Button to MINOS at Work Page Button to Contacts and Email Page Button to NuMI Notes Document Repository Page Button to Fermilab at Work Page
Header Base image

Using LSF to access the MINOS FNAL Batch Nodes


The lsf batch system from Platform Computing is used to submit and monitor batch jobs. All the batch nodes have access to AFS and the full range of ups products that are installed on FNALU. You can access the MINOS Offline software that is installed on FNALU. Instructions on setting up the software are available. LSF carries a copy of your current environment to the submitted job. This can be problematic if you are relying on UPS products. In particular, it is recommended that your script contain the lines:

unset SETUPS_DIR unset SETUP_UPS unset PRODUCTS if [ -f "/afs/fnal.gov/ups/etc/setups.sh" ] ; then source /afs/fnal.gov/ups/etc/setups.sh fi export MINOS_SETUP_DIR=/afs/fnal.gov/files/code/e875/general/minossoft/setup source $MINOS_SETUP_DIR/setup_minossoft_FNALU.sh -r snapshot

Use .csh if you are using tcsh as your shell. Replace "snapshot" with a particular release name if you want something other than the latest snapshot. Note that the development release at FNAL rebuilds EVERY night at about 10pm so if your batch job is running at that time it will probably crash. To avoid this for long running jobs you should use a frozen or snapshot release.

Batch jobs should not compile code in general. Attempting to do so can cause conflict if parallel jobs are run or if the code is compiled on a 2.6 kernel machine. In order to retain maximum compatibility and consistency one should take the following steps for test releases:
Code should always be built on a 2.4 kernel machine (e.g. minos11) once before any processing is done .
The symlinks made (once):

for subdir in bin lib tmp ; do cd $SRT_PRIVATE_CONTEXT/${subdir} ln -s Linux2.4-GCC_3_4 Linux2.6-GCC_3_4 ln -s Linux2.4-GCC_3_4-maxopt Linux2.6-GCC_3_4-maxopt done



Some of the MINOS cluster nodes (minos14-minos25) have been configured to allow running LSF batch jobs; these are accessible only from the minos (1 day CPU) queue [2008-02: this is currently disabled]. In addition there are 30 Linux batch nodes available in the general FNALU cluster (flxb01-flxb30). Nodes flxi06, flxb11, flxb35 are running 64-bit installations without compatibility libraries and should be avoided; see the instructions below on how to do so. The Linux nodes flxi02 and flxi03 are for interactive use only and long running jobs will be killed on these machines, but are acceptable machines for submitting LSF jobs.
 
To use lsf you need to

setup lsf

The following batch queues exist for the Linux nodes:

    QUEUE_NAME      PRIO STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN  SUSP 
    30min            10  Open:Active       -    -    1    -     0     0     0     0
    4hr               8  Open:Active       -    -    1    -     0     0     0     0
    12hr              6  Open:Active       -    -    1    -    68    53    15     0
    1day              4  Open:Active       -    -    1    -    73    39    34     0
    minos             4  Open:Active       -    -    1    -    20     0    20     0
    1day_ex           4  Open:Active       -    4    1    -     0     0     0     0
    4day              2  Open:Active       -    5    1    -     0     0     0     0
    8day              1  Open:Active       -    2    1    -     0     0     0     0

To submit a batch job to one of the Linux nodes in the 4 hour queue (excluding some nodes):

export XCLUDE="hname!=flxi06 & hname!=flxb10 & hname!=flxb11 & hname!=flxb35"
bsub -q 4hr -R "linux24 & ${XCLUDE}" myscript.csh

This will run the script myscript.csh in the 4hr queue on one of the Linux nodes with a 2.4 kernel installed [Note: it appears that due to a misconfiguration this does not exclude 2.6 kernel nodes, but does restrict jobs to Linux nodes].


 
The standard output from the batch job will be emailed back to you. Any other ntuples etc will be written to the location you specify. If the output is large and you don't want to have it emailed to you then you can use the -o option to specify an alternative location. In addition if you use -N then you will get email telling you the job has finished.

e.g. bsub -q 4hr -R linux24 -o $HOME/out.log -N "loon -bq MyMacro.C dcache:dcap://fndca1.fnal.gov:
24125/pnfs/fnal.gov/usr/minos/caldet_data/2002-09/C00031440_0000.mdaq.root"


Other useful commands - do man lsf to get a complete list and man < command > to get more info on each command:

  • bpeek < jobid > - view contents of output as it is written to standard out for JOBID jobid
  • bjobs - see what jobs are running
  • bqueues - see what queues are available
  • bmod < jobid > - modify submitted job with JOBID jobid
  • bkill < jobid > - kill job with JOBID jobid

Send suggestions or comments to - The Pagemaster

Security, Privacy,Legal

Fermi National
	    Accelerator Laboratory