Condor at MINOS

A guide to running jobs on the MINOS cluster and the FNAL grid nodes


[ Monitoring ]


Table of Contents


IMPORTANT FILESYSTEM INFORMATION

The BlueArc system that serves /minos/data, /minos/app will overload if accessed directly from Grid worker nodes. Thus:

All read and write access to /minos/data and /minos/app from Condor jobs must go through the cpn or ifdhc commands.
Analysis jobs MUST NOT read or write directly from these areas. If you are using minos_jobsub, here's one possible approach for your job script:
#!/bin/sh

CPN=/grid/fermiapp/common/tools/cpn

mkdir ${_CONDOR_SCRATCH_DIR}/input
mkdir ${_CONDOR_SCRATCH_DIR}/output

${CPN} /minos/data/users/yourname/some_input_file.root ${_CONDOR_SCRATCH_DIR}/input

export INPUTFILES=${_CONDOR_SCRATCH_DIR}/input/*
export OUTPUTFILE=${_CONDOR_SCRATCH_DIR}/output/my_output.root

loon -qb MyScript.C("$INPUTFILES","$OUTPUTFILE")

${CPN} ${OUTPUTFILE} /minos/data/some_shared_directory/
If you use the minos_jobsub -dTAG argument, note that the temporary directory $CONDOR_DIR_TAG lives in affected space, so your output must be created locally on the worker node in $_CONDOR_SCRATCH_DIR and then copied to $CONDOR_DIR_TAG at the end of the job. Cumbersome, I know. Thank the BlueArc system's poor scalability for this. If I get some free time (or if someone wants to volunteer?), I might reimplement the -d feature to put the staging directory on local disk during the job and then copy it back in a protected way at job's end.


Overview

Condor is a batch computing system. MINOS users can submit analysis jobs to various computing clusters via Condor. The computing that MINOS has available through Condor consists of

  1. 24 cores with AFS at MINOS
  2. 8000 cores without AFS on the general purpose Fermigrid
  3. 8000 cores of opportunistic cores in Fermigrid ( mainly CDF )
The first item in the list is the MINOS cluster. The rest are referred to in these pages as the grid.


Getting set up to use Condor

Add the following line to the bottom of your .bashrc or .cshrc login script:

bash users:
    source /afs/fnal.gov/files/code/e875/general/condor/scripts/setup_minos_condor.sh

tcsh users:
    source /afs/fnal.gov/files/code/e875/general/condor/scripts/setup_minos_condor.csh

That's all you need to do to submit to the MINOS cluster. To submit to the grid, you'll also need to set up a proxy to authenticate you on the grid nodes. You can still run on the MINOS cluster without setting up a proxy. Other than this extra proxy setup needed for grid running, the grid and the MINOS cluster should appear more or less the same from the user's perspective.

You can interact with Condor from any MINOS cluster machine. In the past, Condor interaction was limited to minos25.fnal.gov. This is no longer the case, and users are discouraged from ever logging in to minos25.fnal.gov.


Summary of commands

Here is a terse summary of the most important MINOS Condor commands. They typically have many options which I'm not showing just yet. These command and others are more thoroughly documented below and elsewhere. You can usually run a command with the option -h to see full usage information.

minos_jobsubSubmit a job.
minos_harvestfilesCollect job output, if necessary (see docs for when).
minos_qPrint a short summary of all jobs in the queue. This calls condor_q and pares the output down to a raw overview.
condor_qList all jobs in the queue. (See note 1.)
condor_rmRemove job. (See note 1.)
enstore_getGet dCache files.
enstore_cleanupClean up dCache files you're finished with.
Note 1: Commands of the form condor_* are direct Condor commands (as opposed to MINOS scripts), so very detailed documentation is available from the Condor Project if you need it.
Two more commands (documented later) are relevant for advanced users, especially those that use the old-style condor_submit method of submitting jobs.
choose_dcache_portReturn a load-balanced dCache port (if you wish to use dccp directly for your dCache needs).
choose_db_serverReturn a load-balanced database server (if you need to set up the minossoft environment on your own for some reason).


Your first job

Have no fear. Type the red text shown here to submit a simple job:

<minos04.fnal.gov> minos_jobsub sleep 120
Created /minos/data/condor-tmp/rbpatter/sleep_20080924_013207_1.cmd
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 195793.
(I've shown an example of the screen output here.) You can see your job in the queue by typing:
<minos04.fnal.gov> condor_q

-- Submitter: minos25.fnal.gov : <131.225.193.25:64961> : minos25.fnal.gov
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
190161.0   rhatcher        9/15 16:54   0+01:27:08 R  0   415.0 gen_charm.sh
190161.1   rhatcher        9/15 16:54   0+01:26:18 R  0   415.0 gen_charm.sh
 ... snipped many jobs from the list ...
195792.0   kreymer         9/24 01:30   0+00:00:00 I  0   0.0  probe
195793.0   rbpatter        9/24 01:32   0+00:00:00 I  0   0.0  sleep_20080924_013

2248 jobs; 2018 idle, 219 running, 11 held
There's our job -- the last one in the list. Often there are so many jobs listed that condor_q clobbers your screen. There are command line arguments to deal with this (for example, condor_q <your_name>), but you can also just look at a compact summary table instead by typing:
<minos04.fnal.gov> minos_q
 ... Output not shown here.  You'll see it when you run it. ...
The totals from condor_q and minos_q differ because the latter separates out the "fake" jobs (Glideins) that are present only to provide access to the grid nodes.

Now, type:

<minos04.fnal.gov> ls -l $CONDOR_TMP
total 8
-rw-r--r--  1 rbpatter 5468 567 Sep 24 01:32 sleep_20080924_013207_1.cmd
-rw-r--r--  1 rbpatter 5468   0 Sep 24 01:37 sleep_20080924_013207_1.err
-rw-r--r--  1 rbpatter 5468 689 Sep 24 01:39 sleep_20080924_013207_1.log
-rw-r--r--  1 rbpatter 5468   0 Sep 24 01:37 sleep_20080924_013207_1.out
$CONDOR_TMP is an area that you can write to regardless of where your job is sent. Each user has his/her own $CONDOR_TMP. If you submitted the example job, you should see four files like those above (with the date/time stamp being different). These files are:
*.cmd      Condor submission file
*.log      Condor log file
*.out      the job's stdout stream
*.err      the job's stderr stream
The Condor submission file is created when you run minos_jobsub and it tells Condor all it needs to know about the job. You should never have to edit this file. (If you think you do, let me know.) The .out and .err files are obvious (and empty here, since sleep produces no output). The log file tells you what the job is up to. It looks like this:
<minos04.fnal.gov> cat $CONDOR_TMP/sleep_20080924_013207_1.log
000 (195793.000.000) 09/24 01:32:07 Job submitted from host: <131.225.193.25:64961>
...
001 (195793.000.000) 09/24 01:37:14 Job executing on host: <131.225.193.12:61671>
...
006 (195793.000.000) 09/24 01:37:22 Image size of job updated: 3644
...
005 (195793.000.000) 09/24 01:39:14 Job terminated.
        (1) Normal termination (return value 0)
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Run Local Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Remote Usage
                Usr 0 00:00:00, Sys 0 00:00:00  -  Total Local Usage
        0  -  Run Bytes Sent By Job
        0  -  Run Bytes Received By Job
        0  -  Total Bytes Sent By Job
        0  -  Total Bytes Received By Job
...
The job has finished. It didn't use any significant CPU time (since sleep doesn't), and it ended two minutes after it started. We can also see that the job ran on node 131.225.193.12. Converting that into a domain name:
<minos04.fnal.gov> host 131.225.193.12
12.193.225.131.in-addr.arpa domain name pointer minos12.fnal.gov.
We see that the job ran on minos12.fnal.gov, part of the MINOS cluster.


The job environment

Directories. A few directories are accessible everywhere. The most relevant one is $CONDOR_TMP. Generally, all your job output (ROOT files, etc.) should go under here. This area is group writable (MINOS group) to facilitate grid running. (All grid jobs run as a single MINOS user.) $CONDOR_TMP is an output staging area only, not permanent storage. Old files in $CONDOR_TMP are periodically purged (after a warning email) to keep free space available. See the -d option of minos_jobsub for a way to simplify the process of moving your output files to better locations.

The other important area that you can see from everywhere is /minos/app. If you don't have a subdirectory with your user name here (like: /minos/app/rbpatter), just make one with mkdir. This area is not writable from the grid nodes; your output should all go to $CONDOR_TMP. The area is mounted with execute permissions, however. You can put your minossoft test releases and other scripts, if needed, under here.

The working directory of your job will be the directory from which you ran minos_jobsub. (If possible, that is. Remember that user AFS areas aren't accessible from most Grid nodes.)

Cluster, process, and job IDs. Each running job has a job ID that looks like: cluster.process. (See the condor_q output for examples.) If you do nothing special, process always equals 0. However, you can submit multiple instances of a job with the -N option of minos_jobsub:

<minos04.fnal.gov> minos_jobsub -N 5 sleep 120
This example submits five (5) copies of the job. The individual jobs will be assigned process numbers from 0 to N-1, and each running job will know its own process number via the environment variable $PROCESS. You might use $PROCESS to choose an input file from a list or to seed a random number generator. Condor commands like condor_rm understand cluster and cluster.process identifiers. (That is, you can refer to a whole cluster or just one of its processes.)


More complex jobs

To see a full list of minos_jobsub options, type:

<minos04.fnal.gov> minos_jobsub -h
Here's the output of the above command (could be out of date.)

I do not describe every option here. Refer to the "-h" output above for the complete list. A few options deserve mention, though.

-r <rel>
Set up minossoft release <rel> before running the job. An example usage:
<minos04.fnal.gov> minos_jobsub -r S07-12-22-R1-26 loon myscript.C

-t <dir>
Set up the test release located at <dir> before running the job. You do not need -r <rel> if you use -t.

-c <ClassAd>
A ClassAd is a requirement you can attach to your job. Condor respects these requirements when assigning a job to a node. Consult the Condor manual for details on what you can specify. There is one application of this that is worth noting here. Sometimes, a node will be messed up and your jobs will keep dying on that machine. If this happens, you can eliminate that machine (say, minos99.fnal.gov) from consideration via a ClassAd:
<minos04.fnal.gov> minos_jobsub -c 'Machine != "minos99.fnal.gov"' sleep 120
(You should also, of course, notify someone that a bad node is present.)
Again, use "minos_jobsub -h" to see many other useful options that are available.

Finally, here are a few example uses of some basic Condor commands to get you started with manipulating the queue.

condor_qList all jobs in the queue
condor_q -helpShow all the options possible
condor_q -runList all running jobs
condor_q rbpatterList all jobs submitted by user rbpatter
condor_q 12345List jobs in cluster 12345
condor_q 12345.3List job 12345.3
condor_q -analyze 12345.3Give information about why job 12345.3 isn't running
condor_q -long 12345.3Print everything Condor knows about job 12345.3
condor_rm 12345.3Remove job 12345.3 from queue
condor_rm -allRemove all your jobs
condor_hold 12345.3Hold job 12345.3
condor_release 12345.3Release job 12345.3
Arguments can be truncated to the shortest non-ambiguous form, e.g. "condor_rm -a" instead of "condor_rm -all".


Enstore (dCache) use

Your job may need to access files through FNAL's dCache system. A general write up is available on the dCache at MINOS page. If your job uses dCache, keep reading this section, as there are some important considerations.

The Enstore/dCache system has a limit to the number of files it can serve at once. A single user could in principle block all other MINOS jobs if care is not taken. In particular, there are two things you must avoid:

  1. Do not keep a dCache file stream open. Always copy first, then use. This means you cannot use loon's built-in dcap:// filename handling.
  2. Do not hard code dCache door numbers into your script.
Don't worry: The following tools should help. There are several choices for how to get to dCache. Note that the first two schemes (-y and enstore_get) allow you to specify an Enstore file in either of two ways:
/pnfs/fnal.gov/usr/minos/fardet_data/2003-01/F00011732_0000.mdaq.root
/pnfs/minos/fardet_data/2003-01/F00011732_0000.mdaq.root
The first example is what you are used to. The second is rather useful because it's the path that is actually visible from the MINOS machines.

Now, on to the three approaches:

(1) You can use the -y option of minos_jobsub:

<minos04.fnal.gov> minos_jobsub -y /pnfs/fnal.gov/usr/minos/fardet_data/2003-01/F00011732_0000.mdaq.root loon -b -q myscript.C
Your job will have access to the requested file(s) through the variable $FROM_ENSTORE. You can have multiple -y filename combinations. You may specify non-dCache files as well (perhaps for testing). Clean up is automatic.

(2) You can use the enstore_get and enstore_cleanup commands. In an interactive shell, these commands should already be available for you to use. In a script, though, you'll need to first include:

bash script:
    source $MINOS_ENSTORE/setup_aliases.sh

tcsh script:
    source $MINOS_ENSTORE/setup_aliases.csh

Using these in a script might look like this:
#!/bin/sh

source $MINOS_ENSTORE/setup_aliases.sh

enstore_get /pnfs/fnal.gov/usr/minos/fardet_data/2003-01/F00011732_0000.mdaq.root
loon -b -q do_stuff.C
enstore_cleanup

enstore_get can take multiple file arguments and can be called multiple times in succession to supplement the list. You may specify non-dCache files just as well. The job has access to the requested file(s) through the variable $FROM_ENSTORE.

(3) You can use dccp directly. If you do, you must use the choose_dcache_port tool to pick a load-balanced server and port. Running this command at the prompt shows you what it does:

<minos04.fnal.gov> choose_dcache_port
fndca1.fnal.gov:24137
In a script, you might use it like this:
#!/bin/sh

dccp dcap://`choose_dcache_port`/pnfs/fnal.gov/usr/minos/fardet_data/2003-01/F00011732_0000.mdaq.root my_local_copy.root
loon -b -q do_stuff.C
rm my_local_copy.root

As you can see, clean up is up to you with this approach.


Database use

If you use the -t or -r options of minos_jobsub to set up your minossoft environment, then you need to do nothing further for proper database access. If you set up a minossoft release manually within your script (which you probably don't need to do), then read this section.

The large number of farm jobs running simultaneously requires us to spread the database load among several servers. If you set up minossoft by hand in your scripts, you should do the following just before running your loon (or other) command:

bash users:
    export ENV_TSQL_URL=`/grid/fermiapp/minos/griddb/choose_db_server`

tcsh users:
    setenv ENV_TSQL_URL `/grid/fermiapp/minos/griddb/choose_db_server`

The choose_db_server command (which is usually available in your path) returns the URL for the server with the lightest load, like so:
<minos04.fnal.gov> choose_db_server
mysql:odbc://minos-db1.fnal.gov/temp;mysql:odbc://minos-db1.fnal.gov/offline
This load sharing is critical. You should put this near whatever will actually use the database so that the load information is not stale. In particular, you don't want a lengthy dCache copy between the server selection and the actual job. Again, if you let minos_jobsub set up your release for you (-t or -r option), you don't have to worry about any of this.


Output collection

NOTE: See the important filesystem information at the top of this page! It affects the -d feature described here.

If your output files are written to $CONDOR_TMP, you will need to move them from $CONDOR_TMP to a more permanent location once the job finishes. This is annoying. The -d option streamlines this process. Here's the idea.

Imagine a job that is designed to run locally. Perhaps it writes some output under your home area in AFS space. Here's an simple ROOT (CINT) script that does just this:

{
  TFile f("/afs/fnal.gov/files/home/room3/rbpatter/my_output/rootfile.root","recreate");
  TH1D *h = new TH1D;
  h->Write();
  f.Close();
}
If I try to run this ROOT script on the grid, it isn't going to work since the output directory /afs/fnal.gov/files/home/room3/rbpatter/my_output/ is not available. I could just edit the script so that it writes to $CONDOR_TMP:
{
  TFile f("$CONDOR_TMP/rootfile.root","recreate");
  TH1D *h = new TH1D;
  h->Write();
  f.Close();
}
but then I will later have to move rootfile.root to my desired location (namely, /afs/fnal.gov/files/home/room3/rbpatter/my_output/) by hand. Here's the way to avoid this...

I first edit the script so that it writes my output file to $CONDOR_DIR_FOO (where FOO is an arbitrary string) --

{
  TFile f("$CONDOR_DIR_FOO/rootfile.root","recreate");
  TH1D *h = new TH1D;
  h->Write();
  f.Close();
}
I then submit the job with a -d option:
<minos04.fnal.gov> minos_jobsub -dFOO /afs/fnal.gov/files/home/room3/rbpatter/my_output -r S08-08-28-R1-30 -g -a root -b -q rootexample.C
The -dFOO part says that I want the job to create a temporary directory and point to it with the variable $CONDOR_DIR_FOO. The second part of the -d argument (/afs/fnal.gov/files/home/room3/rbpatter/my_output) indicates which directory $CONDOR_DIR_FOO is standing in for. This can be any directory that you can see from a local MINOS machine (including AFS directories). Relative paths are okay. For example, if my_output/ is a directory directly below my current directory, I could just to this:
<minos04.fnal.gov> minos_jobsub -dFOO my_output -r S08-08-28-R1-30 -g -a root -b -q rootexample.C
Or, if I'm sitting somewhere else and my_output/ sits under my AFS home area, I could do this:
<minos04.fnal.gov> minos_jobsub -dFOO ~rbpatter/my_output -r S08-08-28-R1-30 -g -a root -b -q rootexample.C

Now, at any point after the job has finished, I run minos_harvestfiles:

<minos04.fnal.gov> minos_harvestfiles
root_20080905_121540_1_dir_FOO: 1 files (4.0K bytes) moving to /afs/fnal.gov/files/home/room3/rbpatter/my_output
Voilà! minos_harvestfiles scours $CONDOR_TMP for any completed jobs that were submitted with a -d directory mapping. For each one it finds, it moves the output to the directory specified at submission time, cleaning up the temporary staging area. Type (or click on) minos_harvestfiles -h for some useful options.

The minos_harvestfiles utility is intended to be robust, and you can execute it at any time, even while jobs are running. If you identify any non-robustness, please let me know!

NOTE: See the important filesystem information at the top of this page! It affects the -d feature described here.


Grid running

Unless you specify otherwise, jobs submitted with minos_jobsub run on the MINOS cluster. The Fermilab grid provides many more CPUs (eventually 4000+ cores when they are not being used by CDF.) Here's how to run on the grid.

You first need a valid proxy. MINOS jobs on the grid do not run under your username. Rather, they all run under a single MINOS account ("minosana"). You must set up a proxy file so that the actual job owner can be identified and authenticated. Click here to set up your grid proxy.

I'll now assume you have a valid proxy in place. To submit a job to the grid, simply add -g:

<minos04.fnal.gov> minos_jobsub -g sleep 120
Here's the log file while the job is still running:
<minos04.fnal.gov> more $CONDOR_TMP/sleep_20080905_111131_1.log
000 (185420.000.000) 09/05 11:11:31 Job submitted from host: <131.225.193.25:64961>
...
001 (185420.000.000) 09/05 11:11:36 Job executing on host: <131.225.166.120:61269>
...
The job is running on 131.225.166.120, which resolves to fnpc341.fnal.gov -- a grid node!

AFS on the grid. Partial access to AFS on grid worker nodes is provided by the filesystem-mimicking utility Parrot. The areas that Parrot provides to your jobs include all those needed to set up a full minossoft release, namely:

/afs/fnal.gov/files/code/e875/general/minossoft
/afs/fnal.gov/files/code/e875/general/ups
/afs/fnal.gov/files/code/e875/sim
See the Parrot documentation if you are interested in how it works, but the goal is to have things set up such that you don't need to know how it works. Everything is supposed to appear as if you ran the job on a machine with "true" AFS. If it doesn't feel seamless, let me know so we can improve the system.

Having said that, it is worth knowing the basic idea of Parrot. The AFS areas listed above are accessed via HTTP (via a proxy server). This means that any AFS file your job uses is pulled across the network. (There is caching in place to minimize network traffic.) Thus, you may notice delays in execution as files are shipped around the network. This is normal and may add several minutes in extreme cases.

It is possible to restrict your grid job to one of the sixty-four grid nodes that have full AFS available by specifying -a. However, if Parrot does not suffice and you must have full AFS access, it is better to submit to the MINOS cluster (by leaving out -g) than to use -a.


Example submissions

These few examples do not touch on all minos_jobsub options, but they should help with things like quotation marks, etc.

<minos04.fnal.gov> minos_jobsub sleep 120
Submits to MINOS cluster machines.

<minos04.fnal.gov> minos_jobsub -g sleep 120
Submits to the Grid.

<minos04.fnal.gov> minos_jobsub -g -r S08-08-28-R1-30 loon my_script.C
Submits to the Grid and sets up minossoft release S08-08-28-R1-30 before running loon.

<minos04.fnal.gov> minos_jobsub -g -r S08-08-28-R1-30 loon 'my_script.C(\"$CONDOR_TMP/outfile.root\",100)'
Loon script now takes arguments. Note -- the outermost single quotes will be stripped by the shell, so it is important that you don't have any spaces in the script specification and that you escape the interior quotes with backslashes.
     Wrong:  'my_script.C(\"$CONDOR_TMP/outfile.root\", 100)'
     Wrong:  'my_script.C("$CONDOR_TMP/outfile.root",100)'

<minos04.fnal.gov> minos_jobsub -g -r S08-08-28-R1-30 do_stuff.sh
If things are complicated enough, it is easiest just to make a script that does everything internally.

<minos04.fnal.gov> minos_jobsub -g -t ../.. do_stuff.sh
The minossoft test release two directories above the current directory will be set up before the job runs.

<minos04.fnal.gov> minos_jobsub -g -t ../.. -dLOGS logs -dROOTFILES ../root_output do_stuff.sh
The job will have access to the two directories pointed to by $CONDOR_DIR_LOGS and $CONDOR_DIR_ROOTFILES where, perhaps, log and root files will be written. Later, minos_harvestfiles will automatically move the output to the logs and root_output directories, the first of which sits beneath the current directory and the second of which is a sibling directory.

<minos04.fnal.gov> minos_jobsub -N 10 -g -r S08-08-28-R1-30 -dOUT . loon 'my_script.C(\"\$CONDOR_DIR_OUT/blah_\$PROCESS.root\")'
This example violates the new filesystem policy described at the top of this page. Do you see how?
The loon argument here is complex enough that you should just use a wrapper script. All the annoying backslashes would then no longer be needed. (The subtle thing on this example is that $CONDOR_DIR_OUT and $PROCESS aren't defined until runtime.) Here's what the wrapper version would look like:
     <minos04.fnal.gov> minos_jobsub -N 10 -g -r S08-08-28-R1-30 -dOUT . my_wrapper.sh
with wrapper.sh containing two simple lines:
     #!/bin/sh
     loon 'my_script.C("$CONDOR_DIR_OUT/blah_$PROCESS.root")'
Remember to make wrapper.sh executable (with chmod a+x wrapper.sh or similar).



Ryan Patterson
< rbpatter at caltech dot edu >

Last modified: Tue Dec 20 14:58:19 GMT 2011