Minos SAM data loading



        Status

For all the really ugly details, see the detailed LOG

2005 10 19

Reco data for R1.18 for fardet_data 2003 has been declared

This completes the the R1_18 reprocessing !.

2005 10 04

Reco data for R1.18 is now up to date.

It is now being updated daily via the predator cron job.

2005 08 31

Reco data for R1.16 is now up to date ( picked up July and August )

2005 08 06

Evaluation SAM datasets are defined for BEAM, DCS, RAW and RECO data files.
These are probably too explicit for general use, but should be fine for inital evaluation.
Dataset names are like
   
zeval-beam

zeval-far-dcs
zeval-near-dcs

zeval-far-raw
zeval-near-raw

zeval-far-cand-physics-alldata-r1_16 2383
zeval-far-cand-physics-spill-r1_16 2383
zeval-near-cand-physics-cosmic-r1_16
zeval-near-cand-physics-spill-r1_16

See the full list here, or via the SAM data browser.
See also this discussion of some dataset name convention considerations.

2005 07 12

Corrected raw data metadata for about 25K files.
These had slightly wrong event counts and timestamps,
due to a bug in the Minos framework in May and June.

Loading of reco files, which clone much metedata from raw parents,
can now proceed.

Loading of Beam and DCS files will also proceed now.

2005 05 27

Reco metadata loading is fully tested in development.
We will skip about half the files, which are likely to be deleted soon.
   R0.8.0
   R1.0
   R1.0.0
   R1.0.0a
Still, there are about 100K files to declare.

Ready to go as soon as the recent R1.16 filenames are adjusted
per the Thursday 26 May Offline meeting,
to have new iteration field to handle file rewrites.

Declaring them should take about a day.
Then we can keep up hourly, as part of the Predator job
which presently declares raw data files.

2005 05 13

Started hourly cron job, declaring near and far metadata.

2005 04 22

 neardetector_data and fardetector_data file metadata is up to date through today.
Will start a daily cron job next week, to keep it so.

2005 04 18

Phase II  actually completed today in development, not last Thursday

Used the Python scrips, which load about 1K files/minute ( versus 20 )


setup sam -q dev


Phase III - recon files - starting inventory and planning
    This need some caution , there seem to be over 280K files under /pnfs/minos/reco_far

2005 04 14

Phase II  is completing today in development
    setup sam -q dev

Starting to work on python scripting for much faster loading.

Phase III - recon files - starting inventory and planning
    This need some caution , there seem to be over 280K files under /pnfs/minos/reco_far



2005 04 11

Phase I is complete
    missing file lists are preprared for
    neardet_data
        only one near detector file is missing
    fardet_data
       285 files cannot be read, mostly from 2002 and July 1003, here is a count
Phase II
    need to add 2005 02/03
 

 2005 04 04

Phase I still nearly complete
    historical neardet_data and fardet_data are done
    catching up on Feb/Mar data for 2005
    need to reload 2003 data for 03 04 05 06
        there were dbu and DCache problems when these were run before

Phase II - sam declarations
    fardet_data is being verified prior to loading


2005 03 31

Phase I nearly complete,
    neardet_data is done
   fardet_data is complete thru fardet_data/2004-05
        should be done by 1 April ( actually finished 3 April )
  Then will catch up with 2005-02 and 2005-03

Phase II - sam loading
    neardet_data is loaded into SAM development, thru 2005-01
      fardet_data will get loaded the week of 8 April

2005 03 15

Phase I first pass complete for neardet_data

Half of 2004-11 files remain to be processed - encountered some files for which dbu loops CPU bound

Started Phase I for fardet_data

Areas to be loaded were :


DATASET

FILES

SIZE (MB)

 neardet_data

 9436

  313798

  fardet_data

37619

 1103099

near_dcs_data

  412

     145

 far_dcs_data

 1082

    1513

    beam_data

  204

    1777


(  File counts and sizes as of 2005 04 03  )

Getting metadata for SAM requires running each data file through DBU.
The rate seems to be about 6 hours per 1000 files.

Work is being done under the kreymer account, on minos06.fnal.gov.
under  ~kreymer/minos and /local/scratch06/kreymer/...



Phase I - generate the .py SAM metadata descriptions

Phase II - run 'sam declare' and 'sam add location' using each of these *.py files.
                should be relatively fast, a few seconds per file.
               But should speed this up by doing this from a Python script.
               Almost all the cost resides in Python script startup overheads when run from the command line.