Reco data for R1.18 for fardet_data 2003 has been declared
This completes the the R1_18 reprocessing !.
Reco data for R1.18 is now up to date.
It is now being updated daily via the predator cron job.
Reco data for R1.16 is now up to date ( picked up July and August )
Evaluation SAM datasets are defined for
BEAM, DCS, RAW and RECO data files.
These are probably too
explicit for general use, but should be fine for inital
evaluation.
Dataset names are like
zeval-beam
zeval-far-dcs
zeval-near-dcs
zeval-far-raw
zeval-near-raw
zeval-far-cand-physics-alldata-r1_16
2383
zeval-far-cand-physics-spill-r1_16
2383
zeval-near-cand-physics-cosmic-r1_16
zeval-near-cand-physics-spill-r1_16
See
the full list here, or via the SAM
data browser.
See also this discussion of some dataset
name convention considerations.
Corrected raw data metadata for about 25K
files.
These had slightly wrong event counts and timestamps,
due
to a bug in the Minos framework in May and June.
Loading of
reco files, which clone much metedata from raw parents,
can now
proceed.
Loading of Beam and DCS files will also proceed now.
Reco metadata loading is fully tested in
development.
We will skip about half the files, which are likely
to be deleted soon.
R0.8.0
R1.0
R1.0.0
R1.0.0a
Still, there are about 100K files
to declare.
Ready to go as soon as the recent R1.16 filenames
are adjusted
per the Thursday 26 May Offline meeting,
to have
new iteration field to handle file rewrites.
Declaring them
should take about a day.
Then we can keep up hourly, as part of
the Predator job
which presently declares raw data files.
Started hourly cron job, declaring near and far metadata.
neardetector_data and fardetector_data
file metadata is up to date through today.
Will start a daily cron
job next week, to keep it so.
Phase II actually completed today in development, not last Thursday
Used the Python scrips, which load about 1K files/minute ( versus 20 )
setup sam -q dev
Phase III -
recon files - starting inventory and planning
This need some caution , there seem to be over 280K files under
/pnfs/minos/reco_far
Phase II is completing today in
development
setup sam -q dev
Starting
to work on python scripting for much faster loading.
Phase
III - recon files - starting inventory and planning
This need some caution , there seem to be over 280K files under
/pnfs/minos/reco_far
Phase I is complete
missing file lists are preprared for
neardet_data
only one near detector file is missing
fardet_data
285 files cannot be read, mostly from 2002 and July 1003, here is a
count
Phase II
need to add 2005 02/03
Phase I still nearly complete
historical neardet_data and fardet_data are done
catching up on Feb/Mar data for 2005
need to
reload 2003 data for 03 04 05 06
there were dbu and DCache problems when these were run before
Phase
II - sam declarations
fardet_data is being
verified prior to loading
Phase I nearly complete,
neardet_data is done
fardet_data is complete thru
fardet_data/2004-05
should be done by 1 April ( actually finished 3 April )
Then will catch up with 2005-02 and 2005-03
Phase II - sam
loading
neardet_data is loaded into SAM
development, thru 2005-01
fardet_data will get loaded the week of 8 April
Phase I first pass complete for neardet_data
Half of 2004-11 files remain to be processed - encountered some files for which dbu loops CPU bound
Started Phase I for fardet_data
|
DATASET |
FILES |
SIZE (MB) |
|
neardet_data |
9436 |
313798 |
|
fardet_data |
37619 |
1103099 |
|
near_dcs_data |
412 |
145 |
|
far_dcs_data |
1082 |
1513 |
|
beam_data |
204 |
1777 |
( File counts and sizes as of 2005 04 03
)
Getting metadata for SAM requires running each data file
through DBU.
The rate seems to be about 6 hours per 1000
files.
Work is being done under the kreymer account, on
minos06.fnal.gov.
under ~kreymer/minos and
/local/scratch06/kreymer/...
Phase I - generate the
.py SAM metadata descriptions
Phase II - run 'sam declare' and
'sam add location' using each of these *.py files.
should be relatively fast, a few seconds per file.
But should speed this up by doing this from a Python
script.
Almost all the cost resides in Python script startup overheads when
run from the command line.