Last modified: Tue Nov 11 09:04:12 GMT 2008
Nick West
Several times a day a cron job takes these master catalogues and places them on a web visible directory. DCM running a sites that does not have access to the master catalogues instead use slave copies which it refreshes from the web when more than a few hours old. DCM names SE elements using the following syntax:-
<site>-<type>-<service> e.g. ral_t1-castor-test_d0t1 Where:- <site> Site name e.g. ral_t1 (RAL Tier 1) or fnal <type> The storage technology e.g. dcache or castor <service> The individual service e.g.tapeWhen DCM is runs its initial output includes a list of the SEs it can access. For example:-
Local Storage Elements (in search order):- ral_t1_ui-nfs Local NFS Disks ral_t1-castor-prod_d0t1 RAL T1 CASTOR disk0tape1 Production Service ral_t1-castor-test_d0t1 RAL T1 CASTOR disk0tape1 Test Service ral_t1-dcache-disk RAL T1 dCache Disk Store ral_t1-dcache-tape RAL T1 dCache Tape Store fnal-dcache-enstore FNAL dCache interface to EnstoreThe order reflects the default search order used to retrieve files. Note how the local disk is treated as an SE.
DCM uses the SE names as the basis for a "DCM URL". The syntax is:-
dcm://<SE_name>/<SE_dir>/<File_name>#<byte_size> e.g. dcm://fnal-dcache-enstore/pnfs/fs/usr/minos/reco_near/R1_18/snts_data/2005-04/N00007148_0008.spill.snts.R1_18.0.root#129262When requested to transfer files, DCM first converts the files names into URLs which are then used to determine the appropriate commands to perform the operation.
Users may specify requests in the form of DCM URLs and for this purpose a basic form of wildcarding is allowed:-
"dcm://ral_t1-castor-prod_d0t1/user/nwest/grid_tests/*.\.root"i.e. any .root file in the directory user/nwest/grid_tests/
The advantage of using DCM URLs is that it bypasses the catalogue look-up so latency issues associated with the catalogues, which could make them over a day out of date, can be avoided. The cost is that the user has to know the exact location of the files.
dcm_cache/
which is where DCM will place files, although it can also manage files
that users have placed elsewhere on these disks.On the first disk in the list there must also be a top-level directory:-
dcm_catalogue/
this is the "soft links catalogue" which is where DCM places soft
links to data files on all the disks it manages. That directory has
the sub-directory
DCM/
where DCM maintains its text catalogues which also holds as series
of Global Log files of the form
global_log_YYYY-MM-DD e.g. global_log_2007-05-25
together with 2 soft links:-
global_log_current
global_log_previous
which point to the latest two. These log files record file transfers,
both success and failures.
When DCM is run it starts by listing the disks it is managing. For example:-
DCM configuration:- List of DCM-managed disks: /stage/minos-data1 List of excluded directories: Ownership group: minos Soft links catalogue: /stage/minos-data1/dcm_catalogue DCM catalogue directory: /stage/minos-data1/dcm_catalogue/DCM Scratch directory /tmp/dcm_scratch_area_9321
Access to a local (scratch) disk isn't a problem; its name ($work_dir) is communicated via the environment (see get_site_info.pm ) and on start up DCM can create the standard subdirectories that it requires.
At least in the current LCG implementation there is a shared disk where files can be held permanently - the software disk - and as the logs and catalogues are small its possible to have a group writable directory on the software disk and store them there. However that only solves part of the problem. This disk is not the same as the one that holds the catalogues and logs on the User Interface (UI), so this means that there has to be a system to keep them synchronised.
The system works is as follows:-
dcm_data/catalogues dcm_data/resource_locksand make them group writable.
[July 2007: The system to propagate the catalogues to the software disk is not yet in place so for now the system only runs on RAL T1 and gets the catalogues from the UI disk.]
Although the initially we only expect to run on RAL T1 and T2 this model would also work if operations were extended to other nearby T2s. In the longer term we should start to use the LCG File Catalog (LFC) as this is accessible from WNs. However this catalogue is not searchable in the same way the current DCM catalogue, it's more like a dCache directory structure, so we may have to change the way we form queries and be explicit about the directories to be searched before we can take this step.
$MINOS_TOOLS/dcm.sh {global options} command {command options} {command args}
--debug n Switch on debug level n (=0 off) --expt e Selected experiment. Allowed values: minos [default] and sno --site s Select site. CAUTION: use for testing only!!
catalogue {<file>...<file>...} { --all}
Example: catalogue /stage/minos-data1/d4/C00080277_0000.mdaq.rootThis adds the file into both in the text catalogue and and as a soft link to the file in the soft links catalogue:-
dcm_catalogue/
directory which must be the top level directory on the first disk
managed by DCM. This command is useful if adding a file that is
not within the set of directories managed by DCM
Example: catalogue -allThis uses the results of the last disk scan (see the survey command) and checks that all the data files that it found are in the text and soft links catalogues.
directory_ownership {mode}
where
mode (optional):-
"full" [default] show every directory
"compress" suppress sub-directory wholly owned by a single user
This command uses the results of the last disk scan (see the survey command) and reports, for each
data directory, the users who own files in it including sub
directories.
disk_usage {command option} ...
--user_lists <user_list_dir> Produce set of files_owned_by_<user>.txt and disk_usage.txt in <user_list_dir>This uses the results of the last disk scan (see the survey command) to produce a summary of usage, both by disk and by user.
DCM classifies all files into 1 of 4 types:-
get {command options} file-query file-query ...
Transfer one or more files from an SE (Storage Element).
--accept_dcm_url Return files as DCM URLs; doesn't attempt any transfers
--accept_root_url Return files as ROOT URLs if supported; otherwise transfer.
--demand_complete_set
Quit without getting any files unless able to get them all
Default: return whatever files can be located.
--file_list f If command succeeds, record list of files (or URLs) in file f.
Will include all files i.e. even those already on disk.
Caution: On input f must not exist.
--force_local Force a copy to local dir (see --local_dir) unless already
there.
--local_dir d Copy files to specified directory.
Default: the dcm_cache directory of the disk with most space
--max_files n Set upper limit on number of files to transfer.
Default 10. Hard upper limit of 1000 files.
Used to prevent misplaced wildcard from transferring
huge amounts of data!
--num_get_jobs n Run up to n transfer jobs at once.
Default 1. Hard upper limit of 10 jobs.
--names_not_unique Use this option if the file name are not definitely unique. This prevents DCM
from seeing if it can find a copy already on the local disk rather than getting
it from the SE.
USE FOR FLUX FILES: If not DCM could find the wrong version (it has happened!)
--preserve_rel_dir d
Preserve relative directory structure: Directory d in in SE maps to top of
local dir. It's useful when downloading flux files e.g.:-
--remote_se "ral_t1-castor-prod_d0t1/flux/gnumi/v19/fluka05_le010z185i/job[0-3]"
--preserve_rel_dir flux/gnumi (or even --preserve_rel_dir gnumi)
--local_dir /some/local/dir
Files would be written to /some/local/dir/v19/fluka05_le010z185i/...
Local directories will be created as necessary.
If the remote SE directory path does not start with d the file is
placed in the top level directory.
--remote_se se_name{/se_dir}
Only copy files from selected SE {and within selected /se_dir}
e.g --remote_se ral_t1-castor-test_d0t1/gnumi/v19/fluka05_le010z185i
Only look in SE ral_t1-castor-test_d0t1 within directory gnumi/v19/fluka05_le010z185i
e.g --remote_se 'ral_t1-dcache-disk/gnumi/v19/fluka05_le010z185i/job1.*'
Only look in SE ral_t1-dcache-disk within directory sub-tree gnumi/v19/fluka05_le010z185i/job1.*
Note: . - any single char; .* - any char string
Use the se_name "not_nfs" to just exclude the local disk.
--test Determine what files have to be transferred and from where but
don't transfer files
file-query
Either: File name
e.g. F00030574_0002.mdaq.root
or an 'egrep' wildcard regular expression: 'F000256.*.cand.R1.14.root'
Note: . - any single char; .* - any char string
CAUTION: Once match found in any SE DCM quits searching.
Or: A database query for SAM enclosed in square brackets
e.g. [ file_name like N00008695_002%.cosmic.sntp.R1_18.0.root ]
e.g. [ "run_type physics%
and data_tier sntp-near
and physical_datastream_name spill%
and start_time < to_date('2006-02-18','yyyy-mm-dd')
and end_time > to_date('2006-02-17','yyyy-mm-dd')
and version cedar" ]
e.g. [ dataset_def_name gemma3-Cedar-near-all-sntp-2007-5-w2 ]
Make sure there is a space after the leading '[' or the shell
command parser may treats as a wildcard construction.
Enclose in double quotes if query includes parentheses.
Or A DCM URL e.g. dcm://fnal-dcache-enstore/pnfs/fs/usr/minos/rec ... .snts.R1_18.0.root#129234
All 3 type types of command arg may be mixed in the same invocation.
DCM first executes all SAM commands to resolve them into files names.
Then, for file names that are not already a DCM URLs it searches the
SE catalogues and converts then to DCM URLs. It then transfers any
that it locates that are not already on local disk.Note that the 2 stage approach allows users to have a dataset defined by a SAM query and yet retrieve files from the closest SE.
Note that, for a given file-query, DCM stops searching SE catalogues as soon as it finds any match. The logic is that a dataset should always be defined by applying a search to a single SE and not by the logical OR of all SEs. So if you want to copy some data set, say a group of files matching a wildcard, and some are already on the local disk, then, by default DCM will only find them and not copy the rest. The solution is to use the --remote_se option to force DCM to look at the SE which has the full set; it will still check the local disk so there is no risk that it will copy files it already has.
If using the --file_list option be sure that the name of the file you pass is unique. The normal way to do that is to include the process ID (environmental variable $$) in the file name. Otherwise on a system with multiple jobs running all getting files via DCM there is a danger that two might use the same name to return their file list. As an additional precaution, DCM will reject the command if it is passed an pre-existing file.
The --accept_dcm_url can be useful to see what files would satisfy a request without doing any transfer. Using the --test option only shows you what files would have to be transferred, unlike the URL request which will show files on local disks as well. It also allows you to see if transfers would have to take place. The resultant URLs can later be passed to DCM for transfer, so long as they are still valid. This might be useful if running a job on a Worker Node if no catalogue were available.
put {command options} file_name file_name ...
Transfer one or more files to an SE (Storage Element).
--create_remote_dir
If necessary create remote directory
--file_list f If command succeeds, record list of files transferred
Each line of file is:-
Either: Name of file successfully written
Or: Error message starting with the character '?'
Caution: On input f must not exist.
--local_dir d Copy files from specified directory. Default: current directory
--overwrite Overwrite existing file. Default don't overwrite
--remote_se se_name/se_dir
Directory on SE. Compulsory
--test Just test, don't transfer files
file-name File name relative to --local_dir.
No wild-cards permitted and no check that file is recognisable as a data file.
survey {<se>...<se>...}
Example: survey ral_t1-castor-test_d0t1 fnal-dcache-enstoreThis command rebuilds the catalogues for the selected SEs or from all available SEs if none is specified. The resulting catalogue is stored in
dcm_catalogue/DCM/<SE name>.catFor most SEs the scan is carried out using the appropriate commands for the SE concerned, but there are two special cases:-
DCM treats the local set of disks as the best SE and gives it the name
<site-name>-nfsDCM does a recursive scan of all the directories that it manages and stores it in the catalogue
dcm_catalogue/DCM/<site-name>-nfs.catThis catalogue is used as the basis for the following commands:-
catalogue (when given the --all option) directory_ownership disk_usageOnce the scan is complete the survey command then executes:-
catalogue --all disk_usage
DCM doesn't scan Enstore. Instead it copies the latest version of a such a scan:-
http://www-stken.fnal.gov/enstore/tape_inventory/$FNAL::COMPLETE_FILE_LISTING_minosand then converts it into a DCM catalogue.
test <sub-command> <arg> ...Is used to test and debug DCM. Typing the test command without further arguments will list what tests are currently available.
catalogue <file>{. <file>..}
Example: uncatalogue /stage/minos-data1/d4/C00080277_0000.mdaq.rootThis removes the file from both in a disk based catalogue and and as a soft link to the file in the soft links catalogue:-
dcm_catalogue/
directory which must be the top level directory on the first disk
managed by DCM.
DataCacheManager/configsubdirectory.
This file identifies all the SEs used by the experiment, the services each provides and the way to access these services.
This file specifies which of the experiments SEs can be accessed from the local site, the mean transfer rate (used to calculate a timeout) and which interfaces to use to them.
This file specifies the local disk setup at the site.
Prepare the configuration files and the associated directories as follows.
<site><tier-level><ui or wn>For example:-
ral_t1_wn oxford_t2_ui sussex_uiThis file remains local to the site and is included in the .cvsignore list.
cd DataCacheManager/config local_name= (what ever name you have chosen) echo $local_name > minos.site_name
cp minos.site_oxford_t2_ui.se_access minos.site_$local_name.se_access
minos.site_ral_t1_ui.local_disksand renaming entries to match the local disks, creating directories as required with group write permission.
Alternatively, start from scratch, define 'data_dir' to be the top directory of your data and then used that to fill out the file:-
data_dir= (the top directory of your data disk) rm -f minos.site_$local_name.local_disks (should not exists, but just in case) echo Group minos >> minos.site_$local_name.local_disks echo Scratch_dir /tmp >> minos.site_$local_name.local_disks echo @Disks $data_dir >> minos.site_$local_name.local_disks echo @Exclude_dirs $data_dir >> minos.site_$local_name.local_disks echo Soft_links_dir $data_dir/dcm_catalogue >> minos.site_$local_name.local_disks echo Catalogue_dir $data_dir/dcm_catalogue/DCM >> minos.site_$local_name.local_disks echo Resource_lock_dir $data_dir/dcm_resource_locks >> minos.site_$local_name.local_disksDCM is capable of surveying everything below @Disks and provide a catalogue, but we assume that you don't need this feature which is why @Exclude_dirs is set to the same thing.
Now create all the required directories giving group write access.
mkdir --mode 0775 $data_dir/dcm_catalogue mkdir --mode 0775 $data_dir/dcm_cache mkdir --mode 0775 $data_dir/dcm_catalogue/DCM mkdir --mode 0775 $data_dir/dcm_resource_locksConfirm that dcm runs
dcmIt should type its help and near the top list 'host_name' (the name you chose) and the SEs it can see and the local disk setup.
dcm surveyIt will take no time to survey the local disk because everything was excluded but then will take about 15 minutes to download a ~ 0.3GB file from FNAL and reformat it for DCM usage.
Note: DCM does not automatically refetch this file as it does take a while so will slip out of date. One way to prevent this is to have a nightly cron job that just executes this command.
dcm get --accept_dcm_url N00006771_cat0.spill.sntp.R1_18_2.0.root [ should locate one file in ral_t1-dcache-tape ] dcm get --accept_dcm_url AnaNue-N00009062_0018.spill.sntp.cedar.0.root [ should locate a file in ral_t1_ui-nfs ]
For example, for the server "ral_t1-castor-prod_d0t1;rfio"
/castor/ads.rl.ac.uk/prod/grid/hep/disk0tape1/minos;
export STAGE_SVCCLASS=minosDisk0Tape1 export STAGE_HOST=castorstager.ads.rl.ac.uk export RFIO_USE_CASTOR_V2=YES
Handling of DCM URL , which encodes the SE name, SE directory and file size), is done by sei_dcm_url_pack
SE directory creation is done by sei_prepare_directory and file overwriting is done by sei_prepare_file
Catalogue handling is provided by sei_survey that can scan an SE and build a text catalogue and searching such a catalogue for a file name and hence infer the DCM URL (which encodes the SE name, SE directory and file size) is done by sei_search_catalogue
/pnfs/fs/usr/minos /pnfs/minos /pnfs/fnal.gov/usr/minosin such cases sei_search_catalogue takes
/pnfs/minosover other forms.
Having the transfer as a separate script allows FRS, when transferring multiple files, to run multiple jobs in parallel.
After a successful transfer FRS updates the SEI catalogues.
dcm/minos dcm/snoapart from
init_minos.pm init_sno.pmAfter parsing any global switches DCM knows which experiment it is dealing with and then executes the appropriate experiment initialisation.
Calls from the generic to the experiment specific code constitute the experiment API.
Parameters:-
==========
$file_name Name of file to be identified (can contain directory)
Return:-
======
$data_name MINOS: Currently this is returned as the component
between the sub-run and the data type
SNO: The module name e.g.Reconstruct
$data_type MINOS: Data type i.e. the extension e.g. mdaq.root
SNO: Data type e.g. sno_root
$detector MINOS: The detector.One of "CalDet", "Far" or Near".
SNO: The phase e.g. salt
$run_no Run number
$sub_run_no Sub-run number (or -1 if n/a)
$version MINOS: Release (or "" if n/a)
SNO: Pass number (or "" if n/a)
Parameters:-
==========
Either: $file_name File name whose access info is required.
Or: $db_query A database query for SAM (MINOS) or Ral (SNO)
Return:-
======
A list file_access_size variables: Each consisting of:-
$file_name:$access_info:$estimated_file_size
where:-
$file_name File name.
$access_info MINOS: ENSTORE directory
SNO: Tape name:file number
$estimated_file_size Estimated size in GB
In the case of an error a single entry is returned: "? Error message"