Associated with any code release there is often the need for accompanying
ancillary data files containing non-source code informations.
Examples include: PDFs, histograms, bfield maps, etc.
Since the beginning of MINOS we have facilitated the distribution of such
auxillary files by using CVS. This document describes the new,
preferred means of distributing these files for MINOS.
Using CVS and SRT for code management is relatively straightforward in
how ordinary users see it; there is a $SRT_PUBLIC_CONTEXT and
possibly a $SRT_PRIVATE_CONTEXT that represents a release.
Under each of these are a set of packages that appear, to the user,
as a complete release (base) or some select packages for overriding
the base (test). Behind the scene what this means is that if package
MyPkg has a file mystuff.root then complete
independent copies of this will be found in the locations of the form:
$SRT_DIST/packages/MyPkg/HEAD/mystuff.root
$SRT_DIST/packages/MyPkg/R1-24/mystuff.root
$SRT_DIST/packages/MyPkg/R1-24-0/mystuff.root
...
$SRT_DIST/packages/MyPkg/R1-24-4/mystuff.root
$SRT_DIST/packages/MyPkg/R1-28/mystuff.root
$SRT_DIST/packages/MyPkg/S08-02-24-R1-28/mystuff.root
...
$SRT_PRIVATE_CONTEXT/MyPkg/mystuff.root
If the binary-identical file get committed into both MyPkg and
TheirPkg separate copies are made for those as well.
What this new mechanism does is allow releases and packages to share
copies. A master copy of each unique file is kept at FNAL and made
available via the web. Where previously in a package there was
a file mystuff.root it is removed and a new file
mystuff.root.proxy created.
The contents of .proxy tell the system which
copy of the master file is needed. Master files that go into the system
must be uniquely named even if the target (e.g.
mystuff.root) is generic. So, the
first file into the system might be named mystuff.v1.root,
and the next mystuff.v2.root, etc. The text in each instance
of the .proxy file it will name one in this series. Different
copies of mystuff.root.proxy might have different contents
for different releases or packages, but each is a small text file
that can comfortably be handled by CVS.
The sharing can then be accomplished in any of four ways.
| flag | action |
| --cache |
download copies to a site specific cache area and symlink from
the release/package to that location.
|
| --afs |
symlinks to AFS (needs site machine to run AFS)
/afs/fnal.gov/files/data/minos/release_data/
|
| --minosdata |
symlinks to /minos/data/release_data/
(not always possible due to limit on mount permissions)
|
| --local |
download copies in the same directories as the .proxy -- this
leaves one in a state no different than previous to this method
(but not really "sharing", then either)
|
In general the first, site specific cache, is probably the right approach
for most non-FNAL installations.
Base Release:
The simplest way to resolve all the .proxy files
(asumming that the site has chosen to go with a local
cache and configured a .proxyrc to handle that) is to issue the command:
$ $SRT_DIST/setup/proxy_resolver.py
This should be done after any update to the local minossoft installation.
It will correctly handle .proxy files that are newer than their target
on the local machine and automatically get the new file and remake the link.
At some point will probably become the default behaviour when using
msrt update for working on a base release.
If a site is using AFS or /minos/data one adds the command
line flags --afs or --minosdata. If instead one desires
to mimic the old multiple instance case, use --local.
Test Release:
$ setup_minos
$ cd /path/to/testrel
$ srt_setup -a
$ $SRT_DIST/setup/proxy_resolver.py -r test
BField Maps:
$ $SRT_DIST/setup/proxy_resolver.py -r bmap
Commandline Flags:
$SRT_DIST/setup/proxy_resolver.py --help
usage: proxy_resolver.py
-h, --help this message
-q, --quiet don't show actions
-v increase debug verbosity level
-f, --force force refetch/relink of file
-t, --test print actions, but don't do them
-p, --package limit search to just package pattern [*]
-r, --release limit search to just release pattern [*]
if "test" act on $SRT_PRIVATE_CONTEXT
otherwise HEAD, R1-NN, SYY-MM-DD-R1-NN
( also super-special case: "bmap" )
The following determine the action taken:
--cache Set symlink to site cache copy.
Fetch remote file to site cache if necessary.
Default action, but if not specified on the cmd line
the cache location my be resolved by a
$SITE_PROXY_CACHE env variable or a line in
a .proxyrc starting with SITE_PROXY_CACHE:
--afs symlink via AFS
--minosdata symlink to /minos/data
--local fetch copy to pkg directory
--unlink remove target
special flags:
--no-proxyrc ignore all .proxyrc lists
--no-sys-proxyrc ignore .proxyrc except in dir w/ .proxy
--source alternative master source directory
The transition from the old scheme to the new take a few easy steps.
One must choose a site cache that is visible to all local instances
of the minossoft installation (i.e. if the base release is on a NFS
filesystem visable to many nodes, the cache should be as well).
The initial steps, when using a site cache, are:
$ setup_minos
$ cd /path/to/site/cache
$ mkdir release_data
$ echo "SITE_PROXY_CACHE:/path/to/site/cache/release_data" > $SRT_DIST/setup/.proxyrc
$ $SRT_DIST/setup/proxy_resolver.py --unlink # remove any stale files
$ $SRT_DIST/setup/proxy_resolver.py --force
If a site-wide decision is made to exclude particlar files (or patterns
of files) one can use the .proxyrc
file to accomplish that; that decision and changes made to the .proxyrc
before the last step to avoid downloading unnecessary files.
Here is an example .proxy file:
# This is an example .proxy file. It is named "mytarget.root.proxy"
# and gets resolved locally with the addition of a file "mytarget.root"
# in the same directory which is symlink to (copy of) the real file.
#
# The real contents of the .proxy is a single named file which must be on
# the first non-blank, non-comment line. The remote file must be absolutely
# unique in its name, so should incorporate a version number and be
# as descriptive as possible. The remote file line can include a subdir
# path to facilitate clustering of related files (not generally recommended
# unless a long series is anticipated or sharing between packages
# is unlikely). Note that the remote file name needn't be a simple
# tranformation of the target file name -- though that would generally
# be a wise choice.
#
# The comments can serve to describe the file and provide extra metadata
# e.g. "the v3 version of the file has the correct blah-blah PDF"
# This allow one to keep track of why changes were made to the file
# and provide other helpful hints.
#
mysubdir/myremotefile.v3.root
The role of the .proxyrc file is to allow local control over
the proxy resolution. These files should never be committed
back to the repository -- they are strictly for local site configuration.
Here is an example .proxyrc file:
# This is an example .proxyrc file. It serves two purposes:
# * provide a list of file names and patterns that aren't desired
# locally at this site.
# * provide a place for specifying where the local site cache is located
#
SITE_PROXY_CACHE:/path/to/where/actual/files/live
excluded-file.dat
excluded-pattern*.dat
These can be located in individual package directories with the .proxy
files or in $SRT_DIST/setup, ~, or
$SRT_PUBLIC_CONTEXT,
as well as $SRT_PRIVATE_CONTEXT when using the "test"
release, and $BMAPPATH when using "bmap" release.
There now are two steps involved in adding or updating a file. First the
master copy must be put in the FNAL repository. Secondly a .proxy
file must be created/modified to point to that copy. The steps should
be done in this order and allowing sufficient time for the master copy
to be made available before the proxy file is committed to CVS.
Users desiring to distribute a large or .root file should
copy it to FNAL (either AFS or /minos/scratch). They then
notify Robert and Arthur where it is located (core software group
as furlough/vacation/sickday backup). Robert or Arthur verifies that
the name is unique and makes appropriate copies into both the
master AFS and /minos/data areas. That person then informs
the user that the file has been installed.
An example of a .proxy is shown
above. If the desired target file is named mystuff.root
then the proxy file must be named mystuff.root.proxy. The
user puts the new/updated proxy file into the CVS repository.
Recently the number and size of the auxillary files
has exploded out of proportion and is starting to cause problems.
As of 2008-03-07 there were 114 files that were either binary .root
or files over 1MB in the primary minossoft packages, in all, due
to revisions, there are 140 distinct files. There are over 330 MB of this
data checked out for every recent release. Many of these files are then
duplicated when individuals make test releases.
There are also another 55 b-field maps.
The CVS backup process is having a hard
time handling this and code checkouts are taking up too much space on disk,
often with duplicate copies of identical files located in different directories
and/or different releases.
- cvs pro's:
-
- familiar interactions
- automatic versioning
- integrated with current procedure
- comes for "free"
- cvs con's:
-
- size of "code repository" incompatible with
backup tools
- cvs wasn't designed for large/binary files and isn't that good
at handling them
- multiple copies in parallel directories (Mad vs. NCUtil)
- same version not shared between releases
- no straightforward way to exclude files on a local basis
The new approach has several advantages, though a few weaknesses:
- pro:
-
- sharing of identical files under different packages and/or releases
- supply means of excluding files from individual sites under their control
- simple AFS access if so desired
- reduce size of CVS repository
- con:
-
- more effort to do version control (must verify name uniqueness)
- time delay in request for insertion of new/updated master file
Nick's message (2008-02-28):
Dear All,
We hope finally to bring closure to an item back last June:-
Latest snapshot release is 3 times the size of R1.24!
http://listserv.fnal.gov/scripts/wa.exe?A2=ind0706&L=MINOS_SOFTWARE_DISCUSSION&D=0&I=-3&P=2068
We need to do this both to dramatically reduce the size of the CVS
Repository, which is important both for backing it up and for checkout
and to cut down on disk space when multiple releases are installed, as
frequently releases have identical versions of the files.
IF YOU HAVE ANY OBJECTION TO THE SCHEME BELOW, PLEASE SEND EMAIL
TO THIS LIST ASAP
We want to do this in a way that minimises disruption to users by moving
all large data files to a public directory leaving behind symlinks of
the same name in the releases. In order to support multiple versions of
the same files, the reason they ended up in CVS in the first place, the
public directory files will be qualified by a version number with the
symlinks pointing to the appropriate version.
Naturally this will complicate installation in two respects: access to
the data and setting the symlinks.
Access to the data
------------------
The data will be available both via AFS and the web. Sites for which
AFS access is not acceptable can maintain their own copy by using rsync
or some web incremental access method e.g. wget -N as the first step in
the installation procedure.
Setting the symlinks
--------------------
For each file that is removed we will leave behind a text file with the
same name + a suffix .proxy whose contents is to qualified name of the
data file. Then an installation script, which will also be invoked as
required by msrt, will hunt out all proxy files and generate the
corresponding symlink using the information they contain and some global
environmental variable that points to the public data area.
In order to clear the files from the Repository, it is not only
necessary to remove the current version of the data files; all earlier
versions must also be removed. The first step will be to perform a
sweep through the Repository looking for candidates to be removed.
Mostly these are .root files but any file statistically much larger than
the average source file will be considered. Once these have been
identified all versions will be extracted into the public directory.
At this stage we announce that the directory is available and allow
sites time to set up their own copies.
Then we develop a script that for each target data file, 'cvs removes'
the current version and for each tagged release moves the tag forward to
the new reversion (which removes it from the release) and adds and tags
a proxy that holds the qualified name of the version that the release
did hold and updates/creates a .cvsignore to ignore the symlink. Once
that is accomplished any release can be cvs updated to replace the data
files it contains by their proxies and can then run the script to set up
the symlinks.
In future people will be asked to consider carefully before adding large
data files to the Repository and will instead be encouraged to
contribute them to the public area and commit a .proxy file and and an
entry in .cvsignore.
Cheers,
Nick.
Last Modified: $Date: 2009/01/20 19:46:28 $
Contact:
rhatcher@fnal.gov
Page viewed from
http://www-numi.fnal.gov/offline_software/srt_public_context/WebDocs/release_data.html