Tutorial: Accessing Storage Elements

Last modified: Wed Nov 12 15:08:56 GMT 2008
Nick West
Return to home page

Contents

Introduction

In the tutorial GRID Overview we introduced 3 key activities: Authorisation, Job Submission and Data retrieval and storage. In this tutorial we will concentrate on Data retrieval and storage

Data files are held in SEs (Storage Elements) and made available through the a variety of protocols ranging from the local technology-specific all the way up to the remote technology-neutral. An overview of the various levels is given in the next section Storage Element Protocols This immediately gives rise to the question, which should MINOS use? We attempt to address this is the following section The MINOS UK Data Model This page ends with some examples.

Storage Element Protocols

Storage elements are where we actually store our data. You can see what SEs are available to MINOS by typing:-
  lcg-infosites --vo minos.vo.gridpp.ac.uk se
All SEs support GSIFTP (Grid Security Infrastructure File Transfer Protocol) and this is the way files are copied between SEs. It is also one way to use any SE i.e. first use GSIFTP to first make a local copy and then use that copy. However, it's clearly not very satisfactory to duplicate data like that so most SEs offer other secure protocols that allow data to be directly read from or written to an SE by a local CE.

The table below gives examples of some of the protocols that can be used to access data, roughly ordered from highest to lowest, so each row provides a service based on layers (if any) below:-
LevelDescriptionService/Protocol
Logical As Grid but treating distributed SEs as a single logical file space. LFC/LCG (LHC File Catalog and Computing Grid)
Reliable As Grid but with error recovery. FTS (File Transfer Service)
RFT (Reliable File Transfer)
Grid Grid-wide using a technology neutral API.
Several people have warned against using srm - it's too low level and complex.
GridFTP
GFAL (Grid File Access Library)
srm utilities
Local Local access using a technology specific API. rfio (Remote File IO) to CASTOR
dcap (dCache API) to dCache

File Ownership

When a job, or a user in an interactive session, writes a file to an SE, who owns it?

The answer depends on how the file was written. If written locally via a native protocol, for example dcap for dCache and rfio for Castor then the file is owned by the account user. For example this file was written locally using dccp logged in as nwest:-

  -rw-r-----    1 nwest    minos      403597 Nov 25 09:36 LVJ_F00034638_0000.mdaq.root
This second version was again written locally using dccp but via the GRID, so now the username is a Pool Account
  -rw-r-----    1 minos003 minos      403597 Nov 25 09:37 LVJ_F00034638_0000.mdaq.root
Finally, an example running interactively but using lcg-cr
  -rw-r--r--    1 minos001 minos      403597 Nov 25 09:43 LVJ_F00034638_0000.mdaq.root
As Jens Jenssen explained:-

  1. The first minos001 you see is the username just like normal Unix names, all Minos users will be mapped to this name. It looks like a pool account but it doesn't walk or quack like a pool account, so it isn't.

  2. The second one is the group. There is only one of these for each VO unless you have requested special magic like a production subgroup or something. Since you're a relatively small VO there should be no need for such complications.

  3. The name of the top level directory under which all our files get written is decided by the SE admins. For both dCache and Castor, for use, the top level is minos:-
      /pnfs/gridpp.rl.ac.uk/data/minos
      /castor/ads.rl.ac.uk/prod/grid/hep/disk0tape1/minos
    

The result would have been the same if the file had been written by a job submitted to the GRID.

All files written to our SEs via GRID middleware are owned by minos001. This is NOT a pool account, we all share all files. Be very careful using GRID middleware to overwrite or delete files.
 

The MINOS UK Data Model

Introduction

In this section we set out our strategy for data access within the UK, including the possibility of running GRID jobs at Tier 2's away from RAL. We start by formally identifying a 3 tier system for data storage.
  1. Tier 0 FNAL

    The Enstore with dCache interface is our primary data store. We use SAM to catalogue our data but also make use of a flat ASCII file dump of the all MINOS files that is generated at FNAL nightly.
  2. Tier 1 RAL

    The primary storage technology now is CASTOR although there is some dCache which is due to be phased out by the end of 2008.

    Both these technologies provides a unix-style directory structure interface to the database holding the catalogue.

    As both of these offer an SRM interface is is possible to use a LCG File Catalog to combine all their data into a single catalogue. Again it provides a unix-style directory structure that is available to any interface or worker node machine running GRID middleware.

  3. Tier 2 Local site

    The storage technology at a local Tier 2 varies from site to site. In principle it could dCache or CASTOR, but Data Pool Manager is a popular, more lightweight alternative.
In order to support distributed computing we require:-
  1. Read access to T0
  2. Read/write access to T1
  3. Read/write access to local SE.

Frontend choice

If MINOS UK computing was wholly within the LCG computing sphere then the LFC catalogue would be the logical choice. However with Tier 0 being outside means that the LFC cannot handle handle our primary data source so

We choose not to use the LFC (LCG File Catalog)

Instead we will continue to use Data Cache Manager which presents a uniform interface to all of our SEs in which all file requests pass through a 4 stage process:-

  1. Resolution - Here a SAM query is resolved into a list of file names.
  2. Location - The SE catalogues are searched to find out the best one to get them from.
  3. Cache check - The local NFS disk catalogue is checked to see if the file is already on local disk
  4. Access - This gives file access to the client. This can mean copying the file from an SE to disk and then passing back the name of the local copy or, if acceptable, could mean passing back an access URL to allow the client to read directly from the SE.
The model has another advantage over the hierarchical LFC catalogue model: raw FNAL dCache files are organised by run date; you cannot find a specific run until you know when it was taken. However, raw data files do have unique names so can easily be looked up in the flat ASCII catalogues DCM uses.

For GRID computing, where individual worker nodes have no permanent local storage, the model is

  1. Run DCM to get local copies of the required files from SEs
  2. Run job and produce local output
  3. Run DCM to put the output to SEs.

Backend choice

DCM is designed to be extensible, and configurable to support different backends depending on the site.
  1. Tier 0 FNAL Read Access

    Use wget to get a local copy.
  2. Tier 1 RAL Read/Write Access

    Locally use native protocol (rfio for CASTOR, dccp for dCache). For remote access use lcg_utils
  3. Tier 2 Local site Read/Write Access

    Use native protocol.

Catalogue generation and distribution

DCM requires a flat ASCII catalogue for each SE it supports. This requires that fresh copies have to be generated daily and distributed.
  1. Tier 0 FNAL

    As explained in Introduction FNAL regularly create an ASCII file and DCM can convert into a catalogue.
  2. Tier 1 RAL

    A cron job runs nightly on RAL T1 and scans the local SEs there and produces catalogues that it then publishes on the web.
  3. Tier 2 Local site

    For each T2 there needs to be a cron job running nightly to scan the local SE and also get a fresh copy of the Tier 0 catalogue. The local copy of the Tier 1 catalogues are updated, on the fly, by DCM whenever they are more than 8 hours old using wget
This approach gives DCM another advantage over LFC. It is a "a fact of life" that the LFC will get out of sync with the SEs it represents and the standard approach is to treat the LFC as the authority. However each day the DCM catalogues are re-synced with the SEs.

However, a major disadvantage relative to the LFC is latency, the catalogues DCM uses may be up to 32 hours out of date, so there is a proposal to allow the client to specify a file in a DCM supported form (the DCM URL) but, instead of it consulting the local catalogues, have it consult the SE directly. It will mean that the user will have to specify the SE directory in which the file is placed.

Examples

dCache

In our case, the SE at RAL is a dCache server which allows us to read and write directly either by preloading the dcap library and using the standard (posix) I/O or by using ROOT's TDCacheFile, for example:-
  TFile* f = TFile::Open("dcache:/pnfs/gridpp.rl.ac.uk/data/minos/nwest/test/LVJ_F00034638_0000.mdaq.root")

It's possible to directly list files, for example:-

  ls -l /pnfs/gridpp.rl.ac.uk/tape/minos/nwest/test 
Please do not use wildcard ls commands to list large numbers of files. Although all such commands only access the Postgres database used by the dCache they do place a load on it.

Strictly speaking, before any type of posix access, including ls, you need to:-

  setenv LD_PRELOAD libpdcap.so 
To ensure that the dcap deals with dcache files, but on the UI (User Interface) at RAL it appears to work without it.

It's also possible to copy files between disk and dCache by directly using the dcap utility dccp For example:-

  dccp dcm_t1_LVJ_F00034638_0000.mdaq.root /pnfs/gridpp.rl.ac.uk/tape/minos/minosmc/test/
In this do not have LD_PRELOAD set, as it will break dccp. For more details see Reference dCache

CASTOR

Introduction

CASTOR provides a UNIX like directory hierarchy of file names. For MINOS at RAL there a single directory tree:-
  /castor/ads.rl.ac.uk/prod/minos/tape
The data won't stay permanently on disk but instead will migrate to tape. The file names remain visible to listing commands and any attempt to access a file that has migrated will stage it back onto disk.

Local Access

Locally at RAL you can use the rf* and ns* command sets to perform directory and file operations. These commands don't use GRID certificates, it's all down to UNIX permissions. You can use the Stager and Tape Commands to check on the status of files, the stager and tapes.

Setting the environment

Before using CASTOR set the following:-
  export STAGE_SVCCLASS=minosTape
  export STAGE_HOST=genstager.ads.rl.ac.uk
  export RFIO_USE_CASTOR_V2=YES
or the equivalent setenv commands.

Also make sure that:-

 which rfdir
gives:-
  /usr/bin/rfdir 
and not
  /opt/lcg/bin/rfdir 
which is a version that works with DPM but not CASTOR. If you have the wrong version add /usr/bin to your path or explicitly use the path when using any rf* command.

Listing and Creating Directories

Try:-
  nsls -l /castor/ads.rl.ac.uk/prod
which should list the experiments with production allocations.

The equivalent rf* command is rfdir

 rfdir /castor/ads.rl.ac.uk/prod
You will see minos listed and looking in that directory:-
 rfdir /castor/ads.rl.ac.uk/prod/minos
you should see 2 entries:-
 drwxrwxr-x   0 minos001 minos                       0 Jul 07 16:22 tape
 drwxrwx--x   0 minosmc  minos                       0 Jul 14 09:01 test
anything written below the 'tape' directory may eventually go to tape, but anything written to 'test' will not. So 'test' is a good place to try things out without filling up tape but please don't leave much data here just because it never migrates off disk. Also BE WARNED, if the disk becomes full the data may get deleted and as it is not written to tape WILL BE LOST!

If rfdir gives

 No such file or directory
then you are using the wrong version of the rf* commands, see Setting the environment.

Now create a directory:-

 rfmkdir /castor/ads.rl.ac.uk/prod/minos/test/nwest
I've used my username, but you should use yours!

Copying, Accessing and Deleting Files

Now copy in a data file:-
 rfcp  $DCM_DATA/F00035853_0022.mdaq.root  /castor/ads.rl.ac.uk/prod/minos/test/nwest
You should see something like this
  11506790 bytes in 3 seconds through local (in) and eth0 (out) (3745 KB/sec)
To copy files out, simply reverse the source and the destination. However, there is no need to copy out ROOT files if running on the RAL UI (or the RAL csf farm), just use the syntax
  rfio://host/?path=FILEPATH
where
  host       CASTOR stager host = genstager.ads.rl.ac.uk
  FILEPATH   Path to file         e.g. /castor/ads.rl.ac.uk/ ... /F00035853_0022.mdaq.root
e.g.
 loon -bq reco_all.C "rfio://genstager.ads.rl.ac.uk/...
  ...?path=/castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0022.mdaq.root"
To remove a file you no longer need:-
 rfrm  /castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0022.mdaq.root
To remove a directory and recursively all the files it contains use the -r option (just like rm) but do it with extreme care (just like rm)!

As you might suspect, there is also rfchmod (and equivalent nschmod) to change file permissions.

Checking on File Migration and Stager Status

nsls has a -T (list tape residence) to list files that have migrated to tape, but if you have just created the directory then:-
  nsls -lT /castor/ads.rl.ac.uk/prod/minos/test
will be empty. You can also use the Stager and Tape Commands to check on file migration, stager and tape status
  stager_qry -M /castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0022.mdaq.root
which should show it as STAGED i.e. on disk.

To see how much of our tape allocation we have used, type

  vmgrlisttape -P minos
which should show something like:-
CS1003   CS1003 STK_RAL1 500GC    aul minos            453.38GB 20080805 RDONLY
CS1017   CS1017 STK_RAL1 500GC    al  minos            334.27GB 20080819 RDONLY
CS1018   CS1018 STK_RAL1 500GC    al  minos            348.43GB 20080820 RDONLY
...
CS3254   CS3254 STK_RAL1 500GC    aul minos            450.16GB 20080822
CS3257   CS3257 STK_RAL1 500GC    aul minos            384.69GB 20080925
CS3260   CS3260 STK_RAL1 500GC    aul minos                  0B 20080818 FULL
At the time of writing we had 33 0.5TB tapes i.e. about 15TB capacity.

To see what the stager disk space status for MINOS you have to know that our Service Class is minosTape and then use the command

stager_qry -s -d minosTape
which produces output of the form:-
POOL minosTape        CAPACITY 8.18T      FREE   7.69T(93%)  RESERVED       0( 0%)
  DiskServer gdss336.gridpp.rl.ac.uk DISKSERVER_PRODUCTION   CAPACITY 8.18T      FREE   7.69T(93%)  RESERVED       0( 0%)
     FileSystems                       STATUS                  CAPACITY   FREE          RESERVED       GCBOUNDS
     /exportstage/castor1/             FILESYSTEM_PRODUCTION   2.73T        2.56T(93%)        0( 0%)   0.20, 0.30
     /exportstage/castor2/             FILESYSTEM_PRODUCTION   2.73T        2.56T(93%)        0( 0%)   0.20, 0.30
     /exportstage/castor3/             FILESYSTEM_PRODUCTION   2.72T        2.56T(93%)        0( 0%)   0.20, 0.30
The first thing to notice is that we appear to have 3 file systems, each with 2.7TB capacity. That's not quite as good as it sounds, although there is a total of 3*2.7TB, there is only one server; it is divided into 3 RAIDed partitions as this allows CASTOR to distribute load efficiently. The GCBOUNDS is the (badly named) Garbage Collection policy: when free space drops below 20%, files are moved to tape until space increases to 30%.

Remote Access

In order to access our CASTOR service from a remote UI you must have prepared a short term proxy. Then you can use lcg_utils to list directories and copy files in and out of CASTOR.

The CASTOR srm end point is

  srm://srm-minos.gridpp.rl.ac.uk:8443
You need to add this prefix this on to any CASTOR file reference e.g.:-
  srm://srm-minos.gridpp.rl.ac.uk:8443/castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0022.mdaq.root
To list a directory or file use the lcg-ls command:-
lcg-ls -l srm://srm-minos.gridpp.rl.ac.uk:8443/castor/ads.rl.ac.uk/prod/minos
To copy a file into or out of CASTOR use the lcg-cp command.

To refer to a disk file you have to use the notation:-

file:<absolute or relative directory><file-name>
For example, to copy a file out:-
  lcg-cp -v \
    srm://srm-minos.gridpp.rl.ac.uk:8443/castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0022.mdaq.root\
    file:./F00035853_0022.mdaq.root
should produce something like:-
  Source SE type: SRMv2
  Source SRM Request Token: 213622
  Source URL: srm://srm-minos.gridpp.rl.ac.uk:8443/castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0022.mdaq.root
  File size: 37583068
  Source URL for copy: gsiftp://gdss336.gridpp.rl.ac.uk:2811//castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0022.mdaq.root
  Destination URL: file:/data/minos/software/mc_production/daikon_scripts/./F00035853_0022.mdaq.root
  # streams: 1
  # set timeout to  0 (seconds)
       30408704 bytes    781.46 KB/sec avg   7678.59 KB/sec inst
  Transfer took 39330 ms
Without the -v option, there should be no terminal output if everything is O.K. To copy back in, it is just the inverse:-
  lcg-cp -v \
    file:./F00035853_0023.mdaq.root \
    srm://srm-minos.gridpp.rl.ac.uk:8443/castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0023.mdaq.root
but of course you need write access to the destination directory. As this is the GRID access is determined by the Pool Accounts.

For more details see Reference CASTOR

LFC/LCG_UTILS

Caution: We do not use the LFC, see The MINOS UK Data Model: Frontend choice so you should only read the remainder of this section if you want some background on how the LHC experiments use a global catalogue to interface to their data. For remote access to CASTOR, we just use a small subset of lcg-utils - see Remote Access

The LCG File Catalog (LFC) represent files in a single unix-like directory structure which is independent of the physical locations of the files it represents. Indeed a file may actually be in multiple SEs (multiple replicas) but is still represented by a single entry in the LFC.

A file is considered to be a Grid file if it is both physically present in a SE and registered in the catalogue. Associated with the LFC are a set of utilities called lcg-utils that provide a set of methods to interact (read/write,list,directry creation/destruction etc) with an SE and keep the LFC in sync.

Currently we don't use the LFC as much of our data (at FNAL and on NFS disk) is invisible to it. This used to mean that lcg-utils, which was dependent on it, was not much use to us, but they are being decoupled and then we may use lcg-utils to access SEs without an LFC.

We do have a few entries so that you can get a feel for the system.

  1. To start you need a grid proxy
      voms-proxy-init  -voms minos.vo.gridpp.ac.uk
    

  2. Next you need the environmental variable LFC_HOST set to our LFC server. This is normally done but asking the IS
      setenv LFC_HOST `lcg-infosites --vo minos.vo.gridpp.ac.uk lfc`
    

  3. The catalogue is rooted in /grid under which the directory minos.vo.gridpp.ac.uk/ can be found. Directories are listed using the lfc-ls command e.g.:-
      lfc-ls -l /grid/minos.vo.gridpp.ac.uk/nwest/test/LVJ_F00034638_0000.mdaq.root
    
    which should show:-
      -rw-rw-r--   1 104      103                  403597 May 31  2006 /grid/minos.vo.gridpp.ac.uk/nwest/test/LVJ_F00034638_0000.mdaq.root
    

  4. To see what physica files that corresponds to you use the lfc-lr (list replicas) command e.g.:-
      lcg-lr --vo minos.vo.gridpp.ac.uk lfn:/grid/minos.vo.gridpp.ac.uk/nwest/test/LVJ_F00034638_0000.mdaq.root
    
    which gives
      srm://dcache.gridpp.rl.ac.uk/pnfs/gridpp.rl.ac.uk/data/minos/nwest/test/LVJ_F00034638_0000.mdaq.root
    
    which tells you that its actually a dCache file.

  5. You can copy the file out on an SE to local disk (file:) using the lfc-cp command e.g.:-
      lcg-cp -v --vo minos.vo.gridpp.ac.uk                                      \
        lfn:/grid/minos.vo.gridpp.ac.uk/nwest/test/LVJ_F00034638_0000.mdaq.root \
        file:/home/tier1/nwest/LVJ_F00034638_0000.mdaq.root
    
    Do not use relative directories (e.g. file:../) for the local file.

DCM - Data Cache Manager

DCM is a purpose written tool that provides a uniform interface to all of our storage areas.

Start by reading the DCM introductory sections:-

To run DCM you need to setup GridTools as described in Using MINOS Software at Oxford and RAL: Preparation. Then as a demonstration of how DCM can locate simple file names try:-
  dcm get --test N00006956_0000.spill.snts.R1_18.0.root
It should locate it at FNAL and report:-
    Locating N00006956_0000.spill.snts.R1_18.0.root ...
    ... dcm://fnal-dcache-enstore/reco_near/R1_18/snts_data/2005-03/N00006956_0000.spill.snts.R1_18.0.root#74104
unless, of course someone has made a replica after this document was written. It won't get copied because of the --test option.

Next try a SAM query:-

  dcm get --test [ "run_type physics% and data_tier sntp-near           \
                   and physical_datastream_name spill%                  \
                   and start_time < to_date('2006-02-18','yyyy-mm-dd')  \
                   and end_time   > to_date('2006-02-17','yyyy-mm-dd')  \
                   and version cedar" ]
This time you should see it resolve the query to 22 files:-
    N00009816_0014.spill.sntp.cedar.0.root
    N00009816_0003.spill.sntp.cedar.0.root
    ...
    N00009825_0000.spill.sntp.cedar.0.root
    N00009813_0023.spill.sntp.cedar.0.root
which is subsequently locates:-
   Locating N00009816_0014.spill.sntp.cedar.0.root ...
    ... dcm://fnal-dcache-enstore/reco_near/cedar/sntp_data/2006-02/N00009816_0014.spill.sntp.cedar.0.root#301736
    Locating N00009816_0003.spill.sntp.cedar.0.root ...
    ... dcm://fnal-dcache-enstore/reco_near/cedar/sntp_data/2006-02/N00009816_0003.spill.sntp.cedar.0.root#72427638
...
    Locating N00009825_0000.spill.sntp.cedar.0.root ...
    ... dcm://fnal-dcache-enstore/reco_near/cedar/sntp_data/2006-02/N00009825_0000.spill.sntp.cedar.0.root#22680840
    Locating N00009813_0023.spill.sntp.cedar.0.root ...
    ... dcm://fnal-dcache-enstore/reco_near/cedar/sntp_data/2006-02/N00009813_0023.spill.sntp.cedar.0.root#22463430
In this case the SAM query was used to get a list of file names and then the catalogues were used to locate the files. This is particularly handy with datasets; the SAM query defines them but the actual files could be local.

The next example will bypass the catalogues by using a DCM URL:-

  dcm get --test "dcm://ral_t1-castor-prod_d0t1/user/nwest/grid_tests/.*\.root"
because it is a wildcard (no trailing #<byte-size>) it has to run a query on the SE and for this you need a GRID proxy. If not DCM will report
Cannot perform action ...; no LCG GRID proxy
Note that you need to enclose the URL in double quotes because of the wildcard
.*\.root"
which means any number of characters ending with '.' (escaped with the \) and then 'root'.

At least for now, you can only use wildcard DCM URLs on the LCG SEs, you cannot use it on the FNAL enstore.

So far you haven't copied any files, you can do that by omitting --test, but then it's a good idea to add the following:-

  --ignore_cache                    Prevents it looking on local disk or updating the local catalogue
  --local_dir </home/my/test/dir>   Places files on /home/my/test/dir
                                      - by default it places on the disk with most space that is known to DCM.
For example
   dcm get --ignore_cache --local_dir /data/minos/west/temp \
    "dcm://ral_t1-castor-prod_d0t1/user/nwest/grid_tests/.*\.root"
You can also experiment with
   --num_get_jobs n
To have DCM launch up to n copy jobs at once.

For more detail see Data Cache Manager


Return to home page