Reference: Data Access: dCache

Last modified: Fri Dec 7 15:47:57 GMT 2007
Nick West
Return to home page

Contents

Introduction

For an introduction to GRID data access please see Tutorial: Accessing Storage Elements.

dCache is one of the standard data access protocols on the GRID. The files is identified by a URL of the form:-

     dcap://?host:?port/?path/?file
e.g. dcap://dcache-head.gridpp.rl.ac.uk:22125/pnfs/gridpp.rl.ac.uk/tape/minos/nwest/test/LVJ_F00034638_0000.mdaq.root
A dCache server can just be a pool of disks but often attached tape storage. In the later case files migrate to tape transparently; they appear to be on disk to commands that list them but are retrieved automatically when access to their contents is required.

The native protocol is dcap but it can be accessed using several protocols:-

Local Access

In this section we shall use an example from the RAL Tier 1 SE:-
/pnfs/gridpp.rl.ac.uk/tape/minos/nwest/test/LVJ_F00034638_0000.mdaq.root
and consider access from within the RAL firewall i.e. RAL Tier 1 and Tier 2

There are the following ways to access a data file locally using dcap.

  1. Using posix I/O API
    e.g.
    /pnfs/gridpp.rl.ac.uk/tape/minos/nwest/test/LVJ_F00034638_0000.mdaq.root
    
    As of September 2006 with current ROOT (5.13/03) with
      setenv LD_PRELOAD /opt/d-cache/dcap/lib/libpdcap.so
    
    It does work from the RAL Tier 1 UI.

    The job aborts after starting on RAL Tier 1 WNs.

    The file appears as if on an NFS mounted disk so this method cannot be used from RAL Tier 2.

  2. Using dcap library via TDCacheFile
    Derek Ross explains: port 22125 on dcache-head.gridpp.rl.ac.uk should be accessible to anything that doesn't pass through the RAL external firewall , so it should be accessible from the Tier 2.
    e.g.
    dcap://dcache-head.gridpp.rl.ac.uk:22125/pnfs/gridpp.rl.ac.uk/tape/minos/nwest/test/LVJ_F00034638_0000.mdaq.root
    
    ROOT version 5 has a TDCacheFile which is a plug-in for the standard TFile so should work with ROOT. Chris Brew suspects that TDcacheFile is quicker than LD_PRELOAD because it calls dcap directly rather than intercept standard POSIX calls. The TDCacheFile need not be called explicitly, instead the generic TFile::Open can be called passing it the file name as a supported URL:-
        dcache:/pnfs/<path>/<file>.root 
    or  dcap://<nodename.org>/<path>/<file>.root
    
    and have it automatically load the required plug-in. For example:-
      TFile* f = TFile::Open("dcache:/pnfs/gridpp.rl.ac.uk/data/minos/nwest/test/LVJ_F00034638_0000.mdaq.root")
    

Remote Access

Steve Traylen explains: dcap access will work from the Tier2 -> Tier1 it is unlikely to work from any other sites. gsi(dcap) is a non-passive(active) protocol so it would require the clients to accept incoming traffic from RAL which is unlikely to ever be acceptable to sites.

From outside the firewall only GSIFTP is available. I haven't tried using that yet.

Improving Performance

If using the posix interface e.g. the cp and mv commands, setting the environment variable
  DCACHE_USE_UNSAFE=true
dramatically improves writing speed but is only recommended when copying files and and not for unreproducible sources (e.g. direct output of application). See "Q: Is it possible to tune library parameters of the preload library?" of Questions and Answers about dCache, and void dc_unsafeWrite(int dest) of C - API to the dCache Access Protocol (dcap) It has no effect if using dccp.

Site Specific

In this section we look at specifics of the MINOS dCache sites. See also Sites

The RAL Tier 1 DCache

On the RAL Tier 1 DCache server there two distinct data "pools" a disk pool and a tape pool. Files in tape pool are attached to the ATLAS Datastore they are flushed to tape on least recently accessed basis. Each file written individually but various mechanisms inside dCache allow for waiting until a threshold has been reached before starting to store the files Currently virtual tape size 48GB. For individual data files, 1GB is a reasonable size, but really anything over a few hundred MBs is okay.

Service endpoints:-

       https://dcache.gridpp.rl.ac.uk:8443/srm/managerv1.wsdl?/pnfs/gridpp.rl.ac.uk/data/<vo>
  https://dcache-tape.gridpp.rl.ac.uk:8443/srm/managerv1.wsdl?/pnfs/gridpp.rl.ac.uk/tape/<vo> 

Paths:

      /pnfs/gridpp.rl.ac.uk/data/<vo>/   (disk pool owned by a single experiment)
      /pnfs/gridpp.rl.ac.uk/tape/<vo>/   (disk pool shared by all experiments) 

For MINOS:-

      /pnfs/gridpp.rl.ac.uk/tape/minos   10TB  multiple servers (but shared by all experiments) 
                                         Files are staged to /exportstage1/minos-dcache1 on csfnfs58         
      /pnfs/gridpp.rl.ac.uk/data/minos   .1TB  one server

The SURL (Storage URL) of a file can be derrived from the path by prepending the service endpoint machine and port i.e.
     srm://dcache.gridpp.rl.ac.uk:8443
and  srm://dcache-tape.gridpp.rl.ac.uk:8443

so for example

        /pnfs/gridpp.rl.ac.uk/tape/minos/nwest/test/LVJ_F00034638_0000.mdaq.root
becomes srm://dcache-tape.gridpp.rl.ac.uk:8443/pnfs/gridpp.rl.ac.uk/tape/minos/nwest/test/LVJ_F00034638_0000.mdaq.root
This is useful with the srm-get-metadata command to check the state of a file, and in particular, for dcache-tape, whether it is on disk:-
  srm-get-metadata srm://dcache-tape.gridpp.rl.ac.uk:8443/pnfs/gridpp.rl.ac.uk/tape/minos/nwest/test/LVJ_F00034638_0000.mdaq.root

returns:-

   ...
  isPinned :false
  isPermanent :true
  isCached :false

There doesn't appear to be anyway to force a file out of the cache.

The RAL Tier 2 DCache

I have not tried this yet but gsidcap should work locally i.e. Tier 2, and from RAL Tier 1.

External Links


Return to home page