Getting Data offsite using SAM

Introduction

A package of web services is available that allows remote users to run SAM commands to query the database and retrieve filenames and pnfs storage locations. You can also define datasets and retrieve the metadata for files. To simplify retrieving files we have provided a python script that takes a query string and will then ftp the files back to the local disk. This is the preferred method of retrieving data offsite. Please do not do repeated 'ls' commands in the ftp door as it is very resource intensive.

Installing the Web Services package

  1. Download the sam_web_services_client package. Take the gzipped tar file. Version 0_9_2 is the current version. This may be updated from time to time. You can ignore the .table file.
  2. Choose an appropriate location to unwind it. Your $INSTALLATION directory is a good choice. Create a directory there to put the product it. Something like sam_web_services_client-v0_9_2 is a good choice but you can call it what you like.
  3. cd into this directory and unwind the tar file. You will see the following directories
    CVS           GNUmakefile    bin  sam_setup.csh  src  ups
    DEPENDENCIES  RELEASE_NOTES  lib  sam_setup.sh   tmp
    
  4. Add the bin directory to your path
  5. You will also need a recent version of python. We are using 2.4 at FNAL.
  6. You will probably need to add some additional modules to python for this to work. The instructions vary depending on your platform.
  7. Copy the minosGetFiles.py script. This script uses the web services to generate a list of files from SAM and then fetches them using the ftp interface to dcache. This requires the password in the script. Edit the script to change the following line to the password for the mindata ftp account.
        passwd = 'xxxxx'
    

Using Web Services

The following web services commands are supported. Note that the different commands need different wsdl files.

  • samTranslateDimensions. This does the same as sam translate constraints. It takes two arguments,
    --dim="query"
    
    and
    --wsdl="http://www-numi.fnal.gov/sam_web_services/wsdl/DimensionsService.wsdl.xml"
    
    which is the name of the web service to contact. The query uses the same syntax as the sam translate constraints command. Here is a list of relevant dimensions.
    samTranslateDimensions
     --dim="run_type physics% and data_tier sntp-near and physical_datastream_name 
    cosmic and start_time < to_date('2005-10-02','yyyy-mm-dd') and 
    end_time > to_date('2005-10-01','yyyy-mm-dd')"
    --wsdl="http://www-numi.fnal.gov/sam_web_services/wsdl/DimensionsService.wsdl.xml"
    
    This returns a list of files. The current format is not terribly friendly as it is just a dump of the python array. This may improve.
  • samLocate. This returns the locations of the file that SAM knows about. This will include the pnfs location but also "pseudo-dcache" locations that are created when a project has been run that used that file. You can ignore the pseudo locations.
    samLocate --file=N00008695_0023.cosmic.sntp.R1_18.0.root 
    --wsdl="http://www-numi.fnal.gov/sam_web_services/wsdl/DataFileService.wsdl.xml"
    
    This returns
    Replica locations for N00008695_0023.cosmic.sntp.R1_18.0.root:
    <SOAPpy.Types.typedArrayType locationList at -1214966644>:
    ["'/pnfs/minos/reco_near/R1_18/sntp_data/2005-10,69@vo8804'"]
    
  • samDefineDataset. This allows you to create a dataset definition and store in in the SAM database. You can look at what datasets other people have created and use an existing definition. There are test datasets created from production. You can find out what dataset definitions already exits by using the web interface. Clicking on the Submit request button without entering any query will return all datasets in the database.
    samDefineDataset --defName=test-dataset-for-web-services-v1
    --defdesc="Near detector test dataset" 
    --group=minos --dim="run_type physics% and data_tier sntp-near and 
    physical_datastream_name cosmic and start_time < to_date('2005-10-02','yyyy-mm-dd') and 
    end_time > to_date('2005-10-01','yyyy-mm-dd')" --user=buckley --desc="Test dataset for
    web services" --wsdl="http://www-numi.fnal.gov/sam_web_services/wsdl/DatasetService.wsdl.xml"
    
    The dataset definition is assigned a unique ID number in the database. You can use an existing dataset definition and add additional constraints to create your own dataset. For example lets say you are interested in Far detector physics raw data for the first half of June 2005. You can use the data set zeval-far-raw-physics and add an additional constraint for the date range.
    samDefineDataset --defName=my-dataset-v1 --desc="My dataset"
    --group=minos 
    --dim="dataset_def_name zeval-far-raw-physics and start_time <= to_date('2005-06-15','yyyy-mm-dd') 
    and end_time >= to_date('2005-06-01','yyyy-mm-dd')" --user=buckley 
    --wsdl="http://www-numi.fnal.gov/sam_web_services/wsdl/DatasetService.wsdl.xml"
    
    This creates a dataset of 332 files. Note the way that the date query is done. This ensures that you pick up all the files that could have started and ended during the time period of interest.
  • samGetMetadata. This allows you to retrieve the complete metadata entry for the file in SAM
    samGetMetadata --file=N00008695_0023.cosmic.sntp.R1_18.0.root 
    --wsdl="http://www-numi.fnal.gov/sam_web_services/wsdl/DataFileService.wsdl.xml"
    
    This produces the following entry:
    Metadata for N00008695_0023.cosmic.sntp.R1_18.0.root:
    ImportedDetectorFile({
                       'fileName' : 'N00008695_0023.cosmic.sntp.R1_18.0.root',
                         'fileId' : 1311069L,
                       'fileType' : 'importedDetector',
                     'fileFormat' : 'root',
                       'fileSize' : SamSize('73.10MB'),
                            'crc' : CRC('2199668325L', 'adler 32 crc type'),
              'fileContentStatus' : 'good',
                     'eventCount' : 49345L,
                       'dataTier' : 'sntp-near',
                     'firstEvent' : 1074345L,
                      'lastEvent' : 1120113L,
                      'startTime' : SamTime('01-Oct-2005 10:44:36 (UTC)','%d-%b-%Y %
    H:%M:%S (UTC)'),
                        'endTime' : SamTime('01-Oct-2005 11:44:12 (UTC)','%d-%b-%Y %
    H:%M:%S (UTC)'),
              'applicationFamily' : ApplicationFamily(appFamily='reco', appName='loo
    n', appVersion='r1.18'),
                          'group' : 'minos',
                        'parents' : NameOrIdList(['N00008695_0023.mdaq.root']),
                     'datastream' : 'cosmic',
              'runDescriptorList' : RunDescriptorList([RunDescriptor(runType='physic
    s;m', runNumber=8695)]),
        })
    
  • There are other commands to start projects and retrieve files but we are not using these.

Retrieving Files

You can use the following method to retrieve files:

  • Using the minosGetData.py script. This script has one argument
    --dim = "query"
    
    The query uses the same syntax as the samTranslateDimensions command.
    minosGetFiles.py --dim="run_type physics% and data_tier sntp-near and 
    physical_datastream_name cosmic and start_time < to_date('2005-10-02','yyyy-mm-dd') and 
    end_time > to_date('2005-10-01','yyyy-mm-dd')"
    
    You can also put in the name of an existing dataset as the query. The script does a samTranslateDimensions to get the file list. For each file it does samLocate to get the pnfs path and then contacts the ftp dcache door and retrieves the file. Run it from the directory that you want to put the files in. Enhancements will be considered. Contact buckley@fnal.gov
  • You can use the commands to write your own script if you so desire. Given that the web services are all coded in python it is simplest to use python to accomplish this.

Fermi National Accelerator Laboratory Magnet Logo