Grid: Glossary
Last modified: Thu Jan 29 06:36:36 GMT 2009
Nick West
Return to home page
Under construction!
There are any number of glossaries about, for example:-
so here I will only list ones with direct relevance to us.
A centrally trusted authority that issues Digital Certificates.
For MINOS our CA in the UK is
@
Rutherford Appleton Laboratory
The is an interesting circular argument of trust here. How do we know
that sites claiming to offer these Digital Certificates are themselves
trustworthy? For LCG/EGEE there is a list maintained by IGTF
(International Grid Trust Federation) that can be checked at
@
http://www.gridpma.org/
For example
- Click on European Grid PMA
- Click on Getting your own certificate: find your national or regional authority
- Click on UK
and you should end up at Rutherford Appleton Laboratory
Castor (CERN Advanced STORage manager) is a Mass Storage System
Storage Element
consisting of a set front-end disks and a back-end tape storage sytem.
It can be accessed via RFIO (Remote File I/O)
It provides an
SRM (Storage Resource Manager)
interface.
See Reference: CASTOR
A batch queue to a centrally managed farm of computers (WN - Worker
Nodes)than can run GRID jobs. VO
Software Managers can install code centrally that is then available to
all the nodes so that grid jobs that can be satisfied by the CE will
run in a suitable environment.
The queue is identified by a string like
?hostname:?port/?batch_queue_name e.g. adc0015.cern.ch:2119/jobmanager-lcgpbs-long
Besides the Worker Nodes a CE also requires:-
- A Gate Keeper - acts as the front-end two way connection to the
Grid. WNs can access outside the farm but cannout be accessed from
outside.
- A Local Resource Management System (LRMS) which for us are:-
- Portable batch System (PBS)
- The Load Sharing Facility (LSF)
- Torque
- Condor
The Core Infrastructure Centres (CIC) have two fundamental roles:-
- to operate essential grid infrastructure services that are not
required at each Resource Centre
- to act as a Grid Operations Centre providing monitoring and
operational troubleshooting services.
In addition they play a crucial role as second-level support to the
ROCs
e.g.ROC-UKI
for operational problems.
See
@
What is CIC?
dCache is a Mass Storage System
Storage Element (SE)
consisting of a pool of disks and a disk pool manager.
The server represents the single point of access to the SE
and presents files in the pool disks under a single virtual filesystem tree.
Nodes can be dynamically added to the pool. File transfer is managed through
GSIFTP
while the native
GSIDCAP
protocol allows POSIX-like data access.
It provides an
SRM (Storage Resource Manager)
interface.
See
A way to authenticate that the public key of a Public/Private key pair belongs to an
identified individual. Both the authenticator and that individual
have to trust a third party Certification Authority. The
individual establishes his identity with the Certification Authority
who then supply him with a Digital Certificate that contains both his
public key and information about him. The certificate has the
Certification Authority's Digital
Signature. If subsequently the individual presents this
certificate its authenticity can be checked using the Digital
Signature and hence his identity and public key can be established.
See Install and Check Your Grid User Certificate on a UI
for information about examining the contents of a certificate.
Also see slides 43 to 46 of
@
http://egee-docs.web.cern.ch/egee-docs/support/documentation/pdf/9300_GGUS_Presentation.pdf
and
@
Basic Certificate and Proxy Concepts
A method of authenticating that a message comes unmodified from a
trusted source using Public/Private
keys without the need send a challenge to that source. At the
trusted source a checksum of the message is derived using a one-way
hash and then encrypted using the private key. The result is called a
digital signature and is appended to the message. On receipt the
end user again computes the message checksum and compares that to the
digital signature decrypted using the trusted source's public key. If
they match then the message is genuine.
See slides 41 and 42 of
@
http://egee-docs.web.cern.ch/egee-docs/support/documentation/pdf/9300_GGUS_Presentation.pdf
and
@
Basic Certificate and Proxy Concepts
The GRID operates in a global context so names of, for example
individuals and organisations, have to be globally unique. The names,
called Distinguished Names, consist of a hierarchical set of
components, that start globally and become progressively more local.
For example, my DN is:-
/C=UK/O=eScience/OU=Oxford/L=OeSC/CN=nick west
and is composed of:-
/C=UK Country
/O=eScience Organisation
/OU=Oxford Organisational Unit
/L=OeSC Locality
/CN=nick west Common Name
A
SE (Storage Element)
similar to
dCache
but easier to install and maintain.
Grid storage interactions today require using several existing
software components:
- The replica catalog services to locate valid replicas of files.
- The SRM software to ensure:
- files exist on disk (they are recalled from mass storage if necessary) or
- space is allocated on disk for new files (they are possibly
migrated to mass storage later)
- A file access mechanism to access files from the storage system
on the worker node.
The GFAL library hides these interactions and presents a Posix
interface for the I/O operations. The currently supported protocols
are: file for local access, dcap (dCache access protocol) and rfio
(CASTOR access protocol). It can be very handy when an application
requires access to some part of a big Grid file but does not want to
copy the whole file locally.
When talking to an SE advertising
RFIO GFAL needs to know which
type (secure or insecure) and determines this by examining the environmental
variable is called LCG_RFIO_TYPE. If its value is
dpm,
the secure version of RFIO will be used, if its value is
castor
or the variable it is undefined, then insecure RFIO will be the one
chosen.
This summary has been taken from
@
GFAL man pages
and
@
LCG-2-UserGuide Appendix F
GGUS is a central service for GRID training, documentation,
operational status and problem tracking. We use is for
Reporting Problems
The data published in the
Information Service
conforms to the GLUE (Grid Laboratory for a Uniform Environment)
Schema, which defines a common data model to be used for resources monitoring and
discovery. The 3 main components of the GLUE Schema are:-
- Attributes and values of Computing Elements
- Attributes and values of Storage Elements
- Binding information for Computing and Storage Elements
We have a
list
of the most frequently used GLUE attributes.
The standard protocol for GRID based file transfers.
For historical reasons the protocol specifier is
gsiftp://
See Reference: GridFTP
The gsidcap protocol is the GSI secure version of the dCache access
protocol, dcap. Being GSI-secure, gsidcap can be used for inter-site
remote file access. However dcap is a non-passive(active) protocol so
it would require the clients to accept incoming traffic from a dcap
server which is unlikely to ever be acceptable to remote sites on a
WAN but would be O.K. within a firewall, for example between RAL Tier
1 and Tier 2.
A predecessor to GridFTP
which retains the protocol specifier
gsiftp://
which identifies a file uniquely, is of the form:
guid: 40_bytes_unique_string
e.g. guid:38ed3f60-c402-11d7-a6b0-f53ee5a37e1d
The Information Service provides information about the LCG-2 Grid
resources and their status. It is used to locate both
Computing Elements
on which to run jobs and
Storage Elements
holding replicas of date files and the catalogs.
It is used to monitor the performance of the grid and to
provide accounting information about resources consumed by the VOs.
IS data conforms to
GLUE
Schema.
See GRID Status
For more details see
@
LCG-2 User Guide: Information Service
JDL is the language used to specfiy the resources that a job requires.
To submit a job to the GRID a JDL file is created and passed to a
RB (Resource Broker)
or a
WMS (Workload Management System)
that examines the JDL and deterimines, with the help of
the
IS (Information Service)
the best
CE (Computing Element)
to run the job.
See
The File Catalog is a service which provides mappings between
Logical File Names
Grid Unique IDentifiers
and
Storage URLs
In LCG-2, two types of file catalogs are currently deployed: the old
Replica Location Server (RLS) and the new LCG File Catalog (LFC).
MINOS use LFC.
Our LFC server is:-
lfc.gridpp.rl.ac.uk
Also called a User Alias, which can be used to refer to a file in
the place of the
GUID
(and which should be the normal way for a user
to refer to a file), has this format:
lfn: anything_you_want
e.g. lfn:importantResults/Test1240.dat
In case the LCG File Catalog is used the LFNs are
organized in a hierarchical directory-like structure, and they will
have the following format:
lfn:/grid/<MyVO>/<MyDirs>/<MyFile>
Eventually, through the wonders of GRID job submission, a user's job
ends up running somewhere on a machine with a conventional operating
system, which for MINOS is always Scientific Linux. Then the question
is: what account name does the job run under? This involves mapping
of a user's
DN (Distinguished Name)
onto a local account name. In the past this mapping was placed in a
a gridmapfile but for
VOs
supported by
VOMS
the VOMS provides mapping services.
As with interactive access, not all accounts require the same access
privileges, and the VOMS recognise this through the concept of
roles. If a user creates a
Proxy
requesting a special role, say one as an administrator, then the
mapping is to an account with appropriate privileges. However, for
normal job submission, no special role is required and then the VOMS
uses pool accounts. There are a series of numbered accounts, for
MINOS they are minos001, minos002, minos003 ... The first user to run
a job with a newly created VO will be assigned minos001, the second
minos002 and so on. However, the mapping isn't permanent, should
there be more users than there are pool accounts the one that has been
inactive the longest will be reassigned; the accounts are really a
loon pool. So there is not guarantee that a user will always come in
with the same account name, although, as far as job submission goes
this should not matter as all locally used resources are returned at
the end of the job.
What about files written by these accounts? Here we have to
differentiate between
- Disk e.g. NFS/AFS files
- SE files.
For SE files see
SE File Ownership
For disk files, they are just normal UNIX permission files so are
owned by the pool account. So in wanting to know who wrote a file we
need to know the mapping to each pool account. As explained above
this isn't static, but to check your own, with a valid grid proxy,
simply do:-
globus-job-run lcgce02.gridpp.rl.ac.uk /usr/bin/whoami
For other mappings it is necessary to examine the
directory:-
gridmapdir
that is usually found in
/etc/grid-security
As of October 2008 this directory is protected against general read
access so the remaining notes in this section cannot be used unless
that is changed.
In that directory there are pairs of files one gives the pool account
name and the other a mangled version of the
Distinguished Name
What pairs them is that they have the same inode. For example, to
look at the mapping on the RAL T1 CE:-
globus-job-run lcgce02.gridpp.rl.ac.uk /bin/ls -li /etc/grid-security/gridmapdir
Doing that as these note were written gives output that includes
1116569 -rw-r--r-- 2 root root 0 May 6 11:28 minos019
1116569 -rw-r--r-- 2 root root 0 May 6 11:28 %2fc%3duk%2fo%3descience%2fou%3doxford%2fl%3doesc%2fcn%3dnick%20west%3aminos
so I (Nick West) am currently mapped to minos019.
Note that the full output of the command is very long so your best be
is to first filter through grep looking for the account you are
interested in and then again to look for the matching inode e.g.:-
globus-job-run lcgce02.gridpp.rl.ac.uk /bin/ls -li /etc/grid-security/gridmapdir | grep minos019
globus-job-run lcgce02.gridpp.rl.ac.uk /bin/ls -li /etc/grid-security/gridmapdir | grep 1116569
A proxy (short for Proxy Certificate) is a short-term (typically 12
hours) Digital
Certificate designed to act remotely on behalf of a user. To do
this it has to be able to prove it is acting on the user's behalf
using a
Public/Private key
pair. However, to avoid the risk of sending the user's permanent
private key, a temporary Public/Private key pair is created and then a
Proxy Certificate is constructed using the public key of this
temporary pair. The certificate is signed using the permanent key.
When a job is submitted the proxy certificate is sent out together
with the permanent key of the temporary pair. The remote machine can
confirm that the proxy is legitimate and then can use it to
temporarily prove, by using the temporary private key, that is acting
on behalf of the individual.
The problem with this scheme is that the proxy lifetime may be too
short for production purposes so a system has been developed store a
long-term proxy certificate on a dedicated server (Proxy Server). The
WMS (Workload Management System)
is then able to use this long-term proxy to periodically renew the
proxy for a submitted job before it expires and until the job ends (or
the long-term proxy expires). Only trusted services, for example a
batch job scheduler, are allowed to apply for proxy renewal, so this
scheme is safer than simply allowing proxies to have longer lifetimes.
See
Preparing and managing proxies,
@
Basic Certificate and Proxy Concepts
and
@
LCG-2 User Guide 4.4.2 Virtual Organisation Membership Service
A method for providing secure exchange of information between 2 users
A and B. B has two encryption/decryption keys, a private one
and a public one. The public one is derived from the private
one but it is impossible to derive the private one from the public
one. Each key is the only key that will decrypt what the other has
encrypted.
B keys his private key secure and password protected but makes
available his public key. Any user. A wanting to communicate with him
uses it to encrypt a random challenge and sends it to the user
claiming to be B. If genuine B will be able to decrypt the challenge
with his private key and return it to A as proof of identity. After that
the keys can be use to encrypt the information they exchange..
See slide 40 of
@
http://egee-docs.web.cern.ch/egee-docs/support/documentation/pdf/9300_GGUS_Presentation.pdf
and
@
Basic Certificate and Proxy Concepts
An authority to which an individual must report, with photo identification,
as part of the process of obtaining a
Digital Certificate
which gives them access to the GRID.
A Resource Broker examines Requirements and Rank expressions (and also
any data requirements) in the JDL. All CEs are filtered against the
Requirements, and the Rank is calculated for all the CEs which match
and on the basis of Rank determines which CE to submit the job to.
The Resource Broker a job is sent to is determined, in the first
instance by the file:-
/opt/edg/etc/<vo>/edg_wl_ui.conf
so for MINOS this is:-
/opt/edg/etc/minos.vo.gridpp.ac.uk/edg_wl_ui.conf
although this can be overridden in the
JDL (Job Definition Language)
The service is becoming obsolete and for the current middleware (gLite)
is replaced by WMS (Workload Management System).
See
@
FAQs
A
protocol for GRID based file access. There are both secure and
insecure forms of RFIO:-
The fact that both forms are published as "rfio" by the
IS
can cause problems for
GFAL
GRID services in the UK and Ireland Region are provided by 3
federations: GridPP, Grid-Ireland and NGS, the National Grid Service.
See
@
UKI ROC Home page.
It has it's own problem tracking system, although
the preferred error reporting centre is
GGUS (Global Grid User Support),
see
Reporting Problems
A Storage
Element provides uniform access to storage resources. It could be
simply a disk servers, large disk arrays or Mass Storage System such
as
dCache
or
Castor.
A typical API to the resources is
SRM (Storage Resource Manager)
GridPP keep a page showing
@
UK Disk Status
Data on storage elements should be considered permanent and it is user
responsability to manage the available space.
This middleware module provides management services for the storage
resource and provides capabilities like transparent migrations from
disk to tape, file pinnings, reservations, etc. It includes movement
of data from one SRM to another but does not
include file access or file transfer one. For these tasks the client
application will access directly the appropriate file access or
transfer server e.g.
GSIFTP,
GSIDCAP
and
RFIO
Unfortunately, at the moment, not all SEs implement the same version
of the SRM interface, and none of these versions offers all of the
functionalities that the SRM standard defines.
Also known as Physical File Name (PFN), which identifies a
replica in a SE, is of the general form:
<sfn | srm>://<SE_hostname>/<some_string>
where the prefix will be sfn for files located in SEs without SRM
interface and srm for SRM-managed SEs.
In the case of SRM-managed SEs, one cannot assume that the SURL will
have any particular format, other than the srm prefix and the
hostname. In general, SRM-managed SEs can use virtual file systems and
the name a file receives may have nothing to do with its physical
location (which may also vary with time). An example of this kind of
SURL follows:
srm://dcache.gridpp.rl.ac.uk/pnfs/gridpp.rl.ac.uk/data/minos/nwest/test/LVJ_F00034638_0000.mdaq.root
Note: if using RFIO access there
has to be a double slash between the host of the SE and the path of
the file.
Is a valid URI with the necessary information to access a file
in a SE, has the following form:
<protocol>://<some_string>
e.g.gsiftp://tbed0101.cern.ch/flatfiles/SE00/dteam/generated/2004-02-26/file3596e86f-c40
2-11d7-a6b0-f53ee5a37e1d
where <protocol> must be a valid protocol (supported by the SE)
to access the contents of the file (
GSIFTP,
RFIO,
GSIFTP
), and the
string after the double slash may have any format that can be
understood by the SE serving the file.
Note: if using RFIO access there
has to be a double slash between the host of the SE and the path of
the file.
While
SURLs
are in principle
invariable (they are entries in the
file catalog),
TURLs are obtained dynamically from the SURL through the Information
System or the SRM interface (for SRM managed SEs). The TURL therefore
can change with time and should be considered only valid for a
relatively small period of time after it has been obtained.
A computer which has the set of user-level (typically Unix
command-line) client tools and API libraries installed on it. To use
it a user must have an account on the machine and have their user
certificate installed. Besides dealing with all aspects of job
submission and job output recovery, the UI can also be used to check
on resources and copy, replicate and delete files from the Grid.
Ask your computing support for the nearest UI. Any UI
that supports us should have gLite/WMS inteface configuration for us
which is defined by the first of:-
1. the file specified by the -config option of any glite-wms-job-* command.
2. the file pointed by the $GLITE_WMS_CLIENT_CONFIG environment variable.
3. the file $HOME/.glite/<vo>/glite_wms.conf, where <vo> is the user's VO name in lowercase.
4. the file $GLITE_LOCATION/etc/<vo>/glite_wms.conf.
5. the file $GLITE_LOCATION/etc/glite_wms.conf.
See
@
GridPP User Interface
An organisation, typically an experiment, that collectively
run jobs on the grid. It is managed using
VOMS (Virtual Organisation Membership Service)
Provides administration services for users and administrators of a
VO
VOMS is essentally an authentication service: The list of VO users authorized
to use VO resources comes from the VOMS and is propagated to
the resources (
RB,
CE,
SE
....)
For MINOS, the administrators are:-
See the web VOMS interface for MINOS:
@
Welcome to the minos.vo.gridpp.ac.uk VO
There is a
@
VOMS Core Services
that explains:-
- The client(voms-proxy-init)-server(vomsd) architecture
- The command line interface VOMS_PROXY_* to create (INIT), examine (DISPLAY)
and destroy (DESTROY) a proxy.
- The C++ and C APIs, showing how to program to it.
- X.509 AC (Attribute Certificates) Formats: How attributes such as membership, role and
security clearance are bound to an AC.
The WMS examines Requirements and Rank expressions (and also any data
requirements) in the
JDL.
All
CE
are filtered against the Requirements, and the Rank is calculated for
all the CEs which match and on the basis of Rank determines which CE
to submit the job to. For that purpose, the WMS must retrieve
information from the
IS
and the
File Catalog
The Workload Management System a job is sent to is determined, in the first
instance by the file:-
/opt/glite/etc/<vo>/glite_wms.conf
so for MINOS this is:-
/opt/glite/etc/minos.vo.gridpp.ac.uk/glite_wms.conf
although this can be overridden in the
JDL (Job Definition Language)
The WMS service replaces the obsolete middleware (EDG)
RB (Resource Broker)
A Worker Node is a backend batch machine on a
Computing Element
New entries.
Return to home page