next up previous contents
Next: Navigation Up: The MINOS Off-line Software Previous: Geometry   Contents

Subsections

Database Interface

Last significant change: 2008/01/30
This chapter is organised into the following sections.

Concepts
is an introduction to the basic concepts behind the DatabaseInterface. You can skip this section if you are in a hurry, but reading it will help you understand the package.

Installing and Running
provides a few tips on building running programs that use the DatabaseInterface.

Accessing Existing Tables
tells you how you write code to retrieve data from existing tables.

Creating New Tables
describes how new tables are added to the database and the corresponding classes, that serve the data, are designed.

Filling Tables
explains how new data is added to existing tables in the database.

MySQL Crib
gives the bare minimum necessary to use MySQL to manage a database. The DatabaseInterface runs directly on top ROOT under which MySql and flat ASCII files are used to implement a hierarchical database.


Concepts

Types of Data

Besides the data from the detector itself, off-line software requires two additional types of data:-

Detector Description
i.e. data that describes the construction of the detector and how it responds to the passage of particles through it. The geometry, the cabling plex and calibration constants are all examples of this type of data.

Algorithm Configuration
i.e. data used to control the way an algorithm operates. This includes such things as software switches to control flow and tunable cuts.

It is the purpose of the DatabaseInterface to provide simple and efficient access to such data and to provide a framework in which new types of data can be added with minimal effort.


Simple, Compound and Aggregated

Within the database, data is organised into tables. When the user requests data from a table, the DatabaseInterface collect rows of data from the appropriate table. From the perspective of the interface, there are 3 types of organisation:-

Simple
A single row is retrieved. Algorithm Configuration data is always simple; even if multiple configurations are possible, only one can be selected at a time. Detector Description, on the other hand, is almost never Simple.

Compound
Multiple rows are retrieved. Each row represents a single sub-system and the request retrieves data for a complete set of sub-systems. For example a request for plane positions will produce a set of rows, one for each plane.

Aggregated
A special form of Compound depending on the way new data is added to the database:-

There are two types of aggregation:-

Complete
In this type the number of aggregates present at any time is constant, with the possible exception of detector construction periods during which the number increases with time. This is the normal form and is used to describe a set of sub-systems that are permanently present e.g. the set of steel planes.

Sparse
In this type the number of aggregates present at any time is variable, there could even be none. This form is used to describe abnormal conditions such as alarms.

Tables of Data

The DatabaseInterface provides a simple, uniform concept regardless of the data being accessed. Each request for data produces a pointer giving read access to a results table, which is effectively a slice of the underlying database table. Each row of the results table is an object, the type of which is table-specific. These table row objects give access to the data from one row but can hide the way the database table is organised. So changes to the physical layout of a database table should only effect its table row object, not the end users of the data. Note that a single request only ever accesses a single table; the interface does not support retrieval of data from multiple database tables simultaneously.

If the request for data fails for some reason, then the resulting table will be empty, otherwise it will have a single row for Simple organisation and more than one row for Compound and Aggregated. The user can ask how many rows the table has and can directly access any of them. The physical ordering of the rows in the table reflects the way the data was originally written, so for Aggregated data, the ordering is not optimised for retrieval. To deal with this, each table row object can declare a natural index, independent of its physical position, and this natural index can be used to retrieve data.

A Cascade of Databases

The DatabaseInterface can access data for more than one database. During initialisation it is given a list of database URLs. The list order reflects priority; the interface first looks for data in the first database in the list, but if that fails, tries the others in turn until all have been tried or data is found. This scheme allows a user to override parts of the official database by creating a mini-database with their own data and then placing it in the list ahead of the official database. The concept of a set of overlaying databases is called a cascade.

Context Sensitive

In principle, any of the data retrieved by the interface could depend on the the current event being processed. Clearly Detector Descriptions, such as calibration constants, will change with time and the interface has to retrieve the right ones for the current event. Although less likely, its possible that Algorithm Configurations too could change to reflect changes in the detector. For this reason, all requests for data through the interface must supply information about the:-

Collectively this information is called the Context and is represented by the VldContext class of the Validity package. Note that in common with event data and times

All Database date and times are in UTC.

In the database all data is tagged by a Validity Range which identifies the types of data and detector and the ranges of date times for which it is valid. This is represented by the VldRange class of the Validity package. Some data, typically Algorithm Configuration, is universal; the same database data can be used for any event. Others may be very specific to a single type of data and detector and a limited date time range. Physically the way this is done is to have a pair of tables:-

Validity Range Table

This table consists of rows of VldRange objects, each with a unique sequence number which is used as a key into the Main Data Table.

Main Data Table

Each row has a sequence number corresponding to an entry in the Validity Range Table.

The interface first finds a match in the Validity Range Table for the current context and then retrieves all rows in the Main Data Table that match its sequence number. The reasons for this two step approach are:-

Extended Context

The primary function of DatabaseInterface is to provide the best information for a specific context, but it can also retrieve information for much more general queries. The query is still broken into two parts: the ``context'' which is matched to the Validity Range Table and then the data from the main table is taken for the selected sequence number(s). However the user can supply a context such as ``All ranges that start between this time and that time'' hence the term ``Extended Context''. Further, during the retrieval of data from the main table addition restrictions can be imposed. The result of an Extended Context query is a collection of rows that will not normally represent the state of the detector at a single moment in time and it is up to the user to interpret the results meaningfully.


SimFlag Association

As explained in the preceding section, the interface finds the database data that best matches the context of the data. There are occasions when this matching needs to be changed, for example there can be times when Monte Carlo data needs to be treated exactly as if it were event data and this includes the way it retrieves from the database. To support this the user can specify, for any type of data, an associated list of data types. If this is done then, instead of using the current type, each of the alternative types are tried until a match is found. This matching takes precedence over the cascade i.e. all associated types are tried on the first database in the cascade before moving on to the second and subsequent cascade members. This ensures that higher members, which might even refer back to the ORACLE database at FNAL, are only tried as a last resort.

Authorising Databases and Global Sequence Numbers

As explained in the previous section, sequence numbers in the Validity Range Table are unique. However this can present a problem if the same type of data is being entered into several different databases. For example calibration constants will be created in the Near, Far and Calibration detectors. Eventually the tables will be merged but it is essential that there is no conflict in the sequence numbers. To solve this problem, certain databases are special: they are able to produce globally unique sequences numbers. They do this as each is allocated a unique block of 10,000,000 sequence numbers ( which is enough to allow a new entry to be made every minute for 20 years!). These blocks are recorded in a special table: GLOBALSEQNO that holds the last used sequence number for each table. The block 1..9,999,999 is used for local sequence numbers i.e. ones that are only guaranteed unique within the current database table.

By default permanent data written to an authorising database will be written with global sequence numbers. For temporary data, or if writing to a non-authorising database, local sequence numbers are used and in this case a LOCALSEQNO table is generated automatically if required.

Important:-

Validity Management

For constants that change with time (if that is not a contradiction in terms!) it makes sense to have overlapping Validity Ranges. For example, suppose we know that a certain sort of calibration constants drifts with time and that, once determined, is only satisfactory for the next week's worth of data. A sensible procedure would be to limit its validity to a week when writing to the database but to determine new constants every few days to ensure that the constants are always ``fresh'' and that there is no danger that there will be a gap. However, this means that the interface has to perform two types of Validity Management:-

Ambiguity Resolution
When faced with two or more sets of data the interface has to pick the best. It does this simply by picking the one with the latest creation date time.

Validity Range Trimming

Having found the best set, the interface wants to know how long it will remain the best. Any set whose creation date is later will be better according to the above rule and so the retrieved data has its validity trimmed so as not to overlap it. This reduced Validity Range is called the Effective Validity Range. This only happens in memory; the database itself is not modified, but it does mean that the interface does not need to check the database again for this set of data until the Effective Validity Ranges has expired. This trimming also applies between databases in a cascade, with sets in higher priority databases trimming those in lower ones.

Overlay Creation Dates

As explained above, creation dates play a crucial role in resolving which set of data to use; later creation dates take priority over earlier ones. This scheme assumes that constants from earlier runs are created before constants from later runs, but this isn't always true. When improving e.g. calibration constants, it's quite normal to recalibrate recent runs before going back and fixing earlier ones and then, simply to use the date when the constants were created would mean that the constants from earlier runs would take priority over any later runs they overlapped. To allow constants to be created in any order the interface provides a system for deducing the best creation dates for any constants as follows:-

The scheme means that creation dates always follow that dates of the runs that they correspond to rather than the dates when their constants were created. When using the scheme its probably better to consider the ``dates'' to be version numbers.

Rollback

The database changes almost constantly to reflect the state of the detector, particularly with regard to the calibration constants. However this can mean that running the same job twice can produce different results if database updates that have occurred between the two runs. For certain tasks, e.g. validation, its necessary to decouple jobs from recent updates and this requires database rollback i.e. restoring the database to a previous state. Rollback works by exploiting the fact that data is not, in general, ever deleted from the database. Instead new data is added and, by the rules of Ambiguity Resolution (see the previous section) supersede the old data.

Data carry two dates: date of insertion into local database and global creation date, which is typically the date when it was originally created and is universal to all databases. Rollback can use either date. Using insert date rollback undoes recent changes to the local database and the effect will therefore be specific to the local database. On the other hand rollback by creation date removes recently created data in the same way from all databases.

Lightweight Pointers to Heavyweight Data

One of the interface's responsibilities is to minimise I/O. Some requests, particularly for Detector Configuration, can pull in large amounts of data but users must not load it once at the start of the job and then use it repeatedly; it may not be valid for all the data they process. Also multiple users may want access to the same data and it would be wasteful for each to have their own copy.

To deal with both of the above, the interface reuses the concept of a handle, or proxy, that appears in other packages such as Candidate. The system works as follows:-

  1. When the user wants to access a particular table they construct a table-specific pointer object. This object is very small and is suitable to be stack based and passed by value, thus reducing the risk of a memory leak.

  2. During construction of the pointer, a request for data is passed down through the interface and the results table, which could be large, is created on the heap. The interface places the table in its cache and the user's pointer is attached to the table, but the table is owned by the interface, not the user.

  3. Each request for data is first sent to the cache and if already present then the table is reused.

  4. Each table knows how many user pointers are connected to it. As each pointer is discarded by its owner, it disconnects itself from the table it points to.

  5. Once a table has no pointers left it is a candidate for being dropped by its cache. However this is not done at once as, between events, there are likely to be no user pointers, so just because a table is not currently being pointed to, it doesn't mean that it won't be needed again.

Natural Table Index

For Detector Description data, tables can be large and the user will require direct access to every row. However, the way the table is arranged in memory reflects the way the data was originally written to the database. For Simple and Compound data the table designer can control this organisation as complete sets are written as a single unit. For Aggregated data, the layout reflects the way aggregates are written, for example, for pulser data, all the strip ends illuminated by an LED will appear together in the table. This allows the interface to replace individual aggregates as their validity expires. However this means that the physical layout may not be convenient for access. To deal with this table row objects, which all inherit from DbiTableRow are obliged to return a Natural Table Index, if the physical ordering is not a natural one for access. Tables can then be accessed by this index.

Task

Task will provide a way to further select the type of data retrieved. For example:-

The aim is that Task will allow a particular database table to be sub-divided according to the mode of use. Currently Task is a data type defined in Dbi i.e. Dbi::Task and is implemented as an integer. The default value is zero.

Level 2 (disk) Cache

Loading a large table from the database is a lot of work:-

  1. The query has to be applied and the raw data loaded.

  2. The row objects have to be individually allocated on the heap.

  3. Each data word of each row object has to be individually converted through several layers of the support database software from the raw data.

Now as the detector configuration changes slowly with time identically the same process outlined above is repeated many times, in many jobs that process the data, so the obvious solution is to cache the results to disk in some way that can be reloaded rapidly when required. The technique essentially involves making an image copy of the table to disk. It can only be applied to some tables, but these include the Calibration tables which represent the largest database I/O load, and for these tables loading times can be reduced by an order of magnitude.

Running

Setting up the Environment

The interface needs a list of Database URLs, a user name and a password. These can be communicated via a set environmental variables of the form ENV_TSQL*. Here is the set with some example setting:-

setenv ENV_TSQL_URL "mysql://pplx2.physics.ox.ac.uk/offline"
setenv ENV_TSQL_USER "west"
setenv ENV_TSQL_PSWD ""

The meanings are as follows.

ENV_TSQL_URL
This is a semi-colon separated list of URLs. Each URL takes the form:-
protocol://host[:port]/[database][?options]
where:
   protocol - DBMS type , e.g. mysql etc.
   host - host name or IP address of database server
   port - port number
   database - name of database
   options - string key=value's separated by ';' or '&'     
 Example: 
 "mysql://myhost:3306/test?Trace=Yes;TraceFile=qq.log"

ENV_TSQL_USER
Your database user name. If you require different names for different databases in the cascade then this can be a semi-colon separated list in the same order as ENV_TSQL_URL. If the list is shorter than that list, then the first entry is used for the missing entries.

ENV_TSQL_PSWD
You could put your password here, but it is a security risk to leave that set up in the environment. Leaving it null will mean that it will be prompted for when the interface initialises. As with ENV_TSQL_USER it can be a semi-colon separated list with the first entry providing the default if the list is shorter than ENV_TSQL_URL.

These variable should be set for the standard read-only configuration. Any job that uses the JobControl package can override any or all of these variables using the switches:-

for example:-

  demojob -d mysql://pplx2.physics.ox.ac.uk/offline -u reader ...

replaces ENV_TSQL_URL and ENV_TSQL_USER but takes the password from ENV_TSQL_PSWD. Use of these switches is the recommended way to configure a framework job when write-access to the database is required.

Configuring

Once the job is running you can configure the DatabaseInterface. To change any configurable aspect of the interface get a reference to its CfgConfigurable interface, set parameters as required and update:-

   // DbiTableProxyRegistry is a CfgConfigurable 
   DbiTableProxyRegistry& dbiCfg = DbiTableProxyRegistry::Instance();
   dbiCfg.Set("...");
   dbiCfg.Set("...");
   dbiCfg.Set("...");
   dbiCfg.Update();

On startup the interface checks the environmental variable ENV_DBI which can contain a semi-colon separated list of configuration requests. For example:-

   setenv ENV_DBI "Level2Cache = '/home/minos/temp/dbi_chache';Rollback:* = '2000-01-01'"

see below for an explanation of what these requests mean.

In the above example the environment string contained a *. To prevent csh and derivative shells (e.g.tcsh) for attempting to expand the * as a set of file names, temporarily define the shell variable nogob:-

   set noglob
   setenv ENV_DBI " ...
   unset noglob

Rollback

To impose a global rollback date to say September 27th 2002:-

   dbiCfg.Set("Rollback:* = '2002-09-27 00:00:00'");

This will ensure that the interface ignores data inserted after this date for all future queries. The hours, minutes and seconds can be omitted and default to 00:00:00.

Rollback can be more selective, specifying either a single table or a group of tables with a common prefix. For example:-

   dbiCfg.Set("Rollback:*            =  '2002-09-01'");
   dbiCfg.Set("Rollback:CAL*         =  '2002-08-01'");
   dbiCfg.Set("Rollback:CALPINTOADC  =  '2002-07-01'");

Now the table CALPINTOADC is frozen at July 2002, other CAL tables at August and all other tables at September. The ordering of the commands is not important; the interface always picks the most specific one to apply to each table.

By default rollback is by insert date, but it can be changed, on a table by table basis, to creation date using the RollbackType configuration request e.g.

   dbiCfg.Set("RollbackType:CAL*   =  'INSERTDATE'");
   dbiCfg.Set("RollbackType:UGLI*  =  'CREATIONDATE'");

The first of these two lines is redundant as this is the default but serves to illustrate how to select both types of date.

Rollback only applies to future queries, it does not invalidate any existing query result in the cache which are still available to satisfy future requests. So impose rollback conditions at the start of the program to ensure they apply consistently.

SimFlag Association

By default the DatabaseInterface matches data using the context of the current event, and this includes the type of data e.g. Monte Carlo or real. As explained in the Concepts section 8.1.7 this can be overridden. For example:-

   dbiCfg.Set("SimFlagAssociation:MC = 'Reroot,MC,Data'");

would mean that for MC data, the interface would try in order: Reroot, then MC and finally Data for the first database cascade member and then the second and so on until a match was found.

Multiple associations can be set up, here is a really silly example:-

   dbiCfg.Set("SimFlagAssociation:MC = 'Data'");
   dbiCfg.Set("SimFlagAssociation:Data = 'MC'");

Shutdown

By default the DatabaseInterface does not clean up its cache at the end of the program; for to do so would waste time and gain nothing. However, if looking for leaks this then appears as a massive one, so a full shutdown with cache clean up can be enabled by:-

   dbiCfg.Set("Shutdown = 1");


MakeConnectionsPermanent

By default the DatabaseInterface closes connection to the database between queries, to minimise use of resources - see section 8.8.1. If the job is doing a lot of database I/O, for example creating calibration constants then this may degrade performance in which case all connections can be made permanent by:-

   dbiCfg.Set("MakeConnectionsPermanent = 1");

alternatively:-

   dbiCfg.Set("MakeConnectionsPermanent = -1");

does the reverse and makes all database connections temporary. A third alternative is:-

   dbiCfg.Set("MakeConnectionsPermanent = 0");

which leaves ASCII flat file database connections open as reconnecting involves reloading all ASCII files even if they are not needed to satisfy the current query.

Ordering Context Query Results

By default when the DatabaseInterface retrieves the data for a Context Query, it does not impose an order on the data beyond requiring that it be in sequence number order. When an ordering is not imposed, the database server is under no obligation to return data in a particular order. This means that the same job running twice connected to the same database could end up with result sets that contain the same data but with different ordering. Normally this doesn't matter, the ordering of rows is not significant. However, results from two such jobs may not be identical as floating point calculations can change at machine level precision if their ordering is changed. There are situations where it is required that the results be identical. For example:-

and for such occasions it is possible to completely specify the ordering of rows within a sequence number by forcing sub-ordering by ROW_COUNTER, a column that should be present in all Main Data tables:-

   dbiCfg.Set("OrderContextQuery = 1");

Level 2 Cache

Enabling the Level 2 Cache allows certain large tables query results to be written to disk from which they can be reloaded by subsequent jobs saving as much as an order of magnitude in load time. Data in the cache will not prevent changes in the database from taking affect for the DatabaseInterface does an initial (lightweight) query of the database to confirm that the data in the cache is not stale. To enable the cache, the user specifies a directory to which they have read/write access. For example, to make the current working directory the cache:-

   dbiCfg.Set("Level2Cache = './'");

Cache files all have the extension .dbi_cache. Not all tables are suitable for Level 2 caching; the DatabaseInterface will only cache the ones that are.

Cache files can be shared between users at a site to maximise the benefit. In this case the local Database Manager must set up a directory to which the group has read/write access. Management is trivial, should the cache become too large, it can simply be erased and then the next few jobs that run will re-populate it with the currently hot queries.

Note that Cache performance is achieved by doing raw binary I/O so the cache files are platform specific, so if running in a heterogeneous cluster the Database Manager should designate a platform specific directory. To simplify this, the name of the directory used by the cache can include environmental variables e.g.:-

dbiCfg.Set("Level2Cache = '$DBI_L2CACHE/$SRT_ARCH'");

Accessing Existing Tables

Introduction

The DatabaseInterface gives you access to two types of data:-

Detector Description
i.e. data that describes the construction of the detector and how it responds to the passage of particles through it. The geometry, the cabling plex and calibration constants are all examples of this type of data.

Algorithm Configuration
i.e. data used to control the way an algorithm operates. This includes such things as software switches to control flow and tunable cuts.

In all cases the user specifies the database table to be accessed and supplies a ``context'' for the query. The context describes the type and date time of the current event. This is stored in a Validity package VldContext object. Candidate and RawData have:-

   VldContext   GetVldContext() const

methods to get their context. The DatabaseInterface uses the context to extract all the rows from the database table that are valid for this event. It forms the result into a table in memory and returns a object that acts like a pointer to it.

You are NOT responsible for deleting the table; the Database Interface will do that when the table is no longer needed

You have random access to any row of the results table. Each row is an object which is specific to that table. The key to understanding how to get data from a database table is study the class that represent a row of it results table.

Accessing Detector Descriptions

Making the Query

As explained above, the key to getting data is to locate the class that represents one row in a database table. To understand how this all works look at one of the sample tables included in the test directory of the DatabaseInterface package and imaginatively called DbiDemoData1, DbiDemoData2 and DbiDemodata3. For purposes of illustration we will pick the first of these. Its header can be found in:-

DatabaseInterface/test/DbiDemoData1.h

To make a query you create a DbiResultPtr object. Its header can be found in:-

DatabaseInterface/DbiResultPtr.h

This is a class that is templated on the table row class, so in this case the instantiated class is:-

DbiResultPtr<DbiDemoData1>

and to instantiate an object of this class you just need a VldContext object. Suppose vc is such an object, then this creates the pointer:-

DbiResultPtr<DbiDemoData1> myResPtr(vc);

This statement creates a DbiResultPtr for DbiDemoData1 class. First it searches through the database for all DbiDemoData1 objects that are valid for vc, then it assembles them into a table and finally passes back a pointer to it. Not bad for one statement! The constructor can take a second argument:-

 DbiResultPtr(VldContext vc,Dbi::Task task=0);

Dbi::Task offers a way to sub-divided a table according to the mode of operation. For example a Detector Configuration data could have two modes, one for raw calibration and another for refined calibration.

If the concept of a task is not relevant for a particular database table, then it should be left at its default value of 0. Otherwise data for each task should be allocated a unique positive number and then selection will only pick rows with the required value of task.

The constructor can take further arguments which can normally be left at their default values - a Dbi::AbortTest see section 8.3.6 and a Bool_t findFullTimeWindow see section 8.8.2.


Accessing the Results Table

Having got a pointer to the table the first thing you will want to know is how many rows it has. Do this using the method:-

UInt_t GetNumRows() const;

If the query failed then the number of rows returned will be zero. This could either be the result of some catastrophic failure, for example the database could not be opened, or simply that no appropriate data exists for the current event. If you want to know which of these it is you can use the:-

const DbiValidityRec* GetValidityRec() const;

If this returns a null pointer, then the failure was a major one, see 8.3.6. If not then the DbiValidityRec tells you about the validity of the gap. Its method:-

 const VldRange& GetVldRange() const;

returns a Validity package VldRange object that can yield the start and end times of the gap. Due to the way the DatabaseInterface forms the query, this may be an underestimate, but never an overestimate.

If the table has rows then the GetVldRange() will give you an object that tells you the validity range of the data. Again, the range may be an underestimate. To get to the data itself, use the method:-

const T* GetRow(UInt_t i) const;

where T = DbiDemoData1 in this case. This gives you a const pointer to the $i^{th}$ row where i is in the range $ 0 <= i < \rm GetNumRows() $.

Putting this all together, suppose you have a CandDigitListHandle object cdlh, and you want to loop over all DbiDemoData1 objects that are valid for it, the code is:-

DatabaseInterface/test/DbiDemoData1.h
DatabaseInterface/DbiResultPtr.h

...

DbiResultPtr<DbiDemoData1> myResPtr(cdlh.GetVldContext());

for ( UInt_t irow = 0; irow < myResPtr.GetNumRows(); ++ires) {
  const DbiDemoData1* ddd1 = myResPtr.GetRow(irow);

// Process row.

}

GetRow is guaranteed to return a non-zero pointer if the row number is within range, otherwise it returns zero. The ordering of rows reflects the way the data was written to the database. For some types of data this layout is not well suited for access. For example, for pulser data, all the strip ends illuminated by an LED will appear together in the table. To deal with this table row object are obliged to return a Natural Table Index, if the physical ordering is not a natural one for access. You get rows from a table according to their index using the method:-

const T* GetRowByIndex(UInt_t index) const;

You should always check the return to ensure that its non-zero when using this method unless you are absolutely certain that the entry must be present.

Getting Data from a Row

Having got to the table row you want, the last job is to get its data. Its up to the table row objects themselves to determine how they will present the database table row they represent. In our example, the DbiDemoData1 is particularly dumb. Its internal state is:-

    Int_t    fSubSystem;
    Float_t  fPedestal;
    Float_t  fGain1;
    Float_t  fGain2;

which it is content to expose fully:-

    Int_t GetSubSystem() const { return fSubSystem; }
  Float_t GetPedestal() const { return fPedestal; }
  Float_t GetGain1() const { return  fGain1; }
  Float_t GetGain2() const { return fGain2; }

Its worth pointing out though that it is the job of the table row object to hide the physical layout of the database table and so shield its clients from changes to the underlying database. Its just another example of data encapsulation.

Making Further Queries

Even though a DbiResultPtr is lightweight it is also reusable; you can make a fresh query using the NewQuery method:-

UInt_t NewQuery(VldContext vc, Dbi::Task task=0);

which returns the number of rows found in the new query. For example:-

DbiResultPtr<DbiDemoData1> myResPtr(vc);
...
VldContext newVc;
...
myResPtr.NewQuery(newVc);
...

Having made a query you can also step forwards or backwards to the adjacent validity range using the method:-

UInt_t NextQuery(Bool_t forwards = kTRUE);
supply a false value to step backwards. This method can be used to ``scan'' through a database table, for example to study calibration constants changes as a function of time. To use this efficiently you need to request accurate validity ranges for your initial query, although this is the default see section 8.8.2. For aggregated data stepping to a neighbouring range will almost certainly contain some rows in common unless all component aggregates have validity ranges that end on the boundary you are crossing. See the next section for a way to detect changes to data using the DbiResult::GetID() method.

Simple Optimisation

The first, and most important, level of optimisation is done within the DatabaseInterface itself. Each time it retrieves data from the database it places the data in an internal cache. This is then checked during subsequent queries and reused as appropriate. So the first request for a large table of calibration constants may require a lot of I/O. However the constants may remain valid for an entire job and in which case there is no further I/O for this table.

Although satisfying repeat requests for the same data is quick it still requires the location of the appropriate cache and then a search through it looking for a result that it is suitable for the current event. There are situations when even this overhead can be a burden: when processing many rows in a single event. Take for example the procedure of applying calibration. Here every digitization needs to be calibrated using its corresponding row in the database. The naive way to do this would be to loop over the digits, instantiating a DbiResultPtr for each, extracting the appropriate row and applying the calibration. However it would be far more efficient to create a little calibration object something like this:-

class MyCalibrator {
public:
  MyCalibrator(const VldContext vc): fResPtr(vc) {}
  Float_t Calibrate(const CandDigitHandle cdh) {
     /*  Use fResPtr to calibrate cdh  */  
}

private
  DbiResultPtr<DbiDemoData1> fResPtr;
};

MyCalibrator is a lightweight object holding only a pointer to a results table. It is created with a VldContex object which it uses to prime its pointer. After that it can be passed CandDigitHandle objects for which it returns calibrated results using its Calibrate method. Now the loop over all digitizations can use this object without any calls to the DatabaseInterface at all. Being lightweight MyCalibrator is fine as a stack object, staying in scope just long enough to do its job.

Another optimisation strategy involves caching results derived from a query. In this case it is important to identify changes in the query results so that the cached data can be refreshed. To aid this, each DbiResult is given an key which uniquely identifies it. This key can be obtained and stored as follows:-

DbiResultKey MyResultKey(myResPtr.GetKey());

This should be stored by value (the DbiResultKey pointed to by GetKey will be deleted when the results expire) as part of the cache and checked each time a change is possible:-

if ( ! MyResultKey.IsEqualTo(myResPtr.GetKey()) ) {

  // recreate the cache data ...

  MyResultKey = *myResPtr.GetKey();
}

Caution: This tests to see that the current DbiResult has exactly the same data as that used when the cached was filled, but not that it is physically the same object. If there have been intervening queries the original object may have been deleted but this should not matter unless the cache holds pointers back to the DbiResult. In this case the result ID should be used. Initialise with:-

Int_t MyResultID(myResPtr.GetResultID());

and then check as follows:-

if ( MyResultID != (myResPtr.GetResultID()) ) {

  // recreate the cache data ...

   MyResultID = myResPtr.GetResultID();
}


Accessing Algorithm Configuration

In all probability you will not need to know much about the DatabaseInterface when it comes to Algorithm Configuration; all such data ends up inside AlgConfig objects. A helper class DbiConfigStream provides a simple way to fill and store Registry objects, and as AlgConfig objects inherit from Registry it can fill and store them too. All configuration data is tagged with a pair of names:-

Software Name
This is the name of the software (normally an Algorithm) that is being configured.

Configuration Name
Each configurable software system may have multiple configuration sets each with it's own name. If there is only a single set then the name will be ``default''.

To create a Registry object and fill it from the set ``MyConfig'' for the software ``MyAlgorithm'':-

Registry reg;
DbiConfigStream stream("MyAlgorithm","MyConfig");
stream >> reg;

DbiConfigStream is a lightweight object that can be discarded as soon as the Registry object has been filled.

You can stop reading this section now unless you are curious as to what is happening inside DbiConfigStream. For Detector Description data, every database table has its own table row class. With Algorithm Configuration, the reverse is true. All data is loaded from a single table (SOFTWARE_CONFIGURATION) using the same table row class: DbiConfigSet, and a DbiConfigStream object provides a simple way to read and write these objects. Also, in contrast to Detector Configuration data, Algorithm Configuration queries yield a simple (i.e. single row) result. The single DbiConfigSet object give access to all of its parameters i.e. columns:-

       UInt_t GetNumParams() const;
       string GetParamName(UInt_t parNo) const;
 DbiFieldType GetParamType(UInt_t parNo) const;
       string GetParamValue(UInt_t parNo) const;

Each parameter ends up as an item in the Registry.

By default DbiConfigSet is loaded from the SOFTWARE_CONFIGURATION table which has the following columns:-

SEQNO
The sequence number.

SOFTW_NAME
The software name.

CONFIG_NAME
The configuration set name.

CONFIG_DATA
The configuration data as a single string as produced by Registry::PrintStream(). For example:-

  ['' 'mydouble'=(double)1.234e+100 'myint'=(int)12345 'mystring'=(string)'This is a string' '']

In the above case the Registry would be filled with 3 values:-

    mydouble = 1.234e+100
    myint = 12345
    mystring = This is a string

Normally Algorithm Configuration data will be universally valid and only use the default task so the constructor takes a series of default arguments, but the full set is:-

   DbiConfigStream(const std::string& SoftwName,
                   const std::string& ConfigName="default",
                   VldContext vc = Dbi::fgDefaultContext,
                   Dbi::Task task=0,
                   const std::string& tableName="SOFTWARE_CONFIGURATION");

So it is possible to specify a VldContext and a task and even an alternative table name.


Writing Algorithm Configuration

Although database writing is dealt with later in this chapter, this is a convenient place to deal with configuration output. To create new configuration data, or overlay existing ones:-

  1. First create a DbiConfigStream object as described above e.g.:-
    DbiConfigStream stream("MyAlgorithm","MyNewConfig");
    

  2. Next fill it from a Registry:-

    Registry reg;
    // Fill reg from somewhere...
    stream << reg;
    
    At this stage only the DbiConfigStream has been updated; no database output has been performed.

  3. By default the data will be universally valid but you can adjust that by:-

    DbiValidityRec& vrec = stream.GetValidityRec();
    
    and modifying/replacing as required.

  4. Finally write out the new data using:-

    Bool_t DbiConfigStream::Write(UInt_t dbNo = 0,
                 const std::string& logComment = "");
    
    specifying the cascade entry number and the comment used to create a new database log record.

Extended Context Queries

Making the Query

The constructor of a DbiResultPtr for an Extended Context Query is:-

   DbiResultPtr(const string& tableName,
                const DbiSqlContext& context,
                const Dbi::Task& task = Dbi::kAnyTask, 
                const string& data = "",
                const string& fillOpts = "",

Dealing with each of these arguments in turn:-

const string& tableName
The name of the table that is to be accessed. This allows any type of DbiTableRow to be loaded from any type of table, but see section 8.5 on filling if you are going to play tricks!

const DbiSqlContext& context

This argument provides the extended context through the utility class DbiSqlContext. Consider the following code:-

  // Construct the extended context: FarDet data that starts on Sept 1 2003. 
  // (note: then end time stamp is exclusive)
  VldTimeStamp tsStart(2003,9,1,0,0,0);
  VldTimeStamp   tsEnd(2003,9,2,0,0,0);
  DbiSqlContext context(DbiSqlContext::kStarts,tsStart,
                        tsEnd,DetectorType::kFar,SimFlag::kData);

You supply the type of context (in this case DbiSqlContext::kStarts), the date range and the detector type and sim flag. Other types of context are kEnds and kThroughout. See

  DatabaseInterface/DbiSqlContext.h

for the complete list.

You are not limited to the contexts that DbiSqlContext provides. If you know the SQL string you want to apply then you can create a DbiSqlContext with the WHERE clause you require e.g.:-

  DbiSqlContext myContext("DETECTORMASK & 4")

which would access every row that is suitable for the CalDet detector.

const Dbi::Task& task

The task is as for other queries but with the default value of:-

  Dbi::kAnyTask

which results in the task being omitted from the context query and also allows for more general queries: anything that is is valid after the where is permitted. For example:-

  DbiSqlContext myContext("creationdate > '2004-01-01 00:00:00' order by creationdate limit 1")

The SQL must have a where condition, but if you don't need one, create a dummy that is always true e.g.:-

  DbiSqlContext myContext("1 = 1 order by timeend desc limit 1 ")

const string& data

This is an SQL fragment, that if not empty (the default value) is used to extend the WHERE clause that is applied when querying the main table. For example consider:-

  DbiSqlContext context(DbiSqlContext::kStarts,tsStart,tsEnd,
                        DetectorType::kFar,SimFlag::kData);
  DbiResultPtr<DbuSubRunSummary> 
                        runs("DBUSUBRUNSUMMARY",context,
                              Dbi::kAnyTask,"RUNTYPENAME = 'NormalData'");

This query reads the DBUSUBRUNSUMMARY table, and besides imposing the context query also demands that the data rows satisfies a constraint on RUNTYPENAME.

const string& fillOpts

This is a string that can be retrieved from DbiResultSet when filling each row so could be used to program the way an object fills itself e.g. by only filling certain columns. The DatabaseInterface plays no part here; it merely provides this way to communicate between the query maker and the the author of the class that is being filled.

Accessing the Results Table

Accessing the results of an Extended Context query are essentially the same as for a standard query but with following caveats:-


Error Handling

Response to Errors

All DbiResultPtr constructors, except the default constructor, have a optional argument:-

 Dbi::AbortTest abortTest = Dbi::kTableMissing

Left at its default value any query that attempts to access a non-existent table will abort the job. The other values that can be supplied are:-

kDisabled
Never abort. This value is used for the default constructor.

kDataMissing
Abort if the query returns no data. Use this option with care and only if further processing is impossible.

Currently aborting means just that; there is no graceful shut down and saving of existing results. You have been warned!


Error Logging

Errors from the database are recorded in a DbiExceptionLog. There is a global version of that records all errors. The contents can be printed as follows:-

  #include "DatabaseInterface/DbiExceptionLog.h"

  ...

  MSG("MyPackage",Msg::kInfo) << "Contents of the Global Exception Log: \n"
             <<  DbiExceptionLog::GetGELog();

Query results are held in a DbiResult and each of these also holds a DbiExceptionLog of the errors (if any) recorded when the query was made. If myResPtr is a DbiResultPtr, then to check and print associated errors:-

  const DbiExceptionLog& el(myResPtr.GetResult()->GetExceptionLog());
  if ( el.Size() == 0 ) MSG("MyPackage",Msg::kInfo) << "No errors found" << endl;
  else                  MSG("MyPackage",Msg::kInfo) << "Following errors found" << el << endl;

Creating New Tables

Choosing Table Names

The general rule is that a table name should match the DbiTableRow subclass object that it is used to fill. For example the table PLEXRAWCHANNELTOPINDIODE corresponds to the class PlexRawChannelToPinDiode. Beyond this the following are required as we need to remain compatible with ORACLE:-

These restrictions also apply to column names.

Creating Detector Descriptions

A Simple Example

Creating new Detector Descriptions involves the creation of a database table and the corresponding table row Class. The main features can be illustrated using the example we have already studied: DbiDemoData1. Recall that its state data is:-

      Int_t  fSubSystem;
    Float_t  fPedestal;
    Float_t  fGain1;
    Float_t  fGain2;

Its database table, which bears the same name, is defined, in MySQL, as:-

CREATE TABLE DBIDEMODATA1(
         SEQNO INTEGER not null,
         ROW_COUNTER INTEGER not null,
         SUBSYSTEM  INT,
         PEDESTAL   FLOAT,
         GAIN1      FLOAT,
         GAIN2      FLOAT,
         primary key(SEQNO,ROW_COUNTER));

as you can see there is a simple 1:1 correspondence between them except that the database table has two additional leading entries:-

         SEQNO INTEGER not null,
         ROW_COUNTER INTEGER not null,

and a trailing entry:-

         primary key(SEQNO,ROW_COUNTER));

ROW_COUNTER is a column whose value is generated by the interface, it isn't part of table row class. Its sole purpose is to ensure that every row in the table is unique; an import design constraint for any database. This is achieved by ensuring that, for a given SEQNO, each row has a different value of ROW_COUNTER. This allows the combination of these two values to form a primary (unique) key, which is declared in the trailing entry.

All database tables supported by the DatabaseInterface have an auxiliary Validity Range Tables that defines validity ranges for them. Each validity range is given a unique sequence number that acts as a key and corresponds to SeqNo. In our case, indeed every case apart from the table name, the definition is:-

create table  DBIDEMODATA1VLD(
         SEQNO integer not null primary key,
         TIMESTART datetime not null,
         TIMEEND datetime not null,
         DETECTORMASK tinyint(4),
         SIMMASK tinyint(4),
         TASK integer,
         AGGREGATENO integer,
         CREATIONDATE datetime not null,
         INSERTDATE datetime not null,
         key TIMESTART (TIMESTART), 
         key TIMEEND (TIMEEND));

When the DatabaseInterface looks for data that is acceptable for a give validity it:-

  1. Matches the validity to an entry in the appropriate Validity Range Table and gets its SeqNo.

  2. Uses SeqNo as a key into the main table to get all the rows that match that key.

So, as a designer, you need to be aware of the sequence number, and the row counter must be the first two columns in the database table, but are not reflected in the table row class.

Filling a table row object from the database is done using the class's Fill method. For our example:-

void DbiDemoData1::Fill(DbiResultSet& rs,
                        const DbiValidityRec* vrec) {

  rs >> fSubSystem >> fPedestal >> fGain1 >> fGain2;

}

the table row object is passed a DbiResultSet which acts rather like an input stream. The sequence number has already been stripped off; the class just has to fill its own data member. The DatabaseInterface does type checking (see the next section) but does not fail if there is a conflict; it just produces a warning message and puts default data into the variable to be filled.

The second argument is a DbiValidityRec which can, if required, be interrogated to find out the validity of the row. For example:-

  const VldRange& range = vrec->GetVldRange();

vrec may be zero, but only when filling DbiValidityRec objects themselves. On all other occasions vrec should be set.


Creating a Database Table

The previous section gave a simple MySQL example of how a database table is defined. There is a bit more about MySql in section 8.7. The table name normally must match the name of the table row class that it corresponds to with the exception of case: the class may be mixed case e.g. DbiDemoData1 but the table must be upper case e.g. DBIDEMODATA1. There is a strict mapping between database column types and table row data members, although in a few cases one column type can be used to load more than one type of table row member. The table 8.1 gives the recommended mapping between table row, MySQL column type and ORACLE column type.


Table 8.1: Recommended table row and database column type mappings

Table Row Type MySQL Type ORACLE Type Comments
Bool_t CHAR CHAR  
Char_t CHAR CHAR  
Char_t* CHAR(n) n$<$4 CHAR(n) n $<$4
Char_t* TEXT VARCHAR(4000) n $>$3
string TEXT VARCHAR(4000)  
Short_t TINYINT NUMBER(4) 8 bit capacity
Short_t SMALLINT NUMBER(6) 16 bit capacity
Int_t TINYINT NUMBER(4) 8 bit capacity
Int_t SMALLINT NUMBER(6) 16 bit capacity
Int_t INT or INTEGER NUMBER(11) 32 bit capacity
Float_t FLOAT FLOAT(32)  
Double_t DOUBLE FLOAT(64)  
VldTimeStamp DATETIME DATE  


Notes

  1. To save table space, select CHAR(n) for characters strings with 3 or less characters and select the smallest capacity for integers.

  2. The long (64 bit) integer forms are not supported as on Intel processors they are only 4 bytes long.

  3. Although MySQL supports unsigned values we banned them when attempting to get a previous interface to work with ORACLE, so unsigned in database column type should be avoided. It is allowed to have unsigned in the table row when a signed value is not appropriate and the interface will correctly handle I/O to the signed value in the database even if the most significant bit is set i.e. the signed value in the database is negative. It is unfortunate that the signed value in the database will look odd in such cases.


Designing a Table Row Class

Here is a list of the requirements for a table row class.

Must inherit from DbiTableRow
All table row objects must publicly inherit from the abstract classDbiTableRow. DbiTableRow does provide some default methods even though it is abstract.

Must provide a public default constructor
e.g.:-
DbiDemoData1::DbiDemoData1() { }
The DatabaseInterface needs to keep a object of every type of table row class.

Must implement CreateTableRow method
e.g.:-
  virtual DbiTableRow* CreateTableRow() const { 
                              return new DbiDemoData1; }
The DatabaseInterface uses this method to populate results tables.

May overload the GetIndex method
As explained in section 8.3.2 the ordering of rows in a table is determined by the way data is written to the database. Where that does not form a natural way to access it, table row objects can declare their own index using:-

UInt_t GetIndex(UInt_t defIndex) const

DbiDemoData2 provides a rather artificial example:-

   UInt_t GetIndex(UInt_t defIndex) const { return fSubSystem/10; }

and is just meant to demonstrate how a unique index could be extracted from some packed identification word.

The following is required of an index:-

GetIndex returns an unsigned integer as the sign bit has no special significance, but its O.K. to derive the index from a signed value, for example:-

Int_t PlexStripEndId::GetEncoded() const

would be a suitable index for tables indexed by strip end.

Must implement Fill method
This is the way table row objects get filled from a DbiResultSet that acts like an input stream. We have seen a simple example in DbiDemoData1:-

void DbiDemoData1::Fill(DbiResultSet& rs,
                        const DbiValidityRec* vrec) {

  rs >> fSubSystem >> fPedestal >> fGain1 >> fGain2;
   
}

However, filling can be more sophisticated. DbiResultSet provides the following services:-

      string DbiResultSet::CurColName() const;  
      UInt_t DbiResultSet::CurColNum() const;   
      UInt_t DbiResultSet::NumCols() const;     
DbiFieldType DbiResultSet::CurColFieldType() const;

The first 3 give you the name of the current column, its number (numbering starts at one), and the total number of columns in the row. DbiFieldType can give you information about the type, concept and size of the data in this column. In particular you can see if two are compatible i.e. of the same type:-

Bool_t DbiFieldType::IsCompatible(DbiFieldType& other) const;

and if they are of the same capacity i.e. size:-

Bool_t DbiFieldType::IsSmaller(DbiFieldType& other) const;

You can create DbiFieldType objects e.g:-

DbiFieldType myFldType(Dbi::kInt)

see enum Dbi::DataTypes for a full list, to compare with the one obtained from the current row.

In this way filling can be controlled by the names, numbers and types of the columns. The Fill method of DbiDemoData1 contains both a ``dumb'' (take the data as it comes) and a ``smart'' (look at the column name) code. Here is the latter:-

Int_t numCol = rs.NumCols();

//  The first column (SeqNo) has already been processed.
for (Int_t curCol = 2; curCol <= numCol; ++curCol) {
  string colName = rs.CurColName();
  if (      colName == "SubSystem" ) rs >> fSubSystem;
  else if ( colName == "Pedestal" )  rs >> fPedestal;
  else if ( colName == "Gain1" )     rs >> fGain1;
  else if ( colName == "Gain2" )     rs >> fGain2;
  else {
    MSG("Dbi",Msg::kDebug) << "Ignoring column " << curCol 
                           << "(" << colName << ")"
 		           << "; not part of DbiDemoData1" << endl;
    rs.IncrementCurCol();
  }
}

Being ``smart'' comes at a price; if your table has many rows valid at at time, defensive programming like this can cost performance!

In such cases, and if the table only exists is a few variants, its better to determine the variant and then branch to code that hardwires that form

Other services that DbiResultSet offers are:-

      UInt_t DbiResultSet::CurRowNum() const;    
      Bool_t DbiResultSet::IsExhausted() const;  
      string DbiResultSet::TableName();

These tell you the current row number, whether there is no data left and the name of the table.

Also note that it is not a rule that database columns and class data members have to be in a 1:1 correspondence. So long as the table row can satisfy its clients (see below) it can store information derived from the database table rather than the data itself.

May overload the CanL2Cache method
As explained in section 8.1 the Level 2 cache allows table loading to be speeded up by caching the query results as disk files. Only certain tables support this option which by default is disabled. To enable it the table row object overrides this method as follows:-

Bool_t CanL2Cache() const { return kTRUE; }

Only table row classes who data members are built-in data types (ints, floats and chars) should do this. Table rows having objects or dynamic data e.g. string or pointers must not claim to support L2 caching. Note the table row doesn't need code to save/restore to the cache, this is handled by the DbiTableProxy

Must Provide Services to its Clients
There would not be much point in its existence otherwise would there? However its not necessarily the case that all its does is to provide direct access to all the data that came from the table. This subject is explored in the next section.

The LinkDef.h file

For each Table Row class you must add an entry to the package's LinkDef.h file so that rootcint can supply all the additional methods such as Streamer to integrate the class into the ROOT framework. For example DatabaseInterface/test/LinkDef.h are the lines:-

#pragma link C++ class DbiResultPtr<DbiDemoData1>;
#pragma link C++ class DbiResultPtr<DbiDemoData2>;
#pragma link C++ class DbiResultPtr<DbiDemoData3>;

Caution: if you have class Yyy that has a DbiResultptr<Xxx> data member. and Yxx and Xxx are in the same package the line:-
#pragma link C++ class DbiResultPtr<Xxx>;
must precede:-
#pragma link C++ class Yyy;

The GNUmakefile file

The additions to the LinkDef.h file tells rootcint what new classes you have introduced to access the database, but you also have to ensure that the make system supplies rootcint with the DatabaseInterface template class headers. This is actually a generic problem that occurs whenever one package uses template headers from another. The solution is to add the required header to the CINTLIST and to use the make vpath command to tell make where to find it. For example, to add DbiResultPtr.h:-

ROOTCINT := YES
vpath DatabaseInterface/DbiResultPtr.h $(SRT_PRIVATE_CONTEXT)/include/ \
                                       $(SRT_PUBLIC_CONTEXT)/include/
CINTLIST += DatabaseInterface/DbiResultPtr.h

The first line is as normal and switches on rootcint, supplying it with all the packages headers, which are stored in the variable CINTLIST. The next line tells make to look for DatabaseInterface/DbiResultPtr.h first the SRT private and then public contexts. That ensures that searching is in the correct order; private supersedes public. The last line adds the header to the list of headers passed to rootcint.

Data Encapsulation

A table row object is the gateway between a database table and the end users who want to use the data it contains. Like any good OO design, the aim should be to hide implementation and only expose the abstraction. There is nothing wrong in effectively giving a 1:1 mapping between the columns of the database table and the getters in the table row object if that is appropriate. For example, a table that gives the position of each plane in a detector is going to have an X, Y and Z both in the database and in the getter. However at the other extreme there is calibration. Its going to be well into detector operation before the best form of calibration has been found, but it would be bad design to constantly change the table row getters. Its far better to keep the data in the database table very generic, for example:-

    SeqNo      int,
    SubSystem  int,
    CalibForm  int,
    parm0      float,
    parm1      float,
    parm2      float,
    ...

The significance of parm0,... depends on CalibForm. The table row object could then provide a calibration service:-

Float_t Calibrate(Float_t rawValue) const;
rather than expose parm0,.. Calibrate() would have code that tests the value of CalibForm and then uses the appropriate formula involving parm0... Of course some validation code will want to look at the quality of the calibration by looking at the calibration constants themselves, but this too could be abstracted into a set of values that hide the details of the form of the calibration.


Filling Tables

Overview

DatabaseInterface can be used to write back into any table from which it can read. To do this you need the services of a DbiWriter. In the case of Algorithm Configuration there is a convenient interface using the DbiConfigStream object - see section 8.3.4.

For all other cases you will need to explicitly use a DbiWriter which is a templated class like DbiResultPtr, so for example, to write DbiDemoData1 rows you need an object of the class:-

DbiWriter<DbiDemoData1>

DbiWriter only fills tables, it does not create them

Always create new tables with mysql before attempting to fill them

If you want to create the tables within the same job as the one that fills it then you can do so as follows:-

  int  dbno = 0; // or whatever cascade entry you need
  DbiStatement* s = DbiTableProxyRegistry::Instance()
                     .GetCascader()
                       .CreateStatement(dbNo);
  s->ExecuteUpdate("create table if not exists xxxVLD ("
                   " SEQNO         int not null primary key auto_increment,"
                   " TIMESTART     datetime not null,"
                   " TIMEEND       datetime not null,"
                   " DETECTORMASK  tinyint,"
                   " SIMMASK       tinyint,"
                   " TASK          int,"
                   " AGGREGATENO   int,"
                   " CREATIONDATE  datetime not null,"
                   " INSERTDATE    datetime not null,"
                   " key TIMESTART (TIMESTART),"
                   " key TIMEEND (TIMEEND) )");

  s->ExecuteUpdate("create table if not exists xxx ("
                   " SEQNO integer not null,"
                   " ROW_COUNTER INTEGER not null,"
                     ....
                   " primary key(SEQNO,ROW_COUNTER))");

  delete s;  //Drop statement to release connection.
  s = 0;
 
   where  xxx = name of your table

In outline the filling procedure is as follows:-

  1. Decide the validity range of the data to be written and store it in a VldRange object.

  2. Instantiate a DbiWriter object using this VldRange object together with an aggregate number and task. Aggregate numbers are discussed below.

  3. Pass filled DbiTableRow sub-class objects (e.g. DbiDemoData1) to the DbiWriter. It in turn will send these objects their Store message that performs the inverse of the Fill message. DbiWriter caches the data but performs no database I/O at this stage.

  4. Finally send the DbiWriter its Close message which triggers the output of the data to the database.

The fact that I/O does not occur until all data has been collected has a couple of consequences:-

Aggregate Numbers

As explained in Concepts (see section 8.1) some types of data are written for the entire detector as a single logical block. For example the way PMT pixels map to electronics channels might be written this way. On the other hand if it is written in smaller, sub-detector, chunks then it is Aggregated. For example light injection constants come from pulser data and it is quite possible that a calibration run will only pulse some LEDs and so only part of a full detector set of constants gets written to the database for the run. Each chunk is called an aggregate and given an aggregate number which defines the sub-section of the detector it represents. For pulser data, the aggregate number will probably be the logical (positional) LED number A single DbiWriter can only write a single aggregate at a time, for every aggregate can in principle have a different validity range. For unaggregated data, the aggregate number is -1, for aggregated data numbers start at 0,1,2...

The way that the DatabaseInterface assembles all valid data for a given context is as follows:-

This has two consequences:-

Its perfectly possible that a calibration procedure might produce database data for multiple aggregates at a single pass. If you are faced with this situation and want to write all aggregates in parallel, then simply have a vector of DbiWriter's indexed by aggregate number and pass rows to the appropriate one. See DbiValidate::Test_6() for an example of this type of parallel processing.

Simple Example

We will use the class DbiDemoData1 to illustrate each of the above steps.

  1. Set up VldRange object.
    Typically the VldRange will be based on the VldContext for the event data that was used to generate the database data that is to be stored. For our example we will assume that DbiDemoData1 represents calibration data derived from event data. It will be valid for 1 week from the date of the current event and be suitable for the same type of data.

    VldContext now;  // Event context e.g. CandHandle::GetVldContext()
    VldTimeStamp start = now.GetTimeStamp();
    // Add 7 days (in secs) to get end date.
    time_t vcSec = start.GetSec() + 7*24*60*60;
    VldTimeStamp    end(vcSec,0);
    // Construct the VldRange.
    VldRange      range(now.GetDetector(),
                        now.GetSimFlag(),
                        start,
                        end,
                        "Demo");
    

  2. Instantiate a DbiWriter.
    Create a DbiDemoData1 writer for unaggregated data task 0.

    Int_t aggNo = -1;
    Dbi::Task task = 0;
    // Decide a creation date (default value is now)
    VldTimeStamp create;
    DbiWriter<DbiDemoData1> writer(range,aggNo,task,create);
    

  3. Pass filled DbiDemoData1 objects.

    // Create some silly data.
    DbiDemoData1 row0(0,10.,20.,30.); 
    DbiDemoData1 row1(0,11.,21.,31.);
    DbiDemoData1 row2(0,12.,22.,32.);
    
    // Store the silly data.
    writer << row0;
    writer << row1;
    writer << row2;
    

    The DbiWriter will call DbiDemoData1's Store method which is rather similar to its Fill method:-

    void DbiDemoData1::Store(DbiOutRowStream& ors,
                             const DbiValidityRec* vrec) const {
      ors << fSubSystem << fPedestal << fGain1 << fGain2;
    }
    

    Again notice that the SeqNo, which is part of the table row, but not part of the class data, is silently handled by the system.

  4. Send the DbiWriter its Close message.

    writer.Close();
    

Using DbiWriter

Table Row Responsibilities

All DbiTableRow sub-class objects must support the input interface accessed through DbiResultPtr. The responsibilities that this implies are itemised in section 8.4.2. The output interface is optional; the responsibilities listed here apply only if you want to write data to the database using this interface.

Must override GetAggregateNo method if aggregated

DbiTableRow supplies a default that returns -1. The GetAggregateNo method is used to check that table row objects passed to a particular DbiWriter have the right aggregate number.

Must override Store Method

The Store method is the inverse to Fill although it is passed a DbiOutRowStream reference:-

void Store(DbiOutRowStream& ors) const;
rather than a DbiResultSet reference. Both these classes inherit from DbiRowStream so the same set of methods:-

      string DbiResultSet::CurColName() const;  
      UInt_t DbiResultSet::CurColNum() const;   
      UInt_t DbiResultSet::NumCols() const;     
DbiFieldType DbiResultSet::CurColFieldType() const;
      UInt_t DbiResultSet::CurRowNum() const;    
      string DbiResultSet::TableName();

are available. So, as with the Fill method, there is scope for Store to be ``smart''. The quotes are there because it often does not pay to be too clever! Also like the Fill method its passed a DbiValidityRec pointer (which is only zero when filling DbiValidityRec objects) so that the validity of the row can be accessed if required.

Creating and Writing Temporary Tables

It is possible to create and write temporary tables during execution. Temporary tables have the following properties:-

Temporary tables are a good way to try out new types of table, or different types of data for an existing table, without modifying the database. Writing data is as normal, by means of a DbiWriter, however before you write data you must locate a database in the cascade that will accept temporary tables and pass it a description of the table. This is done using the DbiCascader method CreateTemporaryTable. You can access the cascader by first locating the singleton DbiTableProxyRegister which is in overall charge of the DatabaseInterface. The following code fragment shows how you can define a new table for DbiDemoData1:-

#include "DatabaseInterface/DbiCascader.h"
#include "DatabaseInterface/DbiTableProxyRegistry.h"

...

//  Ask the singleton DbiTableProxyRegistry for the DbiCascader.
  const DbiCascader& cascader 
               = DbiTableProxyRegistry::Instance().GetCascader();

//  Define the table.
  string tableDescr = "(SEQNO INT,   SUBSYSTEM  INT, PEDESTAL FLOAT,"
                      " GAIN1 FLOAT, GAIN2 FLOAT )";
//  Ask the cascader to find a database that will accept it.
  Int_t dbNoTemp = cascader.CreateTemporaryTable("DbiDemoData1",
                                                 tableDescr);

  if ( dbNoTemp < 0 ) {
     cout << "No  database to will accept temporary tables. " << endl;
  }

You pass CreateTemporaryTable the name of the table and its description. The description is a parenthesised comma separated list. It follows the syntax of the MYSQL CREATE TABLE command, see section 8.7.

In principle not every database in the cascade will accept temporary tables so the cascader starts with the highest priority one and works done until it finds one, returning its number in the cascade. It returns -1 if it fails. For this to work properly the first entry in the cascade must accept it so that it will be taken in preference to the true database. It is recommended that the first entry be the temp database, for everyone has write-access to that and write-access is needed to create even temporary tables. So a suitable cascade might be:-

setenv ENV_TSQL_URL "mysql://pplx2.physics.ox.ac.uk/temp;\
                     mysql://pplx2.physics.ox.ac.uk/offline"

Having found a database and defined the new or replacement table, you can now create a DbiWriter and start writing data as describe in section 8.5. You have to make sure that the DbiWriter will output to the correct database which you can either do by specifying it using the 5th arg of its constructor:-

   DbiWriter(const VldRange& vr,
             Int_t aggNo,
             Dbi::Task task = 0,
             VldTimeStamp creationDate = VldTimeStamp(),
             UInt_t dbNo = 0);

or alternatively you can set it after construction:-

  DbiWriter<DbiDemoData1> writer(range,aggNo);
  writer.SetDbNo(dbNoTemp);

As soon as the table has been defined it will, as explained above, completely replace any permanent table in the same database with the same name. However, if there is already data in the cache for the permanent table then it may satisfy further requests for data. To prevent this from happening you can clear the cache as described in the next section.

Do NOT write permanent data to any temporary database for it could end up being used by anyone who includes the database for temporary tables. Database managers may delete any permanent tables in temporary databases without warning in order to prevent such problems.

Clearing the Cache

Normally you would not want to clear the cache, after all its there to improve performance. However if you have just created a temporary table as described above, and it replaces an existing table, then clearing the cache is necessary to ensure that future requests for data are not satisfied from the now out of date cache. Another reason why you may want to clear the cache is to study database I/O performance.

Although this section is entitled Clearing the Cache, you cannot actually do that as the data in the cache may already be in use and must not be erased until its clients have gone away. Instead the data is marked as stale, which is to say that it will ignored for all future requests. Further, you don't clear the entire cache, just the cache associated with the table that you want to refresh. Each table is managed by a DbiTableProxy that owns a DbiCache. Both DbiWriter and DbiResultPtr have a TableProxy method to access the associated DbiTableProxy. The following code fragment shows how to set up a writer and mark its associated cache as stale:-

  DbiWriter<DbiDemoData1> writer(range,aggNo);
  writer.SetDbNo(dbNoTemp);
  writer.TableProxy().GetCache()->SetStale();

ASCII Flat Files and Catalogues

Overview

ASCII flat files and catalogues provide a convenient way to temporarily augment a database with additional tables under your control. A flat file is a file that contains, in human readable form, the definition of a table and its data. It can be made an entry in a cascade and, by placing before other entries allows you to effectively modify the database just for the duration of a single job. As has already been explained, for each Main Data Table there is also an auxiliary Validity Range Table, so you need 2 entries in the cascade for each table you want to introduce. The problem with this scheme is that, if introducing a number of tables, the cascade could get rather large. To avoid this catalogues are used. A catalogue is actually nothing more that a special ASCII flat file, but each row of its data is a URLs for another ASCII flat file that becomes part of the same cascade entry. In this way a single cascade entry can consist of an arbitrary number of files.

Flat Files

An ASCII flat file defines a single database table.

Format

The format is sometimes referred to as Comma Separated Value (CSV). Each line in the file corresponds to a row in the table. As you might suspect, values are separated by commas, although you can add additional white space (tabs and spaces) to improve readability (but heed the caution in section 8.6.4). The first row is special, it contains the column names and types. The types must valid MySQL types, see table 8.1 for some examples. If the special row is omitted or is invalid then the column names are set to C1, C2, ... etc. and all types are set to string (TEXT). Here is a simple example of a CSV file:-

SeqNo int, Pedestal float, SubSystem int, Gain1 float, Gain2 float
1,    1.0,    0,    10.,    100.
1,    1.1,    1,    11.,    110.
1,    1.2,    2,    12.,    120.
1,    1.3,    3,    13.,    130.

Its in a convention to use the file extension .csv, but it is not compulsory.

If any value is a string or a date, it must be delimited by double quotes.

URL

The database URL is based on the standard one extended by adding the suffix

#absolute-path-to-file

For example:-

mysql://fnpcsrv1.fnal.gov:3307/temp#/farm/minsoft2/Minossoft/minossoft/releases/R1.24.3/Production/asciidb/cedar_daikon/bhcurv/BFLDDBIPLANEMAP.csv

The table name is derived from the file name after stripping off the extension. In this example, the table name will be BFLDDBIPLANEMAP

Catalogues

These are special types of ASCII Flat File. Their data are URLs to other flat files. You cannot nest them i.e. one catalogue cannot contain a URL that is itself catalogue.

Format

The first line of the file just contains the column name ``name''. The remaining lines are URLs of the flat files. Here is a simple example:-

name
file:/home/minos/work/MyData.csv
file:/home/minos/work/MyDataVld.csv
file:$MY_ENV/MyDataToo.csv
file:$MY_ENV/MyDataTooVld.csv

This catalogue defines two tables MyData and MyDataToo each with its associated auxiliary validity range table. Note that files names must be absolute but can begin with an environmental variable including one defined by the DBI:-

DBI_CATALOGUE_PATH

which is always the directory holding the catalogue currently being processed. If the above example is written:-

name
file:$DBI_CATALOGUE_PATH/MyData.csv
file:$DBI_CATALOGUE_PATH/MyDataVld.csv
file:$DBI_CATALOGUE_PATH/MyDataToo.csv
file:$DBI_CATALOGUE_PATH/MyDataTooVld.csv

and all the files and the catalogue held in the same directory then they can be moved en bloc somewhere else and only the catalogue URL need be updated.

URL

The URL is identical to any other flat file with one additional constraint: the extension must be .cat or .db. Here are some valid examples:-

mysql://#/home/minos/work/MyCatalogue.db
file:/home/minos/work/MyCatalogue.cat


Example

The stand-alone testing of the Database Interface includes an example of an ASCII Catalogue. The URL of the cascade entry is:-

mysql://localhost/test#$SRT_PUBLIC_CONTEXT/DatabaseInterface/test/DemoASCIICatalogue.db

If you look at the file:-

   \$SRT_PUBLIC_CONTEXT/DatabaseInterface/test/DemoASCIICatalogue.db

you will see it contains 4 lines, defining the tables DEMOASCIIDATA (a Detector Descriptions table) and DEMOASCIICONFIG ( Algorithm Configurations table):-

file:$SRT_PUBLIC_CONTEXT/DatabaseInterface/test/DEMOASCIIDATA.csv
file:$SRT_PUBLIC_CONTEXT/DatabaseInterface/test/DEMOASCIIDATAVLD.csv
file:$SRT_PUBLIC_CONTEXT/DatabaseInterface/test/DEMOASCIICONFIG.csv 
file:$SRT_PUBLIC_CONTEXT/DatabaseInterface/test/DEMOASCIICONFIGVLD.csv

In both cases, the auxiliary validity range table defines a single validity range, although there is no reason why it could not have defined any number. For the DEMOASCIIDATA, there are 5 rows, a header row followed by 4 rows of data:-

SEQNO INT, UNWANTED INT, PEDESTAL FLOAT, SUBSYSTEM INT, GAIN1 FLOAT, GAIN2 FLOAT
1,99,1.0,0,10.,100.
1,99,1.1,1,11.,110.
1,99,1.2,2,12.,120.
1,99,1.3,3,13.,130.

For the DEMOASCIICONFIG table, there are only two rows:-

SEQNO INT, CONFIGSTRING TEXT
1,"mybool=1 mydouble=1.23456789012345678e+200 mystring='This is a string' myint=12345"

Caution: Note, don't have any white space between the comma and the leading double quote of the configuration string.


MySQL Crib

This provides the absolute bare minimum to install, manage and use a MySQL database in the context of the DatabaseInterface.

Introduction

The following are useful URLs:-

A good book on MySQL is:-

MySQL by Paul DuBois, Michael Widenius. New Riders Publishing; ISBN: 0-7357-0921-1

Installing

See:-

http://www-numi.fnal.gov/offline_software/srt_public_context/WebDocs/external_products.html

Running mysql

mysql is a utility, used both by system administrators and users to interact with MySQL database. The command syntax is:-

mysql [-h host_name] [-u user_name] [-pyour_pass]

if you are running on the server machine, with you Unix login name and no password then:-

mysql

is sufficient. To exit type:-

\q

Note: most mysql commands are terminated with a semi-colon. If nothing happens when you type a command, the chances are that mysql is still waiting for it, so type it and press return again.

System Administration

This also has to be done as root. As system administrator, MySQL allows you to control access, on a user by user basis, to databases. Here are some example commands:-

create database Minos_db1;
grant all on    Minos_db1.*       to  smart@pplx2.physics.ox.ac.uk
grant all on    Minos_db1.*       to  smart@"%"
grant select    Minos_db1.Boring  to  dumb@pplx2.physics.ox.ac.uk
\q

Selecting a Database

Before you can use mysql to create, fill or examine a database table you have to tell it what database to use. For example:-

use Minos_db1

`use' is one of the few commands that does not have a trailing semi-colon.

Creating Tables

The following commands create, or recreate, a table and display a description of it:-

drop table if exists MyTable;
create table MyTable(
    MyInt      int,
    MyFloat    float,
    MyText     text
 );
describe MyTable;

See table 8.1 for a list of MySQL types that the DatabaseInterface currently supports.

Filling Tables

The following commands add data from the file DemoData1.dat to an existing table:-

load data local infile 'DemoData1.dat' into table DbiDemoData1;

Each line of the file corresponds to a row in the table. Columns should be separated with tabs. Table 8.2 shows typical formats of the various data types.


Table 8.2: Example data formats.

MySQL Type Table Row Type
CHAR a
TINYINT -128
SMALLINT -32768
INT or INTEGER -2147483647
FLOAT -1.234567e-20
DOUBLE 1.23456789012345e+200
TEXT 'This is a string'
DATETIME '2001-12-31 04:05:06'


Making Queries

Here is a sample query:-

select * from DbiDemoData2Validity where
    TimeStart <= '2001-01-11 12:00:00'
and TimeEnd    > '2000-12-22 12:00:00'
and DetectorMask & 4
order by TimeStart desc
;

Performance


Holding Open Connections

Connections to the database are either permanent i.e. open all the time or temporary i.e. they are closed as soon as a I/O operation is complete. A connection is made permanent if:-

[Aside: Before January 2009 connections to a ASCII flat file database were also made permanent as re-opening such a database would involve re-loading all the data. However the overhead costs are low and are outweighed when considering the large number of connections that would otherwise be required on farms running many jobs.]

In all other cases the connection is temporary so as to minimise resources. For normal operations this adds little overhead as typically there are several major database reads at the start of a production job after which little or no further database I/O occurs. However if you require the connection to remain open throughout the job then you can force any entry in the cascade to be permanent. The following code sets entry 0 in the cascade to have a permanent connection:-

#include "DatabaseInterface/DbiCascader.h"
#include "DatabaseInterface/DbiTableProxyRegistry.h"


//  Ask the singleton DbiTableProxyRegistry for the DbiCascader.
  const DbiCascader& cascader 
               = DbiTableProxyRegistry::Instance().GetCascader();
// Request that entry 0 is permanently open.
  cascader.SetPermanent(0);

Note that this won't open the connection but will prevent it from closing after its next use.

If you want all connections to remain open this can be set through the configuration parameter MakeConnectionsPermanent. See section 8.2.2.


Truncated Validity Ranges

Standard context specific queries are first trimmed to a time window to limit the number of VLD records that have to be analysed. Having established the best data, a further 4 calls to query the VLD table is made to determine the full validity. For data with long validities, these extra calls are worthwhile as they can significantly increase the lifetime of the results. However there are two cases where these should not be use:-

The following DbiResultPtr methods support this request:-

  DbiResultPtr(const VldContext& vc,
               Dbi::Task task = Dbi::kDefaultTask,
               Dbi::AbortTest abortTest = Dbi::kTableMissing,
               Bool_t findFullTimeWindow = true);

  DbiResultPtr(const string& tableName,
               const VldContext& vc = Dbi::fgDefaultContext,
               Dbi::Task task = Dbi::kDefaultTask,
               Dbi::AbortTest abortTest = Dbi::kTableMissing,
               Bool_t findFullTimeWindow = true);
  UInt_t NewQuery(VldContext vc, 
               Dbi::Task task=0,
               Bool_t findFullTimeWindow = true);

It is selected by passing in the value false for findFullTimeWindow.

Timing

DbiTimerManager is a static object that provides performance printout when enabled. By default it is enabled but can be disabled by:-

   DbiTimerManager::gTimerManager.Enable(false);


next up previous contents
Next: Navigation Up: The MINOS Off-line Software Previous: Geometry   Contents
MINOS Software 2017-11-20