Last significant change: 2008/01/30This chapter is organised into the following sections.
It is the purpose of the DatabaseInterface to provide simple and efficient access to such data and to provide a framework in which new types of data can be added with minimal effort.
Within the database, data is organised into tables. When the user requests data from a table, the DatabaseInterface collect rows of data from the appropriate table. From the perspective of the interface, there are 3 types of organisation:-
There are two types of aggregation:-
The DatabaseInterface provides a simple, uniform concept regardless of the data being accessed. Each request for data produces a pointer giving read access to a results table, which is effectively a slice of the underlying database table. Each row of the results table is an object, the type of which is table-specific. These table row objects give access to the data from one row but can hide the way the database table is organised. So changes to the physical layout of a database table should only effect its table row object, not the end users of the data. Note that a single request only ever accesses a single table; the interface does not support retrieval of data from multiple database tables simultaneously.
If the request for data fails for some reason, then the resulting table will be empty, otherwise it will have a single row for Simple organisation and more than one row for Compound and Aggregated. The user can ask how many rows the table has and can directly access any of them. The physical ordering of the rows in the table reflects the way the data was originally written, so for Aggregated data, the ordering is not optimised for retrieval. To deal with this, each table row object can declare a natural index, independent of its physical position, and this natural index can be used to retrieve data.
The DatabaseInterface can access data for more than one database. During initialisation it is given a list of database URLs. The list order reflects priority; the interface first looks for data in the first database in the list, but if that fails, tries the others in turn until all have been tried or data is found. This scheme allows a user to override parts of the official database by creating a mini-database with their own data and then placing it in the list ahead of the official database. The concept of a set of overlaying databases is called a cascade.
In principle, any of the data retrieved by the interface could depend on the the current event being processed. Clearly Detector Descriptions, such as calibration constants, will change with time and the interface has to retrieve the right ones for the current event. Although less likely, its possible that Algorithm Configurations too could change to reflect changes in the detector. For this reason, all requests for data through the interface must supply information about the:-
Collectively this information is called the Context and is represented by the VldContext class of the Validity package. Note that in common with event data and times
All Database date and times are in UTC.
In the database all data is tagged by a Validity Range which identifies the types of data and detector and the ranges of date times for which it is valid. This is represented by the VldRange class of the Validity package. Some data, typically Algorithm Configuration, is universal; the same database data can be used for any event. Others may be very specific to a single type of data and detector and a limited date time range. Physically the way this is done is to have a pair of tables:-
This table consists of rows of VldRange objects, each with a unique sequence number which is used as a key into the Main Data Table.
Each row has a sequence number corresponding to an entry in the Validity Range Table.
The interface first finds a match in the Validity Range Table for the current context and then retrieves all rows in the Main Data Table that match its sequence number. The reasons for this two step approach are:-
The primary function of DatabaseInterface is to provide the best information for a specific context, but it can also retrieve information for much more general queries. The query is still broken into two parts: the ``context'' which is matched to the Validity Range Table and then the data from the main table is taken for the selected sequence number(s). However the user can supply a context such as ``All ranges that start between this time and that time'' hence the term ``Extended Context''. Further, during the retrieval of data from the main table addition restrictions can be imposed. The result of an Extended Context query is a collection of rows that will not normally represent the state of the detector at a single moment in time and it is up to the user to interpret the results meaningfully.
As explained in the preceding section, the interface finds the database data that best matches the context of the data. There are occasions when this matching needs to be changed, for example there can be times when Monte Carlo data needs to be treated exactly as if it were event data and this includes the way it retrieves from the database. To support this the user can specify, for any type of data, an associated list of data types. If this is done then, instead of using the current type, each of the alternative types are tried until a match is found. This matching takes precedence over the cascade i.e. all associated types are tried on the first database in the cascade before moving on to the second and subsequent cascade members. This ensures that higher members, which might even refer back to the ORACLE database at FNAL, are only tried as a last resort.
As explained in the previous section, sequence numbers in the Validity Range Table are unique. However this can present a problem if the same type of data is being entered into several different databases. For example calibration constants will be created in the Near, Far and Calibration detectors. Eventually the tables will be merged but it is essential that there is no conflict in the sequence numbers. To solve this problem, certain databases are special: they are able to produce globally unique sequences numbers. They do this as each is allocated a unique block of 10,000,000 sequence numbers ( which is enough to allow a new entry to be made every minute for 20 years!). These blocks are recorded in a special table: GLOBALSEQNO that holds the last used sequence number for each table. The block 1..9,999,999 is used for local sequence numbers i.e. ones that are only guaranteed unique within the current database table.
By default permanent data written to an authorising database will be written with global sequence numbers. For temporary data, or if writing to a non-authorising database, local sequence numbers are used and in this case a LOCALSEQNO table is generated automatically if required.
Important:-
For constants that change with time (if that is not a contradiction in terms!) it makes sense to have overlapping Validity Ranges. For example, suppose we know that a certain sort of calibration constants drifts with time and that, once determined, is only satisfactory for the next week's worth of data. A sensible procedure would be to limit its validity to a week when writing to the database but to determine new constants every few days to ensure that the constants are always ``fresh'' and that there is no danger that there will be a gap. However, this means that the interface has to perform two types of Validity Management:-
Having found the best set, the interface wants to know how long it will remain the best. Any set whose creation date is later will be better according to the above rule and so the retrieved data has its validity trimmed so as not to overlap it. This reduced Validity Range is called the Effective Validity Range. This only happens in memory; the database itself is not modified, but it does mean that the interface does not need to check the database again for this set of data until the Effective Validity Ranges has expired. This trimming also applies between databases in a cascade, with sets in higher priority databases trimming those in lower ones.
As explained above, creation dates play a crucial role in resolving which set of data to use; later creation dates take priority over earlier ones. This scheme assumes that constants from earlier runs are created before constants from later runs, but this isn't always true. When improving e.g. calibration constants, it's quite normal to recalibrate recent runs before going back and fixing earlier ones and then, simply to use the date when the constants were created would mean that the constants from earlier runs would take priority over any later runs they overlapped. To allow constants to be created in any order the interface provides a system for deducing the best creation dates for any constants as follows:-
The scheme means that creation dates always follow that dates of the runs that they correspond to rather than the dates when their constants were created. When using the scheme its probably better to consider the ``dates'' to be version numbers.
The database changes almost constantly to reflect the state of the detector, particularly with regard to the calibration constants. However this can mean that running the same job twice can produce different results if database updates that have occurred between the two runs. For certain tasks, e.g. validation, its necessary to decouple jobs from recent updates and this requires database rollback i.e. restoring the database to a previous state. Rollback works by exploiting the fact that data is not, in general, ever deleted from the database. Instead new data is added and, by the rules of Ambiguity Resolution (see the previous section) supersede the old data.
Data carry two dates: date of insertion into local database and global creation date, which is typically the date when it was originally created and is universal to all databases. Rollback can use either date. Using insert date rollback undoes recent changes to the local database and the effect will therefore be specific to the local database. On the other hand rollback by creation date removes recently created data in the same way from all databases.
One of the interface's responsibilities is to minimise I/O. Some requests, particularly for Detector Configuration, can pull in large amounts of data but users must not load it once at the start of the job and then use it repeatedly; it may not be valid for all the data they process. Also multiple users may want access to the same data and it would be wasteful for each to have their own copy.
To deal with both of the above, the interface reuses the concept of a handle, or proxy, that appears in other packages such as Candidate. The system works as follows:-
For Detector Description data, tables can be large and the user will require direct access to every row. However, the way the table is arranged in memory reflects the way the data was originally written to the database. For Simple and Compound data the table designer can control this organisation as complete sets are written as a single unit. For Aggregated data, the layout reflects the way aggregates are written, for example, for pulser data, all the strip ends illuminated by an LED will appear together in the table. This allows the interface to replace individual aggregates as their validity expires. However this means that the physical layout may not be convenient for access. To deal with this table row objects, which all inherit from DbiTableRow are obliged to return a Natural Table Index, if the physical ordering is not a natural one for access. Tables can then be accessed by this index.
Task will provide a way to further select the type of data retrieved. For example:-
The aim is that Task will allow a particular database table to be sub-divided according to the mode of use. Currently Task is a data type defined in Dbi i.e. Dbi::Task and is implemented as an integer. The default value is zero.
Loading a large table from the database is a lot of work:-
Now as the detector configuration changes slowly with time identically the same process outlined above is repeated many times, in many jobs that process the data, so the obvious solution is to cache the results to disk in some way that can be reloaded rapidly when required. The technique essentially involves making an image copy of the table to disk. It can only be applied to some tables, but these include the Calibration tables which represent the largest database I/O load, and for these tables loading times can be reduced by an order of magnitude.
The interface needs a list of Database URLs, a user name and a password. These can be communicated via a set environmental variables of the form ENV_TSQL*. Here is the set with some example setting:-
setenv ENV_TSQL_URL "mysql://pplx2.physics.ox.ac.uk/offline" setenv ENV_TSQL_USER "west" setenv ENV_TSQL_PSWD ""
The meanings are as follows.
protocol://host[:port]/[database][?options] where: protocol - DBMS type , e.g. mysql etc. host - host name or IP address of database server port - port number database - name of database options - string key=value's separated by ';' or '&' Example: "mysql://myhost:3306/test?Trace=Yes;TraceFile=qq.log"
These variable should be set for the standard read-only configuration. Any job that uses the JobControl package can override any or all of these variables using the switches:-
for example:-
demojob -d mysql://pplx2.physics.ox.ac.uk/offline -u reader ...
replaces ENV_TSQL_URL and ENV_TSQL_USER but takes the password from ENV_TSQL_PSWD. Use of these switches is the recommended way to configure a framework job when write-access to the database is required.
Once the job is running you can configure the DatabaseInterface. To
change any configurable aspect of the interface get a reference to its
CfgConfigurable interface, set parameters as required and
update:-
// DbiTableProxyRegistry is a CfgConfigurable
DbiTableProxyRegistry& dbiCfg = DbiTableProxyRegistry::Instance();
dbiCfg.Set("...");
dbiCfg.Set("...");
dbiCfg.Set("...");
dbiCfg.Update();
On startup the interface checks the environmental variable ENV_DBI
which can contain a semi-colon separated list of configuration requests.
For example:-
setenv ENV_DBI "Level2Cache = '/home/minos/temp/dbi_chache';Rollback:* = '2000-01-01'"
see below for an explanation of what these requests mean.
In the above example the environment string contained a *. To prevent csh and derivative shells (e.g.tcsh) for attempting to expand the * as a set of file names, temporarily define the shell variable nogob:-
set noglob setenv ENV_DBI " ... unset noglob
To impose a global rollback date to say September 27th 2002:-
dbiCfg.Set("Rollback:* = '2002-09-27 00:00:00'");
This will ensure that the interface ignores data inserted after this date for all future queries. The hours, minutes and seconds can be omitted and default to 00:00:00.
Rollback can be more selective, specifying either a single table or a group of tables with a common prefix. For example:-
dbiCfg.Set("Rollback:* = '2002-09-01'");
dbiCfg.Set("Rollback:CAL* = '2002-08-01'");
dbiCfg.Set("Rollback:CALPINTOADC = '2002-07-01'");
Now the table CALPINTOADC is frozen at July 2002, other CAL tables at August and all other tables at September. The ordering of the commands is not important; the interface always picks the most specific one to apply to each table.
By default rollback is by insert date, but it can be changed, on a table by table basis, to creation date using the RollbackType configuration request e.g.
dbiCfg.Set("RollbackType:CAL* = 'INSERTDATE'");
dbiCfg.Set("RollbackType:UGLI* = 'CREATIONDATE'");
The first of these two lines is redundant as this is the default but serves to illustrate how to select both types of date.
Rollback only applies to future queries, it does not invalidate any existing query result in the cache which are still available to satisfy future requests. So impose rollback conditions at the start of the program to ensure they apply consistently.
By default the DatabaseInterface matches data using the context of the current event, and this includes the type of data e.g. Monte Carlo or real. As explained in the Concepts section 8.1.7 this can be overridden. For example:-
dbiCfg.Set("SimFlagAssociation:MC = 'Reroot,MC,Data'");
would mean that for MC data, the interface would try in order: Reroot, then MC and finally Data for the first database cascade member and then the second and so on until a match was found.
Multiple associations can be set up, here is a really silly example:-
dbiCfg.Set("SimFlagAssociation:MC = 'Data'");
dbiCfg.Set("SimFlagAssociation:Data = 'MC'");
By default the DatabaseInterface does not clean up its cache at the end of the program; for to do so would waste time and gain nothing. However, if looking for leaks this then appears as a massive one, so a full shutdown with cache clean up can be enabled by:-
dbiCfg.Set("Shutdown = 1");
By default the DatabaseInterface closes connection to the database between queries, to minimise use of resources - see section 8.8.1. If the job is doing a lot of database I/O, for example creating calibration constants then this may degrade performance in which case all connections can be made permanent by:-
dbiCfg.Set("MakeConnectionsPermanent = 1");
alternatively:-
dbiCfg.Set("MakeConnectionsPermanent = -1");
does the reverse and makes all database connections temporary. A third alternative is:-
dbiCfg.Set("MakeConnectionsPermanent = 0");
which leaves ASCII flat file database connections open as reconnecting involves reloading all ASCII files even if they are not needed to satisfy the current query.
By default when the DatabaseInterface retrieves the data for a Context Query, it does not impose an order on the data beyond requiring that it be in sequence number order. When an ordering is not imposed, the database server is under no obligation to return data in a particular order. This means that the same job running twice connected to the same database could end up with result sets that contain the same data but with different ordering. Normally this doesn't matter, the ordering of rows is not significant. However, results from two such jobs may not be identical as floating point calculations can change at machine level precision if their ordering is changed. There are situations where it is required that the results be identical. For example:-
and for such occasions it is possible to completely specify the ordering of rows within a sequence number by forcing sub-ordering by ROW_COUNTER, a column that should be present in all Main Data tables:-
dbiCfg.Set("OrderContextQuery = 1");
Enabling the Level 2 Cache allows certain large tables query results to be written to disk from which they can be reloaded by subsequent jobs saving as much as an order of magnitude in load time. Data in the cache will not prevent changes in the database from taking affect for the DatabaseInterface does an initial (lightweight) query of the database to confirm that the data in the cache is not stale. To enable the cache, the user specifies a directory to which they have read/write access. For example, to make the current working directory the cache:-
dbiCfg.Set("Level2Cache = './'");
Cache files all have the extension .dbi_cache. Not all tables are suitable for Level 2 caching; the DatabaseInterface will only cache the ones that are.
Cache files can be shared between users at a site to maximise the benefit. In this case the local Database Manager must set up a directory to which the group has read/write access. Management is trivial, should the cache become too large, it can simply be erased and then the next few jobs that run will re-populate it with the currently hot queries.
Note that Cache performance is achieved by doing raw binary I/O so the cache files are platform specific, so if running in a heterogeneous cluster the Database Manager should designate a platform specific directory. To simplify this, the name of the directory used by the cache can include environmental variables e.g.:-
dbiCfg.Set("Level2Cache = '$DBI_L2CACHE/$SRT_ARCH'");
The DatabaseInterface gives you access to two types of data:-
In all cases the user specifies the database table to be accessed and supplies a ``context'' for the query. The context describes the type and date time of the current event. This is stored in a Validity package VldContext object. Candidate and RawData have:-
VldContext GetVldContext() const
methods to get their context. The DatabaseInterface uses the context to extract all the rows from the database table that are valid for this event. It forms the result into a table in memory and returns a object that acts like a pointer to it.
You are NOT responsible for deleting the table; the Database Interface will do that when the table is no longer needed
You have random access to any row of the results table. Each row is an object which is specific to that table. The key to understanding how to get data from a database table is study the class that represent a row of it results table.
As explained above, the key to getting data is to locate the class that represents one row in a database table. To understand how this all works look at one of the sample tables included in the test directory of the DatabaseInterface package and imaginatively called DbiDemoData1, DbiDemoData2 and DbiDemodata3. For purposes of illustration we will pick the first of these. Its header can be found in:-
DatabaseInterface/test/DbiDemoData1.h
To make a query you create a DbiResultPtr object. Its header can be found in:-
DatabaseInterface/DbiResultPtr.h
This is a class that is templated on the table row class, so in this case the instantiated class is:-
DbiResultPtr<DbiDemoData1>
and to instantiate an object of this class you just need a VldContext object. Suppose vc is such an object, then this creates the pointer:-
DbiResultPtr<DbiDemoData1> myResPtr(vc);
This statement creates a DbiResultPtr for DbiDemoData1 class. First it searches through the database for all DbiDemoData1 objects that are valid for vc, then it assembles them into a table and finally passes back a pointer to it. Not bad for one statement! The constructor can take a second argument:-
DbiResultPtr(VldContext vc,Dbi::Task task=0);
Dbi::Task offers a way to sub-divided a table according to the mode of operation. For example a Detector Configuration data could have two modes, one for raw calibration and another for refined calibration.
If the concept of a task is not relevant for a particular database table, then it should be left at its default value of 0. Otherwise data for each task should be allocated a unique positive number and then selection will only pick rows with the required value of task.
The constructor can take further arguments which can normally be left at their default values - a Dbi::AbortTest see section 8.3.6 and a Bool_t findFullTimeWindow see section 8.8.2.
UInt_t GetNumRows() const;
If the query failed then the number of rows returned will be zero. This could either be the result of some catastrophic failure, for example the database could not be opened, or simply that no appropriate data exists for the current event. If you want to know which of these it is you can use the:-
const DbiValidityRec* GetValidityRec() const;
If this returns a null pointer, then the failure was a major one, see 8.3.6. If not then the DbiValidityRec tells you about the validity of the gap. Its method:-
const VldRange& GetVldRange() const;
returns a Validity package VldRange object that can yield the start and end times of the gap. Due to the way the DatabaseInterface forms the query, this may be an underestimate, but never an overestimate.
If the table has rows then the GetVldRange() will give you an object that tells you the validity range of the data. Again, the range may be an underestimate. To get to the data itself, use the method:-
const T* GetRow(UInt_t i) const;
where T = DbiDemoData1 in this case. This gives you a const pointer to
the
row where i is in the range
.
Putting this all together, suppose you have a CandDigitListHandle object cdlh, and you want to loop over all DbiDemoData1 objects that are valid for it, the code is:-
DatabaseInterface/test/DbiDemoData1.h
DatabaseInterface/DbiResultPtr.h
...
DbiResultPtr<DbiDemoData1> myResPtr(cdlh.GetVldContext());
for ( UInt_t irow = 0; irow < myResPtr.GetNumRows(); ++ires) {
const DbiDemoData1* ddd1 = myResPtr.GetRow(irow);
// Process row.
}
GetRow is guaranteed to return a non-zero pointer if the row number is within range, otherwise it returns zero. The ordering of rows reflects the way the data was written to the database. For some types of data this layout is not well suited for access. For example, for pulser data, all the strip ends illuminated by an LED will appear together in the table. To deal with this table row object are obliged to return a Natural Table Index, if the physical ordering is not a natural one for access. You get rows from a table according to their index using the method:-
const T* GetRowByIndex(UInt_t index) const;
You should always check the return to ensure that its non-zero when using this method unless you are absolutely certain that the entry must be present.
Having got to the table row you want, the last job is to get its data. Its up to the table row objects themselves to determine how they will present the database table row they represent. In our example, the DbiDemoData1 is particularly dumb. Its internal state is:-
Int_t fSubSystem;
Float_t fPedestal;
Float_t fGain1;
Float_t fGain2;
which it is content to expose fully:-
Int_t GetSubSystem() const { return fSubSystem; }
Float_t GetPedestal() const { return fPedestal; }
Float_t GetGain1() const { return fGain1; }
Float_t GetGain2() const { return fGain2; }
Its worth pointing out though that it is the job of the table row object to hide the physical layout of the database table and so shield its clients from changes to the underlying database. Its just another example of data encapsulation.
Even though a DbiResultPtr is lightweight it is also reusable;
you can make a fresh query using the NewQuery method:-
UInt_t NewQuery(VldContext vc, Dbi::Task task=0);
which returns the number of rows found in the new query. For example:-
DbiResultPtr<DbiDemoData1> myResPtr(vc); ... VldContext newVc; ... myResPtr.NewQuery(newVc); ...
Having made a query you can also step forwards or backwards to the adjacent validity range using the method:-
UInt_t NextQuery(Bool_t forwards = kTRUE);supply a false value to step backwards. This method can be used to ``scan'' through a database table, for example to study calibration constants changes as a function of time. To use this efficiently you need to request accurate validity ranges for your initial query, although this is the default see section 8.8.2. For aggregated data stepping to a neighbouring range will almost certainly contain some rows in common unless all component aggregates have validity ranges that end on the boundary you are crossing. See the next section for a way to detect changes to data using the DbiResult::GetID() method.
The first, and most important, level of optimisation is done within the DatabaseInterface itself. Each time it retrieves data from the database it places the data in an internal cache. This is then checked during subsequent queries and reused as appropriate. So the first request for a large table of calibration constants may require a lot of I/O. However the constants may remain valid for an entire job and in which case there is no further I/O for this table.
Although satisfying repeat requests for the same data is quick it still requires the location of the appropriate cache and then a search through it looking for a result that it is suitable for the current event. There are situations when even this overhead can be a burden: when processing many rows in a single event. Take for example the procedure of applying calibration. Here every digitization needs to be calibrated using its corresponding row in the database. The naive way to do this would be to loop over the digits, instantiating a DbiResultPtr for each, extracting the appropriate row and applying the calibration. However it would be far more efficient to create a little calibration object something like this:-
class MyCalibrator {
public:
MyCalibrator(const VldContext vc): fResPtr(vc) {}
Float_t Calibrate(const CandDigitHandle cdh) {
/* Use fResPtr to calibrate cdh */
}
private
DbiResultPtr<DbiDemoData1> fResPtr;
};
MyCalibrator is a lightweight object holding only a pointer to a results table. It is created with a VldContex object which it uses to prime its pointer. After that it can be passed CandDigitHandle objects for which it returns calibrated results using its Calibrate method. Now the loop over all digitizations can use this object without any calls to the DatabaseInterface at all. Being lightweight MyCalibrator is fine as a stack object, staying in scope just long enough to do its job.
Another optimisation strategy involves caching results derived from a query. In this case it is important to identify changes in the query results so that the cached data can be refreshed. To aid this, each DbiResult is given an key which uniquely identifies it. This key can be obtained and stored as follows:-
DbiResultKey MyResultKey(myResPtr.GetKey());
This should be stored by value (the DbiResultKey pointed to by GetKey will be deleted when the results expire) as part of the cache and checked each time a change is possible:-
if ( ! MyResultKey.IsEqualTo(myResPtr.GetKey()) ) {
// recreate the cache data ...
MyResultKey = *myResPtr.GetKey();
}
Caution: This tests to see that the current DbiResult has exactly the same data as that used when the cached was filled, but not that it is physically the same object. If there have been intervening queries the original object may have been deleted but this should not matter unless the cache holds pointers back to the DbiResult. In this case the result ID should be used. Initialise with:-
Int_t MyResultID(myResPtr.GetResultID());
and then check as follows:-
if ( MyResultID != (myResPtr.GetResultID()) ) {
// recreate the cache data ...
MyResultID = myResPtr.GetResultID();
}
To create a Registry object and fill it from the set ``MyConfig'' for the software ``MyAlgorithm'':-
Registry reg;
DbiConfigStream stream("MyAlgorithm","MyConfig");
stream >> reg;
DbiConfigStream is a lightweight object that can be discarded as soon as the Registry object has been filled.
You can stop reading this section now unless you are curious as to what is happening inside DbiConfigStream. For Detector Description data, every database table has its own table row class. With Algorithm Configuration, the reverse is true. All data is loaded from a single table (SOFTWARE_CONFIGURATION) using the same table row class: DbiConfigSet, and a DbiConfigStream object provides a simple way to read and write these objects. Also, in contrast to Detector Configuration data, Algorithm Configuration queries yield a simple (i.e. single row) result. The single DbiConfigSet object give access to all of its parameters i.e. columns:-
UInt_t GetNumParams() const;
string GetParamName(UInt_t parNo) const;
DbiFieldType GetParamType(UInt_t parNo) const;
string GetParamValue(UInt_t parNo) const;
Each parameter ends up as an item in the Registry.
By default DbiConfigSet is loaded from the SOFTWARE_CONFIGURATION table which has the following columns:-
['' 'mydouble'=(double)1.234e+100 'myint'=(int)12345 'mystring'=(string)'This is a string' '']
In the above case the Registry would be filled with 3 values:-
mydouble = 1.234e+100
myint = 12345
mystring = This is a string
Normally Algorithm Configuration data will be universally valid and only use the default task so the constructor takes a series of default arguments, but the full set is:-
DbiConfigStream(const std::string& SoftwName,
const std::string& ConfigName="default",
VldContext vc = Dbi::fgDefaultContext,
Dbi::Task task=0,
const std::string& tableName="SOFTWARE_CONFIGURATION");
So it is possible to specify a VldContext and a task and even an alternative table name.
Although database writing is dealt with later in this chapter, this is a convenient place to deal with configuration output. To create new configuration data, or overlay existing ones:-
DbiConfigStream stream("MyAlgorithm","MyNewConfig");
Registry reg; // Fill reg from somewhere... stream << reg;At this stage only the DbiConfigStream has been updated; no database output has been performed.
DbiValidityRec& vrec = stream.GetValidityRec();and modifying/replacing as required.
Bool_t DbiConfigStream::Write(UInt_t dbNo = 0,
const std::string& logComment = "");
specifying the cascade entry number and the comment used
to create a new database log record.
The constructor of a DbiResultPtr for an Extended Context Query is:-
DbiResultPtr(const string& tableName,
const DbiSqlContext& context,
const Dbi::Task& task = Dbi::kAnyTask,
const string& data = "",
const string& fillOpts = "",
Dealing with each of these arguments in turn:-
This argument provides the extended context through the utility class DbiSqlContext. Consider the following code:-
// Construct the extended context: FarDet data that starts on Sept 1 2003.
// (note: then end time stamp is exclusive)
VldTimeStamp tsStart(2003,9,1,0,0,0);
VldTimeStamp tsEnd(2003,9,2,0,0,0);
DbiSqlContext context(DbiSqlContext::kStarts,tsStart,
tsEnd,DetectorType::kFar,SimFlag::kData);
You supply the type of context (in this case DbiSqlContext::kStarts), the date range and the detector type and sim flag. Other types of context are kEnds and kThroughout. See
DatabaseInterface/DbiSqlContext.h
for the complete list.
You are not limited to the contexts that DbiSqlContext provides. If you know the SQL string you want to apply then you can create a DbiSqlContext with the WHERE clause you require e.g.:-
DbiSqlContext myContext("DETECTORMASK & 4")
which would access every row that is suitable for the CalDet detector.
The task is as for other queries but with the default value of:-
Dbi::kAnyTask
which results in the task being omitted from the context query and also allows for more general queries: anything that is is valid after the where is permitted. For example:-
DbiSqlContext myContext("creationdate > '2004-01-01 00:00:00' order by creationdate limit 1")
The SQL must have a where condition, but if you don't need one, create a dummy that is always true e.g.:-
DbiSqlContext myContext("1 = 1 order by timeend desc limit 1 ")
This is an SQL fragment, that if not empty (the default value) is used to extend the WHERE clause that is applied when querying the main table. For example consider:-
DbiSqlContext context(DbiSqlContext::kStarts,tsStart,tsEnd,
DetectorType::kFar,SimFlag::kData);
DbiResultPtr<DbuSubRunSummary>
runs("DBUSUBRUNSUMMARY",context,
Dbi::kAnyTask,"RUNTYPENAME = 'NormalData'");
This query reads the DBUSUBRUNSUMMARY table, and besides imposing the context query also demands that the data rows satisfies a constraint on RUNTYPENAME.
This is a string that can be retrieved from DbiResultSet when filling each row so could be used to program the way an object fills itself e.g. by only filling certain columns. The DatabaseInterface plays no part here; it merely provides this way to communicate between the query maker and the the author of the class that is being filled.
Accessing the results of an Extended Context query are essentially the same as for a standard query but with following caveats:-
const DbiValidityRec* GetValidityRec(const DbiTableRow* row=0) const;
is used with the default argument then the ``global validity'' of the set i.e. the overlap of all the rows is returned. Given the nature of Extended Queries there may be no overlap at all. In general it is far better to call this method and pass a pointer to a specific row for in this case you will get that validity of that particular row.
const T* GetRowByIndex(UInt_t index) const;
will not be able to access all the data in the table if two or more rows have the same Natural Index. This is prohibited in a standard query but extended ones break all the rules and have to pay a price!
All DbiResultPtr constructors, except the default constructor, have a optional argument:-
Dbi::AbortTest abortTest = Dbi::kTableMissing
Left at its default value any query that attempts to access a non-existent table will abort the job. The other values that can be supplied are:-
Currently aborting means just that; there is no graceful shut down and saving of existing results. You have been warned!
Errors from the database are recorded in a DbiExceptionLog. There is a global version of that records all errors. The contents can be printed as follows:-
#include "DatabaseInterface/DbiExceptionLog.h"
...
MSG("MyPackage",Msg::kInfo) << "Contents of the Global Exception Log: \n"
<< DbiExceptionLog::GetGELog();
Query results are held in a DbiResult and each of these also holds a DbiExceptionLog of the errors (if any) recorded when the query was made. If myResPtr is a DbiResultPtr, then to check and print associated errors:-
const DbiExceptionLog& el(myResPtr.GetResult()->GetExceptionLog());
if ( el.Size() == 0 ) MSG("MyPackage",Msg::kInfo) << "No errors found" << endl;
else MSG("MyPackage",Msg::kInfo) << "Following errors found" << el << endl;
The general rule is that a table name should match the DbiTableRow subclass object that it is used to fill. For example the table PLEXRAWCHANNELTOPINDIODE corresponds to the class PlexRawChannelToPinDiode. Beyond this the following are required as we need to remain compatible with ORACLE:-
http://home.fnal.gov/%7Edbox/SQL_API_Portability.html
These restrictions also apply to column names.
Creating new Detector Descriptions involves the creation of a database table and the corresponding table row Class. The main features can be illustrated using the example we have already studied: DbiDemoData1. Recall that its state data is:-
Int_t fSubSystem;
Float_t fPedestal;
Float_t fGain1;
Float_t fGain2;
Its database table, which bears the same name, is defined, in MySQL, as:-
CREATE TABLE DBIDEMODATA1(
SEQNO INTEGER not null,
ROW_COUNTER INTEGER not null,
SUBSYSTEM INT,
PEDESTAL FLOAT,
GAIN1 FLOAT,
GAIN2 FLOAT,
primary key(SEQNO,ROW_COUNTER));
as you can see there is a simple 1:1 correspondence between them except that the database table has two additional leading entries:-
SEQNO INTEGER not null,
ROW_COUNTER INTEGER not null,
and a trailing entry:-
primary key(SEQNO,ROW_COUNTER));
ROW_COUNTER is a column whose value is generated by the interface, it isn't part of table row class. Its sole purpose is to ensure that every row in the table is unique; an import design constraint for any database. This is achieved by ensuring that, for a given SEQNO, each row has a different value of ROW_COUNTER. This allows the combination of these two values to form a primary (unique) key, which is declared in the trailing entry.
All database tables supported by the DatabaseInterface have an auxiliary Validity Range Tables that defines validity ranges for them. Each validity range is given a unique sequence number that acts as a key and corresponds to SeqNo. In our case, indeed every case apart from the table name, the definition is:-
create table DBIDEMODATA1VLD(
SEQNO integer not null primary key,
TIMESTART datetime not null,
TIMEEND datetime not null,
DETECTORMASK tinyint(4),
SIMMASK tinyint(4),
TASK integer,
AGGREGATENO integer,
CREATIONDATE datetime not null,
INSERTDATE datetime not null,
key TIMESTART (TIMESTART),
key TIMEEND (TIMEEND));
When the DatabaseInterface looks for data that is acceptable for a give validity it:-
So, as a designer, you need to be aware of the sequence number, and the row counter must be the first two columns in the database table, but are not reflected in the table row class.
Filling a table row object from the database is done using the class's Fill method. For our example:-
void DbiDemoData1::Fill(DbiResultSet& rs,
const DbiValidityRec* vrec) {
rs >> fSubSystem >> fPedestal >> fGain1 >> fGain2;
}
the table row object is passed a DbiResultSet which acts rather like an input stream. The sequence number has already been stripped off; the class just has to fill its own data member. The DatabaseInterface does type checking (see the next section) but does not fail if there is a conflict; it just produces a warning message and puts default data into the variable to be filled.
The second argument is a DbiValidityRec which can, if required, be interrogated to find out the validity of the row. For example:-
const VldRange& range = vrec->GetVldRange();
vrec may be zero, but only when filling DbiValidityRec objects themselves. On all other occasions vrec should be set.
Notes
To save table space, select CHAR(n) for characters strings with 3 or less characters and select the smallest capacity for integers.
Although MySQL supports unsigned values we banned them when attempting to get a previous interface to work with ORACLE, so unsigned in database column type should be avoided. It is allowed to have unsigned in the table row when a signed value is not appropriate and the interface will correctly handle I/O to the signed value in the database even if the most significant bit is set i.e. the signed value in the database is negative. It is unfortunate that the signed value in the database will look odd in such cases.
DbiDemoData1::DbiDemoData1() { }
The DatabaseInterface needs to keep a object of every type of table
row class.
virtual DbiTableRow* CreateTableRow() const {
return new DbiDemoData1; }
The DatabaseInterface uses this method to populate results tables.
UInt_t GetIndex(UInt_t defIndex) const
DbiDemoData2 provides a rather artificial example:-
UInt_t GetIndex(UInt_t defIndex) const { return fSubSystem/10; }
and is just meant to demonstrate how a unique index could be extracted from some packed identification word.
The following is required of an index:-
GetIndex returns an unsigned integer as the sign bit has no special significance, but its O.K. to derive the index from a signed value, for example:-
Int_t PlexStripEndId::GetEncoded() const
would be a suitable index for tables indexed by strip end.
void DbiDemoData1::Fill(DbiResultSet& rs,
const DbiValidityRec* vrec) {
rs >> fSubSystem >> fPedestal >> fGain1 >> fGain2;
}
However, filling can be more sophisticated. DbiResultSet provides the following services:-
string DbiResultSet::CurColName() const;
UInt_t DbiResultSet::CurColNum() const;
UInt_t DbiResultSet::NumCols() const;
DbiFieldType DbiResultSet::CurColFieldType() const;
The first 3 give you the name of the current column, its number (numbering starts at one), and the total number of columns in the row. DbiFieldType can give you information about the type, concept and size of the data in this column. In particular you can see if two are compatible i.e. of the same type:-
Bool_t DbiFieldType::IsCompatible(DbiFieldType& other) const;
and if they are of the same capacity i.e. size:-
Bool_t DbiFieldType::IsSmaller(DbiFieldType& other) const;
You can create DbiFieldType objects e.g:-
DbiFieldType myFldType(Dbi::kInt)
see enum Dbi::DataTypes for a full list, to compare with the one obtained from the current row.
In this way filling can be controlled by the names, numbers and types of the columns. The Fill method of DbiDemoData1 contains both a ``dumb'' (take the data as it comes) and a ``smart'' (look at the column name) code. Here is the latter:-
Int_t numCol = rs.NumCols();
// The first column (SeqNo) has already been processed.
for (Int_t curCol = 2; curCol <= numCol; ++curCol) {
string colName = rs.CurColName();
if ( colName == "SubSystem" ) rs >> fSubSystem;
else if ( colName == "Pedestal" ) rs >> fPedestal;
else if ( colName == "Gain1" ) rs >> fGain1;
else if ( colName == "Gain2" ) rs >> fGain2;
else {
MSG("Dbi",Msg::kDebug) << "Ignoring column " << curCol
<< "(" << colName << ")"
<< "; not part of DbiDemoData1" << endl;
rs.IncrementCurCol();
}
}
Being ``smart'' comes at a price; if your table has many rows valid at at time, defensive programming like this can cost performance!
In such cases, and if the table only exists is a few variants, its better to determine the variant and then branch to code that hardwires that form
Other services that DbiResultSet offers are:-
UInt_t DbiResultSet::CurRowNum() const;
Bool_t DbiResultSet::IsExhausted() const;
string DbiResultSet::TableName();
These tell you the current row number, whether there is no data left and the name of the table.
Also note that it is not a rule that database columns and class data members have to be in a 1:1 correspondence. So long as the table row can satisfy its clients (see below) it can store information derived from the database table rather than the data itself.
Bool_t CanL2Cache() const { return kTRUE; }
Only table row classes who data members are built-in data types (ints, floats and chars) should do this. Table rows having objects or dynamic data e.g. string or pointers must not claim to support L2 caching. Note the table row doesn't need code to save/restore to the cache, this is handled by the DbiTableProxy
For each Table Row class you must add an entry to the package's LinkDef.h file so that rootcint can supply all the additional methods such as Streamer to integrate the class into the ROOT framework. For example DatabaseInterface/test/LinkDef.h are the lines:-
#pragma link C++ class DbiResultPtr<DbiDemoData1>; #pragma link C++ class DbiResultPtr<DbiDemoData2>; #pragma link C++ class DbiResultPtr<DbiDemoData3>;
Caution: if you have class Yyy that has a DbiResultptr<Xxx>
data member. and Yxx and Xxx are in the same package the line:-
#pragma link C++ class DbiResultPtr<Xxx>;
must precede:-
#pragma link C++ class Yyy;
The additions to the LinkDef.h file tells rootcint what new classes you have introduced to access the database, but you also have to ensure that the make system supplies rootcint with the DatabaseInterface template class headers. This is actually a generic problem that occurs whenever one package uses template headers from another. The solution is to add the required header to the CINTLIST and to use the make vpath command to tell make where to find it. For example, to add DbiResultPtr.h:-
ROOTCINT := YES
vpath DatabaseInterface/DbiResultPtr.h $(SRT_PRIVATE_CONTEXT)/include/ \
$(SRT_PUBLIC_CONTEXT)/include/
CINTLIST += DatabaseInterface/DbiResultPtr.h
The first line is as normal and switches on rootcint, supplying it with all the packages headers, which are stored in the variable CINTLIST. The next line tells make to look for DatabaseInterface/DbiResultPtr.h first the SRT private and then public contexts. That ensures that searching is in the correct order; private supersedes public. The last line adds the header to the list of headers passed to rootcint.
A table row object is the gateway between a database table and the end users who want to use the data it contains. Like any good OO design, the aim should be to hide implementation and only expose the abstraction. There is nothing wrong in effectively giving a 1:1 mapping between the columns of the database table and the getters in the table row object if that is appropriate. For example, a table that gives the position of each plane in a detector is going to have an X, Y and Z both in the database and in the getter. However at the other extreme there is calibration. Its going to be well into detector operation before the best form of calibration has been found, but it would be bad design to constantly change the table row getters. Its far better to keep the data in the database table very generic, for example:-
SeqNo int,
SubSystem int,
CalibForm int,
parm0 float,
parm1 float,
parm2 float,
...
The significance of parm0,... depends on CalibForm. The table row object could then provide a calibration service:-
Float_t Calibrate(Float_t rawValue) const;rather than expose parm0,.. Calibrate() would have code that tests the value of CalibForm and then uses the appropriate formula involving parm0... Of course some validation code will want to look at the quality of the calibration by looking at the calibration constants themselves, but this too could be abstracted into a set of values that hide the details of the form of the calibration.
DatabaseInterface can be used to write back into any table from which it can read. To do this you need the services of a DbiWriter. In the case of Algorithm Configuration there is a convenient interface using the DbiConfigStream object - see section 8.3.4.
For all other cases you will need to explicitly use a DbiWriter which is a templated class like DbiResultPtr, so for example, to write DbiDemoData1 rows you need an object of the class:-
DbiWriter<DbiDemoData1>
DbiWriter only fills tables, it does not create them
Always create new tables with mysql before attempting to fill them
If you want to create the tables within the same job as the one that fills it then you can do so as follows:-
int dbno = 0; // or whatever cascade entry you need
DbiStatement* s = DbiTableProxyRegistry::Instance()
.GetCascader()
.CreateStatement(dbNo);
s->ExecuteUpdate("create table if not exists xxxVLD ("
" SEQNO int not null primary key auto_increment,"
" TIMESTART datetime not null,"
" TIMEEND datetime not null,"
" DETECTORMASK tinyint,"
" SIMMASK tinyint,"
" TASK int,"
" AGGREGATENO int,"
" CREATIONDATE datetime not null,"
" INSERTDATE datetime not null,"
" key TIMESTART (TIMESTART),"
" key TIMEEND (TIMEEND) )");
s->ExecuteUpdate("create table if not exists xxx ("
" SEQNO integer not null,"
" ROW_COUNTER INTEGER not null,"
....
" primary key(SEQNO,ROW_COUNTER))");
delete s; //Drop statement to release connection.
s = 0;
where xxx = name of your table
In outline the filling procedure is as follows:-
The fact that I/O does not occur until all data has been collected has a couple of consequences:-
Be very sure that you delete the DbiWriter once you have finished with it or you will have a serious memory leak!
To cut down the risk of a memory leak, you cannot copy construct or assign to DbiWriter objects.
The way that the DatabaseInterface assembles all valid data for a given context is as follows:-
This has two consequences:-
Its perfectly possible that a calibration procedure might produce database data for multiple aggregates at a single pass. If you are faced with this situation and want to write all aggregates in parallel, then simply have a vector of DbiWriter's indexed by aggregate number and pass rows to the appropriate one. See DbiValidate::Test_6() for an example of this type of parallel processing.
We will use the class DbiDemoData1 to illustrate each of the above steps.
VldContext now; // Event context e.g. CandHandle::GetVldContext()
VldTimeStamp start = now.GetTimeStamp();
// Add 7 days (in secs) to get end date.
time_t vcSec = start.GetSec() + 7*24*60*60;
VldTimeStamp end(vcSec,0);
// Construct the VldRange.
VldRange range(now.GetDetector(),
now.GetSimFlag(),
start,
end,
"Demo");
Int_t aggNo = -1; Dbi::Task task = 0; // Decide a creation date (default value is now) VldTimeStamp create; DbiWriter<DbiDemoData1> writer(range,aggNo,task,create);
// Create some silly data. DbiDemoData1 row0(0,10.,20.,30.); DbiDemoData1 row1(0,11.,21.,31.); DbiDemoData1 row2(0,12.,22.,32.); // Store the silly data. writer << row0; writer << row1; writer << row2;
The DbiWriter will call DbiDemoData1's Store method which is rather similar to its Fill method:-
void DbiDemoData1::Store(DbiOutRowStream& ors,
const DbiValidityRec* vrec) const {
ors << fSubSystem << fPedestal << fGain1 << fGain2;
}
Again notice that the SeqNo, which is part of the table row, but not part of the class data, is silently handled by the system.
writer.Close();
DbiWriter(const VldRange& vr,
Int_t aggNo,
Dbi::Task task = 0,
VldTimeStamp creationDate = VldTimeStamp(0,0),
UInt_t dbNo = 0,
const std::string& LogComment = "",
const std::string& tableName = ""
);
void SetOverlayCreationDate();
to ensure that DbiWriter uses Overlay Creation Dates.
Caution: Supplying the entry number assumes that at execution time the cascade is defined in a way that is consistent with the code that is using the DbiWriter. As an alternative, you can supply the database name (e.g. offline) if you know it and are certain it will appear in the cascade.
void SetLogComment(const std::string& LogComment)
Update comments are ignored unless writing to a Master database (i.e. one used as a source database e,g. the database at FNAL), and in this case a non-blank comment is mandatory unless the table is exempt. Currently only DBI, DCS and PULSER tables are exempt.
If the first character on the string is the '@' character then the rest of the string will be treated as the name of a file that contains the comment. If using DbiWriter to write multiple records to the same table as part of a single update then only create a single DbiWriter and use the Open method to initialise for the second and subsequent records. That way a single database log entry will be written to cover all updates.
DbiWriter<T>& operator<<(const T& row);
for example:-
writer << row0; writer << row1; writer << row2;
DbiWriter calls the table row's Store method, see the next section. It also performs some basic sanity checks:-
If either check fails then an error message is output and the data marked as bad and the subsequent Close method will not produce any output.
Bool_t Close();
which returns true if the data is successfully output.
Alternatively, you can write out the data as a DBMauto update file by passing the name of the file to the Close command:-
Close("my_dbmauto_update_file.dbm");
void SetRequireGlobalSeqno(Int_t requireGlobal) Where requireGlobal > 0 Must be global = 0 Must be global if writing permanent data to an authorising database < 0 Must be local
void Abort();
Bool_t Open(const VldRange& vr,
Int_t aggNo,
Dbi::Task task = 0,
VldTimeStamp creationDate = VldTimeStamp(),
UInt_t dbNo = 0);
The arguments have the same meaning as for the constructor. An alternative form of the Open statement allows the database name to be supplied instead of its number. If the DbiWriter is already assembling data then the Close method is called internally to complete the I/O. The method returns true if successful. As explained above, the Open method must be used if writing multiple records to the same table as part of a single update for then a single database log entry will be written to cover all updates.
All DbiTableRow sub-class objects must support the input interface accessed through DbiResultPtr. The responsibilities that this implies are itemised in section 8.4.2. The output interface is optional; the responsibilities listed here apply only if you want to write data to the database using this interface.
DbiTableRow supplies a default that returns -1. The GetAggregateNo method is used to check that table row objects passed to a particular DbiWriter have the right aggregate number.
The Store method is the inverse to Fill although it is passed a DbiOutRowStream reference:-
void Store(DbiOutRowStream& ors) const;rather than a DbiResultSet reference. Both these classes inherit from DbiRowStream so the same set of methods:-
string DbiResultSet::CurColName() const;
UInt_t DbiResultSet::CurColNum() const;
UInt_t DbiResultSet::NumCols() const;
DbiFieldType DbiResultSet::CurColFieldType() const;
UInt_t DbiResultSet::CurRowNum() const;
string DbiResultSet::TableName();
are available. So, as with the Fill method, there is scope for Store to be ``smart''. The quotes are there because it often does not pay to be too clever! Also like the Fill method its passed a DbiValidityRec pointer (which is only zero when filling DbiValidityRec objects) so that the validity of the row can be accessed if required.
It is possible to create and write temporary tables during execution. Temporary tables have the following properties:-
Temporary tables are a good way to try out new types of table, or different types of data for an existing table, without modifying the database. Writing data is as normal, by means of a DbiWriter, however before you write data you must locate a database in the cascade that will accept temporary tables and pass it a description of the table. This is done using the DbiCascader method CreateTemporaryTable. You can access the cascader by first locating the singleton DbiTableProxyRegister which is in overall charge of the DatabaseInterface. The following code fragment shows how you can define a new table for DbiDemoData1:-
#include "DatabaseInterface/DbiCascader.h"
#include "DatabaseInterface/DbiTableProxyRegistry.h"
...
// Ask the singleton DbiTableProxyRegistry for the DbiCascader.
const DbiCascader& cascader
= DbiTableProxyRegistry::Instance().GetCascader();
// Define the table.
string tableDescr = "(SEQNO INT, SUBSYSTEM INT, PEDESTAL FLOAT,"
" GAIN1 FLOAT, GAIN2 FLOAT )";
// Ask the cascader to find a database that will accept it.
Int_t dbNoTemp = cascader.CreateTemporaryTable("DbiDemoData1",
tableDescr);
if ( dbNoTemp < 0 ) {
cout << "No database to will accept temporary tables. " << endl;
}
You pass CreateTemporaryTable the name of the table and its description. The description is a parenthesised comma separated list. It follows the syntax of the MYSQL CREATE TABLE command, see section 8.7.
In principle not every database in the cascade will accept temporary tables so the cascader starts with the highest priority one and works done until it finds one, returning its number in the cascade. It returns -1 if it fails. For this to work properly the first entry in the cascade must accept it so that it will be taken in preference to the true database. It is recommended that the first entry be the temp database, for everyone has write-access to that and write-access is needed to create even temporary tables. So a suitable cascade might be:-
setenv ENV_TSQL_URL "mysql://pplx2.physics.ox.ac.uk/temp;\
mysql://pplx2.physics.ox.ac.uk/offline"
Having found a database and defined the new or replacement table, you can now create a DbiWriter and start writing data as describe in section 8.5. You have to make sure that the DbiWriter will output to the correct database which you can either do by specifying it using the 5th arg of its constructor:-
DbiWriter(const VldRange& vr,
Int_t aggNo,
Dbi::Task task = 0,
VldTimeStamp creationDate = VldTimeStamp(),
UInt_t dbNo = 0);
or alternatively you can set it after construction:-
DbiWriter<DbiDemoData1> writer(range,aggNo); writer.SetDbNo(dbNoTemp);
As soon as the table has been defined it will, as explained above, completely replace any permanent table in the same database with the same name. However, if there is already data in the cache for the permanent table then it may satisfy further requests for data. To prevent this from happening you can clear the cache as described in the next section.
Do NOT write permanent data to any temporary database for it could end up being used by anyone who includes the database for temporary tables. Database managers may delete any permanent tables in temporary databases without warning in order to prevent such problems.
Normally you would not want to clear the cache, after all its there to improve performance. However if you have just created a temporary table as described above, and it replaces an existing table, then clearing the cache is necessary to ensure that future requests for data are not satisfied from the now out of date cache. Another reason why you may want to clear the cache is to study database I/O performance.
Although this section is entitled Clearing the Cache, you cannot actually do that as the data in the cache may already be in use and must not be erased until its clients have gone away. Instead the data is marked as stale, which is to say that it will ignored for all future requests. Further, you don't clear the entire cache, just the cache associated with the table that you want to refresh. Each table is managed by a DbiTableProxy that owns a DbiCache. Both DbiWriter and DbiResultPtr have a TableProxy method to access the associated DbiTableProxy. The following code fragment shows how to set up a writer and mark its associated cache as stale:-
DbiWriter<DbiDemoData1> writer(range,aggNo); writer.SetDbNo(dbNoTemp); writer.TableProxy().GetCache()->SetStale();
ASCII flat files and catalogues provide a convenient way to temporarily augment a database with additional tables under your control. A flat file is a file that contains, in human readable form, the definition of a table and its data. It can be made an entry in a cascade and, by placing before other entries allows you to effectively modify the database just for the duration of a single job. As has already been explained, for each Main Data Table there is also an auxiliary Validity Range Table, so you need 2 entries in the cascade for each table you want to introduce. The problem with this scheme is that, if introducing a number of tables, the cascade could get rather large. To avoid this catalogues are used. A catalogue is actually nothing more that a special ASCII flat file, but each row of its data is a URLs for another ASCII flat file that becomes part of the same cascade entry. In this way a single cascade entry can consist of an arbitrary number of files.
An ASCII flat file defines a single database table.
The format is sometimes referred to as Comma Separated Value (CSV). Each line in the file corresponds to a row in the table. As you might suspect, values are separated by commas, although you can add additional white space (tabs and spaces) to improve readability (but heed the caution in section 8.6.4). The first row is special, it contains the column names and types. The types must valid MySQL types, see table 8.1 for some examples. If the special row is omitted or is invalid then the column names are set to C1, C2, ... etc. and all types are set to string (TEXT). Here is a simple example of a CSV file:-
SeqNo int, Pedestal float, SubSystem int, Gain1 float, Gain2 float 1, 1.0, 0, 10., 100. 1, 1.1, 1, 11., 110. 1, 1.2, 2, 12., 120. 1, 1.3, 3, 13., 130.
Its in a convention to use the file extension .csv, but it is not compulsory.
If any value is a string or a date, it must be delimited by double quotes.
The database URL is based on the standard one extended by adding the suffix
#absolute-path-to-file
For example:-
mysql://fnpcsrv1.fnal.gov:3307/temp#/farm/minsoft2/Minossoft/minossoft/releases/R1.24.3/Production/asciidb/cedar_daikon/bhcurv/BFLDDBIPLANEMAP.csv
The table name is derived from the file name after stripping off the extension. In this example, the table name will be BFLDDBIPLANEMAP
These are special types of ASCII Flat File. Their data are URLs to other flat files. You cannot nest them i.e. one catalogue cannot contain a URL that is itself catalogue.
The first line of the file just contains the column name ``name''. The remaining lines are URLs of the flat files. Here is a simple example:-
name file:/home/minos/work/MyData.csv file:/home/minos/work/MyDataVld.csv file:$MY_ENV/MyDataToo.csv file:$MY_ENV/MyDataTooVld.csv
This catalogue defines two tables MyData and MyDataToo each with its associated auxiliary validity range table. Note that files names must be absolute but can begin with an environmental variable including one defined by the DBI:-
DBI_CATALOGUE_PATH
which is always the directory holding the catalogue currently being processed. If the above example is written:-
name file:$DBI_CATALOGUE_PATH/MyData.csv file:$DBI_CATALOGUE_PATH/MyDataVld.csv file:$DBI_CATALOGUE_PATH/MyDataToo.csv file:$DBI_CATALOGUE_PATH/MyDataTooVld.csv
and all the files and the catalogue held in the same directory then they can be moved en bloc somewhere else and only the catalogue URL need be updated.
The URL is identical to any other flat file with one additional constraint: the extension must be .cat or .db. Here are some valid examples:-
mysql://#/home/minos/work/MyCatalogue.db file:/home/minos/work/MyCatalogue.cat
The stand-alone testing of the Database Interface includes an example of an ASCII Catalogue. The URL of the cascade entry is:-
mysql://localhost/test#$SRT_PUBLIC_CONTEXT/DatabaseInterface/test/DemoASCIICatalogue.db
If you look at the file:-
\$SRT_PUBLIC_CONTEXT/DatabaseInterface/test/DemoASCIICatalogue.db
you will see it contains 4 lines, defining the tables
DEMOASCIIDATA (a Detector Descriptions table) and DEMOASCIICONFIG
( Algorithm Configurations table):-
file:$SRT_PUBLIC_CONTEXT/DatabaseInterface/test/DEMOASCIIDATA.csv file:$SRT_PUBLIC_CONTEXT/DatabaseInterface/test/DEMOASCIIDATAVLD.csv file:$SRT_PUBLIC_CONTEXT/DatabaseInterface/test/DEMOASCIICONFIG.csv file:$SRT_PUBLIC_CONTEXT/DatabaseInterface/test/DEMOASCIICONFIGVLD.csv
In both cases, the auxiliary validity range table defines a single
validity range, although there is no reason why it could not have
defined any number. For the DEMOASCIIDATA, there are 5 rows, a
header row followed by 4 rows of data:-
SEQNO INT, UNWANTED INT, PEDESTAL FLOAT, SUBSYSTEM INT, GAIN1 FLOAT, GAIN2 FLOAT 1,99,1.0,0,10.,100. 1,99,1.1,1,11.,110. 1,99,1.2,2,12.,120. 1,99,1.3,3,13.,130.
For the DEMOASCIICONFIG table, there are only two rows:-
SEQNO INT, CONFIGSTRING TEXT 1,"mybool=1 mydouble=1.23456789012345678e+200 mystring='This is a string' myint=12345"
Caution: Note, don't have any white space between the comma and the leading double quote of the configuration string.
This provides the absolute bare minimum to install, manage and use a MySQL database in the context of the DatabaseInterface.
The following are useful URLs:-
http://www.mysql.com/
http://www.mysql.com/documentation/index.html
http://www.mysql.com/downloads/mysql-3.23.html
A good book on MySQL is:-
MySQL by Paul DuBois, Michael Widenius. New Riders Publishing; ISBN: 0-7357-0921-1
See:-
http://www-numi.fnal.gov/offline_software/srt_public_context/WebDocs/external_products.html
mysql is a utility, used both by system administrators and users to interact with MySQL database. The command syntax is:-
mysql [-h host_name] [-u user_name] [-pyour_pass]
if you are running on the server machine, with you Unix login name and no password then:-
mysql
is sufficient. To exit type:-
\q
Note: most mysql commands are terminated with a semi-colon. If nothing happens when you type a command, the chances are that mysql is still waiting for it, so type it and press return again.
This also has to be done as root. As system administrator, MySQL allows you to control access, on a user by user basis, to databases. Here are some example commands:-
create database Minos_db1; grant all on Minos_db1.* to smart@pplx2.physics.ox.ac.uk grant all on Minos_db1.* to smart@"%" grant select Minos_db1.Boring to dumb@pplx2.physics.ox.ac.uk \q
Before you can use mysql to create, fill or examine a database table you have to tell it what database to use. For example:-
use Minos_db1
`use' is one of the few commands that does not have a trailing semi-colon.
The following commands create, or recreate, a table and display a description of it:-
drop table if exists MyTable;
create table MyTable(
MyInt int,
MyFloat float,
MyText text
);
describe MyTable;
See table 8.1 for a list of MySQL types that the DatabaseInterface currently supports.
The following commands add data from the file DemoData1.dat to an existing table:-
load data local infile 'DemoData1.dat' into table DbiDemoData1;
Each line of the file corresponds to a row in the table. Columns should be separated with tabs. Table 8.2 shows typical formats of the various data types.
|
|
Here is a sample query:-
select * from DbiDemoData2Validity where
TimeStart <= '2001-01-11 12:00:00'
and TimeEnd > '2000-12-22 12:00:00'
and DetectorMask & 4
order by TimeStart desc
;
Connections to the database are either permanent i.e. open all the time or temporary i.e. they are closed as soon as a I/O operation is complete. A connection is made permanent if:-
[Aside: Before January 2009 connections to a ASCII flat file database were also made permanent as re-opening such a database would involve re-loading all the data. However the overhead costs are low and are outweighed when considering the large number of connections that would otherwise be required on farms running many jobs.]
In all other cases the connection is temporary so as to minimise resources. For normal operations this adds little overhead as typically there are several major database reads at the start of a production job after which little or no further database I/O occurs. However if you require the connection to remain open throughout the job then you can force any entry in the cascade to be permanent. The following code sets entry 0 in the cascade to have a permanent connection:-
#include "DatabaseInterface/DbiCascader.h"
#include "DatabaseInterface/DbiTableProxyRegistry.h"
// Ask the singleton DbiTableProxyRegistry for the DbiCascader.
const DbiCascader& cascader
= DbiTableProxyRegistry::Instance().GetCascader();
// Request that entry 0 is permanently open.
cascader.SetPermanent(0);
Note that this won't open the connection but will prevent it from closing after its next use.
If you want all connections to remain open this can be set through the configuration parameter MakeConnectionsPermanent. See section 8.2.2.
Standard context specific queries are first trimmed to a time window to limit the number of VLD records that have to be analysed. Having established the best data, a further 4 calls to query the VLD table is made to determine the full validity. For data with long validities, these extra calls are worthwhile as they can significantly increase the lifetime of the results. However there are two cases where these should not be use:-
The following DbiResultPtr methods support this request:-
DbiResultPtr(const VldContext& vc,
Dbi::Task task = Dbi::kDefaultTask,
Dbi::AbortTest abortTest = Dbi::kTableMissing,
Bool_t findFullTimeWindow = true);
DbiResultPtr(const string& tableName,
const VldContext& vc = Dbi::fgDefaultContext,
Dbi::Task task = Dbi::kDefaultTask,
Dbi::AbortTest abortTest = Dbi::kTableMissing,
Bool_t findFullTimeWindow = true);
UInt_t NewQuery(VldContext vc,
Dbi::Task task=0,
Bool_t findFullTimeWindow = true);
It is selected by passing in the value false for findFullTimeWindow.
DbiTimerManager is a static object that provides performance printout when enabled. By default it is enabled but can be disabled by:-
DbiTimerManager::gTimerManager.Enable(false);