Database Distribution: Introduction

Contents

Overview

The flow of data is shown schematically in diagram. At the heart of the system is the Master Database at Fermilab. Most database updates enter the database realm here. At regular intervals the DBM job dbmjob is used to export all recently updated data and these export files are distributed to all other databases where the data is imported if not already present. This is done by the local database manager again using dbmjob. These primary data flows are shown in red.

Smaller amounts of data come from secondary databases e.g. at CalDet and these are exported up to the Master Database where they join other up

This system relies on the ability to:-

dbmjob exploits the fact that all Dbi compliant database tables come in pairs, the main data table and an auxiliary validity range table. The auxiliary table records insertion dates and have globally unique SeqNos (Sequence Numbers). The diagram shows how globally unique numbers are assigned. Every database that is a source of data has a GlobalSeqNo table that is used to generate sequence numbers. Each time one is allocated the count is incremented in the table. For each database the table operates in a different range of numbers hence ensuring that all are unique. dbmjob moves data in ``Validity Packets'' i.e. a single row in the auxiliary table and all its associated data rows. The insertion date and SeqNo on the auxiliary row allow dbmjob to support incremental updates and avoid data duplication. All this implies a very important restriction on dbmjob:-

dbmjob can only distribute Dbi compliant database tables i.e. ones that come in pairs, the main data table and an auxiliary validity range table.

Along with the data, dbmjob exports table descriptions in the database update file. The Slave database ignores these descriptions for any table it already has otherwise it uses them to create empty tables before importing the associated data. This allows new tables to be added the Master database and for the Slave databases to automatically create them when they first receive data for them.

The system also distributes updates to the DBILOGENTRY table which records significant changes to the database i.e. those that could affect production running. These updates are written to the import log files and also broadcast to the list of names contained in:-

    $SRT_PUBLIC_CONTEXT/DatabaseMaintenance/scripts/logentry_mail_list 

DBMauto: The Automated Distribution System

The DatabaseMaintenance package provides a perl driver script to automate the system described above. The key points are:-

The perl driver scripts are to be found in:- $SRT_PUBLIC_CONTEXT/DatabaseMaintenance/scripts/ and the sub-directory:- primer/ has a series of files that can be used as the starting point.

Duplicate SEQNO Handling

The exporting logic overlaps the date window used to select data to ensure no update is missed but it does mean that most updated "Validity Packet"s with their associated globally unique SeqNo get written to two consecutive update files. The first time a Slave site receives a validity packet it imports it but the second time, it finds that it already has the SeqNo so instead compares it.

The import logic that checks for repeated SeqNos classifies them into 3 types:-

Master/Slave Cron Job Synchronisation

As is explained above it is not necessary for Masters and Slaves to synchronise exports and imports; it is fine for the Slave to import at a lower frequency than a Master exports, or even a higher frequency although rather pointless of course. Actually synchronisation brings the low-level risk that two jobs, one writing and one reading, will try to simultaneously access the same update file. To minimise this risk the following convention is proposed for the scheduling of cron jobs:- So, at the highest frequency of hourly updates, an change in the Master would be caught within an hour and then propagate to a Slave 30 mins later.


Return to the top-level Database Distribution document
a Last Modified: $Date: 2006/08/03 07:21:27 $