How-To Online DB Run Preparation

This page contains basic steps only, please see subpages for details on Run preparations!

I. The following paragraphs should be examined *before* the data taking starts:

--- onldb.starp.bnl.gov ---

1. DB: make sure that databases at ports 3501, 3502 and 3503 are running happily. It is useful to check that onldb2.starp, onl10.starp and onl11.starp have replication on and running.

2. COLLECTORS AND RUNLOG DAEMON: onldb.starp contains "old" versions of metadata collectors and RunLogDB daemon. Collector daemons need to be recompiled and started before the "migration" step. One should verify with "caGet <subsystem>.list" command that all EPICS variables are being transmitted and received without problems. Make sure no channels produce "cannot be connected" or "timeout" or "cannot contact IOC" warnings. If they do, please contact Slow Controls expert *before* enabling such service. Also, please keep in mind that RunLogDB daemon will process runs only if all collectors are started and collect meaningful data.

3. FASTOFFLINE: To allow FastOffline processing, please enable cron record which runs migrateDaqFileTags.new.pl script. Inspect that script and make sure that $minRun variable is pointing to some recently taken run or this script will consume extra resource from online db.

4. MONITORING: As soon as collector daemons are started, database monitoring scripts should be enabled. Please see crontabs under 'stardb' and 'staronl' accounts for details. It is recommended to verify that nfs-exported directory on dean is write-accessible.

   Typical crontab for 'stardb' account would be like:

*/3 * * * * /home/stardb/check_senders.sh > /dev/null
*/3 * * * * /home/stardb/check_cdev_beam.sh > /dev/null
*/5 * * * * /home/stardb/check_rich_scaler_log.sh > /dev/null
*/5 * * * * /home/stardb/check_daemon_logs.sh > /dev/null
*/15 * * * * /home/stardb/check_missing_sc_data.sh > /dev/null
*/2 * * * * /home/stardb/check_stale_caget.sh > /dev/null
(don't forget to set email address to your own!)

   Typical crontab for 'staronl' account would look like:

*/10 * * * * /home/staronl/check_update_db.sh > /dev/null
*/10 * * * * /home/staronl/check_qa_migration.sh > /dev/null

--- onl11.starp.bnl.gov ---

1. MQ: make sure that qpid service is running. This service processes MQ requests for "new" collectors and various signals (like "physics on").

2. DB: make sure that mysql database server at port 3606 is running. This database stores data for mq-based collectors ("new").

3. SERVICE DAEMONS: make sure that mq2memcached (generic service), mq2memcached-rt (signals processing) and mq2db (storage) services are running.

4. COLLECTORS: grab configuration files from cvs, and start cdev2mq and ds2mq collectors. Same common sense rule applies: please check that CDEV and EPICS do serve data on those channels first. Also, collectors may be started at onl10.starp.bnl.gov if onl11.starp is busy with something (unexpected IO stress tests, user analysis jobs, L0 monitoring scripts, etc).
 

--- onl13.starp.bnl.gov ---

1. MIGRATION: check crontab for 'stardb' user. Mare sure that "old" and "new" collector daemons are really running, before moving further. Verify that migration macros experience no problems by trying some simple migration script. If it breaks saying that library is not found or something - find latest stable (old) version of STAR lib and set it to .cshrc config file. If tests succeed, enable cron jobs for all macros, and verify that logs contain meaningful output (no errors, warnings etc).
 

--- dean.star.bnl.gov ---

1. PLOTS: Check dbPlots configuration, re-create it as a copy with incremented Run number if neccesary. Subsystem experts tend to check those plots often, so it is better to have dbPlots and mq collectors up and running a little earlier than the rest of services.

2. MONITORING:

3. RUNLOG - now RunLog browser should display recent runs.

--- db03.star.bnl.gov ---

1. TRIGGER COUNTS check cront tab for root, it should have the following records:
40 5 * * * /root/online_db/cron/fillDaqFileTag.sh
0,10,15,20,25,30,35,40,45,50,55 * * * * /root/online_db/sum_insTrgCnt >> /root/online_db/trgCnt.log

First script copies daqFileTag table from online db to local 'trigger' database. Second script calculates trigger counts for FileCatalog (Lidia). Please make sure that both migration and trigger counting work before you enable it in the crontab. There is no monitoring to enable for this service.
 

--- dbbak.starp.bnl.gov ---

1. ONLINE BACKUPS: make sure that mysql-zrm is taking backups from onl10.starp.bnl.gov for all three ports. It should take raw backups daily and weekly, and logical backups once per month or so. It is generally recommended to periodically store weekly / monthly backups to HPSS, for long-term archival using /star/data07/dbbackup directory as temporary buffer space.

 

II. The following paragraphs should be examined *after* the data taking stops:

1. DB MERGE: Online databases from onldb.starp (all three ports) and onl11.starp (port 3606) should be merged into one. Make sure you keep mysql privilege tables from onldb.starp:3501. Do not overwrite it with 3502 or 3503 data. Add privileges allowing read-only access to mq_collector_<bla> tables from onl11.starp:3606 db.

2. DB ARCHIVE PART ONE: copy merged database to dbbak.starp.bnl.gov, and start it with incremented port number. Compress it with mysqlpack, if needed. Don't forget to add 'read-only' option to mysql config. It is generally recommended to put an extra copy to NAS archive, for fast restore if primary drive crashes.

3. DB ARCHIVE PART TWO: archive merged database, and split resulting .tgz file into chunks of ~4-5 GB each. Ship those chunks to HPSS for long-term archival using /star/data07/dbbackup as temporary(!) buffer storage space.

4. STOP MIGRATION macros at onl13.starp.bnl.gov - there is no need to run that during summer shutdown period.

5. STOP trigger count calculations at db03.star.bnl.gov for the reason above.

 

 

HOW-TO: access CDEV data (copy of CAD docs)

As of Feb 18th 2011, previously existing content of this page is removed.

If you need to know how to access RHIC or STAR data available through CDEV interface, please read official CDEV documentation here : http://www.cadops.bnl.gov/Controls/doc/usingCdev/remoteAccessCdevData.html

 

Documentation for CDEV access codes used in Online Data Collector system will be available soon in appropriate section of STAR database documentation.

-D.A.

HOW-TO: compile RunLogDb daemon

  1. Login to root@onldb.starp.bnl.gov
  2. Copy latest "/online/production/database/Run_[N]/" directory contents to "/online/production/database/Run_[N+1]/"
  3. Grep every file and replace "Run_[N]" entrances with "Run_[N+1]" to make sure daemons pick up correct log directories
  4. $> cd /online/production/database/Run_9/dbSenders/online/Conditions/run
  5. $> export DAEMON=RunLogDb
  6. $> make
  7. /online/production/database/Run_9/dbSenders/bin/RunLogDbDaemon binary should be compiled at this stage

TBC

 

HOW-TO: create a new Run RunLog browser instance and retire previous RunLog

1. New RunLog browser:

  1. Copy the contents of the "/var/www/html/RunLogRunX" to "/var/www/html/RunLogRunY", where X is the previous Run ID, and Y is the current Run ID
  2. Change symlink "/var/www/html/RunLog" pointing to "/var/www/html/RunLogX" to "/var/www/html/RunLogY"
  3. TBC

2. Retire Previous RunLog browser:

  1. Grep source files and replace all "onldb.starp : 3501/3503" with "dbbak.starp:340X", where X is the id of the backed online database.
  2. TrgFiles.php and ScaFiles.php contain reference "/RunLog/", which should be changed to "/RunLogRunX/", where X is the run ID
  3. TBC

3. Update /admin/navigator.php immediately after /RunLog/ rotation! New run range is required.

HOW-TO: enable Online to Offline migration

Migration macros reside on stardb@onllinux6.starp.bnl.gov .

$> cd dbcron/macros-new/
(you should see no StRoot/StDbLib here, please don't check it out from CVS either - we will use precompiled libs)

First, one should check that Load Balancer config env. variable is NOT set :

$> printenv|grep DB

DB_SERVER_LOCAL_CONFIG=

(if it says =/afs/... .xml, then it should be reset to "" in .cshrc and .login scripts)

Second, let's check that we use stable libraries (newest) :

$> printenv | grep STAR
...
STAR_LEVEL=SL08e
STAR_VERSION=SL08e
...
(SL08e is valid for 2009, NOTE: no DEV here, we don't want to be affected by changed or broken DEV libraries)

OK, initial settings look good, let's try to load Fill_Magnet.C macro (easiest to see if its working or not) :

$> root4star -b -q Fill_Magnet.C

You should see some harsh words from Load Balancer, that's exactly what we need - LB should be disabled for our macros to work. Also, there should not be any segmentation violations. Initial macro run will take some time to process all runs known to date (see RunLog browser for run numbers).

Let's check if we see the entries in database:

$> mysql -h robinson.star.bnl.gov -e "use RunLog_onl; select count(*) from starMagOnl where entryTime > '2009-01-01 00:00:00' " ;
(entryTime should be set to current date)

+----------+
| count(*) |
+----------+
|     1589 |
+----------+

Now, check the run numbers and magnet current with :

$>  mysql -h robinson.star.bnl.gov -e "use RunLog_onl; select * from starMagOnl where entryTime > '2009-01-01 00:00:00' order by entryTime desc limit 5" ;
+--------+---------------------+--------+-----------+---------------------+--------+---------+----------+----------+-----------+------------+------------------+
| dataID | entryTime           | nodeID | elementID | beginTime           | flavor | numRows | schemaID | deactive | runNumber | time       | current          |
+--------+---------------------+--------+-----------+---------------------+--------+---------+----------+----------+-----------+------------+------------------+
|  66868 | 2009-02-16 10:08:13 |     10 |         0 | 2009-02-15 20:08:00 | ofl    |       1 |        1 |        0 |  10046008 | 1234743486 | -4511.1000980000 |
|  66867 | 2009-02-16 10:08:13 |     10 |         0 | 2009-02-15 20:06:26 | ofl    |       1 |        1 |        0 |  10046007 | 1234743486 | -4511.1000980000 |
|  66866 | 2009-02-16 10:08:12 |     10 |         0 | 2009-02-15 20:02:42 | ofl    |       1 |        1 |        0 |  10046006 | 1234743486 | -4511.1000980000 |
|  66865 | 2009-02-16 10:08:12 |     10 |         0 | 2009-02-15 20:01:39 | ofl    |       1 |        1 |        0 |  10046005 | 1234743486 | -4511.1000980000 |
|  66864 | 2009-02-16 10:08:12 |     10 |         0 | 2009-02-15 19:58:20 | ofl    |       1 |        1 |        0 |  10046004 | 1234743486 | -4511.1000980000 |
+--------+---------------------+--------+-----------+---------------------+--------+---------+----------+----------+-----------+------------+------------------+
 

If you see that, you are OK to start cron jobs (see "crontab -l") !

 

 

HOW-TO: online databases, scripts and daemons

Online db enclave includes :

primary databases: onldb.starp.bnl.gov, ports : 3501|3502|3503
repl.slaves/hot backup: onldb2.starp.bnl.gov, ports : 3501|3502|3503
read-only online slaves: mq01.starp.bnl.gov, mq02.starp.bnl.gov 3501|3502|3503
trigger database: db01.star.bnl.gov, port 3316, database: trigger

Monitoring:
http://online.star.bnl.gov/Mon/
(scroll down to see online databases. db01 is monitored, it is in offline slave group)

Tasks:

1. Slow Control data collector daemons

$> ssh stardb@onldb.starp.bnl.gov;
$> cd /online/production/database/Run_11/dbSenders;
./bin/ - contains scripts for start/stop daemons
./online/Conditions/ - contains source code for daemons (e.g. ./online/Conditions/run is RunLogDb)
See crontab for monitoring scripts (protected by lockfiles)

Monitoring page :
http://online.star.bnl.gov/admin/daemons/

2. Online to Online migration

  • RunLogDb daemon performs online->online migration. See previous paragraph for details (it is one of the collector daemons, because it shares the source code and build system). Monitoring: same page as for data collectors daemons;
  • migrateDaqFileTags.pl perl script moves data from onldb.starp:3503/RunLog_daq/daqFileTag to onldb.starp:3501/RunLog/daqFileTag table (cron-based). This script is essential for OfflineQA, so RunLog/daqFileTag table needs to be checked to ensure proper OfflineQA status.

3. RunLog fix script
$> ssh root@db01.star.bnl.gov;
$> cd online_db;
sum_insTrgCnt is the binary to perform various activities per recorded run, and it is run as cron script (see crontab -l).

4. Trigger data migration
$> ssh root@db01.star.bnl.gov
/root/online_db/cron/fillDaqFileTag.sh <- cron script to perform copy from online trigger database to db01
BACKUP FOR TRIGGER CODE:
1. alpha.star.bnl.gov:/root/backups/db01.star.bnl.gov/root
2. bogart.star.bnl.gov:/root/backups/db01.star.bnl.gov/root

5. Online to Offline migration
$> ssh stardb@onl13.starp.bnl.gov;
$> cd dbcron/macros-new; ls;
Fill*.C macros are the online->offline migration macros. There is no need in local/modified copy of the DB API, all macros use regular STAR libraries (see tcsh init scripts for details)
Macros are cron jobs. See cron for details (crontab -l). Macros are lockfile-protected to avoid overlap/pileup of cron jobs.
Monitoring :
http://online.star.bnl.gov/admin/status/