The STAR FileCatalog is an a set of tools and API providing users access to the MeataData, File and Replica information pertaining to all data produced by the RHIC/STAR experiment. The STAR FileCatalog in other words provides users access to meta-data, file and replica information through a unified schema-agnostic API. The user never needs to know the details of the relation between elements (or keywords) but rather, is provided with a flexible yet powerful query API allowing them to request any combination of 'keywords' based on sets of conditions composed of sequences of keyword operation values combinations. The user manual provides a list of keywords.
The STAR FIleCatalog also provides multi-site support through the same API. In other words, the same set of tools and programmatic interface allows to register, update, maintain a global catalog for the experiment and serve as a core component to the Data Management system. To date, the STAR FileCatalog holds information on 22 Million files and 52 Million active replicas.
A few examples will be left here to guide users and installer.
This dictionary was created on 2012/03/12.
Field | Type | Null | Default | Comments |
---|---|---|---|---|
collisionTypeID | smallint(6) | No | ||
firstParticle | varchar(10) | No | ||
secondParticle | varchar(10) | No | ||
collisionEnergy | float | No | 0 | |
collisionTypeIDate | timestamp | No | CURRENT_TIMESTAMP | |
collisionTypeCreator | smallint(6) | No | 1 | |
collisionTypeCount | int(11) | Yes | NULL | |
collisionTypeComment | text | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
creatorID | bigint(20) | No | ||
creatorName | varchar(15) | Yes | unknown | |
creatorIDate | timestamp | No | CURRENT_TIMESTAMP | |
creatorCount | int(11) | Yes | NULL | |
creatorComment | varchar(512) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
detectorConfigurationID | int(11) | No | ||
detectorConfigurationName | varchar(50) | Yes | NULL | |
dTPC | tinyint(4) | Yes | NULL | |
dSVT | tinyint(4) | Yes | NULL | |
dTOF | tinyint(4) | Yes | NULL | |
dEMC | tinyint(4) | Yes | NULL | |
dEEMC | tinyint(4) | Yes | NULL | |
dFPD | tinyint(4) | Yes | NULL | |
dFTPC | tinyint(4) | Yes | NULL | |
dPMD | tinyint(4) | Yes | NULL | |
dRICH | tinyint(4) | Yes | NULL | |
dSSD | tinyint(4) | Yes | NULL | |
dBBC | tinyint(4) | Yes | NULL | |
dBSMD | tinyint(4) | Yes | NULL | |
dESMD | tinyint(4) | Yes | NULL | |
dZDC | tinyint(4) | Yes | NULL | |
dCTB | tinyint(4) | Yes | NULL | |
dTPX | tinyint(4) | Yes | NULL | |
dFGT | tinyint(4) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
detectorStateID | int(11) | No | ||
sTPC | tinyint(4) | Yes | NULL | |
sSVT | tinyint(4) | Yes | NULL | |
sTOF | tinyint(4) | Yes | NULL | |
sEMC | tinyint(4) | Yes | NULL | |
sEEMC | tinyint(4) | Yes | NULL | |
sFPD | tinyint(4) | Yes | NULL | |
sFTPC | tinyint(4) | Yes | NULL | |
sPMD | tinyint(4) | Yes | NULL | |
sRICH | tinyint(4) | Yes | NULL | |
sSSD | tinyint(4) | Yes | NULL | |
sBBC | tinyint(4) | Yes | NULL | |
sBSMD | tinyint(4) | Yes | NULL | |
sESMD | tinyint(4) | Yes | NULL | |
sZDC | tinyint(4) | Yes | NULL | |
sCTB | tinyint(4) | Yes | NULL | |
sTPX | tinyint(4) | Yes | NULL | |
sFGT | tinyint(4) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
eventGeneratorID | smallint(6) | No | ||
eventGeneratorName | varchar(30) | No | ||
eventGeneratorVersion | varchar(10) | Yes | 0 | |
eventGeneratorParams | varchar(200) | Yes | NULL | |
eventGeneratorIDate | timestamp | No | CURRENT_TIMESTAMP | |
eventGeneratorCreator | smallint(6) | No | 1 | |
eventGeneratorCount | int(11) | Yes | NULL | |
eventGeneratorComment | varchar(512) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
fileDataID | bigint(20) | No | ||
runParamID | int(11) | No | 0 | |
fileName | varchar(255) | No | ||
baseName | varchar(255) | No | Name without extension | |
sName1 | varchar(255) | No | Will be used for name+runNumber | |
sName2 | varchar(255) | No | Will be used for name before runNumber | |
productionConditionID | mediumint(9) | Yes | NULL | |
numEntries | mediumint(9) | Yes | 0 | |
md5sum | varchar(32) | Yes | 0 | |
fileTypeID | smallint(6) | No | 0 | |
fileSeq | smallint(6) | Yes | NULL | |
fileStream | smallint(6) | Yes | 0 | |
fileDataIDate | timestamp | No | CURRENT_TIMESTAMP | |
fileDataCreator | smallint(6) | No | 1 | |
fileDataCount | int(11) | Yes | NULL | |
fileDataComment | text | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
fileLocationID | bigint(20) | No | ||
fileDataID | bigint(20) | No | 0 | |
filePathID | bigint(20) | No | 0 | |
storageTypeID | mediumint(9) | No | 0 | |
createTime | timestamp | No | CURRENT_TIMESTAMP | |
insertTime | timestamp | No | 0000-00-00 00:00:00 | |
owner | varchar(15) | Yes | NULL | |
fsize | bigint(20) | Yes | NULL | |
storageSiteID | smallint(6) | No | 0 | |
protection | varchar(15) | Yes | NULL | |
hostID | mediumint(9) | No | 1 | |
availability | tinyint(4) | No | 1 | |
persistent | tinyint(4) | No | 0 | |
sanity | tinyint(4) | No | 1 |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
fileLocationID | bigint(20) | No |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
fileLocationID | bigint(20) | No | ||
fileDataID | bigint(20) | No | 0 | |
filePathID | bigint(20) | No | 0 | |
storageTypeID | mediumint(9) | No | 0 | |
createTime | timestamp | No | CURRENT_TIMESTAMP | |
insertTime | timestamp | No | 0000-00-00 00:00:00 | |
owner | varchar(15) | Yes | NULL | |
fsize | bigint(20) | Yes | NULL | |
storageSiteID | smallint(6) | No | 0 | |
protection | varchar(15) | Yes | NULL | |
hostID | mediumint(9) | No | 1 | |
availability | tinyint(4) | No | 1 | |
persistent | tinyint(4) | No | 0 | |
sanity | tinyint(4) | No | 1 |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
fileLocationID | bigint(20) | No | ||
fileDataID | bigint(20) | No | 0 | |
filePathID | bigint(20) | No | 0 | |
storageTypeID | mediumint(9) | No | 0 | |
createTime | timestamp | No | CURRENT_TIMESTAMP | |
insertTime | timestamp | No | 0000-00-00 00:00:00 | |
owner | varchar(15) | Yes | NULL | |
fsize | bigint(20) | Yes | NULL | |
storageSiteID | smallint(6) | No | 0 | |
protection | varchar(15) | Yes | NULL | |
hostID | mediumint(9) | No | 1 | |
availability | tinyint(4) | No | 1 | |
persistent | tinyint(4) | No | 0 | |
sanity | tinyint(4) | No | 1 |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
fileLocationID | bigint(20) | No | ||
fileDataID | bigint(20) | No | 0 | |
filePathID | bigint(20) | No | 0 | |
storageTypeID | mediumint(9) | No | 0 | |
createTime | timestamp | No | CURRENT_TIMESTAMP | |
insertTime | timestamp | No | 0000-00-00 00:00:00 | |
owner | varchar(15) | Yes | NULL | |
fsize | bigint(20) | Yes | NULL | |
storageSiteID | smallint(6) | No | 0 | |
protection | varchar(15) | Yes | NULL | |
hostID | mediumint(9) | No | 1 | |
availability | tinyint(4) | No | 1 | |
persistent | tinyint(4) | No | 0 | |
sanity | tinyint(4) | No | 1 |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
fileLocationID | bigint(20) | No | ||
fileDataID | bigint(20) | No | 0 | |
filePathID | bigint(20) | No | 0 | |
storageTypeID | mediumint(9) | No | 0 | |
createTime | timestamp | No | CURRENT_TIMESTAMP | |
insertTime | timestamp | No | 0000-00-00 00:00:00 | |
owner | varchar(15) | Yes | NULL | |
fsize | bigint(20) | Yes | NULL | |
storageSiteID | smallint(6) | No | 0 | |
protection | varchar(15) | Yes | NULL | |
hostID | mediumint(9) | No | 1 | |
availability | tinyint(4) | No | 1 | |
persistent | tinyint(4) | No | 0 | |
sanity | tinyint(4) | No | 1 |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
parentFileID | bigint(20) | No | 0 | |
childFileID | bigint(20) | No | 0 |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
filePathID | bigint(6) | No | ||
filePathName | varchar(255) | No | ||
filePathIDate | timestamp | No | CURRENT_TIMESTAMP | |
filePathCreator | smallint(6) | No | 1 | |
filePathCount | int(11) | Yes | NULL | |
filePathComment | varchar(512) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
fileTypeID | smallint(6) | No | ||
fileTypeName | varchar(30) | No | ||
fileTypeExtension | varchar(15) | No | ||
fileTypeIDate | timestamp | No | CURRENT_TIMESTAMP | |
fileTypeCreator | smallint(6) | No | 1 | |
fileTypeCount | int(11) | Yes | NULL | |
fileTypeComment | varchar(512) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
hostID | smallint(6) | No | ||
hostName | varchar(30) | No | localhost | |
hostIDate | timestamp | No | CURRENT_TIMESTAMP | |
hostCreator | smallint(6) | No | 1 | |
hostCount | int(11) | Yes | NULL | |
hostComment | varchar(512) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
productionConditionID | smallint(6) | No | ||
productionTag | varchar(10) | No | ||
libraryVersion | varchar(10) | No | ||
productionConditionIDate | timestamp | No | CURRENT_TIMESTAMP | |
productionConditionCreator | smallint(6) | No | 1 | |
productionConditionCount | int(11) | Yes | NULL | |
productionConditionComment | varchar(512) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
runParamID | int(11) | No | ||
runNumber | bigint(20) | No | 0 | |
dataTakingStart | timestamp | No | 0000-00-00 00:00:00 | |
dataTakingEnd | timestamp | No | 0000-00-00 00:00:00 | |
dataTakingDay | smallint(6) | Yes | 0 | |
dataTakingYear | smallint(6) | Yes | 0 | |
simulationParamsID | int(11) | Yes | NULL | |
runTypeID | smallint(6) | No | 0 | |
triggerSetupID | smallint(6) | No | 0 | |
detectorConfigurationID | mediumint(9) | No | 0 | |
detectorStateID | mediumint(9) | No | 0 | |
collisionTypeID | smallint(6) | No | 0 | |
magFieldScale | varchar(50) | No | ||
magFieldValue | float | Yes | NULL | |
runParamIDate | timestamp | No | CURRENT_TIMESTAMP | |
runParamCreator | smallint(6) | No | 1 | |
runParamCount | int(11) | Yes | NULL | |
runParamComment | varchar(512) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
runTypeID | smallint(6) | No | ||
runTypeName | varchar(255) | No | ||
runTypeIDate | timestamp | No | CURRENT_TIMESTAMP | |
runTypeCreator | smallint(6) | No | 1 | |
runTypeCount | int(11) | Yes | NULL | |
runTypeComment | varchar(512) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
simulationParamsID | int(11) | No | ||
eventGeneratorID | smallint(6) | No | 0 | |
simulationParamIDate | timestamp | No | CURRENT_TIMESTAMP | |
simulationParamCreator | smallint(6) | No | 1 | |
simulationParamCount | int(11) | Yes | NULL | |
simulationParamComment | varchar(512) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
storageSiteID | smallint(6) | No | ||
storageSiteName | varchar(30) | No | ||
storageSiteLocation | varchar(50) | Yes | NULL | |
storageSiteIDate | timestamp | No | CURRENT_TIMESTAMP | |
storageSiteCreator | smallint(6) | No | 1 | |
storageSiteCount | int(11) | Yes | NULL | |
storageSiteComment | varchar(512) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
storageTypeID | mediumint(9) | No | ||
storageTypeName | varchar(6) | No | ||
storageTypeIDate | timestamp | No | CURRENT_TIMESTAMP | |
storageTypeCreator | smallint(6) | No | 1 | |
storageTypeCount | int(11) | Yes | NULL | |
storageTypeComment | varchar(512) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
fileDataID | bigint(20) | No | 0 | |
triggerWordID | mediumint(9) | No | 0 | |
triggerCount | mediumint(9) | Yes | 0 |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
triggerSetupID | smallint(6) | No | ||
triggerSetupName | varchar(50) | No | ||
triggerSetupComposition | varchar(255) | No | ||
triggerSetupIDate | timestamp | No | CURRENT_TIMESTAMP | |
triggerSetupCreator | smallint(6) | No | 1 | |
triggerSetupCount | int(11) | Yes | NULL | |
triggerSetupComment | varchar(512) | Yes | NULL |
Field | Type | Null | Default | Comments |
---|---|---|---|---|
triggerWordID | mediumint(9) | No | ||
triggerWordName | varchar(50) | No | ||
triggerWordVersion | varchar(6) | No | V0.0 | |
triggerWordBits | varchar(6) | No | ||
triggerWordIDate | timestamp | No | CURRENT_TIMESTAMP | |
triggerWordCreator | smallint(6) | No | 1 | |
triggerWordCount | int(11) | Yes | NULL | |
triggerWordComment | varchar(512) | Yes | NULL |
#use FileCatalog; # # All IDs are named after their respective table. This MUST # remain like this. # eventGeneratorID -> eventGenerator+ID in 'EventGenerators' # detectorConfigurationID ->detectorConfiguration+ID in 'DetectorConfigurations' # # etc... # DROP TABLE IF EXISTS EventGenerators; CREATE TABLE EventGenerators ( eventGeneratorID SMALLINT NOT NULL AUTO_INCREMENT, eventGeneratorName VARCHAR(30) NOT NULL, eventGeneratorVersion VARCHAR(10) NOT NULL, eventGeneratorParams VARCHAR(200), eventGeneratorIDate TIMESTAMP NOT NULL, eventGeneratorCreator CHAR(15) DEFAULT 'unknown' NOT NULL, eventGeneratorCount INT, eventGeneratorComment TEXT, UNIQUE EG_EventGeneratorUnique (eventGeneratorName, eventGeneratorVersion, eventGeneratorParams), PRIMARY KEY (eventGeneratorID) ) TYPE=MyISAM; DROP TABLE IF EXISTS DetectorConfigurations; CREATE TABLE DetectorConfigurations ( detectorConfigurationID INT NOT NULL AUTO_INCREMENT, detectorConfigurationName VARCHAR(50) NULL UNIQUE, dTPC TINYINT, dSVT TINYINT, dTOF TINYINT, dEMC TINYINT, dEEMC TINYINT, dFPD TINYINT, dFTPC TINYINT, dPMD TINYINT, dRICH TINYINT, dSSD TINYINT, dBBC TINYINT, dBSMD TINYINT, dESMD TINYINT, PRIMARY KEY (detectorConfigurationID) ) TYPE=MyISAM; # Trigger related tables DROP TABLE IF EXISTS TriggerSetups; CREATE TABLE TriggerSetups ( triggerSetupID SMALLINT NOT NULL AUTO_INCREMENT, triggerSetupName VARCHAR(50) NOT NULL UNIQUE, triggerSetupComposition VARCHAR(255) NOT NULL, triggerSetupIDate TIMESTAMP NOT NULL, triggerSetupCreator CHAR(15) DEFAULT 'unknown' NOT NULL, triggerSetupCount INT, triggerSetupComment TEXT, PRIMARY KEY (triggerSetupID) ) TYPE=MyISAM; DROP TABLE IF EXISTS TriggerCompositions; CREATE TABLE TriggerCompositions ( fileDataID BIGINT NOT NULL, triggerWordID INT NOT NULL, triggerCount MEDIUMINT DEFAULT 0, PRIMARY KEY (fileDataID, triggerWordID) ) TYPE=MyISAM; DROP TABLE IF EXISTS TriggerWords; CREATE TABLE TriggerWords ( triggerWordID mediumint(9) NOT NULL auto_increment, triggerWordName varchar(50) NOT NULL default '', triggerWordVersion varchar(6) NOT NULL default 'V0.0', triggerWordBits varchar(6) NOT NULL default '', triggerWordIDate timestamp(14) NOT NULL, triggerWordCreator varchar(15) NOT NULL default 'unknown', triggerWordCount int(11) default NULL, triggerWordComment text, PRIMARY KEY (triggerWordID), UNIQUE KEY TW_TriggerCharacteristic (triggerWordName,triggerWordVersion,triggerWordBits) ) TYPE=MyISAM; DROP TABLE IF EXISTS CollisionTypes; CREATE TABLE CollisionTypes ( collisionTypeID SMALLINT NOT NULL AUTO_INCREMENT, firstParticle VARCHAR(10) NOT NULL, secondParticle VARCHAR(10) NOT NULL, collisionEnergy FLOAT NOT NULL, PRIMARY KEY (collisionTypeID) ) TYPE=MyISAM; # # A few dictionary tables # DROP TABLE IF EXISTS ProductionConditions; CREATE TABLE ProductionConditions ( productionConditionID SMALLINT NOT NULL AUTO_INCREMENT, productionTag VARCHAR(10) NOT NULL, libraryVersion VARCHAR(10) NOT NULL, productionConditionIDate TIMESTAMP NOT NULL, productionConditionCreator CHAR(15) DEFAULT 'unknown' NOT NULL, productionConditionCount INT, productionConditionComments TEXT, PRIMARY KEY (productionConditionID) ) TYPE=MyISAM; DROP TABLE IF EXISTS StorageSites; CREATE TABLE StorageSites ( storageSiteID SMALLINT NOT NULL AUTO_INCREMENT, storageSiteName VARCHAR(30) NOT NULL, storageSiteLocation VARCHAR(50), storageSiteIDate TIMESTAMP NOT NULL, storageSiteCreator CHAR(15) DEFAULT 'unknown' NOT NULL, storageSiteCount INT, storageSiteComment TEXT, PRIMARY KEY (storageSiteID) ) TYPE=MyISAM; DROP TABLE IF EXISTS FileTypes; CREATE TABLE FileTypes ( fileTypeID SMALLINT NOT NULL AUTO_INCREMENT, fileTypeName VARCHAR(30) NOT NULL UNIQUE, fileTypeExtension VARCHAR(15) NOT NULL, fileTypeIDate TIMESTAMP NOT NULL, fileTypeCreator CHAR(15) DEFAULT 'unknown' NOT NULL, fileTypeCount INT, fileTypeComment TEXT, PRIMARY KEY (fileTypeID) ) TYPE=MyISAM; DROP TABLE IF EXISTS FilePaths; CREATE TABLE FilePaths ( filePathID BIGINT NOT NULL AUTO_INCREMENT, filePathName VARCHAR(255) NOT NULL UNIQUE, filePathIDate TIMESTAMP NOT NULL, filePathCreator CHAR(15) DEFAULT 'unknown' NOT NULL, filePathCount INT, filePathComment TEXT, PRIMARY KEY (filePathID) ) TYPE=MyISAM; DROP TABLE IF EXISTS Hosts; CREATE TABLE Hosts ( hostID SMALLINT NOT NULL AUTO_INCREMENT, hostName VARCHAR(30) NOT NULL DEFAULT 'localhost' UNIQUE, hostIDate TIMESTAMP NOT NULL, hostCreator CHAR(15) DEFAULT 'unknown' NOT NULL, hostCount INT, hostComment TEXT, PRIMARY KEY (hostID) ) TYPE=MyISAM; DROP TABLE IF EXISTS RunTypes; CREATE TABLE RunTypes ( runTypeID SMALLINT NOT NULL AUTO_INCREMENT, runTypeName VARCHAR(255) NOT NULL UNIQUE, runTypeIDate TIMESTAMP NOT NULL, runTypeCreator CHAR(15) DEFAULT 'unknown' NOT NULL, runTypeCount INT, runTypeComment TEXT, PRIMARY KEY (runTypeID) ) TYPE=MyISAM; DROP TABLE IF EXISTS StorageTypes; CREATE TABLE StorageTypes ( storageTypeID MEDIUMINT NOT NULL AUTO_INCREMENT, storageTypeName VARCHAR(6) NOT NULL UNIQUE, storageTypeIDate TIMESTAMP NOT NULL, storageTypeCreator CHAR(15) DEFAULT 'unknown' NOT NULL, storageTypeCount INT, storageTypeComment TEXT, PRIMARY KEY (storageTypeID) ) TYPE=MyISAM; DROP TABLE IF EXISTS SimulationParams; CREATE TABLE SimulationParams ( simulationParamsID INT NOT NULL AUTO_INCREMENT, eventGeneratorID SMALLINT NOT NULL, detectorConfigurationID INT NOT NULL, simulationParamComments TEXT, PRIMARY KEY (simulationParamsID), INDEX SP_EventGeneratorIndex (eventGeneratorID), INDEX SP_DetectorConfigurationIndex (detectorConfigurationID) ) TYPE=MyISAM; DROP TABLE IF EXISTS RunParams; CREATE TABLE RunParams ( runParamID INT NOT NULL AUTO_INCREMENT, runNumber BIGINT NOT NULL UNIQUE, dataTakingStart TIMESTAMP, dataTakingEnd TIMESTAMP, simulationParamsID INT NULL, runTypeID SMALLINT NOT NULL, triggerSetupID SMALLINT NOT NULL, detectorConfigurationID INT NOT NULL, collisionTypeID SMALLINT NOT NULL, magFieldScale VARCHAR(50) NOT NULL, magFieldValue FLOAT, runComments TEXT, PRIMARY KEY (runParamID), INDEX RP_RunNumberIndex (runNumber), INDEX RP_DataTakingStartIndex (dataTakingStart), INDEX RP_DataTakingEndIndex (dataTakingEnd), INDEX RP_MagFieldScaleIndex (magFieldScale), INDEX RP_MagFieldValueIndex (magFieldValue), INDEX RP_SimulationParamsIndex (simulationParamsID), INDEX RP_RunTypeIndex (runTypeID), INDEX RP_TriggerSetupIndex (triggerSetupID), INDEX RP_DetectorConfigurationIndex (detectorConfigurationID), INDEX RP_CollisionTypeIndex (collisionTypeID) ) TYPE=MyISAM; DROP TABLE IF EXISTS FileData; CREATE TABLE FileData ( fileDataID BIGINT NOT NULL AUTO_INCREMENT, runParamID INT NOT NULL, fileName VARCHAR(255) NOT NULL, baseName VARCHAR(255) NOT NULL COMMENT 'Name without extension', sName1 VARCHAR(255) NOT NULL COMMENT 'Will be used for name+runNumber', sName2 VARCHAR(255) NOT NULL COMMENT 'Will be used for name before runNumber', productionConditionID INT NULL, numEntries MEDIUMINT, md5sum CHAR(32) DEFAULT 0, fileTypeID SMALLINT NOT NULL, fileSeq SMALLINT, fileStream SMALLINT, fileDataComments TEXT, PRIMARY KEY (fileDataID), INDEX FD_FileNameIndex (fileName(40)), INDEX FD_BaseNameIndex (baseName), INDEX FD_SName1Index (sName1), INDEX FS_SName2Index (sName2), INDEX FD_RunParamsIndex (runParamID), INDEX FD_ProductionConditionIndex (productionConditionID), INDEX FD_FileTypeIndex (fileTypeID), INDEX FD_FileSeqIndex (fileSeq), UNIQUE FD_FileDataUnique (runParamID, fileName, productionConditionID, fileTypeID, fileSeq) ) TYPE=MyISAM; # FileParents DROP TABLE IF EXISTS FileParents; CREATE TABLE FileParents ( parentFileID BIGINT NOT NULL, childFileID BIGINT NOT NULL, PRIMARY KEY (parentFileID, childFileID) ) TYPE=MyISAM; # FileLocations DROP TABLE IF EXISTS FileLocations; CREATE TABLE FileLocations ( fileLocationID BIGINT NOT NULL AUTO_INCREMENT, fileDataID BIGINT NOT NULL, filePathID BIGINT NOT NULL, storageTypeID MEDIUMINT NOT NULL, createTime TIMESTAMP, insertTime TIMESTAMP NOT NULL, owner VARCHAR(30), fsize BIGINT, storageSiteID SMALLINT NOT NULL, protection VARCHAR(15), hostID BIGINT NOT NULL DEFAULT 1, availability TINYINT NOT NULL DEFAULT 1, persistent TINYINT NOT NULL DEFAULT 0, sanity TINYINT NOT NULL DEFAULT 1, PRIMARY KEY (fileLocationID), INDEX FL_FilePathIndex (filePathID), INDEX FL_FileDataIndex (fileDataID), INDEX FL_StorageTypeIndex (storageTypeID), INDEX FL_StorageSiteIndex (storageSiteID), INDEX FL_HostIndex (hostID), UNIQUE FL_FileLocationUnique (fileDataID, storageTypeID, filePathID, storageSiteID, hostID) ) TYPE=MyISAM;
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?> <!DOCTYPE SCATALOG [ <!ELEMENT SCATALOG (SITE*)> <!ATTLIST SCATALOG VERSION CDATA #REQUIRED> <!ELEMENT SITE (SERVER+)> <!ATTLIST SITE name (BNL | LBL) #REQUIRED> <!ATTLIST SITE description CDATA #IMPLIED> <!ATTLIST SITE URI CDATA #IMPLIED> <!ELEMENT SERVER (HOST+)> <!ATTLIST SERVER SCOPE (Master | Admin | User) #REQUIRED> <!ELEMENT HOST (ACCESS+)> <!ATTLIST HOST NAME CDATA #REQUIRED> <!ATTLIST HOST DBTYPE CDATA #IMPLIED> <!ATTLIST HOST DBNAME CDATA #REQUIRED> <!ATTLIST HOST PORT CDATA #IMPLIED> <!ELEMENT ACCESS EMPTY> <!ATTLIST ACCESS USER CDATA #IMPLIED> <!ATTLIST ACCESS PASS CDATA #IMPLIED> ]> <SCATALOG VERSION="1.0.1"> <SITE name="BNL"> <SERVER SCOPE="Master"> <HOST NAME="mafata.wherever.net" DBNAME="Catalog_XXX" PORT="1234"> <ACCESS USER="Moi" PASS="HelloWorld"/> </HOST> <HOST NAME="mafata.wherever.net" DBNAME="Catalog_YYY" PORT="1235"> <ACCESS USER="Moi" PASS="HelloWorld"/> </HOST> <HOST NAME="duvall.star.bnl.gov" DBNAME="FileCatalog" PORT=""> <ACCESS USER="FC_master" PASS="AllAccess"/> </HOST> </SERVER> <SERVER SCOPE="Admin"> <HOST NAME="duvall.star.bnl.gov" DBNAME="FileCatalog_BNL" PORT=""> <ACCESS USER="FC_admin" PASS="ExamplePassword"/> </HOST> </SERVER> <SERVER SCOPE="User"> <HOST NAME="duvall.star.bnl.gov" DBNAME="FileCatalog_BNL" PORT=""> <ACCESS USER="FC_user" PASS="FCatalog"/> </HOST> </SERVER> </SITE> </SCATALOG>
This document is intended for FileCatalog managers only who have previously deployed an earlier version of API and older database table layout. It is NOT intended for users.
Reasoning for this upgrade and core of the upgrade
One of the major problem with the preceding database layout started to show itself when we reached 4 Million entries (for some reason, we seem to have magic numbers). A dire restriction was the presence of the field 'path' and 'nodename' in the FileLocations table. This table became unnecessarily large (of the order of GB) and sorting and queries would become slow and IO demanding (regardless of our careful indexing). The main action was to move both field to separate tables. This change requires a two step modification :
The steps are more carefully described below ...
Step by step migration instructions
Has to be made in several steps for safety a least interruption of service (although a pain to the manager). Note that you can do that much faster by cutting the Master/slave relationship, disabling all daemons auto-updating the database, proceed with table reshape and normalization script execution, drop and rebuild index, deploy the point-of-no-return API and restore Master/slave relation).
This upgrade is best if you have perl 5.8 or upper. Note that this transition will be the LAST one using perl 5.6 (get ready for a perl upgrade on your cluster).
We will assume you know how to connect to your database from an account able to manipulate and create any tables in the FileCatalog database.
DROP TABLE IF EXISTS FilePaths; CREATE TABLE FilePaths ( filePathID BIGINT NOT NULL AUTO_INCREMENT, filePathName VARCHAR(255) NOT NULL UNIQUE, filePathCount INT, PRIMARY KEY (filePathID) ) TYPE=MyISAM; DROP TABLE IF EXISTS Hosts; CREATE TABLE Hosts ( hostID smallint(6) NOT NULL auto_increment, hostName varchar(30) NOT NULL default 'localhost', hostIDate timestamp(14) NOT NULL, hostCreator varchar(15) NOT NULL default 'unknown', hostCount int(11) default NULL, hostComment text, PRIMARY KEY (hostID), UNIQUE KEY hostName (hostName) ) TYPE=MyISAM;
ALTER TABLE `FileLocations` ADD `filePathID` bigint(20) NOT NULL default '0' AFTER `fileDataID`; ALTER TABLE `FileLocations` ADD `hostID` bigint(20) NOT NULL default '1' AFTER `protection`; UPDATE TABLE `FileLocations` SET hostID=0; # note that I did that one from the Web interface (TBC) INSERT INTO Hosts VALUES(0,'localhost',NOW()+0,'',0,'Any unspecified node'); ALTER TABLE `FileLocations` ADD INDEX ( `filePathID` ) ALTER TABLE `FilePaths` ADD `filePathIDate` TIMESTAMP NOT NULL AFTER `filePathName` ; ALTER TABLE `FilePaths` ADD `filePathCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `filePathIDate` ; ALTER TABLE `FilePaths` ADD `filePathComment` TEXT AFTER `filePathCount`; ALTER TABLE `StorageSites` ADD `storageSiteIDate` TIMESTAMP NOT NULL AFTER `storageSiteLocation` ; ALTER TABLE `StorageSites` ADD `storageSiteCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `storageSiteIDate` ; ALTER TABLE `StorageSites` DROP `storageComment`; ALTER TABLE `StorageSites` ADD `storageSiteComment` TEXT AFTER `storageSiteCount`; ALTER TABLE `StorageTypes` ADD `storageTypeIDate` TIMESTAMP NOT NULL AFTER `storageTypeName` ; ALTER TABLE `StorageTypes` ADD `storageTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `storageTypeIDate` ; ALTER TABLE `FileTypes` ADD `fileTypeIDate` TIMESTAMP NOT NULL AFTER `fileTypeExtension` ; ALTER TABLE `FileTypes` ADD `fileTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `fileTypeIDate` ; ALTER TABLE `FileTypes` ADD `fileTypeComment` TEXT AFTER `fileTypeCount`; ALTER TABLE `TriggerSetups` ADD `triggerSetupIDate` TIMESTAMP NOT NULL AFTER `triggerSetupComposition` ; ALTER TABLE `TriggerSetups` ADD `triggerSetupCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `triggerSetupIDate`; ALTER TABLE `TriggerSetups` ADD `triggerSetupCount` INT AFTER `triggerSetupCreator`; ALTER TABLE `TriggerSetups` ADD `triggerSetupComment` TEXT AFTER `triggerSetupCount`; ALTER TABLE `EventGenerators` ADD `eventGeneratorIDate` TIMESTAMP NOT NULL AFTER `eventGeneratorParams` ; ALTER TABLE `EventGenerators` ADD `eventGeneratorCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `eventGeneratorIDate` ; ALTER TABLE `EventGenerators` ADD `eventGeneratorCount` INT AFTER `eventGeneratorCreator`; ALTER TABLE `RunTypes` ADD `runTypeIDate` TIMESTAMP NOT NULL AFTER `runTypeName` ; ALTER TABLE `RunTypes` ADD `runTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `runTypeIDate` ; ALTER TABLE `ProductionConditions` DROP `productionComments`; ALTER TABLE `ProductionConditions` ADD `productionConditionIDate` TIMESTAMP NOT NULL AFTER `libraryVersion`; ALTER TABLE `ProductionConditions` ADD `productionConditionCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `productionConditionIDate`; ALTER TABLE `ProductionConditions` ADD `productionConditionComment` TEXT AFTER `productionConditionCount`; # # This table was not shaped as a dictionary so needs to be re-created # Hopefully, was not filled prior (but will be this year) # DROP TABLE IF EXISTS TriggerWords; CREATE TABLE TriggerWords ( triggerWordID MEDIUMINT NOT NULL AUTO_INCREMENT, triggerWordName VARCHAR(50) NOT NULL, triggerWordVersion CHAR(6) NOT NULL DEFAULT "V0.0", triggerWordBits CHAR(6) NOT NULL, triggerWordIDate TIMESTAMP NOT NULL, triggerWordCreator CHAR(15) DEFAULT 'unknown' NOT NULL, triggerWordCount INT, triggerWordComment TEXT, UNIQUE TW_TriggerCharacteristic (triggerWordName, triggerWordVersion, triggerWordBits), PRIMARY KEY (triggerWordID) ) TYPE=MyISAM;
Deploy the new API CVS version 1.62 of FileCatalog.pm
util/path_convert.pl
util/host_convert.pl
Note that those scripts use a new method $fC->connect_as("Admin"); which assumes that the Master Catalog will be accessed using the XML connection description. Also, it should be obvious that
use lib "/WhereverYourModulAPIisInstalled"; should be replaced by the appropriate path for your site (or test area). Finally, it uses API CVS version 1.62 which supports Xpath and Xnode transitional keywords allowing us to transfer the information from one field to one table.
>SELECT FileLocations.fileLocationID FROM FileLocations WHERE FileLocations.filePathID = 0 LIMIT 0, 100 >SELECT FileLocations.fileLocationID FROM FileLocations WHERE FileLocations.hostID = 0 LIMIT 0, 100
You are ready for phase II. Hang on tight now ...
Those steps are no VERY intrusive and potentially destructive. Be careful from here on ...
> ALTER TABLE `FileLocations` ADD INDEX FL_HostIndex (hostID); > ALTER TABLE `FileLocations` DROP INDEX `FL_FileLocationUnique`, ADD UNIQUE (fileDataID, storageTypeID, filePathID, storageSiteID, hostID); # drop the columns not in use anymore / should also get rid of the associated # indexes. > ALTER TABLE `FileLocations` DROP COLUMN nodeName; > ALTER TABLE `FileLocations` DROP COLUMN filePath; # "rename" index / was created with a name difference to avoid clash for transition # now renamed for consistency > ALTER TABLE `FileLocations` DROP INDEX `filePathID`, ADD INDEX FL_FilePathIndex (filePathID);
OK, you should be done. Deploy either CVS version 1.63 which correspond to the FileCatalog API version V01.275 and above ... (by the way, get_file_list.pl -V gives the API version).
# Get connection fills the blanks while reading from XML # However, USER/PASSWORD presence are re-checked #$fC->debug_on(); ($USER,$PASSWD,$PORT,$HOST,$DB) = $fC->get_connection("Admin"); $port = $PORT if ( defined($PORT) ); $host = $HOST if ( defined($HOST) ); $db = $DB if ( defined($DB) ); if ( defined($USER) ){ $user = $USER;} else { $user = "FC_admin";} if ( defined($PASSWD) ){ $passwd = $PASSWD;} else { print "Password for $user : "; chomp($passwd = );} # # Now connect using a fully specified user/passwd/port/host/db # $fC->connect($user,$passwd,$port,$host,$db);
or counting on the full definition in the XML file
$fC = FileCatalog->new(); $fC->connect_as("Admin");
This document is intended for FileCatalog managers only who have previously deployed an earlier version of API and older database table layout. It is NOT intended for users.
Reasoning for this upgrade and core of the upgrade
This upgrade is a minor one, making support for two more detector sub-systems. The new API supports this modification. You need to alter the table DetectorConfigurations and add two columns. API are always forward compatible in that regard so it is completely safe to alter the table and deploy the API later.
ALTER TABLE `DetectorConfigurations` ADD dBSMD TINYINT;
ALTER TABLE `DetectorConfigurations` ADD dESMD TINYINT;
UPDATE `DetectorConfigurations` SET dBSMD=0;
UPDATE `DetectorConfigurations` SET dESMD=0;
And deploy the API V01.280 or later. You are done.
The FileCatalog database contains information about all files used in production for the STAR experiment. The database itself and the PERL module used to manipulate it are described later in this document. Below You'll find the description of the command line utility to access data in this database.
The utility is called get_file_list.pl. If issued without any arguments it will print the following usage message:
% get_file_list.pl [-all] -keys keyword[,keyword,...] \ [-cond keyword=value[,keyword=value,...]] [-start #] [-limit #] [-delim $St] \ [-onefile] [-o outputfile]
The command line options are described below:
-all | use all entries regardless of availability flag. Default is to show only available=1 |
-alls | use all entries regardless of sanity flag, default is to show sanity=1 unless the sanity flag was used as a key |
-onefile | A special mode of operation; returns a list of files, but gives only one location (the one with highest persistence) for each file, even if the database has many. |
-keys | Specify what data You want to get from the database. A list of valid keywords, separated by colons should follow this parameter. See examples for more clarification. See also the description of aggregate functions for some more sophisticated tricks. |
-cond | Specify the conditions limiting returned dataset. A list of valid expressions (consisting of a valid keyword, a valid operator and a value), separated by colons should follow this parameter. Since some of the operators are special characters, the list of expressions should always be enclosed in single quotes. Again, see examples for more explanations. |
-start # | specify the record number # to start from - default - start with the first record (together with -limit can be used to get the data in chunks) |
-limit # | limit the number of records returned (default 100, a value of 0 indicates an unlimited number of records). |
-rlimit # | limit the number of unique LFN (attention, the number of lines may be more than the rlimit). Using rlimit will switch the limit logic off and you cannot use both at the same time. |
-delim <string> |
specify the characters that will separate the fields in the output (default: “::“) |
-V | print the module version and leave |
-as <scope> -as <site:scope> |
connects to the FileCatalog database as specified. scopes are {Admin|User}. site should be specified for a multi-site deployment. |
<= | Not greater than | |
< | Lesser than | |
>= | Not less than | |
> | Greater than | |
<> | Not equal to | |
!= | Not equal to | |
= | equal to | |
!~ | Not containing (i.e. do not match) | strings |
~ | Containing (i.e. approximately matching) | strings |
[] | In range | |
][ | Outside the range | |
% | Modulo | integer |
%% | Not Modulo | integer |
The following logical operators can also be used in a query. The usage scope in this case is in a -cond context as keyword=Value1 LogicalOperator Value2 {LogicalOperator Value3 ...}
|| | Logical OR | Strings or numbers |
&& | Logical AND | Strings or numbers |
Note that the use of the logical AND operator will return no selection in most cases (for example, a runnumber cannot be of Value1 and Value2 at the same time) but was added for later extension of the database : selections based on meta data such as triggered-events (many triggers in a file) would be case where this operator would be used.
These are special aggregate functions. They can be used in conjunction with any keyword that describes some data. Note that most of them only make sense for numerical values. See examples for the description on how to use them.
sum |
The sum of the values |
avg |
The average of the values |
min |
The minimum of the values |
max |
The maximum of the values |
orda |
Sort the output in ascending order by this keyword |
ordd |
Sort the output in descending order by this keyword |
count |
The count for a given selection |
grp |
Group the output - put all the records with the same value for a given keyword together. This is required in conjunction with any aggregate functions used in a multiple keyword syntax context. |
Here are the keywords that can be used in the context. There is a color scheme to those keywords as follow
Keywords in blue are currently supported by the database schema but unused by the production scripts and therefore are not filled (or ill-filled).
Keywords in aqua are automatically updated (there is no need to reset)
Keywords in magenta are filled but update may be needed (do not strongly rely on their value)
keyword |
Notes |
Meaning |
site |
|
The site where the data is stored, eg. BNL, LBL |
sitecmt |
|
The site comment string |
siteloc |
|
A full string describing the site location in the world |
storage |
The storage medium, eg. HPSS, NFS, local disk. Note that the local disk storage does not allow for a unique file location. One must also select on node |
|
node |
The name of the node where data is stored (necessary to locate local disk storage) |
|
path |
the path to a specific copy of the file |
|
filename |
The name of the data file |
|
sname1 |
The (short) name of the data file with the extensions removed. E.G. "st_physics_12114010_raw_4040002" |
|
sname2 |
The (short) name of the data file with only the file name prefix remaining. E.G. "st_physics". Useful, for example, to isolate only st_physics files and rejecting "st_physics_adc" files. |
|
filetype |
The type of the file, e.g. "daq_reco_dst", "MC_fzd" etc ... |
|
extension |
The extension of the file - directly connected to type (each file type has an associated extension) |
|
events |
Number of events or entries in the file |
|
size |
The size of the data file |
|
fileseq |
The file sequence as determined during data taking by DAQ. Arbitrary for simulation and processed files. |
|
stream |
The file stream if applicable (defaut is 0) |
|
md5sum |
Early stage db fill did not update this field. It may return 0. |
The file's md5 checksum |
production |
The production tag with which a given file was produced. Can also be "raw" or "simulation" |
|
library |
The library version this file was produced with |
|
trgsetupname |
Used in to encode the path in production |
The name of the online trigger setup name |
trgname |
|
The name of one trigger in a collection of triggers associated to a runumber. |
trgcount |
|
The event count having the associated trgname for a given runnumber |
trgword |
This is available for Year4 data and beyond for DAQ files |
The trigger word associated to one trigger in a collection |
trgversion |
The trigger word version associated to a trgname |
|
trgdefinition |
The trigger definition of one trigger in a collection |
|
runtype |
the type of the run - eg. "physics", "laser" , “pulser”, “pedestal”, “test” but also "simulation" for simulated datasets |
|
configuration |
The detector configuration name. A detector configuration is a combination of detectors that were present during data taking in a given run. Note tha the combination configuration/ geometry is unique (but not any of the two alone) |
|
geometry |
The geometry definition for a given simulation set. |
|
runnumber |
The number of the run. Arbitrary for simulations. |
|
runcomments |
The comments for a given run. |
|
collision |
The collision type. Specified in the form of <first particle><second particle><collision energy>, eg. "AuAu200" |
|
datetaken |
Format was messed up at conversion old->new Catalog. Can be (and will be) recovered. |
The date the data was taken. Arbitrary for simulation. |
magscale |
The name of the magnetic field scale, e.g. FullField |
|
magvalue |
The actual magnetic field value |
|
filecomment |
The comment to the file. |
|
owner |
The owner of the file. |
|
protection |
Subject to changes |
The protection or read/write permissions, given in a format similar to UNIX 'ls -l' |
available |
is the file available ? (0 if one cannot get it from HPSS or the file disappeared from disk) |
|
persistent |
is the file persistent ? |
|
createtime |
Only HPSS files have a createtime which is not subject to changes |
the time a file was created. Format is YYYYmmddHHMMSS |
inserttime |
the time a file data was inserted into the database. |
|
simcomment |
The comments for the simulation |
|
generator |
The event generator name |
|
genversion |
Event generator version |
|
gencomment |
Event generator comments |
|
genparams |
Event generator params |
|
tpc |
was the TPC in the data stream when specific data was taken? |
|
svt |
was the SVT in the data stream when specific data was taken? |
|
tof |
was the TOF in the data stream when specific data was taken? |
|
emc |
was the B-EMC in the data stream when specific data was taken? |
|
eemc |
was the E-EMC in the data stream when specific data was taken? |
|
fpd |
was the FPD in the data stream when specific data was taken? |
|
ftpc |
was the FTPC in the data stream when specific data was taken? |
|
pmd |
was the PMD in the data stream when specific data was taken? |
|
rich |
was the RICH in the data stream when specific data was taken? |
|
ssd |
was the SSD in the data stream when specific data was taken? |
|
bbc |
was the BBC in the data stream when specific data was taken? |
|
bsmd |
was the Barrel EMC SMD in the data stream when specific data was taken? |
|
esmd |
was the End-Cap SMD in the data stream when specific data was taken? |
|
zdc |
was the Zero-Degree Calorimeter in the data stream when specific data was taken? |
|
tpx | was the tpx (tpc-X) information in the data stream when data was taken? | |
fgt | was the Forward Gem Tracker information saved in this data stream? |
The following keywords are for either internal use or specific management purposes. They have no meaning to users (but are unique).
flid |
Access the FileLocation ID of the FileLocation table |
fdid |
Access the FileData ID of the FileData table |
rfdid |
Access the FileData ID of the FileLocation table |
pcid |
Access the ProductionCondition ID of the ProductionConditions table |
rpcid |
Access the ProductionCondition ID of the FileData table |
rpid |
Access the runParam ID of the runParams table |
rrpid |
Access the runParam ID of the FileData table |
ftid |
Access the FileType ID of the FileTypes table |
rftid |
Access the FileType ID of the FileData table |
stid |
Access the storageType ID of the StorageTypes table |
rtid |
Access the storageType ID of the FileLocations table |
ssid |
Access the storageSite ID of the StorageSites table |
rssid |
Access the storageSite ID of the FileLocations table |
tcfdid |
Access the FileData ID of the TriggerCompositions table |
tctwid |
Access the TriggerWords ID of the TriggerCompositions table |
twid |
Access the TriggerWords ID of the TriggerWords table |
dcid |
Access the detectorConfiguration ID of the DetectorConfigurations table |
rdcid |
Access the detectorConfiguration ID o the RunParams table |
|
|
lgnm |
An aggregate keyword returning an equivalence to the logical name |
lgpth |
An aggregate keyword returning a logical path (a string which uniquely characterize the file's location) |
fulld |
An aggregate keyword returning a string completely defining all meta-data for real data |
fulls |
An aggregate keyword returning a string completely defining all meta-data for simulation data |
Here are the keywords not connected to a specific field in the database. They change the behaviour of the module itself.
keyword |
Notes |
Meaning |
simulation |
Is the data a simulation? |
|
nounique |
In script mode, this keyword is set to 0 (i.e. unique fields) which may slow down tremendously your scripting. In the user interface get_file_list.pl however, this is set by default to 1 (does not ensure unique fields). |
Should the module return all fields, instead of only unique selected fields. |
noround |
Turns off rounding of magfield, and collision energy. |
|
startrecord |
The PERL module will skip the first startrecord records and start returning data beginning from the next one. |
|
limit |
The PERL module will return the maximum of limit records. |