- General information
- Data readiness
- Grid and Cloud
- Infrastructure
- Online Computing
- Software Infrastructure
- Batch system, resource management system
- CVS->Git
- Computing Environment
- Facility Access
- FileCatalog
- HPSS services
- Home directories and other areas backups
- Hypernews
- Installing the STAR software stack
- Provision CVMFS and mount BNL/STAR repo
- RCF Contributions
- Security
- Software and Libraries
- Storage
- Tools
- Tutorials
- Video Conferencing
- Web Access
- Machine Learning
- Offline Software
- Production
- S&C internal group meetings
- Test tree
User manual
Updated on Tue, 2020-06-30 09:11 by genevb. Originally created by jeromel on 2012-03-12 13:44.
Under:
The command line interface to the FileCatalog
The FileCatalog database contains information about all files used in production for the STAR experiment. The database itself and the PERL module used to manipulate it are described later in this document. Below You'll find the description of the command line utility to access data in this database.
The utility is called get_file_list.pl. If issued without any arguments it will print the following usage message:
% get_file_list.pl [-all] -keys keyword[,keyword,...] \ [-cond keyword=value[,keyword=value,...]] [-start #] [-limit #] [-delim $St] \ [-onefile] [-o outputfile]
Command line options
The command line options are described below:
-all | use all entries regardless of availability flag. Default is to show only available=1 |
-alls | use all entries regardless of sanity flag, default is to show sanity=1 unless the sanity flag was used as a key |
-onefile | A special mode of operation; returns a list of files, but gives only one location (the one with highest persistence) for each file, even if the database has many. |
-keys | Specify what data You want to get from the database. A list of valid keywords, separated by colons should follow this parameter. See examples for more clarification. See also the description of aggregate functions for some more sophisticated tricks. |
-cond | Specify the conditions limiting returned dataset. A list of valid expressions (consisting of a valid keyword, a valid operator and a value), separated by colons should follow this parameter. Since some of the operators are special characters, the list of expressions should always be enclosed in single quotes. Again, see examples for more explanations. |
-start # | specify the record number # to start from - default - start with the first record (together with -limit can be used to get the data in chunks) |
-limit # | limit the number of records returned (default 100, a value of 0 indicates an unlimited number of records). |
-rlimit # | limit the number of unique LFN (attention, the number of lines may be more than the rlimit). Using rlimit will switch the limit logic off and you cannot use both at the same time. |
-delim <string> |
specify the characters that will separate the fields in the output (default: “::“) |
-V | print the module version and leave |
-as <scope> -as <site:scope> |
connects to the FileCatalog database as specified. scopes are {Admin|User}. site should be specified for a multi-site deployment. |
Supported comparison or selection operators
<= | Not greater than | |
< | Lesser than | |
>= | Not less than | |
> | Greater than | |
<> | Not equal to | |
!= | Not equal to | |
= | equal to | |
!~ | Not containing (i.e. do not match) | strings |
~ | Containing (i.e. approximately matching) | strings |
[] | In range | |
][ | Outside the range | |
% | Modulo | integer |
%% | Not Modulo | integer |
Logical operators
The following logical operators can also be used in a query. The usage scope in this case is in a -cond context as keyword=Value1 LogicalOperator Value2 {LogicalOperator Value3 ...}
|| | Logical OR | Strings or numbers |
&& | Logical AND | Strings or numbers |
Note that the use of the logical AND operator will return no selection in most cases (for example, a runnumber cannot be of Value1 and Value2 at the same time) but was added for later extension of the database : selections based on meta data such as triggered-events (many triggers in a file) would be case where this operator would be used.
The aggregate functions
These are special aggregate functions. They can be used in conjunction with any keyword that describes some data. Note that most of them only make sense for numerical values. See examples for the description on how to use them.
sum |
The sum of the values |
avg |
The average of the values |
min |
The minimum of the values |
max |
The maximum of the values |
orda |
Sort the output in ascending order by this keyword |
ordd |
Sort the output in descending order by this keyword |
count |
The count for a given selection |
grp |
Group the output - put all the records with the same value for a given keyword together. This is required in conjunction with any aggregate functions used in a multiple keyword syntax context. |
Keyword list
Here are the keywords that can be used in the context. There is a color scheme to those keywords as follow
Keywords in blue are currently supported by the database schema but unused by the production scripts and therefore are not filled (or ill-filled).
Keywords in aqua are automatically updated (there is no need to reset)
Keywords in magenta are filled but update may be needed (do not strongly rely on their value)
-
keyword
Notes
Meaning
site
The site where the data is stored, eg. BNL, LBL
sitecmt
The site comment string
siteloc
A full string describing the site location in the world
storage
The storage medium, eg. HPSS, NFS, local disk. Note that the local disk storage does not allow for a unique file location. One must also select on node
node
The name of the node where data is stored (necessary to locate local disk storage)
path
the path to a specific copy of the file
filename
The name of the data file
sname1
The (short) name of the data file with the extensions removed. E.G. "st_physics_12114010_raw_4040002"
sname2
The (short) name of the data file with only the file name prefix remaining. E.G. "st_physics". Useful, for example, to isolate only st_physics files and rejecting "st_physics_adc" files.
filetype
The type of the file, e.g. "daq_reco_dst", "MC_fzd" etc ...
extension
The extension of the file - directly connected to type (each file type has an associated extension)
events
Number of events or entries in the file
size
The size of the data file
fileseq
The file sequence as determined during data taking by DAQ. Arbitrary for simulation and processed files.
stream
The file stream if applicable (defaut is 0)
md5sum
Early stage db fill did not update this field. It may return 0.
The file's md5 checksum
production
The production tag with which a given file was produced. Can also be "raw" or "simulation"
library
The library version this file was produced with
trgsetupname
Used in to encode the path in production
The name of the online trigger setup name
trgname
The name of one trigger in a collection of triggers associated to a runumber.
trgcount
The event count having the associated trgname for a given runnumber
trgword
This is available for Year4 data and beyond for DAQ files
The trigger word associated to one trigger in a collection
trgversion
The trigger word version associated to a trgname
trgdefinition
The trigger definition of one trigger in a collection
runtype
the type of the run - eg. "physics", "laser" , “pulser”, “pedestal”, “test” but also "simulation" for simulated datasets
configuration
The detector configuration name. A detector configuration is a combination of detectors that were present during data taking in a given run. Note tha the combination configuration/ geometry is unique (but not any of the two alone)
geometry
The geometry definition for a given simulation set.
runnumber
The number of the run. Arbitrary for simulations.
runcomments
The comments for a given run.
collision
The collision type. Specified in the form of <first particle><second particle><collision energy>, eg. "AuAu200"
datetaken
Format was messed up at conversion old->new Catalog. Can be (and will be) recovered.
The date the data was taken. Arbitrary for simulation.
magscale
The name of the magnetic field scale, e.g. FullField
magvalue
The actual magnetic field value
filecomment
The comment to the file.
owner
The owner of the file.
protection
Subject to changes
The protection or read/write permissions, given in a format similar to UNIX 'ls -l'
available
is the file available ? (0 if one cannot get it from HPSS or the file disappeared from disk)
persistent
is the file persistent ?
createtime
Only HPSS files have a createtime which is not subject to changes
the time a file was created. Format is YYYYmmddHHMMSS
inserttime
the time a file data was inserted into the database.
simcomment
The comments for the simulation
generator
The event generator name
genversion
Event generator version
gencomment
Event generator comments
genparams
Event generator params
tpc
was the TPC in the data stream when specific data was taken?
svt
was the SVT in the data stream when specific data was taken?
tof
was the TOF in the data stream when specific data was taken?
emc
was the B-EMC in the data stream when specific data was taken?
eemc
was the E-EMC in the data stream when specific data was taken?
fpd
was the FPD in the data stream when specific data was taken?
ftpc
was the FTPC in the data stream when specific data was taken?
pmd
was the PMD in the data stream when specific data was taken?
rich
was the RICH in the data stream when specific data was taken?
ssd
was the SSD in the data stream when specific data was taken?
bbc
was the BBC in the data stream when specific data was taken?
bsmd
was the Barrel EMC SMD in the data stream when specific data was taken?
esmd
was the End-Cap SMD in the data stream when specific data was taken?
zdc was the Zero-Degree Calorimeter in the data stream when specific data was taken?
tpx was the tpx (tpc-X) information in the data stream when data was taken? fgt was the Forward Gem Tracker information saved in this data stream?
The following keywords are for either internal use or specific management purposes. They have no meaning to users (but are unique).
-
flid
Access the FileLocation ID of the FileLocation table
fdid
Access the FileData ID of the FileData table
rfdid
Access the FileData ID of the FileLocation table
pcid
Access the ProductionCondition ID of the ProductionConditions table
rpcid
Access the ProductionCondition ID of the FileData table
rpid
Access the runParam ID of the runParams table
rrpid
Access the runParam ID of the FileData table
ftid
Access the FileType ID of the FileTypes table
rftid
Access the FileType ID of the FileData table
stid
Access the storageType ID of the StorageTypes table
rtid
Access the storageType ID of the FileLocations table
ssid
Access the storageSite ID of the StorageSites table
rssid
Access the storageSite ID of the FileLocations table
tcfdid
Access the FileData ID of the TriggerCompositions table
tctwid
Access the TriggerWords ID of the TriggerCompositions table
twid
Access the TriggerWords ID of the TriggerWords table
dcid
Access the detectorConfiguration ID of the DetectorConfigurations table
rdcid
Access the detectorConfiguration ID o the RunParams table
lgnm
An aggregate keyword returning an equivalence to the logical name
lgpth
An aggregate keyword returning a logical path (a string which uniquely characterize the file's location)
fulld
An aggregate keyword returning a string completely defining all meta-data for real data
fulls
An aggregate keyword returning a string completely defining all meta-data for simulation data
Here are the keywords not connected to a specific field in the database. They change the behaviour of the module itself.
keyword |
Notes |
Meaning |
simulation |
Is the data a simulation? |
|
nounique |
In script mode, this keyword is set to 0 (i.e. unique fields) which may slow down tremendously your scripting. In the user interface get_file_list.pl however, this is set by default to 1 (does not ensure unique fields). |
Should the module return all fields, instead of only unique selected fields. |
noround |
Turns off rounding of magfield, and collision energy. |
|
startrecord |
The PERL module will skip the first startrecord records and start returning data beginning from the next one. |
|
limit |
The PERL module will return the maximum of limit records. |
»
- Printer-friendly version
- Login or register to post comments