STAR Computing | Tutorials main page |
Grid Collector Tutorial | |
Offline computing tutorial | Maintained by Wei-Ming Zhang |
This is the long version of the Grid Collector tutorial for STAR offline analysis. The shorter version is here.
ATTENTION: The GridCollector rely on both tags and a bitmap index to be build. If it is not in place, this tool will not work.
For more background details, see the slides from a GC talk at the July 04 Collaboration Meeting.
Please contact John Wu, if you would like to have access to older productions.
StFileI* setFiles = new StFile("files");To use GC, simply initialize the variable with StGridCollector::Create() as follows,
StFileI* setFiles = StGridCollector::Create("select ...");There are example analysis makers, StAnalysisMaker for event.root and StMuAnalysisMaker for MuDst.root, and their associated macro doEvents.C in CVS. StAnalysisMaker is kept the same as before the implementation of GC. StMuAnalysisMaker is new. It shows how to access branches of MuDst data when the standard file IO maker StIOMaker is used to open MuDst data files in a macro. The macro doEvents.C is updated for GC and MuDst. The default analysis maker in doEvents.C is StAnalysisMaker. To analyze MuDst data, the user has to instance a StMuAnalysisMaker instead of the default StAnalysisMaker in doEvents.C as in
StMuAnalysisMaker *analysisMaker = new StMuAnalysisMaker("analysis");
.x doEvents.C(100, "filenames", "")As usual, the first input parameter 100 in the above examples is the number of requested events. The second is a char string for the input file names. A long list of file names can be placed in a file and the second argument can be "@filename". Please include its path in the filename if a file is not in your working directory. Please start the path with "./" if the it is relative. Other options such as "dbon" still work as usual. The sequential mode can be used to test and debug analysis makers before turning on GC.
.x doEvents.C()
To be more user-friendly, the macro doEvents.C allows the user to save the GC command in an ASCII file, say, GC_sample.txt, then run it as in
.x doEvents.C(100, "@GC_sample.txt", "gc")
There are two types of commands for Grid Collector, a SQL select statement or a gc command line. Here are two examples of the SQL select statement:
"select event where production=P03ia and 20030300<=mProdTime<=20030330"for event.root data and
"select MuDst where production=P02gc and magScale=FullField and numberOfPrimaryTracks > 200"for MuDst.
The statement starts with a "select" clause followed by a "where" clause. It is not case sensitive. For running doEvents.C, the "select" clause can be omitted because "select event" is the default in GC. The "where" clause is mandatory. It consists of the reserved word "where" followed by a list of conditions joined together by logical operators AND, OR, XOR, and NOT. The conditions are of the forms of 'production=P03ia', '10<=NV0<=50', and 'primaryVertexX*primaryVertexX + primaryVertexY*primaryVertexY < 2'. The variable names in the conditions are the names of tags in tags.root files plus three additional names used by the File Catalog: production, trgSetupName, and magScale. These three additional attributes have the same meanings as in the fileCatalog. The majority of the variable names are inherited from tags.root files and are self explanatory. The number of tags and their names vary from production to production. As examples, tags of production P02gc, P03ia, and P04ie are listed in files P05ia_tags.txt, P02gc_tags.txt, P03ia_tags.txt, and P04ie_tags.txt, which are typical for the data from years 2002, 2003, and 2004, respectively.
The conditions may also include common functions including "acos", "asin", "atan", "ceil", "cos", "cosh", "exp", "fabs", "floor", "frexp", "log10", "log", "modf", "sin", "sinh", "sqrt", "tan", "tanh", "atan2", "fmod", "ldexp", "pow". The definition of this functions are the same as in a standard math library (see "man math" on most Unix Systems). These standard math functions can be used in any condtions to replace a simple variable name, for example, "sqrt(VectorX*VectorX + VectorY*VectorY) < abs(Gamma)." In addition to these standard functions, there is a pesudo-function named "any" which can be used as the left-hand side of an equation of an "IN" expression, such as, "any(triggerId)=15007" and "any(triggerId) in (15007, 15006)."
The command line style of GC commands can have similar elements as the SQL select statements, but provides more options to control Grid Collector. The two above examples can be alternatively expressed as
"gc -select event -where 'production=P03ia and 20030300<=mProdTime<=20030330'"
"gc -select MuDst -where 'production=P02gc and magScale=FullField and numberOfPrimaryTracks > 200'"
Note the need to use an additional level of quote to make sure the conditions following the '-where' option is treated as one single string.
To reduce the number of quotes required, one may specify options in the SQL style commands. This format defines three reserved words 'SELECT', 'FROM' and 'WHERE' as in a SQL SELECT statement. Each of these reserved words may follow the same arguments as above. For options that do not have corresponding reserved words, they still have to be specified as options, with two restrictions though. The options must appear before keyword 'WHERE', and must not split the keywords 'SELECT' or 'FROM' from their argument, for example,
"select MuDst -m 10 -new where production=P02gc and magScale=FullField and numberOfPrimaryTracks > 200"
"select MuDst from P02gc% -m 10 -new where magScale=FullField and numberOfPrimaryTracks > 200"
Here is a list of most useful command line options for Grid Collector.
Change the default message/debug level (1) to the specified integer.
Limits the search to the OID given in the input files. An OID is a run-number and event-number pair. A sample OID file is sample-oids. In some cases, the user has gone through a series of pico DST or micro DST files and have identified some events of interest. The user can put the run-number and event-number pairs in a file and have Grid Collector retrieve the events out of MuDst.root file or event.root files. Since Grid Collector automatically retrieve the needed files, users do not have to worry about manually retrieving a large number of irrelevant files.
List the search to a specified list of productions and trgSetupNames and magScales. Internally, Grid Collector stores data in subsets defined by production, trgSetupName and magScale, and each subset is named as production_trgSetupName_magScale. Both '*' (as in CSH) and '%' (as in SQL like statments) can be used to denote wild cards. For example, instead of giving 'production=P02gc and magScale=FullField' as part of the where clause, one can use '-from P02gc_%_FullField' to achieve the same effect.
As Grid Collector make use of Globus toolkit for underlying file transfers. It needs a Grid proxy if it ever actually need to transfer files using Globus functions. If you are using Grid Collector on RCAS or PDSF machines and is sure that all the requested files are on disk, there is no need to have this. At this point, there is a surrogate proxy available for access to some common file locations. This should allow most users to complete their job without worrying about Grid proxy. Ultimately, if you find Grid Collector useful or plan to use other Grid based software, it is highly recommanded that you obtain a personal Grid certificate and generate your own Grid proxy. If you create your proxy using grid-proxy-init, you may not need to explicitly use this option.
These two options are useful for running Grid Collector (client) outside of RCAS and PDSF. To run Grid Collector out side of the two established centers, one has to start a HRM to serve as the disk cache for files. Option '-hrm' is used to pass the name of the HRM to Grid Collector server so it knows where to send the files. Option '-disk-farm' may be used if the file catalogs know about the files that are already located there.
Start a new job, do NOT share events with an existing job. However, this option does not prevent later job from sharing events with this job.
Specify a configuration file to be read. The shared library is built with a default configuration file that is always read so that the user specified configuration file does not have to specify all the necessary entries.
In addition to use variables in the range expressions, the following functions can also be used.
Caution:
.x doEvents.C(10,"select event where .....","gc,evout")For MuDst, the option "evout" does not work. The user has to modify doEvents.C. In doEvents.C, set the fourth parameter to "MuSave" instead of the default "bfcTree" in a line to instance a new StIOMaker, as in
StIOMaker *IOMk = new StIOMaker("IO","r",setFiles,"MuSave");Also, please see relevant comments in doEvents.C.
root4star -b -q doEvents.C'(25,"select MuDst where Production=P04ie \ and trgSetupName=production62GeV and magScale=ReversedFullField \ and chargedMultiplicity>3300 and NV0>200","gc,dbon")'It selects and analyzes 25 out of 1.2K events in two MuDst data files. Its output message is saved in P04ie_GC.log which may be helpful in understanding how GC works.