- General information
- Data readiness
- Grid and Cloud
- Infrastructure
- Online Computing
- Software Infrastructure
- Batch system, resource management system
- Computing Environment
- Facility Access
- FileCatalog
- HPSS services
- Home directories and other areas backups
- Hypernews
- Installing the STAR software stack
- Provision CVMFS and mount BNL/STAR repo
- RCF Contributions
- Software and Libraries
- Storage
- Tools
- Video Conferencing
- Web Access
- Machine Learning
- Offline Software
- Production
- Test tree
Migration and notes from V01.265 to V01.275
Updated on Mon, 2012-03-12 13:57 by testadmin. Originally created by jeromel on 2012-03-12 13:32.
Under:
This document is intended for FileCatalog managers only who have previously deployed an earlier version of API and older database table layout. It is NOT intended for users.
Reasoning for this upgrade and core of the upgrade
One of the major problem with the preceding database layout started to show itself when we reached 4 Million entries (for some reason, we seem to have magic numbers). A dire restriction was the presence of the field 'path' and 'nodename' in the FileLocations table. This table became unnecessarily large (of the order of GB) and sorting and queries would become slow and IO demanding (regardless of our careful indexing). The main action was to move both field to separate tables. This change requires a two step modification :
- reshape of the database (leaving the old field), deployment of the database API in cross mode support
- run the normalization scripts filling the new table and fields, deployment of the final API and drop of the obsolete columns (+ index rebuild)
The steps are more carefully described below ...
Step by step migration instructions
Has to be made in several steps for safety a least interruption of service (although a pain to the manager). Note that you can do that much faster by cutting the Master/slave relationship, disabling all daemons auto-updating the database, proceed with table reshape and normalization script execution, drop and rebuild index, deploy the point-of-no-return API and restore Master/slave relation).
This upgrade is best if you have perl 5.8 or upper. Note that this transition will be the LAST one using perl 5.6 (get ready for a perl upgrade on your cluster).
We will assume you know how to connect to your database from an account able to manipulate and create any tables in the FileCatalog database.
- (0) Create the following tables
DROP TABLE IF EXISTS FilePaths; CREATE TABLE FilePaths ( filePathID BIGINT NOT NULL AUTO_INCREMENT, filePathName VARCHAR(255) NOT NULL UNIQUE, filePathCount INT, PRIMARY KEY (filePathID) ) TYPE=MyISAM; DROP TABLE IF EXISTS Hosts; CREATE TABLE Hosts ( hostID smallint(6) NOT NULL auto_increment, hostName varchar(30) NOT NULL default 'localhost', hostIDate timestamp(14) NOT NULL, hostCreator varchar(15) NOT NULL default 'unknown', hostCount int(11) default NULL, hostComment text, PRIMARY KEY (hostID), UNIQUE KEY hostName (hostName) ) TYPE=MyISAM;
- Modify some table and recreate one
ALTER TABLE `FileLocations` ADD `filePathID` bigint(20) NOT NULL default '0' AFTER `fileDataID`; ALTER TABLE `FileLocations` ADD `hostID` bigint(20) NOT NULL default '1' AFTER `protection`; UPDATE TABLE `FileLocations` SET hostID=0; # note that I did that one from the Web interface (TBC) INSERT INTO Hosts VALUES(0,'localhost',NOW()+0,'',0,'Any unspecified node'); ALTER TABLE `FileLocations` ADD INDEX ( `filePathID` ) ALTER TABLE `FilePaths` ADD `filePathIDate` TIMESTAMP NOT NULL AFTER `filePathName` ; ALTER TABLE `FilePaths` ADD `filePathCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `filePathIDate` ; ALTER TABLE `FilePaths` ADD `filePathComment` TEXT AFTER `filePathCount`; ALTER TABLE `StorageSites` ADD `storageSiteIDate` TIMESTAMP NOT NULL AFTER `storageSiteLocation` ; ALTER TABLE `StorageSites` ADD `storageSiteCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `storageSiteIDate` ; ALTER TABLE `StorageSites` DROP `storageComment`; ALTER TABLE `StorageSites` ADD `storageSiteComment` TEXT AFTER `storageSiteCount`; ALTER TABLE `StorageTypes` ADD `storageTypeIDate` TIMESTAMP NOT NULL AFTER `storageTypeName` ; ALTER TABLE `StorageTypes` ADD `storageTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `storageTypeIDate` ; ALTER TABLE `FileTypes` ADD `fileTypeIDate` TIMESTAMP NOT NULL AFTER `fileTypeExtension` ; ALTER TABLE `FileTypes` ADD `fileTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `fileTypeIDate` ; ALTER TABLE `FileTypes` ADD `fileTypeComment` TEXT AFTER `fileTypeCount`; ALTER TABLE `TriggerSetups` ADD `triggerSetupIDate` TIMESTAMP NOT NULL AFTER `triggerSetupComposition` ; ALTER TABLE `TriggerSetups` ADD `triggerSetupCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `triggerSetupIDate`; ALTER TABLE `TriggerSetups` ADD `triggerSetupCount` INT AFTER `triggerSetupCreator`; ALTER TABLE `TriggerSetups` ADD `triggerSetupComment` TEXT AFTER `triggerSetupCount`; ALTER TABLE `EventGenerators` ADD `eventGeneratorIDate` TIMESTAMP NOT NULL AFTER `eventGeneratorParams` ; ALTER TABLE `EventGenerators` ADD `eventGeneratorCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `eventGeneratorIDate` ; ALTER TABLE `EventGenerators` ADD `eventGeneratorCount` INT AFTER `eventGeneratorCreator`; ALTER TABLE `RunTypes` ADD `runTypeIDate` TIMESTAMP NOT NULL AFTER `runTypeName` ; ALTER TABLE `RunTypes` ADD `runTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `runTypeIDate` ; ALTER TABLE `ProductionConditions` DROP `productionComments`; ALTER TABLE `ProductionConditions` ADD `productionConditionIDate` TIMESTAMP NOT NULL AFTER `libraryVersion`; ALTER TABLE `ProductionConditions` ADD `productionConditionCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `productionConditionIDate`; ALTER TABLE `ProductionConditions` ADD `productionConditionComment` TEXT AFTER `productionConditionCount`; # # This table was not shaped as a dictionary so needs to be re-created # Hopefully, was not filled prior (but will be this year) # DROP TABLE IF EXISTS TriggerWords; CREATE TABLE TriggerWords ( triggerWordID MEDIUMINT NOT NULL AUTO_INCREMENT, triggerWordName VARCHAR(50) NOT NULL, triggerWordVersion CHAR(6) NOT NULL DEFAULT "V0.0", triggerWordBits CHAR(6) NOT NULL, triggerWordIDate TIMESTAMP NOT NULL, triggerWordCreator CHAR(15) DEFAULT 'unknown' NOT NULL, triggerWordCount INT, triggerWordComment TEXT, UNIQUE TW_TriggerCharacteristic (triggerWordName, triggerWordVersion, triggerWordBits), PRIMARY KEY (triggerWordID) ) TYPE=MyISAM;
-
Deploy the new API CVS version 1.62 of FileCatalog.pm
- Run the following utility scripts
util/path_convert.pl
util/host_convert.plNote that those scripts use a new method $fC->connect_as("Admin"); which assumes that the Master Catalog will be accessed using the XML connection description. Also, it should be obvious that
use lib "/WhereverYourModulAPIisInstalled"; should be replaced by the appropriate path for your site (or test area). Finally, it uses API CVS version 1.62 which supports Xpath and Xnode transitional keywords allowing us to transfer the information from one field to one table.
- Check that Hosts table was filled properly and automatically with Creator/IDate
- Paranoia step : Re-run the scripts mentioned 2 steps ago
At this stage and ideally, nothing should happen (as you have already modified the records).
A few tips prior from doing that- % fC_cleanup.pl -modif node=localhost -cond node='' -doit
would hopefully do nothing but if you have messed something up in the past, hostName would be NULL and the above would be necessary. - After a full update, the following queries should return NOTHING
% get_file_list.pl -keys flid -cond rfpid=0 -all -alls -as Admin
% get_file_list.pl -keys flid -cond rhid=0 -all -alls -as Admin
Those are equivalent to the SQL statements>SELECT FileLocations.fileLocationID FROM FileLocations WHERE FileLocations.filePathID = 0 LIMIT 0, 100 >SELECT FileLocations.fileLocationID FROM FileLocations WHERE FileLocations.hostID = 0 LIMIT 0, 100
- % fC_cleanup.pl -modif node=localhost -cond node='' -doit
- Make a backup copy of the database for security (optional but safer) Backup can be done by easer a dump of mysql or more trivially, a cp -r of the database directory.
- Leave it running for a few days (should be fine) for confidence consolidation ;-)
You are ready for phase II. Hang on tight now ...
Those steps are no VERY intrusive and potentially destructive. Be careful from here on ...
- Stop all daemons, be sure that during the rest of the operations, NO command attempts to manipulate the database. If you want to shield your users from the upgrade, stop all Master/slave relations.
- Connect to the master FileCatalog as administrator for that database and execute the following SQL commands
> ALTER TABLE `FileLocations` ADD INDEX FL_HostIndex (hostID); > ALTER TABLE `FileLocations` DROP INDEX `FL_FileLocationUnique`, ADD UNIQUE (fileDataID, storageTypeID, filePathID, storageSiteID, hostID); # drop the columns not in use anymore / should also get rid of the associated # indexes. > ALTER TABLE `FileLocations` DROP COLUMN nodeName; > ALTER TABLE `FileLocations` DROP COLUMN filePath; # "rename" index / was created with a name difference to avoid clash for transition # now renamed for consistency > ALTER TABLE `FileLocations` DROP INDEX `filePathID`, ADD INDEX FL_FilePathIndex (filePathID);
-
OK, you should be done. Deploy either CVS version 1.63 which correspond to the FileCatalog API version V01.275 and above ... (by the way, get_file_list.pl -V gives the API version).
A few notes
- The new API is XML connection aware via a non-mandatory module named XML::Simple . You should install that module but there are some limitations if you are using perl 5.6 i.e., you MUST use the schema with ONLY one choice per category (Admin, Master or User).
- Your scripts will likely need to change if your Database Master and Slave are not on the same node (i.e. the administration account for the FileCatalog can be used only on the database Master and the regular user account on the Slave). There are a few forms of this such as the one below
# Get connection fills the blanks while reading from XML # However, USER/PASSWORD presence are re-checked #$fC->debug_on(); ($USER,$PASSWD,$PORT,$HOST,$DB) = $fC->get_connection("Admin"); $port = $PORT if ( defined($PORT) ); $host = $HOST if ( defined($HOST) ); $db = $DB if ( defined($DB) ); if ( defined($USER) ){ $user = $USER;} else { $user = "FC_admin";} if ( defined($PASSWD) ){ $passwd = $PASSWD;} else { print "Password for $user : "; chomp($passwd = );} # # Now connect using a fully specified user/passwd/port/host/db # $fC->connect($user,$passwd,$port,$host,$db);
or counting on the full definition in the XML file
$fC = FileCatalog->new(); $fC->connect_as("Admin");
- Note a small future convenience when XML is ON. connect_as() does not only select as who you want to connect to but where as well. In fact, the proper syntax is intent=SITE::User (for example BNL::Admin is valid as well as LBL::User). This is only partly supported however.
- The new version of the API automatically add information in dictionary tables. Especially, the account under which a new dictionary value was inserted (Creator) and the insertion date (IDate) are filled automatically. A side effect being that the new API is NOT compatible with previous database table layout (no backward support will be attempted).
»
- Printer-friendly version
- Login or register to post comments