Master/Slave syncronization issues,tools,solutions etc...

This page is dedicated to variety of master/slave syncronization issues, tools and solutions used at STAR. It is made for internal usage, tutorial for offsite facilities would be created later.

Maatkit (formerly MySQL Toolkit) is a bunch of Perl scripts, designed to make MySQL daily management tasks easier. STAR makes use of 'mk-table-checksum' to get lock-free reliable checksumming of FileCatalog and Offline database tables, and 'mk-table-sync' to fix inconsistent tables on slaves.

I. FileCatalog replication checksums

Master server: robinson.star.bnl.gov:3336
mk-table-checksum uses "repl.checksum" (database.table) to store checksum results. If FC slave was added after Nov.2008, then such table should be created manually. If not done exactly as described, checksums will not work properly, and slave will stop replicating with error message stating that "table repl.checksum does not exist".  

CREATE DATABASE repl;
CREATE TABLE checksum (
     db         char(64)     NOT NULL,
     tbl        char(64)     NOT NULL,
     chunk      int          NOT NULL,
     boundaries char(64)     NOT NULL,
     this_crc   char(40)     NOT NULL,
     this_cnt   int          NOT NULL,
     master_crc char(40)         NULL,
     master_cnt int              NULL,
     ts         timestamp    NOT NULL,
     PRIMARY KEY (db, tbl, chunk)
  );


HOW-TO calculate checksums for all FileCatalog tables on Master, results and queries will propagate to slaves :

shell> nohup ./mk-table-checksum --emptyrepltbl --replicate=repl.checksum --algorithm=BIT_XOR \
h=robinson.star.bnl.gov,P=3336,u=root,p=XYZ -d FileCatalog_BNL >> checkrepl.log.txt 2>&1 

At this moment (Nov.2008), it takes about 20 min to checksum all FC tables.
HOW-TO show if there are any differences between master (robinson) and slaves (duvall, brando), when checksumming process is finished :

shell> mysql -u root --host duvall.star.bnl.gov --port 3336 -pXYZ -e "USE repl; SELECT db, 
 tbl, chunk, this_cnt-master_cnt AS cnt_diff, this_crc <> master_crc 
OR ISNULL(master_crc) <> ISNULL(this_crc) AS crc_diff FROM checksum 
WHERE master_cnt <> this_cnt OR master_crc <> this_crc OR 
ISNULL(master_crc) <> ISNULL(this_crc);" > duvall.tablediff.txt


shell> mysql -u root --host brando.star.bnl.gov --port 3336 -pXYZ -e "USE repl; SELECT db, 
tbl, chunk, this_cnt-master_cnt AS cnt_diff, this_crc <> master_crc 
OR ISNULL(master_crc) <> ISNULL(this_crc) AS crc_diff FROM checksum 
WHERE master_cnt <> this_cnt OR master_crc <> this_crc OR 
ISNULL(master_crc) <> ISNULL(this_crc);" > brando.tablediff.txt


HOW-TO show the differences (by row) between master and slave for selected table. In the example below, changes for table "TriggerWords" is requested for duvall.star.bnl.gov :

shell> ./mk-table-sync --print --synctomaster --replicate repl.checksum 
h=duvall.star.bnl.gov,P=3336,u=root,p=XYZ,D=FileCatalog_BNL,t=TriggerWords

To make an actual syncronization, one should replace "--print" option with "--execute" (please make sure you have a valid db backup first!). It is often good to check how many rows need resync and how much time it would take, because it might stop replication process for this time.