On Thursday 09 February 2006 19:57, Samuel Tran wrote: > On Mon, 2006-02-06 at 14:41 -0500, Aaron Richton wrote: > > That's been on my todo list for over a year now. (So I'll join in the > > request for a copy if there is such a script!) > > > > If anybody does write this, it's important to note that something that > > strictly compares contextcsns is likely useless (I think it would just be > > a false positive disaster). Replication doesn't happen instantly; there > > should be some sort of configurable threshold for "csns should be within > > <time>". > > > > > > I've been meaning to ask the list: how many of you check up on your > > slaves from a consistency perspective? What do you do? (contextcsn is the > > approach I've wanted to take. Every time I get annoyed enough to write a > > nagios plugin, I notice that everything is in sync and defer it...) > > I wrote a very generic python script with exhaustive comments/debugging. > It can be modified to be used as a Nagios script plugin. > > To view a description of the script: > $ pydoc ldapSynchCheck > > To view the help: > $ ./ldapSynchCheck.py -h > I guess you didn't look at the perl extension script for BigBrother/Hobbit that I posted. It assumes that it will be able to: 1)read sufficient configuration information from cn=config to be able to determine all the databases using sync-repl, and the master for each database, on any server 2)read the contextCSN for any database on any server anonymously, but, due to this, requires absolutely no configuration. For use with Hobbit, it just needs to be run on the hobbit server, and any host in the bb-hosts file just needs 'ol'. Of course, the hobbit server needs to be able to access all the LDAP servers involved. You may want to take a look, so a user of your script doesn't need to provide the URIs, but instead can just provide the server to check. http://www.zarb.org/~bgmilne/hobbit/ At present, it only goes yellow (not red), since there's no real way to determine if the server being 3 months behind (ie you catch the 30 second perion it takes to replicate the first change to one database in 3 months) is severe enough for an error .. but it does show how far ahead (which could indicate checkpointing/recover problems on the master) or behind the slave is (so you don't have to compare contextCSNs in your head). I could take a look at making it work for nagios, but we're phasing nagios out, and the only LDAP servers monitored for anything by nagios don't use sync-repl. Regards, Buchan -- Buchan Milne ISP Systems Specialist B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)
Attachment:
pgp4JqN4zJnDm.pgp
Description: PGP signature