[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: Upgrade to 2.3.40 -> failed index
On Mon, 4 Feb 2008, Howard Chu wrote:
> That documentation is clearly obsolete, which is why it was removed.
slurpd is obsolete, which is why the section on slurpd was removed from the
2.4 manual. Considering OpenLDAP-2.3.39 is still marked as the stable
release, I can't really see that the 2.3 documentation in its entirety is
obsolete.
> http://www.oracle.com/technology/documentation/berkeley-db/db/ref/transapp/archival.html
Ah, that is the section on backing up/restoring a database, which I suppose
could also be considered the same procedure to be used for copying a
database from one system to another. Given your original wording, I was
looking for something more specifically geared towards copying.
> At a guess, you failed to copy the transaction log files to the slaves.
If I had failed to copy the transaction log files, I don't really see that
it would have worked at all; let alone for almost a year.
Reviewing the backup/restore procedure, I don't really see anything I might
have missed. slapd was not running during the copy, so clearly any updates
were suspended. In fact, slapd had never been run -- the copy was made
immediately after the initial slapadd. There were actually no log files
present. As I mentioned, I have bdb configured to automatically remove
them. Presumably slapadd explicitly/implicitly check pointed upon
completion and they were removed. Even if there was a log file that I
didn't see, the log files were stored in the same directory as the database
files, and I copied the entire directory.
> > Also, even if for some reason the copies on the two slaves were invalid,
> > that would not explain why the master failed. The database on the master
> > was the original database built by slapadd when the server was first put
> > into commission. How could making a copy of it have caused it to fail
> > itself?
>
> Too difficult to guess, given the lack of information. We have only your
> assurance that nothing was done incorrectly, but the facts indicate that at
> least one step was done incorrectly.
The facts only indicate that I had a catastrophic failure. That the failure
was caused by incompetence is only a hypothesis.
I do greatly appreciate your response and willingness to help; I apologize
if I'm getting a bit defensive.
You do have only my assurance that I didn't screw something up. However,
assuming I'm not lying, the facts are:
* openldap 2.3.35 was initially installed on three servers
* on the master server, slapadd was run to load in an existing database
in ldif format
* the resultant bdb database was then copied to both slaves
* all three were put into production March 2007 and ran perfectly
under a reasonably heavy load
* a week or so ago I upgraded them to 2.3.40 (stop old server, install
new server, start new server -- never touching bdb or the existing
database files)
* they ran fine for at least 3-4 days
* this weekend, they died horribly
Given these facts, if something was done incorrectly, it does not seem
likely that it was failure to copy a transaction log file in March 2007. If
the failure was my own doing, it seems more likely a byproduct of the
upgrade, although I can't think of anything that I could have done wrong
during that process.
At this point, I guess I'll just write it off and hope it doesn't happen
again.
--
Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst | henson@csupomona.edu
California State Polytechnic University | Pomona CA 91768