[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Upgrade to 2.3.40 -> failed index

To: Howard Chu <hyc@symas.com>
Subject: Re: Upgrade to 2.3.40 -> failed index
From: "Paul B. Henson" <henson@acm.org>
Date: Mon, 4 Feb 2008 17:30:29 -0800 (PST)
Cc: openldap-software@openldap.org
In-reply-to: <47A7B631.6060206@symas.com>
References: <Pine.GSO.4.55.0802030710060.208@loogie.intranet.csupomona.edu><86F5326469DC72B0D7CF3DA0@[192.168.1.196]> <47A758F2.3050907@symas.com> <Pine.GSO.4.55.0802041146270.208@loogie.intranet.csupomona.edu> <47A7A60A.4090904@symas.com> <Pine.GSO.4.55.0802041625090.208@loogie.intranet.csupomona.edu> <47A7B631.6060206@symas.com>

On Mon, 4 Feb 2008, Howard Chu wrote:

> That documentation is clearly obsolete, which is why it was removed.

slurpd is obsolete, which is why the section on slurpd was removed from the
2.4 manual. Considering OpenLDAP-2.3.39 is still marked as the stable
release, I can't really see that the 2.3 documentation in its entirety is
obsolete.

> http://www.oracle.com/technology/documentation/berkeley-db/db/ref/transapp/archival.html

Ah, that is the section on backing up/restoring a database, which I suppose
could also be considered the same procedure to be used for copying a
database from one system to another. Given your original wording, I was
looking for something more specifically geared towards copying.

> At a guess, you failed to copy the transaction log files to the slaves.

If I had failed to copy the transaction log files, I don't really see that
it would have worked at all; let alone for almost a year.

Reviewing the backup/restore procedure, I don't really see anything I might
have missed. slapd was not running during the copy, so clearly any updates
were suspended. In fact, slapd had never been run -- the copy was made
immediately after the initial slapadd. There were actually no log files
present. As I mentioned, I have bdb configured to automatically remove
them. Presumably slapadd explicitly/implicitly check pointed upon
completion and they were removed. Even if there was a log file that I
didn't see, the log files were stored in the same directory as the database
files, and I copied the entire directory.

> > Also, even if for some reason the copies on the two slaves were invalid,
> > that would not explain why the master failed. The database on the master
> > was the original database built by slapadd when the server was first put
> > into commission. How could making a copy of it have caused it to fail
> > itself?
>
> Too difficult to guess, given the lack of information. We have only your
> assurance that nothing was done incorrectly, but the facts indicate that at
> least one step was done incorrectly.

The facts only indicate that I had a catastrophic failure. That the failure
was caused by incompetence is only a hypothesis.

I do greatly appreciate your response and willingness to help; I apologize
if I'm getting a bit defensive.

You do have only my assurance that I didn't screw something up. However,
assuming I'm not lying, the facts are:

* openldap 2.3.35 was initially installed on three servers
* on the master server, slapadd was run to load in an existing database
  in ldif format
* the resultant bdb database was then copied to both slaves
* all three were put into production March 2007 and ran perfectly
  under a reasonably heavy load
* a week or so ago I upgraded them to 2.3.40 (stop old server, install
  new server, start new server -- never touching bdb or the existing
  database files)
* they ran fine for at least 3-4 days
* this weekend, they died horribly

Given these facts, if something was done incorrectly, it does not seem
likely that it was failure to copy a transaction log file in March 2007. If
the failure was my own doing, it seems more likely a byproduct of the
upgrade, although I can't think of anything that I could have done wrong
during that process.

At this point, I guess I'll just write it off and hope it doesn't happen
again.

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  henson@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768

Follow-Ups:
- Re: Upgrade to 2.3.40 -> failed index
  - From: Quanah Gibson-Mount <quanah@zimbra.com>

References:
- Upgrade to 2.3.40 -> failed index
  - From: "Paul B. Henson" <henson@acm.org>
- Re: Upgrade to 2.3.40 -> failed index
  - From: Quanah Gibson-Mount <quanah@zimbra.com>
- Re: Upgrade to 2.3.40 -> failed index
  - From: Howard Chu <hyc@symas.com>
- Re: Upgrade to 2.3.40 -> failed index
  - From: "Paul B. Henson" <henson@acm.org>
- Re: Upgrade to 2.3.40 -> failed index
  - From: Howard Chu <hyc@symas.com>
- Re: Upgrade to 2.3.40 -> failed index
  - From: "Paul B. Henson" <henson@acm.org>
- Re: Upgrade to 2.3.40 -> failed index
  - From: Howard Chu <hyc@symas.com>

Prev by Date: Re: Upgrade to 2.3.40 -> failed index
Next by Date: Re: Upgrade to 2.3.40 -> failed index
Index(es):
- Chronological
- Thread