[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#5171) hdb txn_checkpoint failures
richton@nbcs.rutgers.edu wrote:
>> If this is happening even with slapd cleanly shut down then it should also
>> prevent slapd from restarting, since slapd first attempts to join an existing
>> environment before trying to create a new one. And that really implies that
>> the rest of the environment is shot.
>
> Agreed, but that's a pretty awful condition to have in a long-running
> slapd process. Without db_stat (easily) working, is there any hope at
> finding clues as to how this might have happened, or is it just time to
> rm/slapadd and hope it doesn't happen again?
It doesn't seem like we can get much more info out of this. One more thing to
try would be a full-debug build of libdb, so we can see exactly where it hangs
when trying to join the environment. Looking thru the code, I only see one
mutex to acquire the environment, and looking at your stack trace it's already
past that location, but the trace could be lying.
Also the mutex used to lock the environment is a regular mutex, not a
persistent lock. So when all processes have closed the environment, there
shouldn't be anything left to conflict with here. So most likely the
environment data structures are hosed, and the thread is locking against
itself. Again, we can't really tell without single-stepping thru the BDB
library code. It may not be worth the effort, but that's your call.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/