[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
BDB Corruption...
Hello everyone,
I'm running OpenLDAP 2.2.6 on SuSE 9.1 on a dual-Xeon Intel server. It
is set up to provide user, group, netgroup, and automount data. I do a
fair amount of writing on occasion when I repopulate some of these OUs
using migration scripts to pull in updated data from a production NIS
server. None of the OUs have more than about 1,000 entries. I've been
running it about a month and twice I've had BDB corruption in which the
server stopped responding or could only serve up a portion of its
entries without hanging. Restarting the server had no beneficial
effect. The first time, I just shut down, wiped the LDAP database
directory, restarted, and slapadd'ed in a back-up LDIF. Now that it's
happened the second time, I've performed the following to try to get to
the bottom of it:
------------------------------
------------------------------
# ps -el | grep slapd
1 S 76 21668 1 0 76 0 - 7965 schedu ? 00:00:00 slapd
(The process always seems to be waiting to be scheduled, as if from
input on a file descriptor)
# strace -f -p 21668
Process 21668 attached - interrupt to quit
futex(0x40787bf8, FUTEX_WAIT, 21669, NULL
# slapcat
...
(Dumps up to a point, then hangs)
# ldapsearch -b ou=...,dc=... ...=...
...
(Returns some entries, doesn't return others which should be there, and
hangs on others)
# vmstat 1 1
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
0 0 0 278536 219132 328428 0 0 0 3 1 10 0
0 100 0
(The CPUs were rather idle the entire time)
# cd /var/lib/ldap; db_recover
(This appeared to resolve the problem, though I still plan to wipe it
out and restore from a back-up LDIF)
------------------------------
------------------------------
Has anyone come across a situation like this before and/or have any tips
on how I might permanently avoid the condition in the future? On less
well-used software I'd expect a possible reentrancy issue, but think
that's unlikely to be the case here. If it's a known issue with there
is a patch, I'd be happy to test it.
Thanks,
Roy