We're experiencing a problem on our LDAP servers. They will run fine
for several days and then slapd will begin using all CPU time and
become unresponsive to queries. This happens on both Red Hat 8.0 and
FreeBSD 4.7. A restart of slapd generally restores order for another
couple of days. I have been unable to strace slapd while it was
having trouble on Red Hat (strace just hangs with no output), but was
able to truss slapd on the FreeBSD box. It's in a loop doing the
following (the exact calls in each loop vary a little):
gettimeofday(0x28328dec,0x0) = 0 (0x0)
sigprocmask(0x3,0x28328e78,0x0) = 0 (0x0)
sigaltstack(0x283435e0,0x0) = 0 (0x0)
poll(0x80ec000,0x33d,0x0) = 0 (0x0)
sigreturn(0x91bf864) = 0 (0x0)
SIGNAL 27
SIGNAL 27
When things are operating normally truss shows similar calls intermixed
with a number of read, write, fstat, fcntl, setsockopt, accept, close,
etc.
Software versions:
OpenLDAP 2.1.12
Berkeley DB 4.0.14-14 (Red Hat) / Berkeley DB 4.1.25 (FreeBSD)
I've managed to find a few similar problem reports. There is ITS 2195,
but I don't think it is relevant because we don't use groups in our
ACLs. There are also these emails to the list:
http://www.openldap.org/lists/openldap-software/200212/msg00403.html
http://www.openldap.org/lists/openldap-software/200302/msg00111.html
Again I don't think they are relevant. The first was a mis-configuration
that we haven't done and the second seems to have been the same problem
as the ITS report.
I tried going back to ldbm instead of bdb which did seem to reduce the
frequency of problems but did not eliminate it. I tried compiled slapd
without threads and that seems to have eliminated the problem but
introduced its own problems and doesn't seem like a viable solution.
I tried creating a DB_CONFIG and increasing the BDB cache size from the
default of 256k to 8M, 16M and 64M but that didn't help.
I'd appreciate any suggestions.
Thanks,
Jason