[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#5926) slapd proxying AD with back-meta locks up
- To: openldap-its@OpenLDAP.org
- Subject: Re: (ITS#5926) slapd proxying AD with back-meta locks up
- From: hyc@symas.com
- Date: Tue, 3 Mar 2009 17:46:42 GMT
- Auto-submitted: auto-generated (OpenLDAP-ITS)
mhardin@symas.com wrote:
> Full_Name: Matthew Hardin
> Version: 2.4.12
> OS: Red Hat Enterprise Linux 4 i686
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (74.38.114.185)
>
>
> Hi All,
>
> We are using a pair of OpenLDAP 2.4.12 servers with back-meta to proxy an active
> directory domain. The clients are all current versions of PADL's nss_ldap
> libraries.
>
> Every once in a while (sometimes twice a day, sometimes once every two weeks)
> one of the slapd servers will peg CPU use at 100% and stop answering requests.
> The only way to stop slapd is with a kill -9.
>
> There doesn't seem to be anything to explain the lockup or allow us to reproduce
> it. We are using redundant AD servers and they are not going offline. A third
> slapd server running as a test server using the same AD servers and configured
> identically but serving a much lighter nss_ldap load does not fail at all. We
> have ruled out hardware, OS, and connectivity as possible causes.
>
> We are unfortunately unable to attach gdb to the running processes, as these are
> production servers and need to be restarted immediately. Our smaller test system
> does not exhibit the same behavior, either. There is nothing unusual in the
> server logs, either. We do have core files generated from kill -6 commands, and
> they are all eerily similar to the back-trace below in that they have one or
> more threads waiting for a search or a bind response from AD.
>
> I am also enclosing relevant portions of slapd.conf for these systems. Please
> let me know if any additional information would be useful.
>
> Thanks,
>
> -Matt
>
> -----
>
>
> (gdb) thr apply all bt
> Thread 1 (process 29769):
> #0 0x005fa410 in __kernel_vsyscall ()
> #1 0x004ddd10 in raise () from /lib/libc.so.6
> #2 0x004df621 in abort () from /lib/libc.so.6
> #3 0x004d715b in __assert_fail () from /lib/libc.so.6
> #4 0x0806eec8 in slap_listener (sl=0x9583108)
> at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/servers/slapd/daemon.c:1803
> #5 0x0806f643 in slap_listener_thread (ctx=0x4e92220, ptr=0x9583108)
> at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/servers/slapd/daemon.c:1997
> #6 0x00a10783 in ldap_int_thread_pool_wrapper (xpool=0x959a010)
> at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/libraries/libldap_r/tpool.c:663
> #7 0x0038a45b in start_thread () from /lib/libpthread.so.0
> #8 0x00585c4e in clone () from /lib/libc.so.6
> (gdb)
It seems you sent the wrong backtrace; this one doesn't show any signs of
looping or anything that would indicate heavy CPU usage. It shows an assert
which would kill the process, leading to 0% CPU usage. This assert was most
likely fixed in 2.4.14.
> slapd.conf
> #######################################################################
> # bdb database definitions
> #######################################################################
> database bdb
> suffix "ou=nisdata"
> #######################################################################
> # Definitions for proxy and cache to AD
> #######################################################################
> database meta
> suffix "dc=my-customer,dc=com"
> # The link to AD:
> uri ldaps://ldap-prd-dc01.my-customer.com/dc=ad,dc=my-customer,dc=com
> ldaps://ldap-prd-dc02.my-customer.com/
> # The link to the NIS data directory (yes, we could chain/glue, that's
> # for later)
> uri ldapi://%2fvar%2fsymas%2frun%2fldapi/dc=nis,dc=my-customer,dc=com
Pointing back-meta at its own slapd will inevitably exhaust the thread pool
since incoming operations will always use 2x the number of available threads.
This ITS will be closed.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/