[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5926) slapd proxying AD with back-meta locks up



Thank you for looking into this. The new configuration does not rely  
on loopbacks and instead uses back-glue. We are also running 2.4.14  
with additional patches.

Cheers,

-Matt


On Mar 3, 2009, at 10:46 AM, Howard Chu wrote:

> mhardin@symas.com wrote:
>> Full_Name: Matthew Hardin
>> Version: 2.4.12
>> OS: Red Hat Enterprise Linux 4 i686
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (74.38.114.185)
>>
>>
>> Hi All,
>>
>> We are using a pair of OpenLDAP 2.4.12 servers with back-meta to  
>> proxy an active
>> directory domain. The clients are all current versions of PADL's  
>> nss_ldap
>> libraries.
>>
>> Every once in a while (sometimes twice a day, sometimes once every  
>> two weeks)
>> one of the slapd servers will peg CPU use at 100% and stop  
>> answering requests.
>> The only way to stop slapd is with a kill -9.
>>
>> There doesn't seem to be anything to explain the lockup or allow us  
>> to reproduce
>> it. We are using redundant AD servers and they are not going  
>> offline. A third
>> slapd server running as a test server using the same AD servers and  
>> configured
>> identically but serving a much lighter nss_ldap load does not fail  
>> at all. We
>> have ruled out hardware, OS, and connectivity as possible causes.
>>
>> We are unfortunately unable to attach gdb to the running processes,  
>> as these are
>> production servers and need to be restarted immediately. Our  
>> smaller test system
>> does not exhibit the same behavior, either. There is nothing  
>> unusual in the
>> server logs, either. We do have core files generated from kill -6  
>> commands, and
>> they are all eerily similar to the back-trace below in that they  
>> have one or
>> more threads waiting for a search or a bind response from AD.
>>
>> I am also enclosing relevant portions of slapd.conf for these  
>> systems. Please
>> let me know if any additional information would be useful.
>>
>> Thanks,
>>
>> -Matt
>>
>> -----
>>
>>
>> (gdb) thr apply all bt
>
>> Thread 1 (process 29769):
>> #0  0x005fa410 in __kernel_vsyscall ()
>> #1  0x004ddd10 in raise () from /lib/libc.so.6
>> #2  0x004df621 in abort () from /lib/libc.so.6
>> #3  0x004d715b in __assert_fail () from /lib/libc.so.6
>> #4  0x0806eec8 in slap_listener (sl=0x9583108)
>>     at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/servers/slapd/ 
>> daemon.c:1803
>> #5  0x0806f643 in slap_listener_thread (ctx=0x4e92220, ptr=0x9583108)
>>     at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/servers/slapd/ 
>> daemon.c:1997
>> #6  0x00a10783 in ldap_int_thread_pool_wrapper (xpool=0x959a010)
>>     at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/libraries/ 
>> libldap_r/tpool.c:663
>> #7  0x0038a45b in start_thread () from /lib/libpthread.so.0
>> #8  0x00585c4e in clone () from /lib/libc.so.6
>> (gdb)
>
> It seems you sent the wrong backtrace; this one doesn't show any  
> signs of looping or anything that would indicate heavy CPU usage. It  
> shows an assert which would kill the process, leading to 0% CPU  
> usage. This assert was most likely fixed in 2.4.14.
>
>> slapd.conf
>
>> #######################################################################
>> # bdb database definitions
>> #######################################################################
>> database        bdb
>> suffix          "ou=nisdata"
>
>> #######################################################################
>> # Definitions for proxy and cache to AD
>> #######################################################################
>> database        meta
>> suffix          "dc=my-customer,dc=com"
>
>> # The link to AD:
>> uri             ldaps://ldap-prd-dc01.my-customer.com/dc=ad,dc=my- 
>> customer,dc=com
>> ldaps://ldap-prd-dc02.my-customer.com/
>
>> # The link to the NIS data directory (yes, we could chain/glue,  
>> that's
>> # for later)
>> uri             ldapi://%2fvar%2fsymas%2frun%2fldapi/dc=nis,dc=my- 
>> customer,dc=com
>
> Pointing back-meta at its own slapd will inevitably exhaust the  
> thread pool since incoming operations will always use 2x the number  
> of available threads.
>
> This ITS will be closed.
> -- 
>  -- Howard Chu
>  CTO, Symas Corp.           http://www.symas.com
>  Director, Highland Sun     http://highlandsun.com/hyc/
>  Chief Architect, OpenLDAP  http://www.openldap.org/project/