[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#5161) delta-syncrepl mutex lockup
quanah@zimbra.com wrote:
> --On Tuesday, October 02, 2007 3:28 AM +0000 quanah@zimbra.com wrote:
>
>> --On Tuesday, October 02, 2007 2:35 AM +0000 hyc@symas.com wrote:
>>
>>> quanah@zimbra.com wrote:
>>>> --On October 1, 2007 11:22:11 PM +0000 quanah@zimbra.com wrote:
>>>>
>>>>> The following files will be uploaded to the ftp site, where # will be
>>>>> the assigned ITS number.
>>>> URL's specifically are:
>>>>
>>>> <ftp://ftp.openldap.org/incoming/5161-pstak.out.2007-10-01>
>>>> <ftp://ftp.openldap.org/incoming/5161-dbstat.delta.out.2007-10-01>
>>>> <ftp://ftp.openldap.org/incoming/5161-db_stat.out.2007-10-01>
>>> The pstack output is a bit odd, is this a regular debug build? With frame
>>> pointers, etc? Can you get a stack trace in gdb?
>> It is a regular build, and they killed and restarted it before getting
>> any gdb information. We've asked them to please get the gdb information
>> in the future. Since it has happened twice now for thi particular group
>> in about a month, I'm hopeful it'll happen again before too long. ;)
>
> And here is the last logged operation:
>
> Oct 1 17:48:21 ldap01 slapd.bin[16121]: conn=62333 op=1 MOD
> dn="uid=XXXXXXX,ou=people,dc=YYYYYY,dc=com"
> Oct 1 17:48:21 ldap01 slapd.bin[16121]: conn=62333 op=1 MOD
> attr=zimbraLastLogonTimestamp
Based on the (unreliable) pstack output it appears that all of the threads are
waiting for the same mutex. This of course shouldn't be possible since one of
those threads must already own it. We really need to have gdb access here to
inspect the state of the mutex and see which thread is the owner, then figure
out why it's trying to lock it again. In OpenLDAP 2.3 this pretty much means
that some operation locked the mutex and somehow completed without unlocking
it, i.e. completed without going thru the accesslog response callback.
This has nothing to do with BDB so db_stat isn't relevant here. It's about the
accesslog overlay and any other overlays that may be manipulating the callback
stack, so your slapd.conf is more relevant here.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/