[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#7378) Slapd hangs on bdb write lock
Hi Howard,
Thank you very much for the explanation. What BDB version would you recommend. Obviously I have quite a few options and would like to use a version that is known to be very solid.
Sincerely,
Nikolai Schupbach
On 3/09/2012, at 9:45 PM, Howard Chu wrote:
> nikolai@net24.co.nz wrote:
>> Full_Name: Nikolai Schupbach
>> Version: 2.4.31
>> OS: FreeBSD
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (202.78.158.60)
>>
>>
>> We are experiencing frequent hangs in slapd. Once hung we can continue to
>> connect, but all searches will just hang indefinitely until we kill -9 the slapd
>> process and restart it. The directory is used for mail routing and we have been
>> migrating to it from an existing directory server over the last 3 weeks - we
>> have noted the busier the directory becomes the more often it hangs (now once
>> every 2 days).
>>
>> We have one master and 10 syncrepl read only replicas - the master is used
>> mainly for writes and has not hung yet, but most of the replicas have hung at
>> least once. The replicas receive anywhere between 50 to 300 searches/sec, while
>> the master would only get 1/sec. There are 45k entries in the directory.
>>
>> We are running:
>>
>> FreeBSD 8.3/9.0 x64
>> OpenLDAP 2.4.31
>> Berkeley DB 4.6.21
>>
>> The old directory we are migrating from has the same load and is also running
>> OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB 4.3.29
>> and OpenLDAP 2.3.27.
>>
>> We have managed to collect db_stat lock information, which indicates the same
>> issue each time - a write lock on dn2id.bdb.
>
> It's more than that. Your db_stat shows that a single thread has 3 active
> transactions. This should never happen:
>
> 8000a85e dd= 0 locks held 2 write locks 0 pid/thread 88000/34386526336
> 8000a85e READ 1 HELD 0xb19a8 len: 9 data: 40xa800000000000000
> 8000a85e READ 1 HELD 0xb26c8 len: 9 data: 60xa800000000000000
> 8000a85f dd= 0 locks held 8 write locks 4 pid/thread 88000/34386526336
> 8000a85f READ 1 WAIT dn2id.bdb page 559
> 8000a85f READ 1 HELD dn2id.bdb page 768
> 8000a85f WRITE 2 HELD dn2id.bdb page 1362
> 8000a85f READ 2 HELD dn2id.bdb page 1362
> 8000a85f WRITE 2 HELD dn2id.bdb page 1353
> 8000a85f READ 2 HELD dn2id.bdb page 1353
> 8000a85f WRITE 2 HELD dn2id.bdb page 933
> 8000a85f READ 1 HELD dn2id.bdb page 933
> 8000a85f WRITE 4 HELD dn2id.bdb page 219
> 80001047 dd=28 locks held 1 write locks 1 pid/thread 88000/34386526336
> 80001047 WRITE 1 HELD dn2id.bdb page 559
>
> I would first recommend changing from BDB 4.6.21 to some other version. There
> are no code paths in back-bdb where we would ever return without either
> committing or aborting the current transactions, so this appears to be a BDB
> bug, not an OpenLDAP bug.
>
>> We have also collected the backtrace for all the threads which I have uploaded
>> to:
>>
>> ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt
>>
>> The full db_stat output is located at:
>>
>> ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt
>
> --
> -- Howard Chu
> CTO, Symas Corp. http://www.symas.com
> Director, Highland Sun http://highlandsun.com/hyc/
> Chief Architect, OpenLDAP http://www.openldap.org/project/