[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#7723) slapd crashes on multi core machines if a search request is *immediately* followed by an unbind
- To: openldap-its@OpenLDAP.org
- Subject: Re: (ITS#7723) slapd crashes on multi core machines if a search request is *immediately* followed by an unbind
- From: jsynacek@redhat.com
- Date: Wed, 13 Nov 2013 11:54:06 GMT
- Auto-submitted: auto-generated (OpenLDAP-ITS)
On 11/04/2013 03:11 PM, hyc@symas.com wrote:
> jsynacek@redhat.com wrote:
>> On 10/15/2013 01:10 PM, michael.vishchers@7p-group.com wrote:
>>> It is not the client loop that is multithreading but the ldap server.
>>>
>>> And it is not a misuse of the API but a problem that may be raised by day t=
>>> o day network problems.
>>>
>>> I've boiled down the problem to a few simple configurations that work (or b=
>>> etter, fail ;-) with both 2.4.23 and 2.4.36. A tgz file containing a setup =
>>> with start script and testclient is attached. It should be sufficient to re=
>>> produce the fault.
>>>
>>> The problem occurs only if we use session variable substitution in the rwm =
>>> overlay, and only if a search is *immediately* (e.g. caused by network loss=
>>> and client timeout) followed by an unbind.
>>>
>>
>> I modified the reproducer a bit (the start script) and find out a few things.
>> You can find the reproducer I'm using at [1].
>>
>> Valgrind's helgrind shows some lock problems in the rwm overlay and also in
>> back-ldap and connection.c. After correcting those the issue seems to be gone.
>>
>> You can find helgrind logs at [2] (before the fix) and [3] (after).
>>
>> Also, ElectricFence reveals some problems [4], which I didn't fix yet.
>>
>> A fix attempt can be found at [5]. I'm not sure if that is a correct fix, or it
>> just masked the real issue. But I didn't to manage to reproduce the problem
>> after applying it.
>
> I already explained the problem. The other issues you identified are not
> relevant, and your patch is not correct. Reread Followup #4 of this ITS.
>
Another take on the fix:
http://jsynacek.fedorapeople.org/openldap/its7723/0001-ITS-7723-fix-reference-counting.patch
--
Jan Synacek
Software Engineer, Red Hat