[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
slapd deadlock bug ITS#7296
- To: openldap-devel@openldap.org
- Subject: slapd deadlock bug ITS#7296
- From: Richard Silverman <res@qoxp.net>
- Date: Fri, 14 Dec 2012 10:06:46 -0500 (EST)
- User-agent: Alpine 2.00 (OSX 1167 2008-08-23)
Hello,
Iâm working on this bug:
http://www.openldap.org/lists/openldap-bugs/201206/msg00026.html
If slapd client connections are torn down in mid-query -- the server has
received the query and has a pending reply to send, but the connection is
closed by the client before it can be sent -- this deadlocks slapd worker
threads. Eventually all threads are deadlocked in send_ldap_ber() which
serializes their network access to send PDUs, and the server becomes
unresponsive and has to be killed.
send_ldap_ber() notices the connection drop and calls connection_closing(). The
problem appears to be that then connection_abandon() abandons all outstanding
executing ops, but does not empty the c_ops queue (as it does with
c_pending_ops). When connection_close() looks at the connection, it always sees
there are outstanding ops and defers the close. I see this pattern:
50cb3104 connection_closing: readying conn=1519 sd=33 for close
50cb3104 connection_close: deferring conn=1519 sd=33
50cb3104 connection_resched: attempting closing conn=1519 sd=33
50cb3104 connection_close: deferring conn=1519 sd=33
50cb3104 connection_resched: attempting closing conn=1519 sd=33
... which repeats until the server freezes entirely.
If I add code to connection_abandon() to empty c_ops, it causes slapd to crash
later with a mutex usage error, so that's apparently not the right place/way to
do it. If I note that the connection is dying and have connection_destroy()
skip the assertion that c_ops must be empty, it fixes the bug: the deadlock no
longer occurs. However, I'm concerned this will leak memory as the ops aren't
being freed. So my question is: what's the right way to get the outstanding
executing ops abandoned by connection_abandon() to be freed?
The code is complex and I may have misunderstood how best to go about fixing
this, but hopefully this is enough to make sense.
Thanks,
--
Richard E. Silverman