[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: Socket-level timeouts?
I think you might be confusing LDAP_OPT_NETWORK_TIMEOUT and
LDAP_OPT_TIMEOUT. (Or maybe I am...) But as I recall, NETWORK_TIMEOUT is
for initial connect(), and you're referring to ongoing conversations.
For that matter, I'm having a hard time envisioning the situation you
describe playing out. Let's say your server dies hard and you reboot it.
Then your client, blissfully unaware of this, sends some packets over its
open connection. The rebooted server sees the packets, but doesn't have a
matching TCP flow, so it's going to tell you to bug off -- I'd expect a
"typical OS" to send a TCP reset in response to this. And at that point,
libldap should produce LDAP_SERVER_DOWN or something along that flavor,
and the client will of course have no bugs and handle this with perfect
grace.
Finally, libldap does use TCP keepalive nowadays. In the event of
intermediate network path dying hard (which can't be relied upon to nicely
produce TCP resets), the underlying keepalive mechanism should pick that up.
On Tue, 8 Apr 2008, Chris Adams wrote:
We've noticed hard failures on both our Linux and Mac workstations when an
LDAP server fails in a way which causes it to stop responding but leave a
connection open (e.g. lock contention, disk failure). This usually ends up
requiring the system to be rebooted because a key system process will
probably have made a call which is waiting on a read() which might take days
to fail.
I've created a patch simply calls setsockopt() to set SO_SNDTIMEO|SO_RCVTIMEO
when LDAP_OPT_NETWORK_TIMEOUT has been set. This appears to produce the
desired result on Linux (both with pam_ldap and the ldap utilities) and OS X
(within the DirectoryService plugin).
Is there a drawback to this approach which I've missed? It appears that the
issue has come up in the past but there's no solution that I can see
(certainly nothing else uses socket-level timeouts). I'd like to find a
solution for this as it's by far the biggest source of Linux downtime in our
environment.
Thanks,
Chris