[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: Socket-level timeouts?
On Apr 8, 2008, at 10:54 AM, Aaron Richton wrote:
I think you might be confusing LDAP_OPT_NETWORK_TIMEOUT and
LDAP_OPT_TIMEOUT. (Or maybe I am...) But as I recall,
NETWORK_TIMEOUT is for initial connect(), and you're referring to
ongoing conversations.
This is correct - I'm proposing extending that to include a timeout
for all network communication. In some cases the APIs have a timeout
but many do not and this seems cleaner than requiring the client to
pass a timeout for every call which could conceivably perform network
operations.
For that matter, I'm having a hard time envisioning the situation
you describe playing out. Let's say your server dies hard and you
reboot it.
This is the only situation which works well currently. The only three
failures we've had with slapd, however, have been situations where the
server failed by simply becoming unresponsive and anything which
touched PAM/NSS hung waiting for read() to return. We've also seen
similar problems with mobile and multi-homed systems where an
connection was attempted before the defined LDAP server was reachable.
Finally, libldap does use TCP keepalive nowadays. In the event of
intermediate network path dying hard (which can't be relied upon to
nicely produce TCP resets), the underlying keepalive mechanism
should pick that up.
This is an improvement but it wouldn't help with the slapd failures
we've observed because the server's TCP stack can respond to
keepalives even when the service is unresponsive. It would definitely
help recover when the server is rebooted but it uses the system-wide
keepalive settings and the values appropriate for a local LDAP server
would be far too aggressive for internet connections.
I understand the current situation but as a user it would feel more
correct for LDAP_OPT_NETWORK_TIMEOUT to mean "try the next server if a
response is not obtained within this time", covering the additional
class of failures where an LDAP server is partially up as we cannot
guarantee minute-level admin response times to restart a failing server.
Chris
Attachment:
smime.p7s
Description: S/MIME cryptographic signature