[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
slapd connection_read: no connection; tcp time_wait state
Hi!
This is an interesting one... I have an OpenLDAP 2.4.12 server as a
consumer in a two node cluster. It's sole function is to answer queries
for our mail hub for recipient validation. We see about 50-300 queries
/ second and occasional spikes.
Unfortunately, our mail hub appliances (vendor name left out to protect
the guilty) are somewhat inefficient in ldap connection handling and are
opening a new TCP connection for every single ldap query. It does this
even when there are multiple recipients in one smtp session (boggles the
mind!). A percentage of these connections don't get closed properly and
I get the following error in the syslog:
slapd[23108]: connection_read(18): no connection!
The reason is that the connections are in a time_wait state because they
were not closed properly. They go away in 60 seconds, but with the load
this server gets we continuously have several hundred tcp connections in
a time_wait state and a system log full of the above errors.
I'm attaching two packet captures:
time_wait.cap - filtered a single complete tcp session that ended with
the port in a time_wait condition.
no_time_wait.cap - control capture for reference. This session closed
properly.
I can't claim to have the greatest understanding of 3-way / 4-way tcp
open / close handshakes. But, one thing that I did notice that seems to
be consistent among the sessions that end in time_wait is that the
fin-ack is initiated by the server. Possibly i'm reading it wrong, but
doesn't the client normally initiate the close? and the server does a
passive close? So, in theory the server should never have to wait for
the client.
Could someone more knowledgeable than me tell me why the server might
initiate the active close?
thanks,
-james
Attachment:
time_wait.cap
Description: application/cap
Attachment:
no_time_wait.cap
Description: application/cap