[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
send_search_entry aborts b/c ber_flush fails with errno=0 (ITS#1891)
Full_Name: Gareth Bestor
Version: 2.0.14 (also 2.0.23?)
OS: Linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (129.42.208.144)
Using openldap 2.0.14 as part of Globus Grid Toolkit (www.globus.org).
When doing slapd queries of more than a couple of machines we observe
intermittent
"Can't contact LDAP server failue" half way through receiving the data and the
query is aborted. I traced the problem to sb_write failing but returning
errno=0
(I still need to find out why). The particular error scenario causes ber_flush
to fail. eg
Jun 14 16:30:16 pygar slapd[18648]: ber_flush failed errno=0 reason="Success"
which in turn causes send_search_entry to fail w/
May 23 13:09:03 c279lx01 slapd[20424]: send_ldap_response: ber write failed
which aborts the LDAP query.
A fix/workaround that I tried is to change in ber_flush(),
if ( err != EWOULDBLOCK && err != EAGAIN ) {
to
if ( err != EWOULDBLOCK && err != EAGAIN && err != LDAP_SUCCESS
) {
that is, if ber_int_sb_write() fails but returns errno=0 then re-try sending,
rather than abort. Tested the fix and it seems to work. I looked at the 2.0.23
source,
which also has the former in ber_flush, so the fix may be generally applicable.
A few other minor things you might want to consider:
In slapd/result.c/send_search_entry()
Debug( LDAP_DEBUG_ANY, "send_ldap_response: ber write failed\n",0,0,0)
should be
Debug( LDAP_DEBUG_ANY, "send_search_entry: ber write failed\n",0,0,0)
This erorr is misleading because there is a identical error message reported
in the *real* send_ldap_response()..
In ber_flush, the following loop attempts to send data even when there is no
data
to send, eg if to_write=0 (such as when ber_rwprt=NULL)
do {
rc = ber_int_sb_write( sb, ber->ber_rwptr, towrite );
if (rc<=0) {
return -1;
}
towrite -= rc;
nwritten += rc;
ber->ber_rwptr += rc;
} while ( towrite > 0 );
This might result in a misleading error condition being reported b/c
ber_int_sb_write will return <= 0. If there is no data to send then perhaps the
following would be better instead
while ( towrite > 0 ) {
rc = ber_int_sb_write( sb, ber->ber_rwptr, towrite );
if (rc<=0) {
return -1;
}
towrite -= rc;
nwritten += rc;
ber->ber_rwptr += rc;
}
ie test for data to send BEFORE sending it rather than after. If ber_flush
somehow gets called with no data left to send the ber_int_sb_write()<=0 'hard
error'
could potentially result get propogated up, causing a successfully
completed query to abort right at the end.
I'm currently running 2.0.14 but noticed the same issues in 2.0.23 source.