[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
RE: try_read1msg()/wait4msg() vs. buffered input (ITS#2153)
Your analysis makes sense. Retrying in try_read1msg sounds like the right
fix.
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support
> -----Original Message-----
> From: owner-openldap-bugs@OpenLDAP.org
> [mailto:owner-openldap-bugs@OpenLDAP.org]On Behalf Of
> jutta@sendmail.com
> Sent: Wednesday, October 23, 2002 9:51 PM
> To: openldap-its@OpenLDAP.org
> Subject: try_read1msg()/wait4msg() vs. buffered input (ITS#2153)
>
>
> Full_Name: Jutta Degener
> Version: 2.0.25 (still in 2.0.27, though)
> OS: Reproduced on Sol 6 and Debian Linux.
> URL:
> Submission from: (NULL) (209.246.26.36)
>
>
> I seem to be encountering an error in OpenLDAP 2.0.25 (which
> appears to be unchanged in 2.0.27 version) related to buffering
> of input and noticing the arrival of new results.
>
> Looking through the archives, this is a slightly more general
> form of the problem you ended up rejecting a patch for in
> ITS #446 in early 2000. I can see why you rejected the patch,
> but the problem still needs fixing.
>
> It is visible as "the system hangs" on systems that
>
> - use an IP connection to their LDAP server
> - call ldap_result with all=1
> - and have low overall traffic
>
> especially when reading through slightly slow connections that
> are likely to blur TCP message boundaries.
>
> To recap the issue of #446, the problem has to do with the interaction
> of wait4msg() and try_read1msg(), both in libldap/result.c, with the
> stacked, buffered BER input.
>
>
> WAIT4MSG:
>
> wait4msg() is used by ldap_result() to read and decode new
> results from the network. It should time out only if no more
> input is available.
>
> Provided a message isn't already decoded and waiting (which
> it isn't, in our scenario), wait4msg walks the list of all
> connections, queries each connection whether it has input
> waiting
>
> for ( lc = ld->ld_conns; lc != NULL; lc = lc->lconn_next ) {
> if ( ber_sockbuf_ctrl( lc->lconn_sb,
> LBER_SB_OPT_DATA_READY, NULL ) ) {
> rc = try_read1msg( ld, msgid, all, lc->lconn_sb,
> lc, result );
> break;
> }
> }
>
> If a connection has input waiting, try_read1msg() is given one
> chance to actually read the message; it then breaks out of the
> loop. If the result wasn't useful (e.g. try_readmsg() returned
> -2), wait4msg() then uses select() to wait for physical low-level
> input to arrive at the actual underlying file descriptor(s).
>
> Once input has arrived as reported by select, a similar loop
> happens:
>
> for ( lc = ld->ld_conns; rc == -2 && lc != NULL;
> lc = nextlc ) {
> nextlc = lc->lconn_next;
> if ( lc->lconn_status ==
> LDAP_CONNST_CONNECTED &&
> ldap_is_read_ready( ld,
> lc->lconn_sb )) {
> rc = try_read1msg( ld, msgid, all,
> lc->lconn_sb, lc, result );
> }
> }
>
> Here, every connection is queried once as well. The loop breaks
> as soon as one of the calls to try_read1msg() returns a result
> other than -2.
>
>
> I/O:
>
> Even without SSL, the input to the BER decoder engine is buffered.
> There is a low-level TCP reading layer; on top of it, a "lookahead"
> reader reads into a (in our experiment) 16k-buffer, and then hands
> out pieces of that buffer.
>
> The methods of the lookahead and the straight reader layer are all
> in liblber/sockbuf.c; calls to ber_sockbuf_add_io() (also in this
> file) are used in libldab/open.c:ldap_int_open_connection() to stack
> the readahead layer on top of the raw layers.
>
> This means that before calling select() to wait for new input,
> the using environment must make sure that there is no input waiting
> in the buffers of the I/O stack; the ldap_is_read_ready() and
> ber_sockbuf_ctrl() calls in the quotes above do preciselly that.
>
>
> TRY_READ1MSG
>
> The try_read1msg() function in result.c is the function used
> to actually read and decode results from the network.
>
> The wait4msg() caller would work with this I/O model if
> try_read1msg() returned only if either its input buffers
> need refilling or if it found a message that will be
> returned to the caller.
>
> But that's not always the case.
>
> When called with all=1, try_read1msg() will read and decode
> one BER-level message, and then either find a LDAP_RES_SEARCH_RESULT
> and return a message built around it, or find just a
> LDAP_RES_SEARCH_ENTRY (which precedes its terminating RESULT frame),
> and return -2, a value that means "continue looking".
>
> If at that point, the LDAP_RES_SEARCH_RESULT frame has
> already been read into the lookahead-buffer, the caller
> will go into a select() waiting for _additional_ input before
> trying to read the search result. It tried to read input,
> it didn't find input that it could use, and instead of
> trying to read again (which would succeed and return the
> LDAP_RES_SEARCH_RESULT), it waits.
>
>
> FIX
>
> I think the retry loop implied by try_read1msg() signalling
> "continue looking" needs to move down into try_read1msg().
> It must not return unless it either found something the
> caller can use or actually needs the caller to wait for
> more input.
>
> That implies that when try_read1msg() would currently
> return -2 because it couldn't use what it just decoded,
> it needs to instead check its connection for additional
> pending input, and needs to retry if input is available.
>
> Please let me know if I overlooked something obvious, or
> if you need more information.
>
> Jutta Degener <jutta@sendmail.com>
>
>