[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: errant SASL/GSSAPI setup?
As I mailed yesterday, I've been seeing my slaves not get updates once their
initial kerberos ticket period expires.
On the server, I see this, using -d-1 on the master:
"
SASL [conn=1] Failure: GSSAPI Error: The context has expired (No error)
sb_sasl_write: failed to encode packet: generic failure
ldap_write: want=11176 error=Input/output error
ber_flush failed errno=5 reason="Input/output error"
connection_closing: readying conn=1 sd=13 for close
"
This makes sense. The context has expired.
Anyway, I noticed the "readying" condition of closure and thought about that.
I never did see an actual "close".
Looking for that phrase, I found it in openldap-2.3.27/servers/slapd/connection.c
I made a rude change to the programming:
utility5:/tools/src/openldap/RHEL4-i686# diff -c openldap-2.3.27/servers/slapd/connection.c{~,}
*** openldap-2.3.27/servers/slapd/connection.c~ Wed Apr 5 14:08:48 2006
--- openldap-2.3.27/servers/slapd/connection.c Thu Aug 31 09:36:18 2006
***************
*** 961,966 ****
--- 961,969 ----
/* wake write blocked operations */
slapd_clr_write( sd, 1 );
+
+ connection_close ( c );
+
if ( c->c_writewaiter ) {
ldap_pvt_thread_cond_signal( &c->c_write_cv );
ldap_pvt_thread_mutex_unlock( &c->c_mutex );
i.e. I just did a hard close of the connection in the "connection_closing"
routine.
This isn't something I'd want to do in my production servers, but I tried it in
my test environment.
The server shows this now:
SASL [conn=11] Failure: GSSAPI Error: The context has expired (No error)
sb_sasl_write: failed to encode packet: generic failure
ldap_write: want=11219 error=Input/output error
ber_flush failed errno=5 reason="Input/output error"
connection_closing: readying conn=11 sd=14 for close
connection_close: conn=11 sd=14
since the closure is forced now, rather than "readied".
The slave, -d 16384, shows this:
do_syncrep2 : Can't contact LDAP server
sb_sasl_write: failed to encode packet: generic failure
*** some time delay ***
request done: ld 0x928fb0 msgid 1
request done: ld 0x928fb0 msgid 2
request done: ld 0x928fb0 msgid 3
request done: ld 0x928fb0 msgid 4
syncrepl_entry: LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
syncrepl_entry: be_search (0)
syncrepl_entry: wpieduPersonUUID=2af586df6800b3389cbe7bcbf2a920df,ou=People,dc=WPI,dc=EDU
syncrepl_entry: be_modify (0)
do_syncrep2: LDAP_RES_INTERMEDIATE - REFRESH_DELETE
i.e. The client sees the closure and connects again, performing any needed
updates. This is the desired action.
I believe the problem is that the closure is scheduled, but the connection
doesn't get to a state where it is tested for that impending closure.
The call for the actual close (connection_close) is done in connection_read and
connection_resched. It's probable that a master just doesn't get to these
states when dealing with a syncrepl slave.
The test "fix" I did was pretty brutal, and I was surprised that the servers
(slave and master) worked with it. However, it did eliminate the hangup when
dealing with syncrepl slaves connected via GSSAPI.
As a wild guess, I think I may be seeing this problem since I may be using a
different version sasl (2.1.21) than others. It could be that an older sasl
may not have been checking ticket age.