[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
(ITS#6639) syncrepl failing using SASL/GSSAPI
Full_Name: Bill MacAllister
Version: 2.4.23-1
OS: Debian Lenny
URL: http://www.stanford.edu/~whm/files/ldap-replication-error.txt
Submission from: (NULL) (99.55.172.208)
I just replaced a 2.3.43 OpenLDAP master with a 2.4.23-1 master. There are 6
slaves all running 2.4.23-1. After reloading the master database, I confirmed
that replication was working my making some changes and then examining
contextCSNs. Sometime later replication stopped on one of the servers. Looking
at the log on the failing slave I saw:
Sep 2 22:12:53 directory3 slapd[7005]: do_syncrep2: rid=000
LDAP_RES_INTERMEDIATE - SYNC_ID_SET
Sep 2 22:12:53 directory3 slapd[7005]: do_syncrep2: rid=000 (-1) Can't contact
LDAP server
Sep 2 22:12:53 directory3 slapd[7005]: do_syncrepl: rid=000 rc -1 retrying
On the master I saw:
Sep 2 22:12:53 directory-master0 slapd[15145]: conn=3764 fd=38 ACCEPT from
IP=171.67.218.156:44519 (IP=0.0.0.0:389)
Sep 2 22:12:53 directory-master0 slapd[15145]: conn=3764 op=0 BIND dn=""
method=163
Sep 2 22:12:53 directory-master0 slapd[15145]: conn=3764 op=0 RESULT tag=97
err=14 text=SASL(0): successful result:
Sep 2 22:12:53 directory-master0 slapd[15145]: conn=3764 op=1 BIND dn=""
method=163
Sep 2 22:12:53 directory-master0 slapd[15145]: conn=3764 op=1 RESULT tag=97
err=14 text=SASL(0): successful result:
Sep 2 22:12:53 directory-master0 slapd[15145]: conn=3764 op=2 BIND dn=""
method=163
Sep 2 22:12:53 directory-master0 slapd[15145]: conn=3764 op=2 BIND
authcid="ldap/directory3.stanford.edu@stanford.edu"
authzid="ldap/directory3.stanford.edu@stanford.edu"
Sep 2 22:12:53 directory-master0 slapd[15145]: conn=3764 op=2 BIND
dn="cn=directory3,cn=ldap,cn=operational,dc=stanford,dc=edu" mech=GSSAPI
sasl_ssf=56 ssf=56
Sep 2 22:12:53 directory-master0 slapd[15145]: conn=3764 op=2 RESULT tag=97
err=0 text=
Sep 2 22:12:53 directory-master0 slapd[15145]: conn=3764 op=3 SRCH
base="cn=accesslog" scope=2 deref=0
filter="(&(objectClass=auditWriteObject)(reqResult=0))"
Sep 2 22:12:53 directory-master0 slapd[15145]: conn=3764 op=3 SRCH attr=reqDN
reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN
Sep 2 22:12:53 directory-master0 slapd[15145]: conn=3764 fd=38 closed
(connection lost on write)
Running slapd on the failing slave with -d -1 I saw:
=>do_syncrep2 rid=000
ldap_result ld 0x1c07000 msgid 4
wait4msg ld 0x1c07000 msgid 4 (timeout 0 usec)
wait4msg continue ld 0x1c07000 msgid 4 all 0
** ld 0x1c07000 Connections:
* host: directory-master.stanford.edu port: 389 (default)
refcnt: 2 status: Connected
last used: Thu Sep 2 22:12:53 2010
** ld 0x1c07000 Outstanding Requests:
* msgid 4, origid 4, status InProgress
outstanding referrals 0, parent count 0
ld 0x1c07000 request count 1 (abandoned 0)
** ld 0x1c07000 Response Queue:
Empty
ld 0x1c07000 response count 0
ldap_chkResponseList ld 0x1c07000 msgid 4 all 0
ldap_chkResponseList returns ld 0x1c07000 NULL
ldap_int_select
read1msg: ld 0x1c07000 msgid 4 all 0
ber_get_next
sasl_generic_read: want=4, got=4
0000: 00 00 00 78 ...x
sasl_generic_read: want=120, got=120
... lots of output ...
sasl_generic_read: want=45833, got=0
ldap_read: want=8, got=0
ber_get_next failed.
ldap_err2string
do_syncrep2: rid=000 (-1) Can't contact LDAP server
I am including a pointer to the complete debug log.
Looks like something in the sasl support is unhappy. We are using
libsasl2-modules-gssapi-heimdal 2.1.23.dfsg1-5~sbp50+2.
Strange to me that this one server is having a problem. There are actually 5
slaves that are working just fine. Here are current contextCSNs for the
systems.
directory-master contextCSN: 20100903060040.713677Z#000000#000#000000
directory1 contextCSN: 20100903060040.713677Z#000000#000#000000
directory2 contextCSN: 20100903060040.713677Z#000000#000#000000
directory3 contextCSN: 20100903015531.064407Z#000000#000#000000
directory4 contextCSN: 20100903060040.713677Z#000000#000#000000
directory11 contextCSN: 20100903060040.713677Z#000000#000#000000
directory12 contextCSN: 20100903060040.713677Z#000000#000#000000
What else should I look at?
Bill