[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: slurpd lockups, serialization of updates (ITS#3123)
--On Thursday, April 29, 2004 7:22 PM +0000 openldap-its@OpenLDAP.org wrote:
I've done more tests on this problem.
1) Our 2.2.6 production servers use heimdal-0.6 and cyrus-sasl 2.1.17. Our
test servers were using heimdal-0.6.1 and cyrus-sasl 2.1.18. So I rebuilt
2.2.11 against the older versions of Heimdal and Cyrus SASL, and redeployed
them onto the test servers. I got the same results (50% of the time, if I
start slurpd, one replica will not be updated when changes come in).
2) I rebuilt 2.2.11 slapd/slurpd linked against the 2.2.6 libraries:
ldd slurpd
libldap_r.so.199 => /usr/local/lib/libldap_r.so.199
libsasl2.so.2 => /usr/local/lib/libsasl2.so.2
libssl.so.0.9.7 => /usr/local/lib/libssl.so.0.9.7
libcrypto.so.0.9.7 => /usr/local/lib/libcrypto.so.0.9.7
libresolv.so.2 => /usr/lib/libresolv.so.2
libgen.so.1 => /usr/lib/libgen.so.1
libnsl.so.1 => /usr/lib/libnsl.so.1
libsocket.so.1 => /usr/lib/libsocket.so.1
libpthread.so.1 => /usr/lib/libpthread.so.1
libc.so.1 => /usr/lib/libc.so.1
liblber.so.199 => /usr/local/lib/liblber.so.199
libgcc_s.so.1 => /usr/local/lib/libgcc_s.so.1
libdl.so.1 => /usr/lib/libdl.so.1
libmp.so.2 => /usr/lib/libmp.so.2
libthread.so.1 => /usr/lib/libthread.so.1
/usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1
I still saw the same behavior.
One other new behavior I noticed tonight, is that when all the replica's
are updated, slurpd will exit. Sometimes it does this cleanly (no
slurpd.pid file left behind), sometimes uncleanly. But no stop signal was
ever issued to slurpd.
On the replica that is not replicated to, I see:
May 15 23:03:32 ldap-test2.Stanford.EDU slapd[13846]: [ID 848112
local4.debug] conn=1159 fd=10 ACCEPT from IP=171.67.16.99:36308
(IP=0.0.0.0:389)
May 15 23:03:32 ldap-test2.Stanford.EDU slapd[13846]: [ID 952275
local4.debug] conn=1159 fd=10 closed
Now, if the problem was a timeout issue (ie, the available mechanisms were
not sent back fast enough), I'd expect that < 1 second would not cause a
timeout to be hit, because this looks like the master is closing the
connection almost immediately.
Note that this connection was made 3 seconds after slurpd started:
root 24835 1 0 23:03:29 ? 0:00 /usr/local/lib/slurpd -t
/var/tmp
Here is how the replica's are defined in slapd.conf:
replica host=ldap-test3.stanford.edu:389
tls=yes bindmethod=sasl
binddn=cn=replicator,cn=service,cn=applications,dc=stanford,dc=edu
saslmech=gssapi
replica host=ldap-test2.stanford.edu:389
tls=yes bindmethod=sasl
binddn=cn=replicator,cn=service,cn=applications,dc=stanford,dc=edu
saslmech=gssapi
replica host=ldap-test1.stanford.edu:389
tls=yes bindmethod=sasl
binddn=cn=replicator,cn=service,cn=applications,dc=stanford,dc=edu
saslmech=gssapi
replogfile /var/log/replog
--Quanah
--
Quanah Gibson-Mount
Principal Software Developer
ITSS/TSS/Computing Systems
ITSS/TSS/Infrastructure Operations
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html