[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Problems with 4-way multi-master
- To: openldap-technical@openldap.org
- Subject: Problems with 4-way multi-master
- From: Mark <mah042@gmail.com>
- Date: Thu, 31 Mar 2011 21:06:13 -0500
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=6Tf+YP107U+z4UfoCEeM0lLpKuQ3MvoW0nDEBmHA3Zc=; b=petAcKAu473YIvy8f3cDUWu+8oTzWm4Rgzl8GDULNrZSl0fsRAYOB66IxmSNAUzXNB oCxT6BMhoFWEkXhcAm4l0McwtvSdwYrHeIYtvuhE7iHjsFsbUvyMyDxK2qXZiJzLISAX GFKWw6Ok22SvVUIH/BHNAqeJAeJlBHT1iPOGo=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=Zan8rIKzpiyhCu05ezZr1JVMPm8fzot+tjz2Ti9Rz0MGSd1tVC6BsP9bVOG6/a2gPO ADJ2Wi62k5+vsSPBLS9UeoMvTOusbfqOc98eNe9JidyXDs7Ia/a1nfdF24moXahNdcdP 0BSNkZGXxtSkLFmiHR3K/j6tsg6QzD/z2eBak=
I've been testing a 4-way multi-master setup using OpenLDAP 2.4.25 and I'm having some sporadic problems with it that I'm having difficulty diagnosing..
I have four identical RHEL 4.9 machines on the same switch (NTP syncronized to same stratum 2 servers):
 dual-core Xeon 5110 1.60GHz
 8GB RAM
 100Mb full-duplex NIC
 OpenLDAP 2.4.25, BDB 4.8.30, OpenSSL 1.0.0d, Cyrus SASL 2.1.23 (using no tls/ssl at this time)
I start the slapds with '-d conns,sync' then commence. I ldapadd 1000 DNs to one of the servers. After all the syncing has stopped I then compare the slapd contents against each other looking for differences. Occasionally there are as much as a couple hundred DNs missing from one or more of the instances. When that happens I've noticed that the mmaster with less DNs has lost its consumer connection to a mmaster provider (confirmed using lsof and netstat) and will never attempt a re-connect, but the provider still shows the connection (using lsof and netstat). When the consumer gets in this state I can connect to its cn=config and cn=monitor backends (and browse them) but when I try to connect to its multi-master'd backend the connection attempt just hangs. It's almost like the connect succeeds but the client is waiting for a response from the server (and never gets it). Also, the consumer slapd will not respond to a 'kill -TERM' at this time and must be 'kill -KILL'd. The same thing occurs sometimes when I delete the entire tree.
I've been trying to catch logging information that might help but so far nothing's jumping out at me. While I continue to try to reproduce and parse through logfiles maybe someone can look at my slapd.confs below and see if I might have configured something wrong (I'm listing the original slapd.conf files below, but I've used slaptest to convert them to slapd.d/cn=config.ldif format):
HOST1 slapd.conf:
include /tmp/openldap/multi-master/etc/schema/core.schema
include /tmp/openldap/multi-master/etc/schema/cosine.schema
include /tmp/openldap/multi-master/etc/schema/nis.schema
argsfile /tmp/openldap/multi-master/var/run/slapd.args
pidfile /tmp/openldap/multi-master/var/run/slapd.pid
threads 16
idletimeout 0
writetimeout 5
reverse-lookup off
timelimit time.soft=30 time.hard=300
sizelimit size.soft=500 size.hard=1000
password-hash {SSHA}
loglevel stats sync
serverid 001
modulepath /tmp/openldap/multi-master/libexec
moduleload back_monitor.la
moduleload back_hdb.la
moduleload syncprov.la
database config
rootdn cn=manager,cn=config
rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm
database monitor
rootdn cn=manager,cn=monitor
rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R
database hdb
suffix dc=example,dc=com
rootdn cn=manager,dc=example,dc=com
rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI
directory /tmp/openldap/multi-master/var/data/dc=example,dc=com
cachesize 30000
cachefree 5
checkpoint 128 15
dncachesize 25000
idlcachesize 100000
index objectClass eq
index entryCSN eq
index entryUUID eq
syncrepl rid=001
 provider=ldap://host2:1389
 type=refreshAndPersist
 interval=00:00:05:00
 retry="15 +"
 searchbase="dc=example,dc=com"
 binddn="cn=manager,dc=example,dc=com"
 credentials="example_pass"
 starttls=no
 schemachecking=off
syncrepl rid=002
 provider=ldap://host3:1389
 type=refreshAndPersist
 interval=00:00:05:00
 retry="15 +"
 searchbase="dc=example,dc=com"
 binddn="cn=manager,dc=example,dc=com"
 credentials="example_pass"
 starttls=no
 schemachecking=off
syncrepl rid=003
 provider=ldap://host4:1389
 type=refreshAndPersist
 interval=00:00:05:00
 retry="15 +"
 searchbase="dc=example,dc=com"
 binddn="cn=manager,dc=example,dc=com"
 credentials="example_pass"
 starttls=no
 schemachecking=off
HOST2 slapd.conf:
include /tmp/openldap/multi-master/etc/schema/core.schema
include /tmp/openldap/multi-master/etc/schema/cosine.schema
include /tmp/openldap/multi-master/etc/schema/nis.schema
argsfile /tmp/openldap/multi-master/var/run/slapd.args
pidfile /tmp/openldap/multi-master/var/run/slapd.pid
threads 16
idletimeout 0
writetimeout 5
reverse-lookup off
timelimit time.soft=30 time.hard=300
sizelimit size.soft=500 size.hard=1000
password-hash {SSHA}
loglevel stats sync
serverid 002
modulepath /tmp/openldap/multi-master/libexec
moduleload back_monitor.la
moduleload back_hdb.la
moduleload syncprov.la
database config
rootdn cn=manager,cn=config
rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm
database monitor
rootdn cn=manager,cn=monitor
rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R
database hdb
suffix dc=example,dc=com
rootdn cn=manager,dc=example,dc=com
rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI
directory /tmp/openldap/multi-master/var/data/dc=example,dc=com
cachesize 30000
cachefree 5
checkpoint 128 15
dncachesize 25000
idlcachesize 100000
index objectClass eq
index entryCSN eq
index entryUUID eq
syncrepl rid=001
 provider=ldap://host1:1389
 type=refreshAndPersist
 interval=00:00:05:00
 retry="15 +"
 searchbase="dc=example,dc=com"
 binddn="cn=manager,dc=example,dc=com"
 credentials="example_pass"
 starttls=no
 schemachecking=off
syncrepl rid=002
 provider=ldap://host3:1389
 type=refreshAndPersist
 interval=00:00:05:00
 retry="15 +"
 searchbase="dc=example,dc=com"
 binddn="cn=manager,dc=example,dc=com"
 credentials="example_pass"
 starttls=no
 schemachecking=off
syncrepl rid=003
 provider=ldap://host4:1389
 type=refreshAndPersist
 interval=00:00:05:00
 retry="15 +"
 searchbase="dc=example,dc=com"
 binddn="cn=manager,dc=example,dc=com"
 credentials="example_pass"
 starttls=no
 schemachecking=off
HOST3 slapd.conf:
include /tmp/openldap/multi-master/etc/schema/core.schema
include /tmp/openldap/multi-master/etc/schema/cosine.schema
include /tmp/openldap/multi-master/etc/schema/nis.schema
argsfile /tmp/openldap/multi-master/var/run/slapd.args
pidfile /tmp/openldap/multi-master/var/run/slapd.pid
threads 16
idletimeout 0
writetimeout 5
reverse-lookup off
timelimit time.soft=30 time.hard=300
sizelimit size.soft=500 size.hard=1000
password-hash {SSHA}
loglevel stats sync
serverid 003
modulepath /tmp/openldap/multi-master/libexec
moduleload back_monitor.la
moduleload back_hdb.la
moduleload syncprov.la
database config
rootdn cn=manager,cn=config
rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm
database monitor
rootdn cn=manager,cn=monitor
rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R
database hdb
suffix dc=example,dc=com
rootdn cn=manager,dc=example,dc=com
rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI
directory /tmp/openldap/multi-master/var/data/dc=example,dc=com
cachesize 30000
cachefree 5
checkpoint 128 15
dncachesize 25000
idlcachesize 100000
index objectClass eq
index entryCSN eq
index entryUUID eq
syncrepl rid=001
 provider=ldap://host1:1389
 type=refreshAndPersist
 interval=00:00:05:00
 retry="15 +"
 searchbase="dc=example,dc=com"
 binddn="cn=manager,dc=example,dc=com"
 credentials="example_pass"
 starttls=no
 schemachecking=off
syncrepl rid=002
 provider=ldap://host2:1389
 type=refreshAndPersist
 interval=00:00:05:00
 retry="15 +"
 searchbase="dc=example,dc=com"
 binddn="cn=manager,dc=example,dc=com"
 credentials="example_pass"
 starttls=no
 schemachecking=off
syncrepl rid=003
 provider=ldap://host4:1389
 type=refreshAndPersist
 interval=00:00:05:00
 retry="15 +"
 searchbase="dc=example,dc=com"
 binddn="cn=manager,dc=example,dc=com"
 credentials="example_pass"
 starttls=no
 schemachecking=off
HOST4 slapd.conf:
include /tmp/openldap/multi-master/etc/schema/core.schema
include /tmp/openldap/multi-master/etc/schema/cosine.schema
include /tmp/openldap/multi-master/etc/schema/nis.schema
argsfile /tmp/openldap/multi-master/var/run/slapd.args
pidfile /tmp/openldap/multi-master/var/run/slapd.pid
threads 16
idletimeout 0
writetimeout 5
reverse-lookup off
timelimit time.soft=30 time.hard=300
sizelimit size.soft=500 size.hard=1000
password-hash {SSHA}
loglevel stats sync
serverid 004
modulepath /tmp/openldap/multi-master/libexec
moduleload back_monitor.la
moduleload back_hdb.la
moduleload syncprov.la
database config
rootdn cn=manager,cn=config
rootpw {SSHA}yMFj3Y7KPh223NkkKLQsFeLUVm08Ckpm
database monitor
rootdn cn=manager,cn=monitor
rootpw {SSHA}vPVSN8o8eRnLdC/bGS7yDwQGeH4BHc0R
database hdb
suffix dc=example,dc=com
rootdn cn=manager,dc=example,dc=com
rootpw {SSHA}0obbsJw5Yq2XAkdd/kS7vokaB9rrSOtI
directory /tmp/openldap/multi-master/var/data/dc=example,dc=com
cachesize 30000
cachefree 5
checkpoint 128 15
dncachesize 25000
idlcachesize 100000
index objectClass eq
index entryCSN eq
index entryUUID eq
syncrepl rid=001
 provider=ldap://host1:1389
 type=refreshAndPersist
 interval=00:00:05:00
 retry="15 +"
 searchbase="dc=example,dc=com"
 binddn="cn=manager,dc=example,dc=com"
 credentials="example_pass"
 starttls=no
 schemachecking=off
syncrepl rid=002
 provider=ldap://host2:1389
 type=refreshAndPersist
 interval=00:00:05:00
 retry="15 +"
 searchbase="dc=example,dc=com"
 binddn="cn=manager,dc=example,dc=com"
 credentials="example_pass"
 starttls=no
 schemachecking=off
syncrepl rid=003
 provider=ldap://host3:1389
 type=refreshAndPersist
 interval=00:00:05:00
 retry="15 +"
 searchbase="dc=example,dc=com"
 binddn="cn=manager,dc=example,dc=com"
 credentials="example_pass"
 starttls=no
 schemachecking=off
Thank you.