[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
syncrepl broke, connection loss
Hi,
I've loaded my mirror mode setup with data and let it run for a few day,
Both cn=config and the application database is mirrored.
Only server1 is receiving writes from the application.
OpenLDAP 2.4.20, BDB 4.8
After about 6 hours the mirror partly broke and I experience 3 symptoms:
1)
The syncrepl connection from server1->server2 for the application
database is missing and data only flows from server1 to server2 - not
the other way. The cn=config connections exists.
$ netstat -tna # shows
tcp 0 0 192.168.0.102:636 0.0.0.0:* LISTEN
tcp 8125 0 192.168.0.102:45535 192.168.0.101:636 ESTABLISHED
tcp 0 0 192.168.0.102:636 192.168.0.101:34954 ESTABLISHED
tcp 0 0 192.168.0.102:45537 192.168.0.101:636 ESTABLISHED
Where it should show, something like:
tcp 0 0 192.168.0.101:636 0.0.0.0:* LISTEN
tcp 0 0 192.168.0.101:34954 192.168.0.102:636 ESTABLISHED
tcp 261 0 192.168.0.101:33409 192.168.0.102:636 ESTABLISHED
tcp 0 0 192.168.0.101:636 192.168.0.102:45537 ESTABLISHED
tcp 0 0 192.168.0.101:636 192.168.0.102:33226 ESTABLISHED
2)
Meanwhile the log on server1 says:
Dec 8 02:04:03 server1 slapd[6863]: do_syncrepl: rid=004 rc -1 retrying
Dec 8 02:05:03 server1 slapd[6863]: do_syncrepl: rid=004 rc -2 retrying
Dec 8 02:06:03 server1 slapd[6863]: do_syncrepl: rid=004 rc -2 retrying
etc...
The first such entry appear around 6 hours after start of the mirror.
3)
If I try to change cn=config with ldapmodify on either server, server1
will hang, not answering queries until I restart it.
For instance, if I do:
----------
dn: cn=config
changetype: modify
replace: olcLogLevel
olcLogLevel: None sync
-----------
... it'l hang.
I was able to connect and search the database on both server, to both
servers like (on server1), using client certs:
ldapsearch -H ldaps://server2/ -YEXTERNAL -b cn=data,dc=example,dc=com
-s sub -D cn=config '(cn=*)' + \*
So it's not that the TCP connection can't be established.
Which make me suspect that this is related to this thread:
http://www.mail-archive.com/openldap-software@openldap.org/msg16028.html
Now after 27 hours the connection finally came back by it self, and
replication works both ways.
The "rc -2 retrying" in the log on server1 stopped and was replaced by:
Dec 8 15:39:34 server1 slapd[11177]: do_syncrepl: rid=004 rc -2 retrying
Dec 8 15:40:34 server1 slapd[11177]: do_syncrepl: rid=004 rc -2 retrying
Dec 8 15:42:15 server1 slapd[11177]: => bdb_idl_insert_key: c_put id
failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994)
Dec 8 15:47:05 server1 slapd[11177]: => bdb_idl_delete_key: c_del id
failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994)
Dec 8 15:47:05 server1 slapd[11177]: conn=15694 op=16: attribute
"entryCSN" index delete failure
Dec 8 15:47:06 server1 slapd[11177]: => bdb_idl_delete_key: c_del id
failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994)
Dec 8 15:47:06 server1 slapd[11177]: conn=15569 op=36: attribute
"entryCSN" index delete failure
... and a bit more of the same.
Trying to modify cn=config with ldapmodify still makes server1 (and
ldapmodify) hang though.
/Peter