[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: issue with bad data ? In MMR setup
Daniel Jung wrote:
hi folks,
Ran into the following on the slaves while replicating:
mdb_id2entry_put: mdb_put failed: MDB_PAGE_FULL: Internal error - page has
no more space(-30786)
null_callback : error code 0x50
syncrepl_entry: rid=407 be_modify failed (80)
This should never happen. Unfortunately some earlier LMDB releases had bugs
related to delete operations that might trigger this.
My previous posted issue happened while replicating an entry that is identical
to new problem.
I dont think this is a coincident that sync replication failed while modifying
a specific DN.
This issue was only visible in some slaves and not all the slaves.
Any idea as to how i could go about troubleshooting this? I did manual
changes to the this specific DN and replication works without issue.
On Sat, Jun 14, 2014 at 3:56 AM, Daniel Jung <mimianddaniel@gmail.com
<mailto:mimianddaniel@gmail.com>> wrote:
Hi,
Ldap daemon was being restarted every so many minutes. All the consumers
were out of sync and had to be re-synced. This specific master in question
in MMR setup was restored from other master and the issue went away.
running 2.4.37 on centos6 with hdb backend on the masters and lmdb on the
consumers.
Searching thru the list shows a lot of hits with "too old", AFAIK ntp is
kept quite closely. serverid "000" no longer exists as it was
decomissioned since last year, hence contextcsn is really old. Not sure
if that played a role in this havoc or not. Could you tell me what "srs"
and "log" means in the context below?
"srs csn" is the CSN from a consumer's cookie. "log csn" is a CSN from the
syncprov session log. If serverID 000 has been decomissioned, you probably
should delete its CSN from your contextCSN attribute on both consumer and
provider. Since syncprov always tries to send changes to a consumer based on
the oldest CSN, you're alwyas going to be plowing thru a lot of old updates
with this.
Following is what I found in the log, and there were a lot of these which
probably contributed to restart of the daemon:
Jun 14 00:05:21 name of the server slapd[16745]: srs csn
20131226183611.000000Z#000000#000#000000
Jun 14 00:05:21 name of the server slapd[16745]: log csn
20131206192447.000000Z#000000#000#000000
Jun 14 00:05:21 name of the server slapd[16745]: cmp -2, too old
Jun 14 00:05:21 name of the server slapd[16745]: log csn
20131206193513.000000Z#000000#000#000000
Jun 14 00:05:21 name of the server slapd[16745]: cmp -2, too old
</snip>
Jun 14 00:05:59 name of the server slapd[16745]: do_syncrep2: rid=0
01 (-1) Can't contact LDAP server
</snip>
Jun 14 00:06:15 name of the server slapd[16745]: log csn
20131229125124.532456Z#000000#001#000000
Jun 14 00:06:15 name of the server slapd[16745]: cmp -256, too old
Jun 14 00:06:15 name of the server slapd[16745]: log csn
20131229125143.680121Z#000000#001#000000
Jun 14 00:06:15 name of the server slapd[16745]: cmp -256, too old
Jun 14 00:06:15 name of the server slapd[16745]: log csn 2013122913
<tel:2013122913>
</snip>
Jun 14 00:06:59 name of the server slapd[31392]: do_syncrep2: rid=000
LDAP_RES_INTERMEDIATE - SYNC_ID_SET
Jun 14 00:06:59 name of the server slapd[31392]: do_syncrep2: rid=000
cookie=rid=000,sid=002,csn=20140613220035.981531Z#000000#001#000000
Jun 14 00:06:59 name of the server slapd[31392]: do_syncrep2: rid=000
LDAP_RES_INTERMEDIATE - REFRESH_DELETE
</snip>
thank you
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/