[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: syncrepl: consumer state is newer than provider

To: "Mahadevan, Venkatasubramanian" <Venkatasubramanian.Mahadevan@ubc.ca>
Subject: Re: syncrepl: consumer state is newer than provider
From: Howard Chu <hyc@symas.com>
Date: Tue, 02 Aug 2011 14:35:27 -0700
Cc: "'openldap-technical@openldap.org'" <openldap-technical@openldap.org>, Chris Jacobs <Chris.Jacobs@apollogrp.edu>
In-reply-to: <6DBC801F1371BD4B8A75D2004D5A572F07FE1E1589@mbx1.mercury.ad.ubc.ca>
References: <6DBC801F1371BD4B8A75D2004D5A572F07FE1E1473@mbx1.mercury.ad.ubc.ca>, <6C447584419BFE4E83D46E88F81314866FFFC29196@EXCH07-05.apollogrp.edu> <6DBC801F1371BD4B8A75D2004D5A572F07FE1E1589@mbx1.mercury.ad.ubc.ca>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0a1) Gecko/20110612 Firefox/7.0a1 SeaMonkey/2.4a1

Mahadevan, Venkatasubramanian wrote:

Hi David,

Thanks much for your response.
That's what I did but when I do that it seems to take forever to recover
using syncrepl as it goes through all the entries in the databases comparing
CSNs. So what I did was stop slapd and rebuild the database using slapadd
with the -w option to preserve syncrepl information. After that, replication
started working again, but it's a less than ideal way to recover from a replication
failure. Perhaps the inherent nature of 2 master servers being updated leads to
replication conflicts whereby the 2 servers get stuck in an infinite loop because their
contextCSN values are out of sync?


Next time try the slapd -c option.

cheers,

Ven

________________________________________
From: Chris Jacobs [Chris.Jacobs@apollogrp.edu]
Sent: Monday, August 01, 2011 8:33 AM
To: Mahadevan, Venkatasubramanian; 'openldap-technical@openldap.org'
Subject: Re: syncrepl: consumer state is newer than provider

Apologies for top posting - blackberry.

Short term fix:
Pick a server, take it offline (stop slapd).
Clear it's database - be careful to not delete any db config files.
Start it back up.

If this happens again, then you'll want to up logging, etc. There's plenty of info on how to trouble shoot openldap.

Note: I'm a sysadmin, not a systems engineer. It's possible the actual reason this broke is clear in your current logs, but not to me.

- chris

Chris Jacobs, Systems Administrator, Technology Services Group
Apollo Group | Apollo Marketing and Product Development?? |?? Aptimus, Inc.
2001 6th Ave?? |?? Suite 3200?? |?? Seattle, WA 98121
direct 206.839.8245?? |?? cell 206.601.3256?? |?? fax 206.839.8106
email chris.jacobs@apollogrp.edu

________________________________
From: openldap-technical-bounces@OpenLDAP.org<openldap-technical-bounces@OpenLDAP.org>
To: openldap-technical@openldap.org<openldap-technical@openldap.org>
Sent: Fri Jul 29 14:03:06 2011
Subject: syncrepl: consumer state is newer than provider

Hello,

I have 2 OpenLDAP servers with the following configuration:

-- OpenLDAP 2.4.26-Release running on Red Hat Enterprise 5.5
-- The two servers are setup in a mirrored multi-master configuration. Below is the
relevant portion of the slapd.conf:


server1
----------
syncrepl rid=002
provider=ldaps://server2
type=refreshAndPersist
retry="5 5 300 +"
searchbase="o=ourdomain.ca"
attrs="*,+"
bindmethod=simple
binddn="cn=Replication Manager,o=ubc.ca"
credentials=something

mirrormode TRUE
overlay syncprov
syncprov-checkpoint 100 10

server2
----------
syncrepl rid=001
provider=ldaps://server1
type=refreshAndPersist
retry="5 5 300 +"
searchbase="o=ourdomain.ca"
attrs="*,+"
bindmethod=simple
binddn="cn=Replication Manager,o=ubc.ca"
credentials=something

mirrormode TRUE
overlay syncprov
syncprov-checkpoint 100 10

The servers have their clocks synchronized using ntp. Below is the output of ntpq:

server1
----------
ntpq>  peer
      remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+hub.ubc.ca      93.113.2.250     3 u  594 1024  377    1.252    1.110   1.520
*dns3.ubc.ca     192.53.103.108   2 u   92 1024  377    1.648    2.670   0.157

server2
----------
ntpq>  peer
      remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+hub.ubc.ca      93.113.2.250     3 u  332 1024  377    0.706    3.487   0.900
*dns3.ubc.ca     192.53.103.108   2 u  325 1024  377    1.631    3.668   0.022


As far as I can tell the clocks appear to be in sync with each other, so hopefully this is not a cause of
the replication issues I am having.

The problem is that the servers are now refusing to synchronize with each other (replication was working
before) but not it does not. The log files on the servers are filled with entries like:

server1
----------
Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT
Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform
Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 (53) Server is unwilling to perform
Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)"
Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH attr=* +
Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!

server2
----------
Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT
Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform
Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 (53) Server is unwilling to perform
Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)"
Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH attr=* +
Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!


So it is looking like the ContextCSN cookies on both servers are out of sync. Digging further into this, I did a search for the ContextCSN values on  both servers and got the following values:

server1
----------
20110729165747.697237Z#000000#001#000000;20110726161604.535176Z#000000#002#000000

server2
----------
20110728220449.050499Z#000000#001#000000;20110728223211.933995Z#000000#002#000000


So my question is: how does one get the server synchronization cookies back into sync and ensure that replication is restarted succesfully again?
As of now, all I see is the log files filling up with messages as shown above and the sync cookies not being updated. Any help or pointers are appreciated. Thanks!

cheers,

Ven

________________________________
This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.



--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Follow-Ups:
- RE: syncrepl: consumer state is newer than provider
  - From: "Mahadevan, Venkatasubramanian" <Venkatasubramanian.Mahadevan@ubc.ca>

References:
- syncrepl: consumer state is newer than provider
  - From: "Mahadevan, Venkatasubramanian" <Venkatasubramanian.Mahadevan@ubc.ca>
- Re: syncrepl: consumer state is newer than provider
  - From: Chris Jacobs <Chris.Jacobs@apollogrp.edu>
- RE: syncrepl: consumer state is newer than provider
  - From: "Mahadevan, Venkatasubramanian" <Venkatasubramanian.Mahadevan@ubc.ca>

Prev by Date: RE: syncrepl: consumer state is newer than provider
Next by Date: RE: syncrepl: consumer state is newer than provider
Index(es):
- Chronological
- Thread