[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
RE: syncrepl: consumer state is newer than provider
Hi Howard,
I have tried the slapd -c option with a rid value, and it
also tries to resync the entire directory when doing that
while comparing CSNs. There is also a cid value which can
be passed to the -c option, but I was unable to find an
example of what to pass in there. Is it just a contextCSN value?
Thanks.
cheers,
Ven
-----Original Message-----
From: Howard Chu [mailto:hyc@symas.com]
Sent: August-02-11 2:35 PM
To: Mahadevan, Venkatasubramanian
Cc: Chris Jacobs; 'openldap-technical@openldap.org'
Subject: Re: syncrepl: consumer state is newer than provider
Mahadevan, Venkatasubramanian wrote:
> Hi David,
>
> Thanks much for your response.
> That's what I did but when I do that it seems to take forever to
> recover using syncrepl as it goes through all the entries in the
> databases comparing CSNs. So what I did was stop slapd and rebuild the
> database using slapadd with the -w option to preserve syncrepl
> information. After that, replication started working again, but it's a
> less than ideal way to recover from a replication failure. Perhaps the
> inherent nature of 2 master servers being updated leads to replication
> conflicts whereby the 2 servers get stuck in an infinite loop because their contextCSN values are out of sync?
Next time try the slapd -c option.
> cheers,
>
> Ven
>
> ________________________________________
> From: Chris Jacobs [Chris.Jacobs@apollogrp.edu]
> Sent: Monday, August 01, 2011 8:33 AM
> To: Mahadevan, Venkatasubramanian; 'openldap-technical@openldap.org'
> Subject: Re: syncrepl: consumer state is newer than provider
>
> Apologies for top posting - blackberry.
>
> Short term fix:
> Pick a server, take it offline (stop slapd).
> Clear it's database - be careful to not delete any db config files.
> Start it back up.
>
> If this happens again, then you'll want to up logging, etc. There's plenty of info on how to trouble shoot openldap.
>
> Note: I'm a sysadmin, not a systems engineer. It's possible the actual reason this broke is clear in your current logs, but not to me.
>
> - chris
>
> Chris Jacobs, Systems Administrator, Technology Services Group Apollo
> Group | Apollo Marketing and Product Development?? |?? Aptimus, Inc.
> 2001 6th Ave?? |?? Suite 3200?? |?? Seattle, WA 98121 direct
> 206.839.8245?? |?? cell 206.601.3256?? |?? fax 206.839.8106 email
> chris.jacobs@apollogrp.edu
>
> ________________________________
> From:
> openldap-technical-bounces@OpenLDAP.org<openldap-technical-bounces@Ope
> nLDAP.org>
> To: openldap-technical@openldap.org<openldap-technical@openldap.org>
> Sent: Fri Jul 29 14:03:06 2011
> Subject: syncrepl: consumer state is newer than provider
>
> Hello,
>
> I have 2 OpenLDAP servers with the following configuration:
>
> -- OpenLDAP 2.4.26-Release running on Red Hat Enterprise 5.5
> -- The two servers are setup in a mirrored multi-master configuration.
> Below is the relevant portion of the slapd.conf:
>
>
> server1
> ----------
> syncrepl rid=002
> provider=ldaps://server2
> type=refreshAndPersist
> retry="5 5 300 +"
> searchbase="o=ourdomain.ca"
> attrs="*,+"
> bindmethod=simple
> binddn="cn=Replication Manager,o=ubc.ca"
> credentials=something
>
> mirrormode TRUE
> overlay syncprov
> syncprov-checkpoint 100 10
>
> server2
> ----------
> syncrepl rid=001
> provider=ldaps://server1
> type=refreshAndPersist
> retry="5 5 300 +"
> searchbase="o=ourdomain.ca"
> attrs="*,+"
> bindmethod=simple
> binddn="cn=Replication Manager,o=ubc.ca"
> credentials=something
>
> mirrormode TRUE
> overlay syncprov
> syncprov-checkpoint 100 10
>
> The servers have their clocks synchronized using ntp. Below is the output of ntpq:
>
> server1
> ----------
> ntpq> peer
> remote refid st t when poll reach delay offset jitter
> ======================================================================
> ========
> +hub.ubc.ca 93.113.2.250 3 u 594 1024 377 1.252 1.110 1.520
> *dns3.ubc.ca 192.53.103.108 2 u 92 1024 377 1.648 2.670 0.157
>
> server2
> ----------
> ntpq> peer
> remote refid st t when poll reach delay offset jitter
> ======================================================================
> ========
> +hub.ubc.ca 93.113.2.250 3 u 332 1024 377 0.706 3.487 0.900
> *dns3.ubc.ca 192.53.103.108 2 u 325 1024 377 1.631 3.668 0.022
>
>
> As far as I can tell the clocks appear to be in sync with each other,
> so hopefully this is not a cause of the replication issues I am having.
>
> The problem is that the servers are now refusing to synchronize with
> each other (replication was working
> before) but not it does not. The log files on the servers are filled with entries like:
>
> server1
> ----------
> Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002
> LDAP_RES_SEARCH_RESULT Jul 29 13:48:54 ldapdev1 slapd[11989]:
> do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT (53) Server is unwilling
> to perform Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 (53) Server is unwilling to perform Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)"
> Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH attr=* +
> Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!
>
> server2
> ----------
> Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001
> LDAP_RES_SEARCH_RESULT Jul 29 13:50:52 ldapdev2 slapd[7996]:
> do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT (53) Server is unwilling
> to perform Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 (53) Server is unwilling to perform Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)"
> Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH attr=* + Jul
> 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!
>
>
> So it is looking like the ContextCSN cookies on both servers are out of sync. Digging further into this, I did a search for the ContextCSN values on both servers and got the following values:
>
> server1
> ----------
> 20110729165747.697237Z#000000#001#000000;20110726161604.535176Z#000000
> #002#000000
>
> server2
> ----------
> 20110728220449.050499Z#000000#001#000000;20110728223211.933995Z#000000
> #002#000000
>
>
> So my question is: how does one get the server synchronization cookies back into sync and ensure that replication is restarted succesfully again?
> As of now, all I see is the log files filling up with messages as shown above and the sync cookies not being updated. Any help or pointers are appreciated. Thanks!
>
> cheers,
>
> Ven
>
> ________________________________
> This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.
>
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/