[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Syncrepl refreshOnly replication failures
Openldap,
We recently upgraded our openldap master from 2.2.30 to 2.3.30. Most
of our replicas (we have some 5 production and 3 or 4 test replicas)
are still 2.2.x versions of openldap. We did testing of the syncrepl
refreshOnly replication across these versions before the upgrade and
the replication seemed to work fine.
After the upgrade to our production master we had problems with bdb
lock exhaustion - as I've noticed others have run into. With the new
master this problem manifested itself in the master seeming to "loop"
consuming CPU while trying to support replication. However, it was
still able to support direct reads and writes - unlike the 2.2.x
master that just hung in that circumstance. I'm not sure such
resiliency in the face of its replication failures was a good thing.
Regardless, we increased the number of available bdb locks, did a bdb
recovery, and the restarted master has been stable since.
However, after that time we noticed that some of the replicas had
parts of their directories that weren't being replicated from the
master. In an effort to deal with this problem we reinitialized our
replicas (zeroed out their DBs and re initialized from the master).
So far so good after that point.
My question to this list is, does anybody know if this re
initialization will suffice? That is, do we (for example):
1. Also need to reinitialize our master (e.g. rebuild it from an
ldif or from another replica).
2. Also need to upgrade the software versions for all our replicas
(e.g. to 2.3.x).
3. ? Is there something else we need to do to insure that our
replicas will be faithful to our new master.
#1 and #2 above impose a significant costs in our environment -
different organizational entities with independently administered
openldap servers. #1 requires a restart of all the replicas. #2 of
course has the much more significant cost of upgrading all the replicas. #3?
Any thoughts appreciated. Our understanding of the progressing
syncrepl implementations isn't sufficient to allow us to distinguish
between the above.
--Jed http://www.nersc.gov/~jed/