[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
New test054-syncrepl-asymmetric
I have created a test script that shows the contextCSN propagation
problems we have, as well as a patch that fixes most of them. But
before I commit anything or files more ITSs I'd like to receive
comments. The test script and patch can be found at:
ftp://ftp.openldap.org/incoming/test054-syncrepl-asymmetric
ftp://ftp.openldap.org/incoming/test054-syncrepl-asymmetric.patch
The comments in the scripts should hopefully explain the configuration.
The patch adds support for the newCookie sync info protocol messages,
and sends them when the contextCSN is being updated without any changes
being sent from the syncprov provider (i.e when the entries that changed
didn't match the syncrepl filter and/or was forbidden by ACL rules). The
queuing of the CSNs done in syncrepl_updateCookie should not be
necessary with this patch, it should be removed after the patch has been
committed. The bug reported in ITS#5710 is also worked around by this
patch, but defining SLAP_SYNC_UPDATE_MSGID with an unique value would be
better.
This test requires the proposed patch to fix ITS#5572 to be applied.
Dynamically creating the configuration doesn't work as expected without
it, and the number of errors reported by the script increases from 10 to
25. Mostly due to the replication starting in an unexpected state, and
several restarts and writes are needed before an expected state is
reached. This patch can be found at:
ftp://ftp.openldap.org/incoming/ITS5572.patch
The test script finds 10 errors (assuming the race condition discussed
below is actually hit) after the ITS#5572 patch has been applied With
both patches I'm left with 3 errors, all related to the race condition.
Howard, you'll probably object to the rootdn usage in this test. But as
of now there is no way I know of that allows me to create the layout I
need without it. Extending syncrepl with the ability to exclude a list
of dn subtrees from its control could be the solution. But in that case
I assume it would be better to have a list of URIs (evaluated locally)
that defines the entries to exclude. Which again opens a can of new
questions as to when the URIs should be evaluated. For add and delete
it should be pretty obvious, but what about modify? Test against the
entry as it is before the change, would be after or both?
Hmm, the ability to exclude a list of URIs from syncrepl control could
be (ab)used to merge some type of entries from one server, other types
from other servers, if anyone should wish to do such weired things..
The errors remaining after these two patches have been applied are due
to the race condition that arise if more then one subordinate database
replicates from the same provider. I have a clear idea as to how this
race can be eliminated, it must be addressed in both syncrepl and syncprov:
In syncrepl it should be fairly easy, as storing the contextCSN values
in the suffix of the database where syncrepl is configured rather than
the suffix of the glue entry should be sufficient.
Syncprov needs a bit more work. Whenever it detects that syncrepl
updates the contextCSN value it must use the *lowest* contextCSN value
for any given sid stored in all the databases within its context. And
no updates must be accepted from syncrepl until a contextCSN value has
been stored in the suffix dn of all databases (within its context) where
syncrepl has been enabled. Assuming syncrepl cannot be used on both a
superior and a subordinate database (which could no be true if the
ability to exclude something from its control is introduced) it should
not be required to make these extra tests when syncrepl updates the
contextCSN in the suffix of the glue database itself (i.e when syncrepl
and syncprov are configured in the same database). The sending of
newCookie messages should cause the contextCSN values to be updated to
the newest values fairly quickly.
Actually, there could be another problem buried here as well, as the
consumers really need a full resynchronization after new subordinate
syncrepl consumers are added on their provider (at least if the new
consumer replicates from the same server as an existing). The easiest
fix would be to require the clients to do a full resync whenever the
config of the provider is changed, which I find a bit drastic. Noting
it in the doc. should hopefully be sufficient.
Rein Tollevik
Basefarm AS