[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: syncrepl
- To: openldap-devel@openldap.org
- Subject: Re: syncrepl
- From: Howard Chu <hyc@symas.com>
- Date: Sat, 03 Feb 2007 21:18:54 -0800
- In-reply-to: <45BAC3DC.8090908@symas.com>
- References: <458F0612.2070005@symas.com> <459D07E3.3040304@uvm.edu> <459D39F1.4010605@sys-net.it> <45BAC3DC.8090908@symas.com>
- User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060911 Netscape/7.2 (ax) Firefox/1.5 SeaMonkey/1.5a
Howard Chu wrote:
So continuing the discussion of what to do with syncrepl and multiple
contexts...
1) the provider must be told about all of the sources of changes living
within its context. possible sources are
a) local changes
b) changes received via syncrepl
2) every source of changes must have a unique sid.
a) if it's a syncprov, then it's configured explicitly there
b) if it's a syncrepl consumer pulling from elsewhere, it uses the
remote server's sid.
The olcServerID config attribute has been added for configuring these IDs. It
is a global config keyword, not associated with a particular provider. A
single serverID can be configured, for simple static setups. Or you can
configure a list of serverIDs and corresponding URLs, to allow a single
configuration to be replicated across a pool of servers.
3) the provider must aggregate all of the cookies for each of these
change sources and send them to consumers pulling from it.
The consumer now checks to see if it's a subordinate DB; if so it will
perform its contextCSN updates through the parent DB. If a syncprov overlay
is present it will get a chance to see the contextCSN update.
There's a desire to be able to configure multiple change sources for the
same context though. E.g., mirrormode is defined to only work with two
servers mirroring each other, it would be nice to be able to extend this
to additional failover servers.
I've modified the consumer to allow multiple syncrepl configurations on the
same backend. Corresponding changes are still needed in the provider. The
contextCSN attribute is now multi-valued, allowing a CSN per SID to be
tracked. Modifies to the contextCSN must be done with specific Delete/Add
instead of Replace.
There's no restriction on how this gets used - a consumer can talk to
multiple providers that master disjoint subtrees of the context, or they can
overlap partially or fully. As long as each provider has a unique SID their
multiple contextCSNs will be tracked properly.
The SID is used in the "replica ID" field of the CSN. That was previously a
two-digit hex number and it was always zero; I've increased it to three
digits. That's probably excessive; two was probably plenty.
From half-multi-master we can go all the way to multi- if we add
collision detection and conflict resolution. There's a pretty simple way
to handle collision detection - we just need to pass the entry's old
entryCSN along with the rest of the modification info. On the consumer
we check and see if the oldEntryCSN matches the consumer entry's current
entryCSN. If they match, there is no collision. If they don't match, we
need to resolve the conflict.
Aside from allowing us to log that a conflict occurred, keeping the oldCSN
around doesn't seem to buy us much. Since the conflict resolution is still
determined solely by the current entryCSN, I'm dropping this idea. All we
need to check is if the incoming mod's entryCSN is <= the current entryCSN
and drop the change if so.
Of course to be able to compare entryCSNs reliably we need high quality,
high resolution timestamps, and all of the participating servers must
have tightly synchronized clocks. This isn't such a troublesome
requirement, you just need to run NTP on all of the servers.
The CSN timestamps are now recorded with microsecond resolution. Whether the
underlying system actually delivers such precision is anybody's guess. At
least in my tests the microseconds returned by gettimeofday() were always
unique, when run in a tight loop. (I recall many years ago when this was not
true, and the value only changed down to milliseconds...)
Since Windows system time only runs with 10 millisecond resolution, I had to
augment that with a high resolution timer. On my test machine this means the
ACPI power management timer, which runs at about 3.58MHz, so that's certainly
good enough. (Which hardware timer is used depends on the version of Windows
and varies quite a bit.) However the high-res timer and the system timer run
independently, so there's no guarantee that they both will zero out together
when the next whole second ticks. I've kludged it up such that the error will
be no more than 1 millisecond, but it's still an annoyance. This must be why
AD uses integer update counters instead of timestamps; the OS doesn't provide
a real source of high quality timestamps. (Oddly enough they still implement
NTP with 0.1 microsecond resolution; they just seem to be discarding the
extra precision.)
It shouldn't be a major problem, we still use the op counter if the
resolution is too low and multiple updates occur in the same timeslice.
Not all of these changes are checked in yet, but they'll be coming in soon.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
Chief Architect, OpenLDAP http://www.openldap.org/project/