[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: Syncrepl: full sync vs. delta
Howard Chu wrote:
Yes. But that's still too messy; currently only the frontend de-queues
CSNs but there are several places that en-queue them (passwd.c,
sasl.c, various overlays) which means right now those places cause a
CSN memory leak. The cleanest solution now appears to be invoking the
CSN management directly from the backend (like ModRdn does) and
removing the responsibility from the frontend. This seems to be the
only way to make sure slap_graduate_commit_csn() will get called when
it's needed.
So just to re-cap... CSN (and other lastmod-related operational
attribute) management is now invoked by the backends. Currently only
back-bdb, hdb, ldbm, and ldif invoke it (Generally using
slap_add_opattrs or slap_mods_opattrs). These functions generally do
nothing on a replica; the attributes are expected to already have been
set by the master. However, they will supply any missing attributes.
This also means that replicating from an older master should always work.
The accesslog overlay uses a pair of mutexes to serialize writes. The
first mutex is locked before the main operation executes. In the
response callback, the second mutex is grabbed and the first mutex is
released. This allows the main database to continue on with another
operation while the log record is being written, and the logs are
guaranteed to be written in the same order as the original operation
execution. (Log records for read operations are not serialized. There
doesn't seem to be much need, though for some applications it would make
sense.)
There is a new syncprov config keyword "syncprov-reloadhint" to tell the
provider to honor the reload hint in the Sync search control. This
feature is normally turned off because previous versions of the consumer
never set the hint, and never recovered from receiving a
LDAP_SYNC_REFRESH_REQUIRED error from the provider. The consumer in HEAD
now uses the hint...
There are two new consumer config keywords "logbase" and "logfilter" for
specifying the baseDN of a log database and the search filter to use on
that database. There is also a "syncdata" keyword to specify the format
of the syncrepl data that is expected - the default is plain entries,
other accepted values are "accesslog" for the accesslog format and
"changelog" for the changelog format. (But changelog parsing is only
half-implemented. ITS#4033 would be handy here.)
The test043 test script is basically a copy of test018, modified for
delta mode. It shows the minimum configuration needed...
A brief overview of how it works:
On the provider side, the log database must be created along with the
main database. The accesslog overlay will record writes to the main
database into the log database, using the same CSNs as the main
database. A syncprov overlay is instantiated on both the log and the
main databases.
On the consumer side, if the consumer has no saved state (fresh, empty
DB, or state is manually overridden) it will startup with a RefreshOnly
request to the main DB. When that request completes, it will issue a new
request against the log DB, using whatever refresh mode was configured
(RefreshAndPersist or RefreshOnly). From that point on, it will talk
only to the log DB.
Also, the request against the logDB always uses reloadHint=FALSE, asking
the provider to send the consumer a LDAP_SYNC_REFRESH_REQUIRED error
code whenever it is out of date. (I.e., whenever the consumer's state as
been expired from the log.) In this situation, the consumer will again
send a RefreshOnly request to the main DB, and the process starts over
again.
So now we have three different replication schemes available: slurpd,
syncrepl, and delta syncrepl. It's worth pointing out that all three of
these schemes support both partial (subtree) and fractional (subset of
attrs) replication.
In the case of delta syncrepl, partial replication requires the provider
to support extensible matching rules so you can specify the subtree of a
DN in the logfilter. Fractional replication can be enforced on the
provider when using the accesslog, but not when using changelog. (Note
that changelog's lack of granularity here is a security risk.) In either
case, the consumer can filter out unwanted attributes from the log
records it receives, but obviously filtering them out on the provider
side makes more efficient use of network resources. (As well as
preserving the provider's security policies.) I will probably finish the
changelog parser at some point, but overall that schema is A Bad Design
and should be avoided.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/