[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: Getting around the single-threaded syncrepl model?



--On Monday, November 23, 2015 11:38 AM +0000 "Bannister, Mark" <Mark.Bannister@morganstanley.com> wrote:

> --On Friday, November 20, 2015 6:31 PM +0000 Albert Braden
> <abraden@about.com> wrote:

> Hi Quanah,
>
> Are you sure your issues with syncrepl aren't specific to Zimbra? When
> I ran the Zimbra at Homestead/Intuit we saw syncrepl issues, but I
> have not seen those issues in non-Zimbra LDAP clusters.

Quite certain.  For one thing, I file very few syncrepl related ITSes,
since I don't use it, but many are filed... For another, while I have
encountered a very few issues with delta-syncrepl, the majority of
issues I find w/ Zimbra and replication are related to syncrepl when it
has to be used in an initial or fallback scenario.

What sort of issues?

And here's some quick stats:
$ grep syncrepl CHANGES  | grep -v delta | wc -l
76

$ grep delta-syncrepl CHANGES  | wc -l
7

Or syncrepl has had 10x times the issues.

Or syncrepl has 10x the users, more eyes spot more bugs, and is now more
stable because it has had more fixes? I'm just speculating here, but I
wouldn't be more confident in product Y because it is mentioned less in
the change log than product X.

Nope. I've been using syncrepl since it was first released in 2.2. I switched over the production ldap servers I was running @ Stanford to syncrepl when 2.3 came out, and found a really vexing problem when mass updates occurred (primarily at quarter end when we'd receive 10s of thousands of updates due to class enrollment changes). While 1-2 of our 6 replicas would stay up to date, the other 4 would fall hours or days behind. I.e., I'm quite familiar with the issue you are discussing, because I've been experiencing it for over a decade. Out of this, delta-syncrepl was born. I've been using it steadily now for over a decade. I switched Zimbra to it back in 2007, and it is deployed at customers world wide, from insallations with 10-20 users to installations with many millions of users. Has there been the occassional issue? Yes. But again, the vast majority of problems I encounter when investigating problems around replication come from syncrepl fallback. There is a very significant fix coming out in 2.4.43 that was found by Zimbra with syncrepl refresh when it is interrupted during the refresh phase (ITS#8281).

So, when I talk about replication and reliability in relation to OpenLDAP, this is something I've been working with since the early days of OpenLDAP 2.1. It's something that as a part of my job, I have to take with the utmost seriousness.

Now, you can ignore my advice if you wish, it doesn't matter to me one way or the other. But to me, mission critical systems must be as reliable as possible, and I've only found that to be the case with openldap replication when I deploy delta-syncrepl as the primary replication mechanism.

--Quanah

--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration