Howard Chu wrote:
Yes, I've seen the same. My suspicion now is that it's due to an update
arriving in the consumer near when it transitions from refresh to persist
mode, but I haven't been able to isolate it. I also note that adding a SLEEP1
near the beginning of test050, after the consumers have been started but
before the ldapadd to populate the privder, completely eliminated the problem.
So there's definitely an issue there that needs to be tracked down.
OK, finally understand the situation.
Server 3's consumer is talking to Server 2 and has entered persist mode. But
Server 2 is still performing a refresh against Server 1. During a refresh,
individual entries have no CSN in their sync cookie, because they arrive in
indeterminate order. Because there's no cookie CSN, the writes have no CSN
queued either. Since there is no queued CSN, they don't get onto the psearch
queue. When the refresh phase completes, and Server 2 enters its own persist
phase, it receives a CSN for its cookie. Writing this cookie causes the
NEW_COOKIE messages to be sent out which causes Server 3 to update its
context, even though it's missing some number of entries.
Without the NEW_COOKIE message, the test succeeds because some other provider
will eventually supply the missing updates. (I.e., mostly by luck because
there are many servers operating at once.)