All, I want to ask the list about this before I try to open an
ITS to make sure that I am understanding everything correctly. We are running
OpenLDAP 2.4.11. I selectively tried to back post ITS 5709 to our source,
because we were losing replications. Applying this seemed to help and reduced
the number of lost replications. We are running in mirror mode using
refreshAndPersist, and doing a high volume of adds to the master, on the order
of 100/s. We have run numerous iterations of the same test with very aggressive
NTP updates that are keeping both the master and consumer within 50
microseconds of one another. Which I saw recommended as a possible solution in
a previous message thread. This seemed to make little to no difference in the
replication loss. From looking at the code I was thinking that the lost
replications might be due to entries being queued on the master side in non-ascending
order which I was seeing preceding the replication that would be rejected on
the consumer side. What I thought was happening is that the logic that
traverses the queue to mark committed CSNs and updates the contextCSN was
getting out of sync because of this, and orphaning replications that were still
pending, because they are too old, but in reality they have never been added to
the consumer. I just pulled the latest code from RE24 and reran the test,
the latest code is better than before with just the back post of 5709 on
2.4.11, but we are still losing a small percentage of the replications with the
“CSN too old” message. With the latest code I am still seeing
a correlation between the out of sync queuing on the master and the
replications that are rejected on the consumer. During this run NTP was keeping the 2 systems within 10
microseconds of each other, with the most aggressive synch interval that is
configurable at 16 seconds. Below I have log snippets and some of the relevant
configuration information. If more is desired then please let me know and I
will provide it. #### MASTER ##### Nov 14 14:43:05 ng04be03 slapd[7582]:
slap_graduate_commit_csn: removing 0x2b4c9568b0
20081114194304.892065Z#000000#001#000000 Nov 14 14:43:05 ng04be03 slapd[7582]: slap_queue_csn: queing
0x42803100 20081114194305.078713Z#000000#001#000000 Nov 14 14:43:05 ng04be03 slapd[7582]: conn=14 op=17167 ADD
dn="uniqueIdentifier=Evad_Added_tele_5450408582,ou=subscribers,ou=SINGP,o=ricuc.com" Nov 14 14:43:05 ng04be03 slapd[7582]: slap_queue_csn: queing
0x4680b100 20081114194305.078878Z#000000#001#000000 Nov 14 14:43:05 ng04be03 slapd[7582]: slap_queue_csn: queing
0x43004100 20081114194305.078653Z#000000#001#000000 Nov 14 14:43:05 ng04be03 slapd[7582]: conn=12 op=13844
RESULT tag=105 err=0 text= Nov 14 14:43:05 ng04be03 slapd[7582]:
slap_graduate_commit_csn: removing 0x2b4c87e670
20081114194305.068251Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing
0x41000100 20081114194502.917316Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: conn=10 op=19719 ADD
dn="uniqueIdentifier=Evad_Added_tele_5450009858,ou=subscribers,ou=SINGP,o=ricuc.com" Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing
0x43805100 20081114194502.917523Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing
0x4780d100 20081114194502.917288Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: conn=12 op=17496
RESULT tag=105 err=0 text= Nov 14 14:45:02 ng04be03 slapd[7582]:
slap_graduate_commit_csn: removing 0x2b5a7f8340 20081114194502.917316Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: conn=13 op=19983 ADD
dn="uniqueIdentifier=Evad_Added_tele_5450509990,ou=subscribers,ou=SINGP,o=ricuc.com" Nov 14 14:45:02 ng04be03 slapd[7582]: conn=10 op=19719
RESULT tag=105 err=0 text= Nov 14 14:45:02 ng04be03 slapd[7582]:
slap_graduate_commit_csn: removing 0x2b5ae77160
20081114194502.917523Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: conn=14 op=19598 ADD
dn="umbillingnumber=5450409797,uniqueIdentifier=Evad_Added_tele_5450409797,ou=subscribers,ou=SINGP,o=ricuc.com" Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing
0x41000100 20081114194502.936884Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]: conn=11 op=16763
RESULT tag=105 err=0 text= Nov 14 14:45:02 ng04be03 slapd[7582]: slap_queue_csn: queing
0x43805100 20081114194502.947725Z#000000#001#000000 Nov 14 14:45:02 ng04be03 slapd[7582]:
slap_graduate_commit_csn: removing 0x2b5ad51170
20081114194502.917288Z#000000#001#000000 ### CONSUMER ### Nov 14 14:43:36 ng04be04 slapd[24622]: syncrepl_entry:
rid=002 be_add (0) Nov 14 14:43:36 ng04be04 slapd[24622]: do_syncrep2:
cookie=rid=002,sid=002,csn=20081114194305.078653Z#000000#001#000000 Nov 14 14:43:36 ng04be04 slapd[24622]: do_syncrep2: rid=002
CSN too old, ignoring 20081114194305.078653Z#000000#001#000000 Nov 14 14:43:36 ng04be04 slapd[24622]: do_syncrep2:
cookie=rid=002,sid=002 Nov 14 14:43:36 ng04be04 slapd[24622]: syncrepl_entry: rid=002
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD) Nov 14 14:43:36 ng04be04 slapd[24622]: syncrepl_entry:
rid=002 be_search (0) Nov 14 14:45:39 ng04be04 slapd[24622]: slap_queue_csn:
queing 0x2b4737c990 20081114194502.917523Z#000000#001#000000 Nov 14 14:45:39 ng04be04 slapd[24622]:
slap_graduate_commit_csn: removing 0x2b473ca890
20081114194502.917523Z#000000#001#000000 Nov 14 14:45:39 ng04be04 slapd[24622]: do_syncrep2:
cookie=rid=002,sid=002,csn=20081114194502.917288Z#000000#001#000000 Nov 14 14:45:39 ng04be04 slapd[24622]: do_syncrep2: rid=002
CSN too old, ignoring 20081114194502.917288Z#000000#001#000000 Nov 14 14:45:39 ng04be04 slapd[24622]: do_syncrep2:
cookie=rid=002,sid=002,csn=20081114194502.936884Z#000000#001#000000 ### Replication Config ### dn: olcDatabase={2}hdb,cn=config objectClass: olcDatabaseConfig objectClass: olcHdbConfig ... olcSyncrepl: {0}rid=2 provider=ldap://ldap.server.com
bindmethod=si mple timeout=0 network-timeout=0
binddn="cn=Directory Manager,o=ricuc.com" cr edentials="secret" starttls=no
filter="(objectclass=*)" searchbase="o=ricuc.com" scope=sub schemachecking=off type=refreshandpersist
retry="60 +" olcMirrorMode: TRUE dn: olcOverlay={0}syncprov,olcDatabase={2}hdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {0}syncprov olcSpCheckpoint: 100 600 olcSpSessionlog: 100 ### Hardware ### Dual Quad Core Xeon 2.83GHz 32GB RAM 8x15000rpm RAID10 Separate LUNS for db and txn logs Kris Burton Acision. Innovation. Assured.
Glen
This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you. |