Hi,
First, please let me tell you the story of my adventure yesterday. I'll
summarize my questions at the end.
I've set up a simple master-slave replicated system some time ago (stock
Debian wheezy OpenLDAP, version 2.4.31-1+nmu2):
dn: olcDatabase={0}config,cn=config
olcSyncrepl: {0}rid=1 provider=ldap://elm.niif.hu [...]
dn: olcDatabase={1}mdb,cn=config
olcSyncrepl: {0}rid=2 provider=ldap://elm.niif.hu [...]
The slave opened two connections to the master, and everything worked
fine. Then I enabled TLS and put in a CNAME record, so that the master
became accessible as ldaps://ldap-master.niif.hu. I decided to also
switch over the replication traffic to the SSL channel, so ldapmodified
the above attributes to contain provider=ldaps://ldap-master.niif.hu.
This pretty much broke the system, because the master server suddenly
started to replicate from itself, thus became read-only.
Finding no other option, I stopped the "master" slapd and edited back
the providers to their previous values (above) in the
olcDatabase={0}config.ldif and olcDatabase={1}mdb.ldif files under the
cn=config directory of my server configuration. I know these files
should not be edited, but I found no other way.
This move made the master recognized itself again in the provider URI,
so it did not start replicating and became writeable. My edits,
however, did not propagate to the slave, probably because I did not
change the internal attributes (entryCSN?) so this was expected. Also,
slapcat started to report CRC warnings in some LDIF files while dumping
the databases, which was also understandable for the edited ones, but
not so much for cn=config.ldif (if I remember correctly).
I tried to fix these by doing some dummy changes by ldapmodify to the
database entries. For both, I added an extra olcAccess attribute, then
deleted it. These operations made the slave switch back its syncrepl
connections to the ldap port from ldaps, but also instantly broke the
slave server, which stopped returning results and instead logged lots of
slapd[27944]: => mdb_idl_fetch_key: cursor failed: Invalid argument (22)
lines. Having no better idea, I restarted the slave server, which
fortunately returned it to normal working condition.
So, my questions:
1. How does the "self-recognition" (by which the master does not start
replicating from itself) work, why did it fail when I changed the
provider URI to ldaps?