[Date Prev][Date Next] [Chronological] [Thread] [Top]

trouble setting initial replication between multiple masters



Hello list.

I'm trying to achieve multi-master setup, starting from a working single-master setup. I took the master node configuration, added the following directives, and distributed it identically on two nodes:

# global
serverID 1  ldap://10.202.11.8:389/
serverID 2  ldap://10.202.11.9:389/

# db
...
syncrepl rid=1
    provider=ldap://10.202.11.8:389/
    starttls=yes
    tls_reqcert=never
    type=refreshAndPersist
    retry="60 +"
    logbase="cn=log"
    logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
    syncdata=accesslog
    searchbase="dc=msr-inria,dc=inria,dc=fr"
    scope=sub
    schemachecking=off
    bindmethod=simple
    binddn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr"
    credentials=XYZ

syncrepl rid=2
    provider=ldap://10.202.11.9:389/
    starttls=yes
    tls_reqcert=never
    type=refreshAndPersist
    retry="60 +"
    logbase="cn=log"
    logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
    syncdata=accesslog
    searchbase="dc=msr-inria,dc=inria,dc=fr"
    scope=sub
    schemachecking=off
    bindmethod=simple
    binddn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr"
    credentials=XYZ

mirrormode on

The 'tls_reqcert=never' is needed because those two servers are accessed from a virtual interface under a load-balancing server, and the certificate name matches the name of this virtual interface, not the actual interface of the servers (I wonder if openldap would support altSubjName in x509 certs, but that's another issue).

Then I imported my base in the first server, and launched both of them.

When node1 (full) tries to access node2 (empty), it fails, because it can't authenticate with a DN still not present in other node database, which is quite understandable.

However, node2 connects successfully, sync the the OU object in the DIT, then fails to actually sync the first user object, with this error message in his logs:
Jan 13 11:29:20 avron2 slapd[20939]: null_callback : error code 0x13
Jan 13 11:29:20 avron2 slapd[20939]: syncrepl_entry: rid=001 be_add uid=ingleber,ou=users,dc=msr-inria,dc=inria,dc=fr (19) Jan 13 11:29:20 avron2 slapd[20939]: syncrepl_entry: rid=001 be_add uid=ingleber,ou=users,dc=msr-inria,dc=inria,dc=fr failed (19)
Jan 13 11:29:20 avron2 slapd[20939]: do_syncrepl: rid=001 rc 19 retrying

In node1 logs:
Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=1 BIND dn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr" method=128 Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=1 BIND dn="cn=syncrepl,ou=roles,dc=msr-inria,dc=inria,dc=fr" mech=SIMPLE ssf=0 Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=1 RESULT tag=97 err=0 text= Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=2 SRCH base="dc=msr-inria,dc=inria,dc=fr" scope=2 deref=0 filter="(objectClass=*)"
Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 op=2 SRCH attr=* +
Jan 13 10:28:31 avron1 slapd[15713]: send_search_entry: conn 1000 ber write failed. Jan 13 10:28:31 avron1 slapd[15713]: conn=1000 fd=21 closed (connection lost on write)

It's hard to tell if the failure occurs on the provider (ber write failed message) or consumer side (null_callback : error code 0x13).

Any hint welcome.
--
BOFH excuse #288:

Hard drive sleeping. Let it wake up on it's own...