[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: new entry lost on multi-master setup (two scenarios)



Greetings,

Any comments on this? can anybody help me verify this possible bug?

Ildefonso.

On Sun, Apr 17, 2011 at 2:24 PM, Jose Ildefonso Camargo Tolosa
<ildefonso.camargo@gmail.com> wrote:
> Greetings,
>
> At first, I was going to create a bug report, but decided to send to
> list first.  I tried this with both: 2.4.23 (Debian package), and
> 2.4.25, compiled from source, bdb 4.8.
>
> After a couple of entries just disappeared on one multi-master setup I
> had, I decided to further investigate, and found this (there are two
> cases, for the same procedure):
>
> 1. Configure two LDAP servers in multi-master setup.
> 2. Make sure they replicate correctly (off course).
> 3. Shutdown one of the two ldap servers.
> 4. Create a new entry (say, ou1) on the LDAP server that is left up.
> 5. Shutdown the last LDAP server.
> 6. Start the *other* LDAP server, the one where you didn't create the entry.
> 7. Create another entry, say: ou2, so that both servers has a new
> entry, that is *not* on the other server.
> 8. Shutdown the LDAP server (both servers down now).
> 9. Start both LDAP servers.
>
> Result (case 1): one of the two newly created entries is missing on
> *one* of the servers, and only one of the entries is missing on the
> other server.
>
> Result (case 2): one entry is missing on *both* servers.
>
> Both servers has NTP, and has the same timezone (ie, time is synchronized).
>
> I'm *not* replicating cn=config (I shouldn't, because I have different
> SSL certificates on each server).  Now, more details:
>
> slapd with -d 16384 gives me this on the server that misses both
> entries, on this server I created the entry dn
> ou=ou2,dc=st-andes,dc=com (and the server decided to delete it!, and,
> for some reason, it didn't detected the new ou1 entry created on the
> other server):
>
> http://www.st-andes.com/openldap/case1/log-server2-case1.txt
>
> The other server (the one that kept one entry and lost the other), on
> this server I created the entry ou=ou1,dc=st-andes,dc=com, and it says
> it was changed by peer.....:
>
> http://www.st-andes.com/openldap/case1/log-server1-case1.txt
>
> Now, I'm seeing here that it is using 000 server id... but on the
> cn=config.ldif I have:
>
> olcServerID: 1 ldap://ldap.ildetech.com:389/
> olcServerID: 2 ldap://ldap2.ildetech.com:389/
>
> And the syncrepl:
>
> olcSyncRepl: rid=001 provider=ldap://ldap.ildetech.com:389
> binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple
> credentials="secret" searchbase="dc=st-andes,dc=com"
> type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical
> olcSyncRepl: rid=002 provider=ldap://ldap2.ildetech.com:389
> binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple
> credentials="secret" searchbase="dc=st-andes,dc=com"
> type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical
> olcMirrorMode: TRUE
>
> And, as you can see on the command line, I have the URL specified on
> the -h parameter, but it seems to be ignoring it!.  Or, should I
> specify the *whole* urls that I put on the -h parameter?
> (ldap://ldap2.ildetech.com:389 ldap://127.0.0.1:389/ ldaps:///
> ldapi:///)
>
> So, I decided to change the config:
>
> On server 1 (kirara):
>
> olcServerID: 1
>
> and
>
> olcSyncRepl: rid=002 provider=ldap://ldap2.ildetech.com:389
> binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple
> credentials="secret" searchbase="dc=st-andes,dc=com"
> type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical
> olcMirrorMode: TRUE
>
> On server 2 (happy):
>
> olcServerID: 2
>
> and
>
> olcSyncRepl: rid=002 provider=ldap://ldap2.ildetech.com:389
> binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple
> credentials="secret" searchbase="dc=st-andes,dc=com"
> type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical
> olcMirrorMode: TRUE
>
> With this new setup, and following the same procedure, I get one
> missing entry on *both* servers (at least servers gets to a consistent
> state), but I still have a missing entry.  The logs for this setup:
>
> Server 2 (ID 2, where I created entry: ou2 while the other server was
> down), this server decided, wrongly, to delete entry ou2:
>
> http://www.st-andes.com/openldap/case2/log-server2-case2.txt
>
> And the other server (where I created ou1):
>
> http://www.st-andes.com/openldap/case2/log-server1-case2.txt
>
> This one never saw the other entry, ou2.
>
> For both cases, the syncprov module was with default configuration:
>
> dn: olcOverlay={0}syncprov
> objectClass: olcOverlayConfig
> objectClass: olcSyncProvConfig
> olcOverlay: {0}syncprov
> structuralObjectClass: olcSyncProvConfig
> entryUUID: 24354488-e5bf-102f-9e6a-ad3cba95f7f1
> creatorsName: cn=config
> createTimestamp: 20110318152128Z
> entryCSN: 20110318152128.935227Z#000000#000#000000
> modifiersName: cn=config
> modifyTimestamp: 20110318152128Z
>
> What do you think?
>
> Thanks in advance!
>
> Ildefonso Camargo
>