[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Syncrepl, and some objectClass errors
Lesley Walker wrote:
> I have spent today dissecting the logs from two incidents this week in
> which entries were erroneously deleted. Although the circumstances of
> the two incidents are quite different, from examining the logs I
> believe it is the same thing happening in each case.
I'm still unable to pinpoint the trigger condition, but I have a better
idea of what happens. I believe it *may* be covered by ITS#4626 and
ITS#4813, so I have built 2.3.35 to run on a test server.
On starting this new version for the first time and letting it build the
database by replication from its provider, I get these messages in the log:
is_entry_objectclass("", "2.5.17.0") no objectClass attribute
is_entry_objectclass("", "2.5.6.1") no objectClass attribute
is_entry_objectclass("", "2.16.840.1.113730.3.2.6") no objectClass attribute
I freely admit that I am not clued-up on schema design, but I have tried
grepping for those numbers in the schema files and in an ldif of the
database and I don't find them.
I note that these same messages were reported in ITS#4626, and wonder
whether there's a connection, or is it a mere coincidence?
I also note that these exact same messages were discussed in December:
http://www.openldap.org/lists/openldap-software/200612/msg00046.html
but this discussion went over my head, so I would welcome any
words-of-one-syllable explanations.
The main problem I'm trying to troubleshoot is this:
In every case, there's a log entry:
do_syncrep2: rid 123 LDAP_RES_INTERMEDIATE - SYNC_ID_SET
followed by some number of these:
syncrepl_entry: rid 123 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
("some number" is MUCH less than the number of records)
then:
do_syncrep2: rid 123 LDAP_RES_INTERMEDIATE - REFRESH_PRESENT
followed by some (other) number of these:
syncrepl_del_nonpresent: rid 123 be_delete
uid=whatever,ou=Accounts,dc=example,dc=co,dc=nz (0)
*INCLUDING* be_deletes for nearly ALL the top-level entries:
be_delete cn=root,dc=example,dc=co,dc=nz (0)
be_delete ou=Accounts,dc=example,dc=co,dc=nz (66)
be_delete ou=Mailbox,dc=example,dc=co,dc=nz (66)
be_delete ou=Services,dc=example,dc=co,dc=nz (66)
be_delete ou=Offices,dc=example,dc=co,dc=nz (66)
be_delete ou=Networks,dc=example,dc=co,dc=nz (66)
be_delete ou=Rooms,dc=example,dc=co,dc=nz (66)
be_delete ou=Group,dc=example,dc=co,dc=nz (66)
be_delete ou=EmailLists,dc=example,dc=co,dc=nz (66)
be_delete ou=People,dc=example,dc=co,dc=nz (66)
be_delete ou=Computers,dc=example,dc=co,dc=nz (66)
This would seem to leave the database completely empty, and in a state
where nothing and nobody can authenticate to it. No amount of
stopping/restarting has any effect (because it thinks it is in sync)
until we repair it by starting with the empty sync cookie.
There have been at least 10 instances of this fault on different servers
in the last 1-2 weeks.
Because I can't reproduce the problem on demand, I won't know for sure
whether or not the new version fixes it, but I have built the new
version and am now running it on a test server.
> Here's the environment:
> OpenLDAP 2.3.32 running on Debian 3.1 (Sarge)
> compiled with sync logging patch discussed about 4 months ago
> loglevel config sync on all servers
> BDB 4.2 backend
> Syncrepl replication all round
> A "master" server (com)
> - holds the master copy of the database
> A number of servers that replicate directly from com
> An "intermediate" server (wwsv04) that
> - is on the same LAN and subnet as com
> - replicates from com
> - acts as provider for all other servers
> 88 servers/replicas in total
> Approx 9000 records
> All replicas are supposed to be complete copies
> Nothing particularly fancy or clever going on
--
Lesley Walker
Linux Systems Administrator
Opus International Consultants Ltd
Email lesley.walker@opus.co.nz
Tel +64 4 471 7002, Fax +64 4 473 3017
http://www.opus.co.nz
Level 9 Majestic Centre, 100 Willis Street, PO Box 12 343
Wellington, New Zealand