[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
[LMDB] getting MDB_CORRUPTED when deleting within a DUPSORT database
- To: "OpenLDAP, Technical" <openldap-technical@openldap.org>
- Subject: [LMDB] getting MDB_CORRUPTED when deleting within a DUPSORT database
- From: Klaus Malorny <Klaus.Malorny@knipp.de>
- Date: Mon, 20 Mar 2017 14:19:11 +0100
- Content-language: en-US
- User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:55.0) Gecko/20100101 Thunderbird/55.0a1
Hi,
I am using version 0.9.20 on Linux (Ubuntu derivates, uname see [1], [2]). One
of the databases is used as an index to another database and thus has been
created using the MDB_DUPSORT. Running my software in a test environment, about
33 million entries were generated in this database. In order to falsify a
suspicion that my software would not perform housekeeping correctly, I copied
the LMDB file to my workstation and forced my software to delete all "legal"
entries in order to see whether any entries remain. Unfortunately, I got an
MDB_CORRUPTED during the delete operation on that database.
Some more details: The deletion takes place in multiple steps, calling a
function that deletes ranges in databases multiple times. The code is as follows
(leaving some boilerplate code away):
unsigned int dbFlags;
int error = mdb_dbi_flags (txn, dbi, &dbFlags);
// [...]
bool isDupSort = dbFlags & MDB_DUPSORT;
error = mdb_cursor_open (txn, dbi, &cursor);
// [...]
error = mdb_cursor_get (cursor, &ckey, &cdata, MDB_SET_RANGE);
while (error != MDB_NOTFOUND)
{
// [...]
int compResult = mdb_cmp (txn, dbi, &ckey, &ekey);
if (compResult > 0 || !compResult && !endIsInclusive)
break;
error = mdb_cursor_del (cursor, isDupSort ? MDB_NODUPDATA : 0);
// [...]
error = mdb_cursor_get (cursor, &ckey, &cdata, MDB_NEXT);
}
mdb_cursor_close (cursor);
Is this the correct way to delete the data? The MDB_CORRUPTED error occurs in
the mdb_cursor_del call. Other operations on that specific database are mdb_put
(with no flags) and mdb_del, supplying both key and data.
One side observation: In a similar test with a lower number of entries, the
database was completely emptied. However, the mdb_stat function still reported a
larger number of entries for the database (5-6 digit figures). I also use the
stat data to estimate the size of the database by adding all page counts and
multiplying it by the page size. This puzzles me, as it is lower than I expected
(it is roughly the net size of only the data part of the entries).
I guess that the provided information might not be sufficient to find the
problem. What additional information would be helpful? How can I test whether
the database is already corrupt at the start of the deletion or whether it
becomes corrupt during the deletion (I guess the latter)? Shall I attempt to
write a specific test case? While I could produce the error a second time with
running my software from scratch, but I don't know to which extent the data
pattern affects the problem and whether I can artificially reproduce this pattern.
Regards,
Klaus
[1] Linux aaa 4.4.0-65-generic #86-Ubuntu SMP Thu Feb 23 17:49:58 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux
[2] Linux bbb 4.8.0-42-generic #45-Ubuntu SMP Wed Mar 8 20:06:06 UTC 2017 x86_64
x86_64 x86_64 GNU/Linux