[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
RE: Sleepycat and hash functions
> -----Original Message-----
> From: owner-openldap-software@OpenLDAP.org
> [mailto:owner-openldap-software@OpenLDAP.org]On Behalf Of Kurt D. Zeilenga
> At 11:56 AM 12/14/2002, Hallvard B Furuseth wrote:
> >http://www.openldap.org/faq/index.cgi?file=756 says one can use BDB's
> >db_dump utility for on-line backups while slapd is running.
> >The manpage continues:
> >
> > Dumping and reloading Btree databases that use
> user-defined prefix or
> > comparison functions will result in new databases that use
> the default
> > prefix and comparison functions. In this case, it is quite likely
> > that the database will be damaged beyond repair permitting neither
> > record storage or retrieval.
> >
> >Since db_dump is recommended, I take it back-bdb does not use Btree
> >databases?
>
> It does... but not with a user-defined comparison function.
> User-defined comparison function is only defined for index
> databases (which use DB_HASH). So, if they are slower, one
> can always recreate them using slapindex(8).
Actually, there are a number of issues here that will impact a Little-Endian
machine. BerkeleyDB's default comparison functions are all byte-oriented,
like strcmp and memcmp. When comparing integer data stored in Little-Endian
order, the data items will not sort into proper numerical order. The reason
we used a non-default comparison function was to preserve the proper sort
order without changing the stored byte-order. Note that on a Big-Endian
machine there are no problems whatsoever.
The non-default comparison function affects both the id2entry database (which
is a Btree) and the index databases. Even though the index databases are
keyed with Hashes, their data are numeric (lists of entry IDs) using the
Sorted Duplicates feature. I believe, on a Little-Endian machine, using
db_dump on an index database will fail because the data items "are out of
order."
The id2entry database is keyed on entry ID, and it does indeed use a
non-default comparison function. db_dump on a Little-Endian machine should
fail there too.
A similar problem used to exist in back-ldbm, but it only affected the
id2entry database, and it was fixed by byteswapping the entry IDs before
reading/writing entries.
Given the large volume of byteswapping that would be needed to correct this
sorting issue for back-bdb's index databases, I chose to instead use an
alternate compare function. I think we should update the documentation and
note that db_dump/db_load must not be used on Intel and other Little-Endian
machines.
I'm pretty sure all of this was discussed on the -devel list 'way back when
we were implementing...
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support