[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: back-mdb - futures...
Howard Chu wrote:
The basic idea is to construct a database that is always mmap'd to a fixed
virtual address, and which returns its mmap'd data pages directly to the
caller (instead of copying them to a newly allocated buffer). Given a fixed
address, it becomes feasible to make the on-disk record format identical to
the in-memory format. Today we have to convert from a BER-like encoding into
our in-memory format, and while that conversion is fast it still takes up a
measurable amount of time. (Which is one reason our slapd entry cache is still
so much faster than just using BDB's cache.) So instead of storing offsets
into a flattened data record, we store actual pointers (since they all simply
reside in the mmap'd space).
One stumbling block: on Little-Endian machines, of which we seem to be cursed
with an overabundance these days, the in-memory format for integers makes a
terrible format for database keys. Byte-swapping them between on-disk and
in-memory would completely defeat the mmap'ing scheme. So there's two choices:
store them Little-Endian on disk, and use a reverse-order key comparison
function (which we did back in OpenLDAP 2.1). This would break portability of
the database files to other machines using Big-Endian format.
The other alternative is to store them in Big-Endian format, and just use them
in their reversed order in memory. That would allow the database files to
remain portable and eliminate the need for alternate key comparison functions.
But it would require a custom iterator to do in-order traversals and entryID
sorting comparisons.
At this point I'm leaning toward the former choice: store in native byte order
and sacrifice portability. The alternative will have too big an ipmact on
runtime performance. With the native byte order choice, this means if you ever
want to cluster a bunch of servers on the same database they will all need to
use the same byte order. (And of course, the same word size, which is the same
requirement we have today.)
(Too bad C doesn't give us a "byteswapped" data attribute; some CPU
architectures have instructions that can load a word from memory in a byte
order that you choose. That would make life easier here, but if your CPU was
that smart, it probably wouldn't be using brain-damaged byte order in the
first place. Oh well...)
(And yes, by the way, we have planning for LDAPCon2009 this September in the
works; I imagine the Call For Papers will go out in a week or two. So now's a
good time to pull up whatever other ideas you've had in the back of your mind
for a while...)
Reminder: LDAPCon2009 is just a couple weeks away!
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/