[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: back-mdb - futures...

To: OpenLDAP Devel <openldap-devel@openldap.org>
Subject: Re: back-mdb - futures...
From: Howard Chu <hyc@symas.com>
Date: Tue, 08 Sep 2009 01:04:43 -0700
In-reply-to: <4A0F924B.6050405@symas.com>
References: <4A0F924B.6050405@symas.com>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; rv:1.9.1b5pre) Gecko/20090819 SeaMonkey/2.0a1pre Firefox/3.0.3

Howard Chu wrote:

The basic idea is to construct a database that is always mmap'd to a fixed
virtual address, and which returns its mmap'd data pages directly to the
caller (instead of copying them to a newly allocated buffer). Given a fixed
address, it becomes feasible to make the on-disk record format identical to
the in-memory format. Today we have to convert from a BER-like encoding into
our in-memory format, and while that conversion is fast it still takes up a
measurable amount of time. (Which is one reason our slapd entry cache is still
so much faster than just using BDB's cache.) So instead of storing offsets
into a flattened data record, we store actual pointers (since they all simply
reside in the mmap'd space).

One stumbling block: on Little-Endian machines, of which we seem to be cursedwith an overabundance these days, the in-memory format for integers makes aterrible format for database keys. Byte-swapping them between on-disk andin-memory would completely defeat the mmap'ing scheme. So there's two choices:store them Little-Endian on disk, and use a reverse-order key comparisonfunction (which we did back in OpenLDAP 2.1). This would break portability ofthe database files to other machines using Big-Endian format.

The other alternative is to store them in Big-Endian format, and just use themin their reversed order in memory. That would allow the database files toremain portable and eliminate the need for alternate key comparison functions.But it would require a custom iterator to do in-order traversals and entryIDsorting comparisons.

At this point I'm leaning toward the former choice: store in native byte orderand sacrifice portability. The alternative will have too big an ipmact onruntime performance. With the native byte order choice, this means if you everwant to cluster a bunch of servers on the same database they will all need touse the same byte order. (And of course, the same word size, which is the samerequirement we have today.)

(Too bad C doesn't give us a "byteswapped" data attribute; some CPUarchitectures have instructions that can load a word from memory in a byteorder that you choose. That would make life easier here, but if your CPU wasthat smart, it probably wouldn't be using brain-damaged byte order in thefirst place. Oh well...)

(And yes, by the way, we have planning for LDAPCon2009 this September in the
works; I imagine the Call For Papers will go out in a week or two. So now's a
good time to pull up whatever other ideas you've had in the back of your mind
for a while...)


Reminder: LDAPCon2009 is just a couple weeks away!

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Follow-Ups:
- Re: back-mdb - futures...
  - From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>

Prev by Date: Re: RE24 testing round 3
Next by Date: Re: back-mdb - futures...
Index(es):
- Chronological
- Thread