[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: back-bdb IDL limitations
- To: Jonghyuk Choi <jongchoi@us.ibm.com>
- Subject: Re: back-bdb IDL limitations
- From: Howard Chu <hyc@symas.com>
- Date: Fri, 16 Sep 2005 07:37:06 -0700
- Cc: openldap-devel@OpenLDAP.org
- In-reply-to: <422892B8.6020501@symas.com>
- References: <OF8ABE6ECB.AED71D59-ON85256FBA.0025F9A6-85256FBA.002E7330@us.ibm.com> <422892B8.6020501@symas.com>
- User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20050909 SeaMonkey/1.1a
Howard Chu wrote:
Jonghyuk Choi wrote:
I think this bit-mapped approach provides more workable solution than
the range ID solution for the case where IDs in an IDL are rather
contiguous. A 32 bit segment bit map covers 8 entries having
contiguous IDs without increasing storage requirement and with the
ability of pinpointing the exact indexing status of the 8 entries in
the contigous range. On the other hand, the range ID consumes double
the storage, but it cannot pinpoint the indexing status of the
indiviual entries in the range once it is scaled. A 64 bit segment
bit map covers 32 entries having contigous IDs with the same amount
of storage as in the range ID scheme, but it can provide exact
indexing for those 32 entries unlike the range ID approach with scaling.
The time we're losing in back-hdb for sorting IDLs has me looking into
this again. I'm only considering 32 bit IDs at the moment because I
can't see a need for 64 bit IDs yet and I haven't figured out where to
arbitrarily cut off (40, 48 bits?).
I'm thinking of an in-memory structure like this: the bottom 5 bits of
the ID occupy the 32 bit map. A chunk of 4096 of these maps is allocated
contiguously (comprising the next 12 bits of the ID). An array of 32768
slots points to these chunks (comprising the top 15 bits of the ID).
This IDL structure has sufficient dynamic range to represent the full 32
bits worth of IDs. Currently our in-memory IDL is 512KB. In 512KB we can
accomodate 128 chunks, so a single 512KB IDL can represent 24 bits worth
of IDs, 16,777,216 IDs. I think that's pretty decent for most purposes,
and it's not too impractical to increase this on larger machines to span
a couple more bits. We still need a representation for ranges or All
IDs, but the likelihood of needing ranges will be very very small; it
would be nice to eliminate that entirely but the All IDs concept is
still useful.
The on-disk format will be 256 bits of map (bottom 8 bits of ID) with a
24 bit prefix, so there are no wasted bits. Fortunately BDB allows us to
do partial writes so we can update individual bytes when we're doing
single-entry IDL updates.
The actual in-memory definitions would be something like this:
typedef uin32_t IDchunk[4096];
typedef struct IDL {
char slots[32768]; /* array of ID slots */
IDchunk chunks[125]; /* ID storage space to be allocated as needed */
char freechunks[16]; /* bitmap of free chunks */
} IDL;
That's (512K -16368) bytes for an in-memory IDL that can represent
16,384,000 IDs. Perhaps we can come up with a better layout. A slot will
contain a value 1-125 pointing to the chunk that it uses, 0 meaning an
unused slot.
Another alternative is to keep a totally separate list of chunks, and
attach them on demand. With that approach there would be no reason to
enforce a hard limit on the total size of an IDL.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/