[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
back-bdb IDL limitations
- To: openldap-devel@OpenLDAP.org
- Subject: back-bdb IDL limitations
- From: Howard Chu <hyc@symas.com>
- Date: Thu, 03 Mar 2005 14:35:31 -0800
- User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a6) Gecko/20050111
At the risk of seriously complicating the IDL code, I wonder if it's
time to investigate a more compact IDL representation - perhaps using
ranges exclusively, where each IDL is a list of ranges.
There would still be pathological cases where discontiguous IDs would
consume a lot of space, but they should be pretty rare.
IDL[0] would still hold the number of slots in use. Each slot is a lo/hi
pair. So we can express any contiguous range with a single pair, and
handle discontinuities with additional ranges. An IDL cursor would be a
(slot,value) pair instead of the single integer we currently use. A
scheme like this could dramatically reduce the size of IDLs overall.
Borrowing from some other data compression techniques - we can also
assign a scale factor if needed, to further expand the dynamic range of
the IDLs.
For example, assume we have an IDL that can only accomodate 4 slots, and
each slot starts with a scale factor of 1. The first slot carries a
range 1-5, then the data skips one, so we use a second slot for IDs
7-12, skip a few more, slot 3 covers 15-17, skip some more, and slot 4
fills up with the range 20-30. Now we want to insert the value 32, but
there are no more slots free. We bump up the scale factor to 2. This
means that skips of 2 or less are treated as contiguous. (Think of it as
rounding, coarsening the granularity, or shifting one bit.) So the first
slot covering 1-5 is now considered contiguous with the second slot at
7-12, so they're combined into one slot. The next slot remains
unchanged, and the next slot 20-30 is contiguous with 32, so it becomes
20-32.
So we started with this IDL:
4,1 4 slots, scale factor 1
1,5
7,12
15,17
20,30
And ended with this IDL:
3,2 3 slots, scale factor 2
1,12
15,17
20,32
Obviously for the real implementation we would use more than 4 slots.
But I don't think a fixed size implementation like we have now is useful
for the sizes of databases we're seeing more and more. Nor is collapsing
it all into a single range very helpful for indexed lookups; the quality
of the indices really suffers. If we use a dynamic scaling approach like
I'm talking about, there will still be some losses in granularity but
the effectiveness of the indices will degrade much more slowly as the
database sizes grow.
--
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support