[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: IDL bit vectors

To: Howard Chu <hyc@symas.com>
Subject: Re: IDL bit vectors
From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Date: Mon, 20 Oct 2008 13:57:19 +0200
Cc: OpenLDAP Devel <openldap-devel@openldap.org>
In-reply-to: <48FB0920.6080205@symas.com>
References: <48FB0920.6080205@symas.com>

Howard Chu writes:
> To avoid wasting those 5 unneeded bits in the base, we could use them as a 
> run-length counter and use multiple vectors per base. But it might be better 
> to slide things over and use just a 24 bit base, and use the bottom 8 bits of 
> the word as a bitmask to represent which following vectors are present. E.g.

With variable-length records you can't do a binary search for an entry
ID.  I don't know how significant that is though.  And you can get
around it, at some expense:

> ......01 means 1 vector follows, representing base+ 0-31.
> ......02 means 1 vector follows, representing base+32-63.

That can be normalized so bottom bit == 1: (base+32, 01).  If we also
don't use entry IDs with (ID & 31) == 0, we can look at the bottom bit
of a word in the IDL to tell if it is a vector header or a vector word.

> That allows us to represent up to 256 entryIDs in only 288 bits
> (instead of the 16384 bits we currently need).

Though in sparse IDLs, worst case will be 2 words per entry ID.

OTOH if we have exact ranges, best case is 2 words for any number of
entries:-) I don't know how they are stored now though, you mentioned
loss of precision so I assume approximate ranges are involved but a
glance at the code doesn't reveal where.

> That would allow us to track about 1.86M entryIDs in the same space
> we currently use to track only 64K entryIDs.
> 
> It seems we would still need some kind of range representation to
> handle IDLs with more than 1.86M entries.

And some way to tell that from vectors, possibly using up another bit as
a flag in the vector header.  Alternatively, don't use IDs where (ID &
31) == 2 (or 31) either, and use that bit in the first vector words as a
flag.

Do you have any stats or feeling about how sparse the average IDL of any
size is?  Or how much ranges improve over lists of single IDLs, for that
matter?  I suppose it matters somewhat in temporary IDLs in memory as
well as in files...

-- 
Hallvard

Follow-Ups:
- Re: IDL bit vectors
  - From: Howard Chu <hyc@symas.com>

References:
- IDL bit vectors
  - From: Howard Chu <hyc@symas.com>

Prev by Date: Re: Managing multiple regex matches
Next by Date: Re: commit: ldap/servers/slapd/back-null null.c
Index(es):
- Chronological
- Thread