[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: IDL bit vectors
Howard Chu writes:
> To avoid wasting those 5 unneeded bits in the base, we could use them as a
> run-length counter and use multiple vectors per base. But it might be better
> to slide things over and use just a 24 bit base, and use the bottom 8 bits of
> the word as a bitmask to represent which following vectors are present. E.g.
With variable-length records you can't do a binary search for an entry
ID. I don't know how significant that is though. And you can get
around it, at some expense:
> ......01 means 1 vector follows, representing base+ 0-31.
> ......02 means 1 vector follows, representing base+32-63.
That can be normalized so bottom bit == 1: (base+32, 01). If we also
don't use entry IDs with (ID & 31) == 0, we can look at the bottom bit
of a word in the IDL to tell if it is a vector header or a vector word.
> That allows us to represent up to 256 entryIDs in only 288 bits
> (instead of the 16384 bits we currently need).
Though in sparse IDLs, worst case will be 2 words per entry ID.
OTOH if we have exact ranges, best case is 2 words for any number of
entries:-) I don't know how they are stored now though, you mentioned
loss of precision so I assume approximate ranges are involved but a
glance at the code doesn't reveal where.
> That would allow us to track about 1.86M entryIDs in the same space
> we currently use to track only 64K entryIDs.
>
> It seems we would still need some kind of range representation to
> handle IDLs with more than 1.86M entries.
And some way to tell that from vectors, possibly using up another bit as
a flag in the vector header. Alternatively, don't use IDs where (ID &
31) == 2 (or 31) either, and use that bit in the first vector words as a
flag.
Do you have any stats or feeling about how sparse the average IDL of any
size is? Or how much ranges improve over lists of single IDLs, for that
matter? I suppose it matters somewhat in temporary IDLs in memory as
well as in files...
--
Hallvard