[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Syntax/matching rule implementation - floating point numbers

To: Lorenzo Pastrana <lorenzo.pastrana@happyend.fr>
Subject: Re: Syntax/matching rule implementation - floating point numbers
From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Date: Wed, 17 Dec 2008 19:09:41 +0100
Cc: openldap-devel@openldap.org
In-reply-to: <1229531966.9989.26.camel@localhost>
References: <1229439364.6606.24.camel@localhost> <hbf.20081216jdy3@bombur.uio.no> <1229443527.6606.35.camel@localhost> <hbf.20081216c1ja@bombur.uio.no> <1229523173.9989.11.camel@localhost> <hbf.20081217vcpi@bombur.uio.no> <1229531966.9989.26.camel@localhost>

Lorenzo Pastrana writes:
>> Depending on your needs you could implement it without indexing support,
>> or at least with only EQUALITY indexing support.
> 
> Actually this is where floats are inherently bat at ...

If you go for floating point instead of fixed point anyway, you could
implement an approx matching rule for this.  The filter (attr~=val)
would search for val*[configurable range, e.g. 0.999999 - 1.000001].

Note that there is no LDAP syntax to assign an attribute an approx
matching rule.  Instead OpenLDAP ties the approx rules to the EQUALITY
matching rules.

If you also want to optimize approx search by supporting indexing, maybe
realApproxFilter() could generate 3 index values to check for: (val, val
+ epsilon, val - epsilon) where epsilon is the smallest significant bit
for <val> in the index.  It might be a bit hairy to come up with an
index format where it's easy to ensure that edge cases work right.

For every possible value, such an index must not have greater precision
than the range which the approx filter checks for (i.e. the range 1 +/-
epsilon must include the range which the approx filter checks for).
That way the indexing will include all candidate entries - possibly plus
some false matches which the matching rule will eliminate later during
the search.

I haven't tested if slapd allows the fooApproxFilter() functions to
generate more than one index key to look up also for approx rules.
Substring filters, at least, do generate several index keys.
The relevant code seems to be just back-bdb/filterindex.c's use of
smr_filter.  slapd collects the index keys generated for the search
filter, then generates the intersection of the sets of entry IDs for
those index keys, and then uses the matching rule to check those
entries.

All this is would be unnecessary for single-valued attributes.  A client
could instead of (attr=4) search for (&(attr>=3.999996)(attr<=4.000004)).
However that doesn't work for multi-valued attributes: The second filter
would return an entry containing attr: <1.0 and 9.0>.

The same goes in part for ORDERING indexing: An approx rule could
generate (attr>=3.999996) and (attr<=4.000004) index keys.  (I think
you'd need to duplicate the index size to contain both increasing and
decreasing keys).  That would eliminate some candidate entries, but how
many would depend on what kind of entries you have.  Given many
multi-valued entries with values both below and above a the value you
search for, the indexing would return many false matches which slapd
would need to check against the matching rule.

-- 
Hallvard

References:
- Syntax/matching rule implementation - floating point numbers
  - From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
- Re: Syntax/matching rule implementation - floating point numbers
  - From: Lorenzo Pastrana <lorenzo.pastrana@happyend.fr>
- Re: Syntax/matching rule implementation - floating point numbers
  - From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
- Re: Syntax/matching rule implementation - floating point numbers
  - From: Lorenzo Pastrana <lorenzo.pastrana@happyend.fr>

Prev by Date: Re: Syntax/matching rule implementation - floating point numbers
Next by Date: hide attribute
Index(es):
- Chronological
- Thread