[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: SLAP_INDEX_SUBSTR_ANY_LEN & co
Howard Chu writes:
> You already know what IF_MINLEN and ANY_LEN are for; they control both
> index generation (which occurs when attributes are stored) and index
> lookup (which occurs when search filters are evaluated). The other two
> values only affect index lookup:
IF_MAXLEN also affects generation, fortunately. At least in
octetStringSubstringsIndexer.
> ANY_STEP has to do with the sliding window that is used to generate a
> substring index keys for a value. For example, when indexing the
> attribute "cn=abcdefgh" with a STEP size of 2 a hash key is generated
> for these parts:
> abcd
> cdef
> efgh
Hm. So with ANY_LEN reduced to 3 I'll probably need ANY_STEP of 1 to
keep down the number of false positives.
Looks there is room for a lot of tuning here - like setting ANY_LEN to
an array {3, 4}, where filtering uses the largest possible value and an
ANY_STEP of something like (the applied ANY_LEN) - 2. And making
ANY_STEP dependent on the substring length and the size of the index or
something, but that seems a lot more hairy.
> I should point out that our patch also fixes the initial/final behavior:
> if a filter is provided that exceeds the MAXLEN, we no longer ignore the
> excess characters. Instead we combine them with an ANY substring index
> lookup, so that
> cn=abcdefgh*
> is internally equivalent to
> cn=abcd*defgh*
>
> Naturally this doesn't work if subany indexing was not used...
Then I suggest you document under 'index .. subinitial/final' that
turning off subany impacts (attr=foo*bar) searches.
Maybe also adding a third constant for the IF_MAXLEN value to use if
subany is disabled, with a larger default value (IF_MAXLEN + ANY_LEN -
ANY_STEP?) so the index will handle (attr=foo*bar) approximately
equally well with and without subany.
Thanks a lot for the explanation - and for the coming patch.
--
Hallvard