[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: LDAPprep: mapping of " " values
I don't know how complex substring matching should get but it does seem
to me that there is a use for being able to express word-aligned
substring matches without jumping through a lot of hoops or thinking
deeply about edge cases.
How about the following:
-- existing wording of substring matching rule
The rule evaluates to TRUE if and only the prepared substrings of the
assertion value match disjoint portions of the prepared attribute value
character string in the order of the substrings in the assertion value,
and
an <initial> substring, if present, matches the beginning of the
prepared attribute value character string, and
a <final> substring, if present, matches the end of the prepared
attribute value character string
-- proposed addition:
, and
an <any> substring, if present and starting with an insignificant space
as per [strprep] either matches the beginning of the prepared attribute
value character string or matches the attribute value character string
at a position following a "breaking space", and
an <any> substring, if present and ending with an insignificant space
as per [strprep] either matches the end of the prepared attribute value
character string or matches the attribute value character string at a
position immediately preceding a "breaking space"
where "breaking space" is defined (similarly to [strprep]) as the SPACE
(U+0020) code point followed by no combining marks.
-- end of proposed addition.
This would have the effect that initial and final spaces in substring
matches would have the intuitive meaning of restricting the substring
match to a word boundary. It would also deal with the issue of spaces
used as base characters for freestanding combining characters, since
these would not count as "breaking spaces" or "insignificant spaces".
This would allow, for example, the filter:
(cn=* John * Ellsworth)
to match:
John Ellsworth
P. John Ellsworth
John P. Ellsworth
which is probably what was intended.
It would also allow the filter:
(|(description=* Angola *)(description=* Mozambique *))
to find references to lusophone African countries, whether or not the
keywords appeared in initial or final positions.
I don't believe it would add much complexity to the matching algorithm.
It might be desirable to add a rule to [strprep] which changed SPACE
(U+0020) to NO-BREAK SPACE (U+00A0) if the following character is a
combining character, just before the insignificant space removal step
(at which point, there cannot be any NO-BREAK SPACEs in the string).