Hi Alex,
Alex Karasulu wrote:
Hello,
I have some questions regarding the interpretation of LDAP search
filters specifically differentiating between presence and substring
items when whitespace is present. According to the ABNF describing
these rules in [FILTERS], and some additional rules in [MODELS] ,
...
present = attr EQUALS ASTERISK
substring = attr EQUALS [initial] any [final]
initial = assertionvalue
any = ASTERISK *(assertionvalue ASTERISK)
final = assertionvalue
attr = attributedescription
...,
the presence of whitespace is considered significant in the
assertionvalue. Please correct me if I'm wrong but this means that
the following filter expressions are interpreted differently:
(for simplicity I'm equating whitespace to be a single space
character, %x20)
1. (ou=*)
- there is no whitespace at all
- interpreted as a presence filter
- matches all entries containing the ou attribute
2. (ou= *)
- there is whitespace before the ASTERISK after the EQUALS
- interpreted as a substring filter
- the space is interpreted as the [initial]
- matches all values of ou starting with a space, %x20
The exact matching behaviour depends on the attribute type. Typically
though,
it will be equivalent to caseIgnoreSubstringsMatch. Assuming that is
the case
then the current ldapbis specifications would invoke stringprep on each
candidate attribute value and each substring of the assertion. The
result will
be that no attribute value will have (for matching purposes) a leading
space.
The initial substring will get reduced to empty which then becomes a
single space. After that it is a code point comparison. Since no
attribute
value has a leading space, none are matched, and the result is empty.
This isn't the intuitive result either. The same occurs in the other
examples
for much the same reasons.
Treating the whitespace as insignificant (unless escaped) in the string
representation of the filter partly helps as it makes all your examples
equivalent to a present match, but there would still be a problem with
cases where the whitespace is explicitly escaped. Stringprep will still
cause (ou=\20*) to match nothing.
It seams to me that stringprep should allow a result string to be empty,
rather than replacing it by a single space. If that were the case then an
initial substring of " " would be reduced to an empty string, which would
trivially match every value, giving the same effect as a presence match.
Similarly, an any substring that reduces to an empty string is trivially
satisfied and so is effectively ignored. In fact, this change to
stringprep
would make escaping of whitespace in the string representation of filters
largely moot.
Regards,
Steven
3. (ou=* )
- there is whitespace after the ASTERISK before the RPAREN
- interpreted as a substring filter
- the space is interpreted as the [final]
- matches all values of ou ending with a space, %x20
4. (ou= * )
- there is whitespace before the ASTERISK and after the ASTERISK
- interpreted as a substring filter
- the first and last spaces are interpreted as the [initial] and
[final] values respectively
- matches all values of ou starting and ending with a space, %x20
5. there's another class where two or more ASTERISKS sandwich
whitespace: (ou=* *)
- although other forms would be a bit nonsensical this one may be
valid and would match all entires with ou values starting or
ending with a space, %x20
Are these correct interpretations according to the ABNF and is the
matching behavior correct?
Now I'd like to open for discussion whether or not these
interpretations are intuitively correct. As an end user issuing
search filters to a directory I've come to expect the directory to be
extra forgiving when it comes to things like whitespace. Users have
gotten this feeling regarding whitespace forgiveness from the way
distinguished names are normalized by the directory. It's intuitive
for the user to presume some of this forgiving nature extends to
filters which can match on attributes with the DN syntax. So looking
at the examples above I can see how a user may think that all these
filters are in fact equal to one another. The user is not thinking,
"=* is a distinct atomic operator token to a parser and is
inseparable where a space makes it no longer a presence ffilter."
The user thinks well I'm matching for anything. What if they just
like to put spaces around parentheses in their filter expressions?
This space forgiving nature is "turned on" for matching normal
equality expressions on attributes like ou and is especially
forgiving if distinguishedNameMatch is in effect for respective
attributes.
So would you agree that there is some mismatch between the hard ABNF
interpretation and the mental interpolation of users writing
filters? IMO I think all whitespace should be escaped if
significant. Otherwise whitespace should be trimmed from the edges
of attributevalues. Also whitespace within the interior of the value
should be reduced to a single space to preserve tokenization order
while matching. With regard to substring items the 'any' pieces
between two ASTERISKS that are purely composed of whitespace should
be discarded and the ASTERISKS consolidated into one.
This makes life tougher on those that really want to match based on
whitespace. However they can just escape out the whitespace in their
filters like so:
1. (ou=*)
2. (ou=\20*)
3. (ou=*\20)
4. (ou=\20*\20)
5. (ou=*\20*)
Comments? Thoughts?
Thanks,
Alex
[Filters] Smith, M. (editor), LDAPbis WG, "LDAP: String
Representation of Search Filters",
draft-ietf-ldapbis-filter-xx.txt, a work in progress.
[Models] Zeilenga, K. (editor), "LDAP: Directory Information
Models",
draft-ietf-ldapbis-models-xx.txt, a work in progress.