[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
filter problems
[filter] says:
> 6. String Search Filter Definition
> (...)
> Since RFC 2254 does not clearly define the term
> "string representation" (and in particular does mention that the
> string representation of an LDAP search filter is a string of UTF-8
> encoded ISO 10646-1 characters) implementations SHOULD accept as
> input strings that include invalid UTF-8 octet sequences.
I don't understand this. RFC2254 does say filters should be UTF-8,
_therefore_ implementations should accept invalid UTF-8? I would have
thought that therefore they should _not_ accept invalid UTF-8.
Maybe with "invalid UTF-8" you just mean e.g. U+0065 encoded as an
"UTF-8" 2-byte sequence (0xc1 0x81)? Or do you also mean e.g. a lone
0x80 octet in the middle of some ASCII characters?
Or maybe it's just that I can't parse the last two lines into a coherent
sentence, and guessed wrong what it should be. Could you split it into
two sentences or something?
BTW, I think "RFC 2254 does not..." should be "did not...", since
[Filter] obsoletes RFC 2254.
Another detail:
> 4. String Search Filter Definition
> (...)
> Other characters besides the
^^^^^^^^^^
> ones listed above may be escaped using this mechanism, for example,
> non-printing characters.
The first "characters" should be "octets". That is, if you escape U+C0,
you say \C3\80, not \C380 or \C0. Though perhaps you should repeat here
that the resulting string must be a valid UTF-8 string; you can't escape
just one octet in an UTF-8 multibyte character but not the others.
--
Hallvard