[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Stringprep Considered Harmful
draft-ietf-ldapbis-strprep-04.txt would define and require the use of a
stringprep profile for many common LDAP attribute types. The stringprep
algorithm may fail on certain input strings; if it fails, that input
string becomes unmatchable.
If all such strings were obviously illegitimate, this would not be a
problem, but many legitimate strings will fail, and this will create
problems, some of them serious.
The clearest example of this is a string which fails the bidi check,
which requires any string which contains right-to-left characters to
both start and end with a right-to-left character. Consequently, a
string which ends with a latin character and contains an arabic word
will fail.
There are a number of potential examples of such strings which might
usefully be found in a directory. These include:
-- a url starting with the latin string 'http' arabic domain-name or
path component
-- an email address ending with a standard TLD which contains hebrew in
the mailbox or in some domain component.
-- a full name in latin with a syriac nickname embedded in it.
-- a descriptive text field which contains some right-to-left words.
As I understand it, strprep's bidi rule was essentially designed for
the nameprep profile, which is performed component by component on
domain names and therefore will not fail a domain name which has both
arabic and latin components (which, I believe, would be the normal case
for a domain name including arabic components.) It is therefore
suitable for the dc attribute, but would not be suitable for an
attribute which could contain a sequence of domain components.
Aside from leading to the failure to return unique identifiers (such as
the first three examples above), the use of the bidi-prohibition may
cause substring matches to fail, even if the substring assertions pass.
Consequently, a text description with a single arabic word in it would
effectively be unmatchable even if the assertion and the substring to
be matched were idential latin (or arabic) words.
Furthermore, it appears to me that unmatchable attribute values will be
harder to Modify. If the attribute type were single valued, one could
use replace rather than delete/add, but for a multi-valued attribute
where the delete/add sequence would probably be the expected procedure,
the delete will fail because the current attribute value cannot match
the one specified in the delete operation. Also, an attempt to add a
duplicate value -- that is a value which is octet-for-octet identical
-- will silently and erroneously succeed.
While the bidi prohibition step seems to me to be the most likely
source of problems, other prohibitions -- such as the prohibition of
characters not in Unicode 3.2 -- may result in non-intuitive behaviour.
It would also prevent an enterprise from using private use area
characters in an internal directory, although that would be low on my
list of priorities.
The strprep document cites the importance to security of a predictable
and consistent string comparison algorithm. This is certainly true, at
least in the case of strings which are used in some security process
(but it is worth noting that many strings entered in directories are
not part of any security process). However, the strings in question
are, in many cases, also being used and matched by systems outside of
the directory, such as filesystems, email processors, web servers, and
so on. It is necessary that all systems using a particular string for
identification purposes perform the same matching algorithm on that
string. If the string in question is a filepath on a Mac OS X HFS
filesystem, for example, the canonicalization algorithm needs to be the
one used (and extremely well documented) by HFS. Similarly, for NTFS. I
don't believe that LDAP is (yet) in a position to impose a single
canonicalization on every string used in every system in the world;
this may indicate the need for more MatchingRules which correspond to
those system's matching algorithms.
Rici