[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Problems with case folding of UTF-8
Hi,
while dealing with DN normalization I had a serious problem. I was
dealing with Italian accents in Latin-1 ("e acute", "e grave" and so),
and this is what happened:
I used "iconv" to convert strings, and I got:
e acute -> '\c3\a8'
which, after normalization (case folding) turns out in
'\c3\88' -> E acute
OK, everything works fine, dn normalization is great!
Then, to make it short:
e grave -> '\c3\a9'
which after normalization (case folding) should be:
'\c3\89' -> E grave
but I actually got
'E\cc\81'
which should be the composite version of the "E grave", namely the "E"
plus the "grave accent"; but this:
a) annoys "iconv", which refuses to convert this back to Latin-1,
because "illegal input sequence at position XXX", and
b) breaks the current DN normalization workaround in slapd because
the resulting normalized DN is longer than the input one (a six-char
'\c3\89' is turned into a seven-char 'E\cc\81' when there's an
equivalent six-char representation)
I noticed that "a acute" and "e acute" are converted correctly in
the "six-char" form, while "e grave", "o acute" and "u acute" are not
(they're the main Italian non-ascii letters); I'm afraid German, Spanish
and French need even more, so this could be a big problem.
Maybe I'm doing something wrong, so please correct me before I start
digging into the UTF-8 code to see what's going on :)
Pierangelo.