[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: Problems with case folding of UTF-8
> Hi
>
> I've digged more into this now, and you're right, something was wrong.
> My apologies for not realizing this at once. Unicode has a composition
> exclusion list. Due to a bug in the exclusion code in ucgendat.c we
> excluded quite a lot of compositions that we shouldn't have. I've now
> replaced 15 with 5 at two places in ucgendat.c, and everything should
> be okay. So please check latest CVS and let me know how it works.
>
> Thanks a lot for reporting this! I'm quite happy you found it before
> 2.1 was released.
Well, I'm quite happy you find the fix, because I started gathering
documentation on UTF-8 from the official website, and it really looks
a nightmare :) I found something about these compositions and was
trying to follow the code, but I'm really running out of time.
>
> BTW, with latest HEAD I'm not allowed to have non-ascii DNs it seems.
> I haven't tried to see why yet. I suppose you are aware of it.
Kind of; the last time I committed stuff about DN normalization
i fell in that strange problem with case folding, so I wasn't able
to test all the non-ascii stuff I was working at. One problem I found
is that when the "escaped" DN is longer than the input string, the
legacy wrappers we're still using (dn_normalize and so) must fail
because the normalization cannot occur in place. Actually, only
"escaped" DNs or OID=#{ber encoded octets} are allowed.
In many cases you can add spurious space nearly everywhere (the
DN parsing is quite liberal: you can do
[space]<attr>[space]=[space]<value>[space][+[space]<ava>][,<rdn>[...]]
and so on. Of course, if required, we can deny this freedom
by setting the appropriate pedantic flags :)
If you have any suggestions on how to better handle non-ascii
stuff, please come in. My experience with non-ascii is quite limited.
Pierangelo.