[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: UTF8 case insensitive matching



At 04:31 PM 10/25/00 +0200, Stig Venås wrote:
>On Tue, Oct 24, 2000 at 01:11:25PM -0700, Kurt D. Zeilenga wrote:
>> The DN normalization and matching?
>
>I'm looking at this. I have some questions.
>
>I'm writing UTF8str2upper and perhaps some other UTF8 functions
>that need liblunicode to work. I think they belong in utf8.c in
>libldap, but it's not so good I think, if applications that use
>libldap also must link with liblunicode. Where should I put it?


ldap_pvt_uc.h/-llunicode

>I'm not sure, but I think that the width of a character in UTF8
>might change when you change casing. Does anyone know for sure
>if it might?

Yes.

>If it can change, dn_normalize will have to malloc
>space for a new string and return a pointer to that.

This is needed anyway for quoting/escape normalization...

>A lot of
>code would have to be changed then. An easy but incorrect way
>out could be to simply not change casing for a character if
>the size is different. It would still be better than todays
>situation.

We can certainly cheat in the short term....

Long term, we need to use the dnValidate()/dnNormalizer()
semantics instead of the dn_validate()/dn_normalize() semantics.  

In the mid term, to avoid the ripple effect of the
dn_validate()/dn_normalize() change, I suggest that temporary
versions of dn_validate()/dn_normalize() be implemented which
use dnValidate()/dnNormalize() to do the work but provide old
semantics otherwise.

The dnValidate()/dnNormalize(), besides dealing with
lower/upper case length changes, can:
   validate/normalize attribute type
   unescape/unquoting, validate/normalize value (*) and reescape
     * extra credit: per value syntax

Kurt