[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: UTF8 case insensitive matching

To: Stig Venås <venaas@alfa.itea.ntnu.no>
Subject: Re: UTF8 case insensitive matching
From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
Date: Wed, 25 Oct 2000 08:32:57 -0700
Cc: openldap-devel@OpenLDAP.org
In-reply-to: <20001025163154.A11668@itea.ntnu.no>
References: <5.0.0.25.0.20001024130940.00abf0d0@router.boolean.net> <20001024112053.A22541@itea.ntnu.no> <5.0.0.25.0.20001024130940.00abf0d0@router.boolean.net>

At 04:31 PM 10/25/00 +0200, Stig Venås wrote:
>On Tue, Oct 24, 2000 at 01:11:25PM -0700, Kurt D. Zeilenga wrote:
>> The DN normalization and matching?
>
>I'm looking at this. I have some questions.
>
>I'm writing UTF8str2upper and perhaps some other UTF8 functions
>that need liblunicode to work. I think they belong in utf8.c in
>libldap, but it's not so good I think, if applications that use
>libldap also must link with liblunicode. Where should I put it?


ldap_pvt_uc.h/-llunicode

>I'm not sure, but I think that the width of a character in UTF8
>might change when you change casing. Does anyone know for sure
>if it might?

Yes.

>If it can change, dn_normalize will have to malloc
>space for a new string and return a pointer to that.

This is needed anyway for quoting/escape normalization...

>A lot of
>code would have to be changed then. An easy but incorrect way
>out could be to simply not change casing for a character if
>the size is different. It would still be better than todays
>situation.

We can certainly cheat in the short term....

Long term, we need to use the dnValidate()/dnNormalizer()
semantics instead of the dn_validate()/dn_normalize() semantics.  

In the mid term, to avoid the ripple effect of the
dn_validate()/dn_normalize() change, I suggest that temporary
versions of dn_validate()/dn_normalize() be implemented which
use dnValidate()/dnNormalize() to do the work but provide old
semantics otherwise.

The dnValidate()/dnNormalize(), besides dealing with
lower/upper case length changes, can:
   validate/normalize attribute type
   unescape/unquoting, validate/normalize value (*) and reescape
     * extra credit: per value syntax

Kurt

Follow-Ups:
- Re: UTF8 case insensitive matching
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>

References:
- Re: UTF8 case insensitive matching
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- UTF8 case insensitive matching
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>
- Re: UTF8 case insensitive matching
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>

Prev by Date: Re: UTF8 case insensitive matching
Next by Date: Re: back-ldap problem with Win2000 Active Directory
Index(es):
- Chronological
- Thread