[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: normalised UTF-8, should it be "decomposed", or "composed"?

To: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
Subject: Re: normalised UTF-8, should it be "decomposed", or "composed"?
From: Stig Venaas <Stig.Venaas@uninett.no>
Date: Wed, 20 Feb 2002 19:28:49 +0100
Cc: Howard Chu <hyc@highlandsun.com>, John Hughes <john@Calva.COM>, "'OpenLDAP DEVEL'" <openldap-devel@OpenLDAP.org>
Content-disposition: inline
In-reply-to: <5.1.0.14.0.20020220094435.017e58e8@127.0.0.1>; from Kurt@OpenLDAP.org on Wed, Feb 20, 2002 at 09:48:43AM -0800
References: <NMEFLNHODBAOPDKNNJALEEPJCFAA.hyc@highlandsun.com> <NMEFLNHODBAOPDKNNJALEEPICFAA.hyc@highlandsun.com> <NMEFLNHODBAOPDKNNJALEEPJCFAA.hyc@highlandsun.com> <20020220153950.C10991@itea.ntnu.no> <5.1.0.14.0.20020220094435.017e58e8@127.0.0.1>
User-agent: Mutt/1.2.5.1i

On Wed, Feb 20, 2002 at 09:48:43AM -0800, Kurt D. Zeilenga wrote:
> At 06:39 AM 2002-02-20, Stig Venaas wrote:
> >then strip 8-bit characters
> 
> I think we should NOT strip 8-bit characters (when doing
> approximate matching).

I guess I wasn't clear enough, I meant stripping non-ascii code points.
So say an accented e would be decomposed as two code points, e and the
accent. When we then strip 8-bit code points, we strip the accent but
not the e. This is like Howard suggested I think, I was trying to tell
how easy it is to implement.

Stig

References:
- RE: normalised UTF-8, should it be "decomposed", or "composed"?
  - From: "Howard Chu" <hyc@highlandsun.com>
- RE: normalised UTF-8, should it be "decomposed", or "composed"?
  - From: "Howard Chu" <hyc@highlandsun.com>
- Re: normalised UTF-8, should it be "decomposed", or "composed"?
  - From: Stig Venaas <Stig@OpenLDAP.org>
- Re: normalised UTF-8, should it be "decomposed", or "composed"?
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>

Prev by Date: Re: storing security info in ldap
Next by Date: str2filter
Index(es):
- Chronological
- Thread