[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: RE : Add flag to UTF8normalize and pals to allow accent stripping



On Mon, Feb 25, 2002 at 03:24:56PM +0100, John Hughes wrote:
> > Where did you find the ucisnonspacing(), has that recently been added
> to the
> > ucdata lib? I can't find it anywhere.
> 
> ldap/libraries/liblunicode/ucdata/ucdata.h

Ah of course, it's a macro. I grepped in the wrong place.

> > Maybe we should only do it with those specific characters that some of
> > you want to treat as equivalent?
> 
> To be realy correct I guess it should be language specific.
> 
> For example in French most people would expect Noël = Noel,
> but Germans might like ö = oe.

To cater for French, we could replace ë, é, è (what others?) with just e
and it wouldn't affect the German umlauts at all. In Norwegian and some
other languages the accent is usually ignored as well. Say a word like
café is sometimes written as cafe. And one can use accents optionally in
some Norwegian words. So, an ugly hack could be to have a table for
mappings that people can modify and enable at compile time. It is ugly
though. To do it correctly we should use language tags and the collation
table for the given language I think.

BTW, we should also do case insensitive comparisons differently. And that
might also depend on language tags/locales.

Stig