[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
RE: String conversions UTF8 <-> ISO-8859-1
> -----Original Message-----
> From: owner-openldap-devel@OpenLDAP.org
> [mailto:owner-openldap-devel@OpenLDAP.org]On Behalf Of Hallvard B Furuseth
> Michael Ströder writes:
> > And wouldn't it be necessary to have schema knowledge to determine
> > whether the conversion is applicable at all? E.g. if syntax is
> > OctetString the charset conversion might not be the right thing.
>
> Yes. In most cases I think it would be enough for the client to know
> which syntaxes should _not_ be converted, though.
Perhaps so; there definitely aren't a lot of Binary or Not-Human-Readable
syntaxes in the standard schema.
> E.g. it wouldn't hurt
> to convert OctetString, since OctetString can't contain non-ASCII
> characters.
OctetString is UTF-8; it can certainly contain non-ASCII characters.
> OTOH, if the client used EBCDIC it would need to
> know a bit more...
Indeed. This is quite a headache. I guess since EBCDIC is only an 8-bit
character set we would just map "unmappable" codes to '?' and leave it at
that.
>
> I suggest ldap.conf could contain lines with
>
> attr-charset <charset> [<attribute-name> <attribute-name>...]
> client-charset <charset>
>
> <charset> would normally be "unknown" alias "binary" or "UTF-8". The
> default attr-charset would be unknown, but an "attr-charset
> UTF-8" line
> without any attributes would set the default attr-charset to UTF-8.
>
> Then, all that remains is to implement this:-)
Given that utilities like iconv and such exist, I think this is best left as
an exercise for the reader.
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support