[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: ldap_explode_dn corrupts UTF-8 encoding (ITS#1890)
On Mon, 17 Jun 2002, Pierangelo Masarati wrote:
> ps@psncc.at writes:
>
> > On Mon, 17 Jun 2002, Pierangelo Masarati wrote:
> >
> >> > OpenLDAP 2.1.2 seems to currupt non-ASCII UTF-8 encoded characters.
> >> > It actually turns unprintable chars (in the ASCII sense) into \<hexcode>.
> >>
> >> I think this is a leftover of when we decided to use UTF8 instead
> >> of the '\' + HEXPAIR representation of non-ascii chars, and initially
> >> it was intended; of course, when parsing a DN, one wants the correct
> >> UTF8 encoding.
> >
> > Note that the problem does not exist in 2.0.23...
>
> DN parsing/handling has been completely rewritten
>
> >
> > To further elaborate the problem: before passing the DN to the
> > ldap_explode_dn function it is properly (UTF-8) encoded. Afterwards the DN
> > parts aren't...
Well, the code fragment that broke is:
char **exploded_dn, *dn;
LDAP *ld;
LDAPMessage *e;
[snip]
dn = ldap_get_dn(ld, e);
/* explode DN */
exploded_dn = ldap_explode_dn(dn, FALSE);
Which is exactly what the man page for ldap_explode_dn suggests. And it is
straightforward too.
> They are; but they're represented in another form that is allowed
> for DNs; it depends on whether you like it or not. I understand
I just think it is not good to break existing functionality.
> that DN parsing is delicate when UTF-8 is involved. The point is
> that ldap_explode_dn API is broken, because t doesn't let you choose
> how to expand a DN (how to represent it in string form).
Well, I do not consider it to be broken, but I am not an LDAP guru... The
functionality is quite clear. I agree that additional functionality is
nice to have, but that is what ldap_str2dn & co. are there for.
> You may use:
>
> int i;
> LDAPDN *dn;
> char **v = 0;
>
> ldap_str2dn( string, &dn, LDAP_DN_FORMAT_LDAP);
> for ( i = 0; dn[i]; i++ ) {
> v = realloc( v, i + 2 );
> ldap_rdn2str( dn[ 0 ][ i ], &v[ i ],
> LDAP_DN_FORMAT_LDAPV3 | LDAP_DN_PRETTY );
> }
>
That code looks a lot more complex and incomprehesible than the
straightforward code fragment above... :-(
> see ldap_explode_dn code in libraries/libldap/getdn.c;
> the flag LDAP_DN_PRETTY causes UTF-8 to be represented.
>
> >
> > Is exploding a dn a conversion wrt to codesets? (I would not think it is)
> > Where would one need to specify extra flags? Or is this a purely internal
> > matter?
>
> No internal matter, only a matter of deciding, among allowed
> choiches what's the most general. Initially I considered
> '\' + HEXPAIR the most general.
But it requires more parsing on the user side. With the whole of the world
working towards UTF-8 (think JAVA, gtk+, ...) not nearly as general as it
could be.
It is mostly a matter of breaking things that used to work.
ps