[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: ldap_str2dn etc.
At 02:05 AM 2001-10-08, Pierangelo Masarati wrote:
>I've done a big chunk of work; I think the largest part
>that's still to do is the UTF-8 handling (I started working
>on it this morning on the commuter train :)
>
>The code is kinda experimental, that is it is filled with
>programmer's notes, and not optimized. After it works as
>expected I'll clean it a bit before commit.
>
>I'll try to list all the open issues to see if I can fix any
>of them before I commit it (some of them simply are statements;
>correct them if they're wrong).
>
><grep FIXME getdn.c:>
>
>a) there's no explicit mention both in RFC 2253 and
>in LDAPbis DN draft of language extensions to attr types.
>I've figured out two behaviors:
> 1) discard the extensions
> 2) leave the extensions in place
> 3) issue an error if PEDANTIC
DNs cannot contain attribute type options. That is,
CN;lang-de=Kurt
is invalid. Both parser and generator should bitch.
>b) string value means UTF-8 if LDAPv3 or T61 if LDAPv2;
Actually, in LDAPv2, LDAPDN (RFC1777) restricted
to IA5 but a DN (RFC1779) string representation is
neutral to the codeset used. This means that any
extended T.61 (or other non-IA5) character in a value
string representation causes requires use of the #hex
format.
>I assume UTF-8 also if I'm reading DCE format, right?
No clue (I'd guess DCE isn't UTF-8 as it predates LDAPv3).
>But after I parsed a string into a DN, if I need to write
>it back, say, in LDAPv2, what should I do if it contains
>UTF-8? I was thinkning of extending the LDAPAVA struct
>to hold flags that state if the value is UTF-8 or simply
>IA5 (correct?).
Well, I suggest that any transliteration be handled
outside of the str2dn and dn2str functions (much like
normalization).
>c) for performance issues, I think I should add a field
>to the LDAPAVA struct that holds the length of the string
>representation of the value, so I don't need to compute it
>all the times. I'd also like to turn the attribute type
>field into a berval, to avoid computing its length many times
>(maybe we could even use an AttributeDescription in union
>with a berval in case the description is unknown).
I have no problem with this.
>d) I guess empty attributes are illegal in a DN; correct?
>(I need to handle AVA separators right after the '=' sign).
No. A value can be empty.
ref=,o=foo
is valid.
>e) I guess multi-AVA RDNs are allowed in DCE, right?
I assume so.
>f) If I understood the point, I have to turn string
>representations of binary values ('#' + 1*(HEXPAIR) and
>'\' HEXPAIR) into their binary form, and back to string,
>right? (at least that's what I did).
In parsing (str2dn), if the string value is # format, you should
just place the BER value into la_value. That is, (psuedo
code)
if ( value[0] == '#' ) then
la_value = unhex( &value[1] );
la_flags = LDAP_AVA_BINARY;
} else {
...
la_flags = LDAP_AVA_STRING;
}
In generating (dn2str),
if( la_flags == LDAP_AVA_BINARY ) {
value[0] = '#';
&value[1] = hex( la_value )
} else {
...
}
The parser/generators need not convert between string and
binary (BER) value representations. This is a job for
normalizers.
>g) I made leading and trailing spaces, and spaces around '=',
>',', ';' and '+' admittable unless the PEDANTIC flag is set.
Yes.
>Are they allowed also in DCE format?
No clue.
>h) T61: what should I look at?.
I assume you mean RFC 1779 format DNs (which technically
can be in any codeset, but restricted to IA5 when used in
LDAPv2).
>I strip quotes from quoted
>values; as a consequence I need to escape chars that need it
>(according to RFC 1779). I've a small doubt: in a quoted
>attribute only double quotes need be escaped (among printable
>chars); so if I find an escape not followed by a quote do I
>need to consider it a normal char (and thus escape it while
>eliminating quotes)? According to rfc 1779, a quoted string
>can contain what is called a PAIR. Example:
>
>'","' => '\,' OK
>'"\,"' => '\,' OK
I guess I don't understand the question (I haven't finished
my morning cold carbonated caffeine yet:-). But let me see
if I can clarify.
In parsing a string representation of a value, that string
may be quoted or may be escaped. In either case, the value
is the unquoted/unescaped value and that's what stored in
la_value.
In generating a string representation, the value (la_value)
may contain characters which require quoting and/or escaping.
I suggest that here that only escaping be used. Hence,
dn2str(str2dn(cn="foo,bar")) returns cn=foo\,bar
Kurt