[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: string value encoding and escaping question
Mark Smith wrote:
>
> The origin of the '$' separator is the Quipu X.500 implementation (it
> used '$' inside various string syntaxes because '$' is not a valid
> character in the T.61 character set which was used for some string
> syntaxes in the olden days).
Ah! ok, I hadn't realized T.61 was the culprit. I'd had a hunch it was because
X.500 was largely Euro-originated and suspected they chose $ cuz it wasn't
their (whoever "they" exactly were) currency symbol. I hadn't looked at T.61
closely enuff to figure out that it doesn't contain '$'. So my hunch wasn't
that far off actually.
Do you or anyone else have a URL handy that points to a reference for T.61?
I'd like to stick it in the LDAP Roadmap.
> Use of '$' has been carried over to some
> of the LDAPv3 syntaxes, so we are stuck with it now.
right.
> In general, you should pick a separator character that makes sense to
> you. Backslash is clearly an inconvenient choice ;-)
Well, of course. (my 10yr old would say: DUH. ;)
So, are there any chars other than '\' that're treated specially in the
protocol docs (aka RFCs [2251..2256] + relevant near-RFC I-Ds) that you know
of? My search hasn't turned up any, but I might've left a stone unturned. It
looks to me like the protocol docs ~don't~ treat '$' specially.
Also, I'd appreciate getting explicit confirmation from LDAP/X.500 mavens on
these other questions I had...
> Jeff.Hodges@Stanford.edu scribbled in netscape.dev.directory newsgroup:
>
> What I'm trying to figure out (sorta outta morbid curiosity) is whether it is
> the libldap (aka "the ldap sdk", "the ldap stub") or the NS DS that is
> recognizing the '\' char and interpreting it as a hex escape. Anyone know?
>
> The below RFC 2252 excerpts imply to me that the client side (aka the LDAP
> stub, lib, or whatever) needs to know about this stuff in order to understand
> and properly handle this value syntax. Is this correct? Or not and why?
>
> Also, I'm curious as to whether there's anything to gain by following X.500's
> lead and using '$' as a separator char? I don't believe that any of the RFCs
> or I-Ds specify treating it specially, so I doubt it will be inadvertently
> specially treated as backslash apparently is.
If '$' isn't treated specially protocol-wise, then the only value of using it
as a separator is consistency with "tradition" and thus perhaps reuse of some
amount of attribute value parsing code out there, tho we don't really have a
large body of that ourselves.
thanks,
Jeff
ps: thanks to Mark Wilcox for experimenting with duplicating our attr value
issues.
> -------------------------------------------------------------------------------
> http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2252.txt
> .
> .
>
> 4.1. Common Encoding Aspects
>
> For the purposes of defining the encoding rules for attribute
> syntaxes, the following BNF definitions will be used. They are based
> on the BNF styles of RFC 822 [13].
>
> a = "a" / "b" / "c" / "d" / "e" / "f" / "g" / "h" / "i" /
> "j" / "k" / "l" / "m" / "n" / "o" / "p" / "q" / "r" /
> "s" / "t" / "u" / "v" / "w" / "x" / "y" / "z" / "A" /
> "B" / "C" / "D" / "E" / "F" / "G" / "H" / "I" / "J" /
> "K" / "L" / "M" / "N" / "O" / "P" / "Q" / "R" / "S" /
> "T" / "U" / "V" / "W" / "X" / "Y" / "Z"
>
> d = "0" / "1" / "2" / "3" / "4" /
> "5" / "6" / "7" / "8" / "9"
>
> hex-digit = d / "a" / "b" / "c" / "d" / "e" / "f" /
> "A" / "B" / "C" / "D" / "E" / "F"
>
> k = a / d / "-" / ";"
>
> p = a / d / """ / "(" / ")" / "+" / "," /
> "-" / "." / "/" / ":" / "?" / " "
>
> letterstring = 1*a
>
> numericstring = 1*d
>
> anhstring = 1*k
>
> keystring = a [ anhstring ]
>
> printablestring = 1*p
>
> space = 1*" "
>
> whsp = [ space ]
>
> utf8 = <any sequence of octets formed from the UTF-8 [9]
> transformation of a character from ISO10646 [10]>
>
> dstring = 1*utf8
>
> qdstring = whsp "'" dstring "'" whsp
>
> qdstringlist = [ qdstring *( qdstring ) ]
>
> qdstrings = qdstring / ( whsp "(" qdstringlist ")" whsp )
>
> .
> .
> 4.3. Syntaxes
> .
> .
> In encodings where an arbitrary string, not a Distinguished Name, is
> used as part of a larger production, and other than as part of a
> Distinguished Name, a backslash quoting mechanism is used to escape
> the following separator symbol character (such as "'", "$" or "#") if
> it should occur in that string. The backslash is followed by a pair
> of hexadecimal digits representing the next character. A backslash
> itself in the string which forms part of a larger syntax is always
> transmitted as '\5C' or '\5c'. An example is given in section 6.27.
> .
> .
>
> 6.27. Postal Address
>
> ( 1.3.6.1.4.1.1466.115.121.1.41 DESC 'Postal Address' )
>
> Values in this syntax are encoded according to the following BNF:
>
> postal-address = dstring *( "$" dstring )
>
> In the above, each dstring component of a postal address value is
> encoded as a value of type Directory String syntax. Backslashes and
> dollar characters, if they occur in the component, are quoted as
> described in section 4.3. Many servers limit the postal address to
> six lines of up to thirty characters.
>
> Example:
>
> 1234 Main St.$Anytown, CA 12345$USA
> \241,000,000 Sweepstakes$PO Box 1000000$Anytown, CA 12345$USA
>
> .
> .
>
> [ note that "\241,000,000" is intended to resolve to "$1000000" once the value
> string shown above is parsed out into its constituent components, which're
> delineated by the "$" chars. This implies to me that the client needs to know
> about this in order to understand and properly handle this value syntax. I
> don't know if that assertion is exactly true. ]
>
> -------------------------------------------------------------------------------