[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: UTF8 case insensitive matching
At 07:07 PM 10/25/00 +0200, Stig Venås wrote:
>On Wed, Oct 25, 2000 at 08:32:57AM -0700, Kurt D. Zeilenga wrote:
>> At 04:31 PM 10/25/00 +0200, Stig Venås wrote:
>> >code would have to be changed then. An easy but incorrect way
>> >out could be to simply not change casing for a character if
>> >the size is different. It would still be better than todays
>> >situation.
>>
>> We can certainly cheat in the short term....
>
>It's very tempting. But some people will need to recreate or at
>least reindex their database each time we change the normalization,
>right? So it shouldn't change too many times. It's a lot of work to
>do it properly though, and I would like to have something people can
>use soon.
We try to avoid releasing patches (sub-minor) that require reindexing,
deferring such changes to minor releases. If the cheat was such that
only those DN with non-ASCII characters were affected, then we might
push such out as a patch. However, I was caseIgnore support for
2.1 (a minor release).
>> Long term, we need to use the dnValidate()/dnNormalizer()
>> semantics instead of the dn_validate()/dn_normalize() semantics.
>
>Right.
Good. This means we both agree architecturally. I'm actually quite
happy with any incremental solution towards this end. I'm primarily
laying out some options.
>> In the mid term, to avoid the ripple effect of the
>> dn_validate()/dn_normalize() change, I suggest that temporary
>> versions of dn_validate()/dn_normalize() be implemented which
>> use dnValidate()/dnNormalize() to do the work but provide old
>> semantics otherwise.
>
>I don't get this. dnValidate() and dnNormalize() use dn_validate()/
>dn_normalize() today.
In the mid term, we'd reverse the dependency. dn_validate would
call dnValidate (to validate) and dnNormalize just to compare
lengths. If length of normalized DN is too bug, the DN would
be treated as invalid.
This is a "mid-term" solution. It hopefully avoids the rippling
of validation/normalization call changes though the code. However,
this ripple might be unavoidable.
>I see two possibilities:
>
>I cheat and add simplistic UTF8 code to dn_validate()/dn_normalize().
This is what I call the "short term" solution.
>I leave dn_validate()/dn_normalize() as they are and implement new
>versions of dnValidate()/dnNormalize() with more correct UTF8 code,
>allowing for the possibility that the size of the dn can increase.
>Then we must change a lot of surrounding code so that it uses
>dnValidate()/dnNormalize() instead of dn_validate()/dn_normalize().
This is what I call a "long term" solution.
>I have no illusions of implementing 100% perfect normalization code
>though.
Understandable! I'm happy with any forward steps.