Date: Wed, 21 Nov 2001 11:01:46 +0100
To: Stig Venaas <Stig@OpenLDAP.org>
From: Alejandra Moreno <alejandra.moreno@atrete.ch>
Subject: Re: Special characters in attribute values
Something isn't working on the utf2ANSI because if I encode it and decode
it I don't get the same string. Do you know what's wrong? Here are both
scripts. Thnaks.
# ======================================================================
sub ansi2UTF($){
my $string = shift;
my @chars = split(//, $string);
my $lowByte;
my $highByte;
my $i;
for ($i=0;$i<=$#chars;$i++){
my $assciCode = ord($chars[$i]);
if ($assciCode > 127){
$lowByte = $assciCode & 192;
$lowByte = $lowByte >> 6;
$lowByte = $lowByte & 3;
$lowByte += 192;
$highByte = $assciCode & 63;
$highByte += 128;
splice(@chars, $i, 1, (chr($lowByte), chr($highByte)));
$i++;
}
}
return(join('', @chars));
}
# ======================================================================
sub utf2ANSI($){
my $string = shift;
my @chars = split(//, $string);
my $i =0;
for ($i=0;$i<=$#chars;$i++){
my $byteCounter =0;
my $assciCode = ord($chars[$i]);
if ($assciCode > 127){
# Wieviele Bytes werden gebraucht
while ( ($assciCode & 128) == 128){
$byteCounter++;
$assciCode = $assciCode << 1;
}
#Solange Shiften, bis der erste 1 vorne steht
while ( ($assciCode & 128) != 128){
$assciCode = $assciCode << 1;
}
# Alle folgenden Bytes
my $j;
my $nextval;
for($j=1;$j<$byteCounter;$j++){
$nextval = ord($chars[$i+$j]) & 63;
$assciCode += $nextval;
splice(@chars, $i+$j, 1);
}
}
if ($byteCounter > 2){
$chars[$i] = '?';
}else{
$chars[$i] = chr($assciCode);
}
}
return(join('', @chars));
}
# ======================================================================
Alejandra
At 08:44 20.11.2001 +0100, you wrote:
On Mon, Nov 19, 2001 at 05:27:44PM +0100, Alejandra Moreno wrote:
> Does this means that you use this script to make the conversion before
> storing the info in the server? But how does the LDAP server know that
the
> info is encoded? Thanks,
I didn't check the script that carefully, but I assume so. The server
expects everything to be UTF-8, but UTF-8 encoding preserves ASCII
characters, so if you have a string with ASCII only, it is already
UTF-8 encoded (:
Stig