[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Accented characters in LDAP directory

To: "Frank Sonnemans" <fs.mail@wanadoo.be>
Subject: Re: Accented characters in LDAP directory
From: Alejandra Moreno <alejandra.moreno@atrete.ch>
Date: Mon, 25 Feb 2002 11:17:58 +0100
Cc: openldap-software@OpenLDAP.org
In-reply-to: <007001c1bbd0$71135a90$0101a8c0@scuba>
References: <5.1.0.14.2.20020222185432.04725d80@mailhost.atrete.ch>

Hi!

I attach the file conv.pl for conversion and the file conv2.pl to un convert. Hope it helps!

Regards,
Alejandra

At 19:40 22.02.2002 +0100, you wrote:

Hi!,

Thanks for the reply. I would like to get your scripts, since they would work independent from the platform.

My BSD servers would probably support iconv, but I had big trouble getting the libiconv to work on Cygwin.

Thanks in advance.

Frank

----- Original Message -----
From: Alejandra Moreno
To: Frank Sonnemans
Cc: openldap-software@OpenLDAP.org
Sent: Friday, February 22, 2002 19:00
Subject: Re: Accented characters in LDAP directory

Hi!

In order to be able to store accented characters in OpenLdap you need to utf8 encode them. In Linux you can use this command:

iconv -f iso-8859-1 -t utf-8 filename.ldif

It is really quick, because at first I used some perl scripts but they were too slow. Nevertheless I have these scripts if you are interested.

Regards,
Alejandra

At 18:44 22.02.2002 +0100, you wrote:

I converted a large database of contacts to the LDIF format in order to import them into the OpenLDAP server. Unfortunately many contact names and addresses have accented characters in them. These records get refused when I try to import them with ldapadd.

Is there a way around this? I can't imagine that LDAP does not allow users to write peoples names correctly.

Regards,

Frank

______________________________________________________________________
Alejandra Moreno Espinar
at rete ag

mailto:alejandra.moreno@atrete.ch, http://www.atrete.ch
snail mail: Oberdorfstrasse 2, P.O. Box 674, 8024 Zurich, Switzerland
voice: +41-1-266 55 55, direct: +41-1-266 55 91, fax: +41-1-266 55 88
_____________________________________________________________________

______________________________________________________________________
Alejandra Moreno Espinar
at rete ag

mailto:alejandra.moreno@atrete.ch, http://www.atrete.ch
snail mail: Oberdorfstrasse 2, P.O. Box 674, 8024 Zurich, Switzerland
voice: +41-1-266 55 55, direct: +41-1-266 55 91, fax: +41-1-266 55 88
_____________________________________________________________________

#!/usr/bin/perl -w

#------------------------------------------------------------------------------
# Settings
#------------------------------------------------------------------------------

$MAX_FILE_LEN = 10000000;


#------------------------------------------------------------------------------
# Main
#------------------------------------------------------------------------------

if ($#ARGV+1 < 2)
{
  print "Usage: conv.pl [source file] [destination file]\n";
  die;
}

$srcFile = $ARGV[0]; # Argument 1 is source file
$dstFile = $ARGV[1]; # Argument 2 is destination file

$filedata      = &readFile($srcFile, $MAX_FILE_LEN); # read file contents
$convertedData = &utf2ANSI($filedata); # convert file data

open(OUTFILE, ">".$dstFile);  # save converted data to file
print OUTFILE $convertedData;
close(OUTFILE);

exit; # end


#------------------------------------------------------------------------------
# Subroutines
#------------------------------------------------------------------------------

# fileContents = readFile(fileName, maxBytesToRead):
#
# example:
# $dbdata = &readFile($ARGV[0], $MAX_DB_SIZE);
#
sub readFile
{
  my ($filename, $maxdata) = ($_[0], $_[1]);
  my $readdata;
  if ($maxdata == 0) {die "readFile(): You want me to read 0 bytes from a file?\n";}
  open(DBFILE, "<$filename") || die "readFile(): Cannot open file.\n";
  read(DBFILE, $readdata, $maxdata);
  close(DBFILE) || die "readFile(): Cannot close file.\n";
  return $readdata;
}


sub utf2ANSI($){
        my $string = shift;
        my @chars = split(//, $string);
        my $i =0;

        for ($i=0;$i<=$#chars;$i++){
                my $byteCounter =0;
                my $assciCode = ord($chars[$i]);

                if ($assciCode > 127){
                        # Wieviele Bytes werden gebraucht
                        while ( ($assciCode & 128) == 128){
                                $byteCounter++;
                                $assciCode = $assciCode << 1;
                        }
                        #Solange Shiften, bis der erste 1 vorne steht
                        while ( ($assciCode & 128) != 128){
                                $assciCode = $assciCode << 1;
                        }
                        # Alle folgenden Bytes
                        my $j;
                        my $nextval;
                        for($j=1;$j<$byteCounter;$j++){
                                $nextval = ord($chars[$i+$j]) & 63;
                                $assciCode += $nextval;
                                splice(@chars, $i+$j, 1);
                        }
                }
                if ($byteCounter > 2){
                        $chars[$i] = '?';
                }else{
                        $chars[$i] = chr($assciCode);
                }
        }
        return(join('', @chars));
}

#!/usr/bin/perl -w

#------------------------------------------------------------------------------
# Settings
#------------------------------------------------------------------------------

$MAX_FILE_LEN = 10000000;


#------------------------------------------------------------------------------
# Main
#------------------------------------------------------------------------------

if ($#ARGV+1 < 2)
{
  print "Usage: conv.pl [source file] [destination file]\n";
  die;
}

$srcFile = $ARGV[0]; # Argument 1 is source file
$dstFile = $ARGV[1]; # Argument 2 is destination file

$filedata      = &readFile($srcFile, $MAX_FILE_LEN); # read file contents
$convertedData = &ansi2UTF($filedata); # convert file data

open(OUTFILE, ">".$dstFile);  # save converted data to file
print OUTFILE $convertedData;
close(OUTFILE);

exit; # end


#------------------------------------------------------------------------------
# Subroutines
#------------------------------------------------------------------------------

# fileContents = readFile(fileName, maxBytesToRead):
#
# example:
# $dbdata = &readFile($ARGV[0], $MAX_DB_SIZE);
#
sub readFile
{
  my ($filename, $maxdata) = ($_[0], $_[1]);
  my $readdata;
  if ($maxdata == 0) {die "readFile(): You want me to read 0 bytes from a file?\n";}
  open(DBFILE, "<$filename") || die "readFile(): Cannot open file.\n";
  read(DBFILE, $readdata, $maxdata);
  close(DBFILE) || die "readFile(): Cannot close file.\n";
  return $readdata;
}

sub ansi2UTF($){
        my $string = shift;
        my @chars = split(//, $string);
        my $lowByte;
        my $highByte;
        my $i;

        for ($i=0;$i<=$#chars;$i++){
                my $assciCode = ord($chars[$i]);
                if ($assciCode > 127){
                        $lowByte = $assciCode & 192;
                        $lowByte = $lowByte >> 6;
                        $lowByte = $lowByte  & 3;
                        $lowByte += 192;
                        $highByte = $assciCode & 63;
                        $highByte += 128;
                        splice(@chars, $i, 1, (chr($lowByte), chr($highByte)));
                        $i++;
                }
        }
        return(join('', @chars));
}

References:
- Re: Accented characters in LDAP directory
  - From: Alejandra Moreno <alejandra.moreno@atrete.ch>
- Re: Accented characters in LDAP directory
  - From: "Frank Sonnemans" <fs.mail@wanadoo.be>

Prev by Date: Re: replica log not being written
Next by Date: Re: New list member.
Index(es):
- Chronological
- Thread