[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: ISO Latin Strings
> Do you know where I could get somekind of encoder/decoder thing? So
> that I could store it as something that is supportted in current
> releases, and then map it back to an ISO Latin string when I retrieve
> it?
Here are the scripts I use. Replace /local/bin/perl5 with the pathname
of a perl 5 release on your machine.
-------- iso2utf --------
#!/local/bin/perl5 -wp
# Convert iso8859-1 to UTF-8
s/([\200-\377])/pack('CC', 0xC0 + (ord($1) >> 6), (ord($1) & 0xBF))/ge;
-------- utf2iso --------
#!/local/bin/perl5 -p
# Convert UTF-8 to iso8859-1, or to "{" "UTF8:" utf8-char "}" and return error.
# Note: Do not remove the code that checks if the UTF-8 sequence
# is valid and can be converted to iso8859-1.
# Otherwise filters like this can be fooled into converting
# some 8-bit chars to \0 or control characters.
# 1. octet in non-ASCII char should be [\300-\377]. However, we check
# for [\200-...] in case the input starts in the middle of an UTF-8 char.
s/([\200-\377][\200-\277]*)/&utf2iso($1)/ge;
sub utf2iso {
my($first,@rest) = unpack('C*', $_[0]);
$first -= 0xC2;
if (@rest != 1 || ($first & ~1)) {
warn "\nutf2iso: Non-iso8859-1 characters(s) in text.\n" unless $w++;
return "{UTF8:$_[0]}";
}
chr($first * 0x40 + $rest[0]);
}
END { die "utf2iso: $w non-iso8859-1 characters.\n" if $w && $w > 1; }