perl and unicode
I have just found an interesting entry about perl and unicode:
http://perlgeek.de/en/article/encodings-and-unicode
Apart of the typical recommendation of using :encoding(UTF-8) as an input layer, it has a script for finding out which is the encoding of your shell:
#!/usr/bin/perl
#!/usr/bin/perl
use warnings;
use strict;
use Encode;
my @charsets = qw(utf-8 latin1 iso-8859-15 utf-16);
# some non-ASCII codepoints:
my $test = 'Ue: ' . chr(220) .'; Euro: '. chr(8364) . "\n";
for (@charsets){
print "$_: " . encode($_, $test);
}
Also, this entry explain how some perl function are expecting text string or ‘Codepoints’ instead of binary data and explains how to decode the strings properly to pass them to this functions.
This other entry also explain how to avoid the ‘wide character in print’ warning ahinea.com/en/tech/perl-unicode-struggle.html, and here there is an explanation of the difference between UTF-8 and utf8 (jeremy.zawodny.com/blog/archives/010546.html).
If you want to read more about encoding:
- A tutorial on character code issues
www.cs.tut.fi/~jkorpela/chars.html
- Character Set Tutorial
www.indwes.edu/Faculty/bcupp/things/Characters/tutorial.html -
UTF-8 and Unicode FAQ