perl and unicode

I have just found an interesting entry about perl and unicode:

http://perlgeek.de/en/article/encodings-and-unicode

Apart of the typical recommendation of using :encoding(UTF-8) as an input layer, it has a script for finding out which is the encoding of your shell:

#!/usr/bin/perl
#!/usr/bin/perl
use warnings;
use strict;
use Encode;

my @charsets = qw(utf-8 latin1 iso-8859-15 utf-16);

# some non-ASCII codepoints:
my $test = 'Ue: ' . chr(220) .'; Euro: '. chr(8364) . "\n";

for (@charsets){
    print "$_: " . encode($_, $test);
}

Also, this entry explain how some perl function are expecting text string or ‘Codepoints’ instead of binary data and explains how to decode the strings properly to pass them to this functions.

This other entry also explain how to avoid the ‘wide character in print’ warning ahinea.com/en/tech/perl-unicode-struggle.html, and here there is an explanation of the difference between UTF-8 and utf8 (jeremy.zawodny.com/blog/archives/010546.html).

If you want to read more about encoding:

Finally al link to perlmonks where it is explained why the pragma :utf8 is insecure and you should use :encoding(UTF-8)
www.perlmonks.org/index.pl?node_id=731943


Leave a Reply


¡IMPORTANTE! Responde a la pregunta: ¿Cuál es el valor de 13 3 ?