perl and unicode

20 September 2009

I have just found an interesting entry about perl and unicode:

Apart of the typical recommendation of using :encoding(UTF-8) as an input layer, it has a script for finding out which is the encoding of your shell:

use warnings;
use strict;
use Encode;

my @charsets = qw(utf-8 latin1 iso-8859-15 utf-16);

# some non-ASCII codepoints:
my $test = 'Ue: ' . chr(220) .'; Euro: '. chr(8364) . "\n";

for (@charsets){
    print "$_: " . encode($_, $test);

Also, this entry explain how some perl function are expecting text string or ‘Codepoints’ instead of binary data and explains how to decode the strings properly to pass them to this functions.

This other entry also explain how to avoid the ‘wide character in print’ warning, and here there is an explanation of the difference between UTF-8 and utf8 (

If you want to read more about encoding:

Finally al link to perlmonks where it is explained why the pragma :utf8 is insecure and you should use :encoding(UTF-8)