Skip to Content.
Sympa Menu

devel - Re: [sympa-developpers] Sympatic unicode ?

Subject: Developers of Sympa

List archive

Chronological Thread  
  • From: IKEDA Soji <address@concealed>
  • To: address@concealed
  • Subject: Re: [sympa-developpers] Sympatic unicode ?
  • Date: Thu, 8 Mar 2018 18:35:37 +0900

On Fri, 2 Mar 2018 18:56:24 +0900
IKEDA Soji <address@concealed> wrote:

> Secondarily important point is that **Text data is not unique**.
>
> Text data should be normalized and (if necessary) be case-folded
> at first we got it.
>
> - Unicode allows at least two sorts of normalization form. So we
> should normalize text data.

Example: when we run attached testutf.pl,

On xfs, ext4, NFS4, CIFS etc.:

$ perl testutf.pl
=> B\x{00e2}le
<= B\x{00e2}le
=> \x{0130}stanbul
<= \x{0130}stanbul
=> Ph\x{00fa} Qu\x{1ed1}c
<= Ph\x{00fa} Qu\x{1ed1}c

On HFS+:

$ perl testutf.pl
=> B\x{00e2}le
<= Ba\x{0302}le
=> \x{0130}stanbul
<= I\x{0307}stanbul
=> Ph\x{00fa} Qu\x{1ed1}c
<= Phu\x{0301} Quo\x{0302}\x{0301}c

HFS+ (macOS) allows pathnames with UTF-8, but holds them in a sort of
decomposed normalization form. Thus, even if the filesystem supports
Unicode, comparison between pathnames on memory and filesystem may
not always success.


Probably there may be similar cases with database.


Regards,
-- Soji


> Regards,
> -- Soji
>
>
> 2018/02/27 18:12、Marc Chantreux <address@concealed>のメール:
>
> > hello people,
> >
> > i really thing Sympatic should use
> >
> > use utf8:all;
> >
> > or at least
> >
> > use utf8;
> > use open qw< :encoding(UTF-8) :std >;
> >
> > what is your opinion about it ?
> >
> > regards,
> > marc
> >


--
株式会社 コンバージョン
ITソリューション部 システムソリューション1グループ 池田荘児
〒140-0014 東京都品川区大井1-49-15 アクセス大井町ビル4F
e-mail address@concealed TEL 03-6429-2880
https://www.conversion.co.jp/
use strict;
use warnings;
use feature qw(say);
use Encode qw(encode FB_PERLQQ);
use File::Temp qw(tempdir);
use utf8::all;

my @names = (
    'Bâle',
    'Ä°stanbul',
    'Phú Quốc',
);

my $tempdir = tempdir(CLEANUP => 1);
foreach my $name (@names) {
    say '=> ', encode('us-ascii', $name, FB_PERLQQ());

    open my $fh, '>', $tempdir . '/' . $name;
    close $fh;

    opendir(my $dh, $tempdir);
    my ($name) = grep {!/^[.]+$/} readdir $dh;
    say '<= ', encode('us-ascii', $name, FB_PERLQQ());
    unlink $tempdir . '/' . $name;
    closedir $dh;
}

    



Archive powered by MHonArc 2.6.19+.

Top of Page