Subject: Developers of Sympa
List archive
- From: IKEDA Soji <address@concealed>
- To: address@concealed
- Subject: Re: [sympa-developpers] Sympatic unicode ?
- Date: Thu, 8 Mar 2018 18:35:37 +0900
On Fri, 2 Mar 2018 18:56:24 +0900
IKEDA Soji <address@concealed> wrote:
> Secondarily important point is that **Text data is not unique**.
>
> Text data should be normalized and (if necessary) be case-folded
> at first we got it.
>
> - Unicode allows at least two sorts of normalization form. So we
> should normalize text data.
Example: when we run attached testutf.pl,
On xfs, ext4, NFS4, CIFS etc.:
$ perl testutf.pl
=> B\x{00e2}le
<= B\x{00e2}le
=> \x{0130}stanbul
<= \x{0130}stanbul
=> Ph\x{00fa} Qu\x{1ed1}c
<= Ph\x{00fa} Qu\x{1ed1}c
On HFS+:
$ perl testutf.pl
=> B\x{00e2}le
<= Ba\x{0302}le
=> \x{0130}stanbul
<= I\x{0307}stanbul
=> Ph\x{00fa} Qu\x{1ed1}c
<= Phu\x{0301} Quo\x{0302}\x{0301}c
HFS+ (macOS) allows pathnames with UTF-8, but holds them in a sort of
decomposed normalization form. Thus, even if the filesystem supports
Unicode, comparison between pathnames on memory and filesystem may
not always success.
Probably there may be similar cases with database.
Regards,
-- Soji
> Regards,
> -- Soji
>
>
> 2018/02/27 18:12、Marc Chantreux <address@concealed>のメール:
>
> > hello people,
> >
> > i really thing Sympatic should use
> >
> > use utf8:all;
> >
> > or at least
> >
> > use utf8;
> > use open qw< :encoding(UTF-8) :std >;
> >
> > what is your opinion about it ?
> >
> > regards,
> > marc
> >
--
株式会社 コンバージョン
ITソリューション部 システムソリューション1グループ 池田荘児
〒140-0014 東京都品川区大井1-49-15 アクセス大井町ビル4F
e-mail address@concealed TEL 03-6429-2880
https://www.conversion.co.jp/
use strict; use warnings; use feature qw(say); use Encode qw(encode FB_PERLQQ); use File::Temp qw(tempdir); use utf8::all; my @names = ( 'Bâle', 'Ä°stanbul', 'Phú Quá»c', ); my $tempdir = tempdir(CLEANUP => 1); foreach my $name (@names) { say '=> ', encode('us-ascii', $name, FB_PERLQQ()); open my $fh, '>', $tempdir . '/' . $name; close $fh; opendir(my $dh, $tempdir); my ($name) = grep {!/^[.]+$/} readdir $dh; say '<= ', encode('us-ascii', $name, FB_PERLQQ()); unlink $tempdir . '/' . $name; closedir $dh; }
-
Re: [sympa-developpers] Sympatic unicode ?
, (continued)
-
Re: [sympa-developpers] Sympatic unicode ?,
IKEDA Soji, 03/14/2018
- Re: [sympa-developpers] Sympatic unicode ?, IKEDA Soji, 03/19/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Soji Ikeda, 03/02/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Marc Chantreux, 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Stefan Hornburg (Racke), 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Soji Ikeda, 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Stefan Hornburg (Racke), 03/08/2018
- Re: [sympa-developpers] Sympatic unicode ?, IKEDA Soji, 03/09/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Stefan Hornburg (Racke), 03/08/2018
- Re: [sympa-developpers] Sympatic unicode ?, Marc Chantreux, 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Soji Ikeda, 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Stefan Hornburg (Racke), 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Marc Chantreux, 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
IKEDA Soji, 03/02/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
IKEDA Soji, 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Stefan Hornburg (Racke), 03/08/2018
- Re: [sympa-developpers] Sympatic unicode ?, IKEDA Soji, 03/09/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Stefan Hornburg (Racke), 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
IKEDA Soji, 03/08/2018
- Re: [sympa-developpers] Sympatic unicode ?, David Verdin, 03/14/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
IKEDA Soji, 03/14/2018
Archive powered by MHonArc 2.6.19+.