Subject: Developers of Sympa
List archive
- From: IKEDA Soji <address@concealed>
- To: "Stefan Hornburg (Racke)" <address@concealed>
- Cc: address@concealed
- Subject: Re: [sympa-developpers] Sympatic unicode ?
- Date: Fri, 9 Mar 2018 15:20:56 +0900
On Thu, 8 Mar 2018 15:54:25 +0100
"Stefan Hornburg (Racke)" <address@concealed> wrote:
> On 03/08/2018 10:35 AM, IKEDA Soji wrote:
> > On Fri, 2 Mar 2018 18:56:24 +0900
> > IKEDA Soji <address@concealed> wrote:
> >
> >> Secondarily important point is that **Text data is not unique**.
> >>
> >> Text data should be normalized and (if necessary) be case-folded
> >> at first we got it.
> >>
> >> - Unicode allows at least two sorts of normalization form. So we
> >> should normalize text data.
> >
> > Example: when we run attached testutf.pl,
> >
> > On xfs, ext4, NFS4, CIFS etc.:
> >
> > $ perl testutf.pl
> > => B\x{00e2}le
> > <= B\x{00e2}le
> > => \x{0130}stanbul
> > <= \x{0130}stanbul
> > => Ph\x{00fa} Qu\x{1ed1}c
> > <= Ph\x{00fa} Qu\x{1ed1}c
> >
> > On HFS+:
> >
> > $ perl testutf.pl
> > => B\x{00e2}le
> > <= Ba\x{0302}le
> > => \x{0130}stanbul
> > <= I\x{0307}stanbul
> > => Ph\x{00fa} Qu\x{1ed1}c
> > <= Phu\x{0301} Quo\x{0302}\x{0301}c
> >
> > HFS+ (macOS) allows pathnames with UTF-8, but holds them in a sort of
> > decomposed normalization form. Thus, even if the filesystem supports
> > Unicode, comparison between pathnames on memory and filesystem may
> > not always success.
> >
> >
> > Probably there may be similar cases with database.
>
> There is certainly a difference to databases, because with databases you
> can specify
> the encoding you want (e.g. xxx_enable_utf8 flags in DBI/DBD).
Encoding does not matter on example above.
With Unicode, validation and/or normalization of text data can be
performed on various subsystems. I presented an example I have known.
> And I'm not sure whether your script does the right thing.
That script uses utf8::all, creates path and reads directory entry.
readdir() certainly returns Unicode string, but since it returns
what filesystem holds, results are affected by normalization.
If there was right thing, it is preventing effect by filesystem.
For example, we can "escape" non-ASCII charcters in path names,
as current code does.
Regards,
-- Soji
> Regards
> Racke
>
> >
> >
> > Regards,
> > -- Soji
> >
> >
> >> Regards,
> >> -- Soji
> >>
> >>
> >> 2018/02/27 18:12、Marc Chantreux <address@concealed>のメール:
> >>
> >>> hello people,
> >>>
> >>> i really thing Sympatic should use
> >>>
> >>> use utf8:all;
> >>>
> >>> or at least
> >>>
> >>> use utf8;
> >>> use open qw< :encoding(UTF-8) :std >;
> >>>
> >>> what is your opinion about it ?
> >>>
> >>> regards,
> >>> marc
> >>>
> >
> >
>
>
> --
> Ecommerce and Linux consulting + Perl and web application programming.
> Debian and Sympa administration. Provisioning with Ansible.
--
株式会社 コンバージョン
ITソリューション部 システムソリューション1グループ 池田荘児
〒140-0014 東京都品川区大井1-49-15 アクセス大井町ビル4F
e-mail address@concealed TEL 03-6429-2880
https://www.conversion.co.jp/
-
Re: [sympa-developpers] Sympatic unicode ?
, (continued)
-
Re: [sympa-developpers] Sympatic unicode ?,
Soji Ikeda, 03/02/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Marc Chantreux, 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Stefan Hornburg (Racke), 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Soji Ikeda, 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Stefan Hornburg (Racke), 03/08/2018
- Re: [sympa-developpers] Sympatic unicode ?, IKEDA Soji, 03/09/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Stefan Hornburg (Racke), 03/08/2018
- Re: [sympa-developpers] Sympatic unicode ?, Marc Chantreux, 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Soji Ikeda, 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Stefan Hornburg (Racke), 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Marc Chantreux, 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
IKEDA Soji, 03/02/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
IKEDA Soji, 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Stefan Hornburg (Racke), 03/08/2018
- Re: [sympa-developpers] Sympatic unicode ?, IKEDA Soji, 03/09/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Stefan Hornburg (Racke), 03/08/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
IKEDA Soji, 03/08/2018
- Re: [sympa-developpers] Sympatic unicode ?, David Verdin, 03/14/2018
-
Re: [sympa-developpers] Sympatic unicode ?,
Soji Ikeda, 03/02/2018
Archive powered by MHonArc 2.6.19+.