Skip to Content.
Sympa Menu

devel - Re: [sympa-developpers] Sympatic unicode ?

Subject: Developers of Sympa

List archive

Chronological Thread  
  • From: IKEDA Soji <address@concealed>
  • To: Marc Chantreux <address@concealed>
  • Cc: address@concealed
  • Subject: Re: [sympa-developpers] Sympatic unicode ?
  • Date: Fri, 2 Mar 2018 18:56:24 +0900

Sorry for posting unfinished text.

Hi again,

In below, "text data" means Unicode text, the string with "utf8" flag
set.

*

I think the most important point is that **mail messages are not text
data**.

They should not be read / written through :utf8 layer but :bytes
layer. For example, following operations should use :bytes layer:

- Opening messages on disk.

- Opening pipe to sendmail.

* To extract text data (content of body, value of unstructured field,
parameter value of structured field), appropriate method to decode
them to text data should be used. Additionally note that:

- We should consider the case that conversion between legacy
character set and Unicode may fail.

- Round-trip conversion may not be guaranteed in general, and
decoded text is no longer identical to source byte string.

* To modify / construct a message, we should generate byte string,
instead of text data.


Secondarily important point is that **Text data is not unique**.

Text data should be normalized and (if necessary) be case-folded
at first we got it.

- Unicode allows at least two sorts of normalization form. So we
should normalize text data.

- Round-trip conversion is not guaranteed by case folding (lowercase
and uppercase conversions).

* So, we face similar problem again: text data we got is no longer
identical to source text data.


N.B.: We probablly may need more higher level of normalization (such
as domain name normalization). But I omit such things by now.


Regards,
-- Soji


2018/02/27 18:12、Marc Chantreux <address@concealed>のメール:

> hello people,
>
> i really thing Sympatic should use
>
> use utf8:all;
>
> or at least
>
> use utf8;
> use open qw< :encoding(UTF-8) :std >;
>
> what is your opinion about it ?
>
> regards,
> marc
>



Archive powered by MHonArc 2.6.19+.

Top of Page