Skip to Content.
Sympa Menu

devel - Re: [sympa-developpers] Sympatic unicode ?

Subject: Developers of Sympa

List archive

Chronological Thread  
  • From: Soji Ikeda <address@concealed>
  • To: "Stefan Hornburg (Racke)" <address@concealed>
  • Cc: address@concealed
  • Subject: Re: [sympa-developpers] Sympatic unicode ?
  • Date: Fri, 9 Mar 2018 00:24:18 +0900

racke,

2018/03/08 23:58、Stefan Hornburg (Racke) <address@concealed>のメール:

>>> On 03/08/2018 12:52 PM, Marc Chantreux wrote:
>>> On Fri, Mar 02, 2018 at 05:55:22PM +0900, Soji Ikeda wrote:
>>> They should not be read / written through :utf8 layer, but :bytes layer.
>>> E.g. following operations should use :bytes layer:
>>> - Opening messages on disk.
>>> - Opening pipe to sendmail.
>
> We should rather use CPAN modules than opening a pipe to sendmail ...

I don’t mind if it is performed by wrapping module. I described the case
that :utf8 layer should not be used (see also comment blow).

>>
>> what's the point of using :bytes everywhere just because mails should be
>> serialized this way ?
>>
>> those special cases (even if happens frequently) should be wrapped into
>> functions that ensures the correctness.
>>
>> regards
>> marc
>
> Yes, I would agree with Marc.
>
> We are doing the following inside our Dancer apps:
>
>
> # the dumper shows \x{20ac}, so html and text are decoded.
> email {
> %args,
> body => encode( 'UTF-8', $text ),
> type => 'text',
> attach => {
> Charset => 'utf-8',
> Data => encode( 'UTF-8', $html ),
> Encoding => "quoted-printable",
> Type => "text/html"
> },
> multipart => 'alternative',
> };
>
> Here "email" is basically a wrapper around Email::Sender
> (https://metacpan.org/pod/Dancer2::Plugin::Email#DESCRIPTION).

In that case email is crafted by program itself: Internal encoding may be
Unicode and resulting message may be freely encoded to UTF-8 (or other char
set).

However we have to process incoming messages possibly encoded by legacy
chaset and transfer-encoding. Because we should keep the content unchanged
octet-by-octet (or we might break integrity of signature etc.), it may not be
decoded to Unicode. After all, we have to treat message as byte string, not
text data.

Does my description miss the point?

Regards,
— Soji

> Regards
> Racke
>
> --
> Ecommerce and Linux consulting + Perl and web application programming.
> Debian and Sympa administration. Provisioning with Ansible.





Archive powered by MHonArc 2.6.19+.

Top of Page