Subject: Developers of Sympa
List archive
Re: [sympa-dev] Charset/encoding for e-mail message
- From: Olivier Salaün - CRU <address@concealed>
- To: Hatuka*nezumi - IKEDA Soji <address@concealed>
- Cc: address@concealed
- Subject: Re: [sympa-dev] Charset/encoding for e-mail message
- Date: Thu, 19 Oct 2006 17:49:13 +0200
Hatuka*nezumi - IKEDA Soji wrote:
[...]
I didn't make sure whether it breaks S/MIME-encrypted data or not (reformat_message() will be called in mail_file(), just before sending()). If it is an issue, bodies in the multipart/signed or multipart/encrypted parts won't be touched by this:We'll probably need to add similar code...thanks for the patch. --- >8 --- >8 --- >8 --- >8 --- >8 --- >8 --- >8 --- >8 --- >8 ---
--- src/mail.pm 18 Oct 2006 10:07:46 -0000 1.37
+++ src/mail.pm 18 Oct 2006 12:16:05 -0000
@@ -822,4 +822,5 @@
my $eff_type = $part->effective_type;
+ return $part if $eff_type =~ m{^multipart/(signed|encrypted)$};
if ($part->parts) {
--- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< ---
[...]
<<snip>>
% iconv -f utf-8 -t eucJP /tmp/ja.po | sed -e 's/; charset=UTF-8/; charset=EUC-JP/i' > po/ja.po will give desired one. I attached the result, with revised translations by myself (this may be useful for tests discussed below).I also did (manually) change the charset in the PO files. Strangely your ja.po file compiles perfectly ; I'll commit it in CVS, thanks. [...]
Strings used for interpolation on TT2 are interpreted as if they
are encoded by ISO-8859-1. Anyhow curious this is ---
- When a byte string ``é'' (latin small letter e with acute) encoded
by UTF-8, "\xC3\xA9", is fed to TT2, output contains ``é'',
"\xC3\x83\xC2\xA9" (UTF-8 representation of ISO-8859-1
interpretation of "\xC3\xA9").
- When a Unicode string ``é'', "\x{00E9}", is fed, output contains
``é'', "\x{00C3}\x{00A9}" (Perl internal representation of
ISO-8859-1 interpretation of the Unicode string with utf8 flag
forced to be off).
I'm wondering if your problem might be related to something I fixed
yesterday in the CVS HEAD :The logging subroutine (do_log()) does recode its parameters from UTF-8 to the filesystem_encoding. (This is required because syslogd does not seem to cope well with UTF-8) I found out, while applying your patch, that do_log() was not only recoding the values of the parameters but also the variables themselves. I fixed this. Therefore can you have a try with the latest CVS HEAD before we go on investigations on this topic ? [...]
The problems are ---
(a) Not reproduced on:
Web:
- List subject.
INFO Service message:
- List subject.
(b) Reproduced on:
INFO Service message:
- Description of list.
I was not able to reproduce this problem ; but maybe I need to try with
non-ISO-8859-1 data... Web:
- Help pages installed into web_tt2/ja_JP/ (UTF-8 is used).
Afterwards, I made a quick hack on src/tt2.pl (as patch
attached). Then this seems not to be reproduced, probably.
If the problem persist, please provide us a step by step way to
reproduce the problem.(c) Following seem to be coumpound of another factors; they are
encoded by charset got by gettext("_charset_") then interpreted
as ISO-8859-1:
Web:
- Language names in language box.
- Dropdown box of "digest" parameter.
- Perhaps anywhere strftime()'ed date appear.
INFO Service message:
- Days of Digest.
For example ``日本語'' is shown as ``ÆüËܸì''
("\xC6\xFC\xCB\xDC\xB8\xEC' by EUC-JP and ISO-8859-1,
respectively) in language box. ``Español'' is truncated
to be ``Espa''.
I'll try to find out what is causing this...
I mean that, for example, a header: To: Modérateurs de la liste somelist <address@concealed> will be encoded as: To: =?ISO-8859-1?Q?Mod=E9rateurs_de_la_liste_somelist_<somelist-editor@so?= =?ISO-8859-1?Q?me.dom.ain>?= N.B.: This result _is_ MIME-compliant, if it was _not_ a structured header field. Though original MIME::Words takes care of natural word separators (i.e. spaces), such separators are not necessarily obvious in non-word-spacing languages (CJK, Thai, ...). On TT2 templates, this will be avoided by attached (second) patch, but this may not be generalized solution.Another solution is to put the [% FILTER qencode %] at the right place in the TT2 files, example below : To: [% FILTER qencode %][%|loc(list.name)%]Moderators of list %1[%END%][%END%] <[% list.name %]-editor@[% list.host %]>I've fixed the mail_tt2 files according to this. I don't know if we still need your patch... I believe that the structured header fields in general need to be parsed/constructed by another functions not just only processing B/Q encodings.What other solution do you propose ? All modifications described in this message are compiled into the last attachment. An actual (maybe tentative) installation is running here: http://sympa.nezumi.nu/sympa I will reply to reminder of your message later. Thanks again.We'll have a close look at the patches your provided, thanks. BTW : In your previous message, you reported a problem related to duplicated header fields. I found out that the problem was related to our mail::mail_file() subroutine incorrectly detecting folded header fields. Here is the patch : http://sourcesup.cru.fr/cgi/viewcvs.cgi/sympa/src/mail.pm?r1=1.37&r2=1.38&makepatch=1&diff_format=u |
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/15/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 10/18/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/19/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 10/19/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/20/2006
- Re: [sympa-dev] Charset/encoding for e-mail message, Olivier Salaün - CRU, 10/27/2006
- Re: [sympa-dev] Charset/encoding for e-mail message, Hatuka*nezumi - IKEDA Soji, 10/26/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/20/2006
-
Re: [sympa-dev] language names (was Charset/encoding for e-mail message),
Olivier Salaün - CRU, 10/20/2006
-
Re: [sympa-translation] Re: [sympa-dev] language names (was Charset/encoding for e-mail message),
Hatuka*nezumi - IKEDA Soji, 10/26/2006
- Re: [sympa-translation] Re: [sympa-dev] language names (was Charset/encoding for e-mail message), Olivier Salaün - CRU, 10/27/2006
-
Re: [sympa-translation] Re: [sympa-dev] language names (was Charset/encoding for e-mail message),
Hatuka*nezumi - IKEDA Soji, 10/26/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 10/19/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Sylvain Amrani, 10/20/2006
- Re: [sympa-dev] Charset/encoding for e-mail message, Hatuka*nezumi - IKEDA Soji, 10/21/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/19/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 10/18/2006
Archive powered by MHonArc 2.6.19+.