Subject: Developers of Sympa
List archive
Re: [sympa-dev] Charset/encoding for e-mail message
- From: Olivier Salaün - CRU <address@concealed>
- To: Hatuka*nezumi - IKEDA Soji <address@concealed>
- Cc: address@concealed
- Subject: Re: [sympa-dev] Charset/encoding for e-mail message
- Date: Thu, 19 Oct 2006 17:49:13 +0200
Hatuka*nezumi - IKEDA Soji wrote:
[...]
- src/mail.pm:reformat_message(): reformat outgoing service messages (along with charset conversion for Japanese messages).Does reformat_message() also alter message bodies ? There might be issues with S:MIME signed messages that should not be altered, or the signature is broken... I didn't make sure whether it breaks S/MIME-encrypted data or not (reformat_message() will be called in mail_file(), just before sending()). If it is an issue, bodies in the multipart/signed or multipart/encrypted parts won't be touched by this:We'll probably need to add similar code...thanks for the patch. --- >8 --- >8 --- >8 --- >8 --- >8 --- >8 --- >8 --- >8 --- >8 --- --- src/mail.pm 18 Oct 2006 10:07:46 -0000 1.37 +++ src/mail.pm 18 Oct 2006 12:16:05 -0000 @@ -822,4 +822,5 @@ my $eff_type = $part->effective_type; + return $part if $eff_type =~ m{^multipart/(signed|encrypted)$}; if ($part->parts) { --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< ---[...] * To support charset conversion for Japanese messages, _charset_ of po/ja.po must be changed from UTF-8 to EUC-JP (how about Rosetta stuff?).[...] Anyway I had a try trans coding ja.po from utf-8 to euc-jp using iconv, but without success : % iconv -f utf-8 -t eucJP -o /tmp/ja.po po/ja.po The problem I got were at the PO catalog compiling time : <<snip>> Therefore I aborted the process. If you have a clue of what the problem might be... % iconv -f utf-8 -t eucJP /tmp/ja.po | sed -e 's/; charset=UTF-8/; charset=EUC-JP/i' > po/ja.po will give desired one. I attached the result, with revised translations by myself (this may be useful for tests discussed below).I also did (manually) change the charset in the PO files. Strangely your ja.po file compiles perfectly ; I'll commit it in CVS, thanks. [...] Problem --- In addtion to patch described above, I tried replacing MIME::Words::decode_mimewords(STRING) with MIME::EncWords::decode_mimewords(STRING, Charset=>CHARSET). But I cannot clarify what CHARSET may be used to feed decoded data to TT2 templates. By any charset (including _UNICODE_), TT2 seems to break fed data. See bug #1059.Can you provide a bit more explanations regarding this problem ? Strings used for interpolation on TT2 are interpreted as if they are encoded by ISO-8859-1. Anyhow curious this is --- - When a byte string ``é'' (latin small letter e with acute) encoded by UTF-8, "\xC3\xA9", is fed to TT2, output contains ``é'', "\xC3\x83\xC2\xA9" (UTF-8 representation of ISO-8859-1 interpretation of "\xC3\xA9"). - When a Unicode string ``é'', "\x{00E9}", is fed, output contains ``é'', "\x{00C3}\x{00A9}" (Perl internal representation of ISO-8859-1 interpretation of the Unicode string with utf8 flag forced to be off).I'm wondering if your problem might be related to something I fixed yesterday in the CVS HEAD : The logging subroutine (do_log()) does recode its parameters from UTF-8 to the filesystem_encoding. (This is required because syslogd does not seem to cope well with UTF-8) I found out, while applying your patch, that do_log() was not only recoding the values of the parameters but also the variables themselves. I fixed this. Therefore can you have a try with the latest CVS HEAD before we go on investigations on this topic ? [...] The problems are --- (a) Not reproduced on: Web: - List subject. INFO Service message: - List subject. (b) Reproduced on: INFO Service message: - Description of list.I was not able to reproduce this problem ; but maybe I need to try with non-ISO-8859-1 data... Web: - Help pages installed into web_tt2/ja_JP/ (UTF-8 is used). Afterwards, I made a quick hack on src/tt2.pl (as patch attached). Then this seems not to be reproduced, probably.If the problem persist, please provide us a step by step way to reproduce the problem. (c) Following seem to be coumpound of another factors; they are encoded by charset got by gettext("_charset_") then interpreted as ISO-8859-1: Web: - Language names in language box. - Dropdown box of "digest" parameter. - Perhaps anywhere strftime()'ed date appear. INFO Service message: - Days of Digest. For example ``日本語'' is shown as ``ÆüËܸì'' ("\xC6\xFC\xCB\xDC\xB8\xEC' by EUC-JP and ISO-8859-1, respectively) in language box. ``Español'' is truncated to be ``Espa''.I'll try to find out what is causing this... Other known bugs --- - When address headers of service messages include non-ASCII characters, headers will be encoded maliciously. It is advisable that structured headers (address fields, parenthesized comments, parameters,...) will be handled separately by some appropriate functions.I'm afraid I don't understand what you mean ? I mean that, for example, a header: To: Modérateurs de la liste somelist <address@concealed> will be encoded as: To: =?ISO-8859-1?Q?Mod=E9rateurs_de_la_liste_somelist_<somelist-editor@so?= =?ISO-8859-1?Q?me.dom.ain>?= N.B.: This result _is_ MIME-compliant, if it was _not_ a structured header field. Though original MIME::Words takes care of natural word separators (i.e. spaces), such separators are not necessarily obvious in non-word-spacing languages (CJK, Thai, ...). On TT2 templates, this will be avoided by attached (second) patch, but this may not be generalized solution.Another solution is to put the [% FILTER qencode %] at the right place in the TT2 files, example below : To: [% FILTER qencode %][%|loc(list.name)%]Moderators of list %1[%END%][%END%] <[% list.name %]-editor@[% list.host %]>I've fixed the mail_tt2 files according to this. I don't know if we still need your patch... I believe that the structured header fields in general need to be parsed/constructed by another functions not just only processing B/Q encodings.What other solution do you propose ? All modifications described in this message are compiled into the last attachment. An actual (maybe tentative) installation is running here: http://sympa.nezumi.nu/sympa I will reply to reminder of your message later. Thanks again.We'll have a close look at the patches your provided, thanks. BTW : In your previous message, you reported a problem related to duplicated header fields. I found out that the problem was related to our mail::mail_file() subroutine incorrectly detecting folded header fields. Here is the patch : http://sourcesup.cru.fr/cgi/viewcvs.cgi/sympa/src/mail.pm?r1=1.37&r2=1.38&makepatch=1&diff_format=u |
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/15/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 10/18/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/19/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 10/19/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/20/2006
- Re: [sympa-dev] Charset/encoding for e-mail message, Olivier Salaün - CRU, 10/27/2006
- Re: [sympa-dev] Charset/encoding for e-mail message, Hatuka*nezumi - IKEDA Soji, 10/26/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/20/2006
-
Re: [sympa-dev] language names (was Charset/encoding for e-mail message),
Olivier Salaün - CRU, 10/20/2006
-
Re: [sympa-translation] Re: [sympa-dev] language names (was Charset/encoding for e-mail message),
Hatuka*nezumi - IKEDA Soji, 10/26/2006
- Re: [sympa-translation] Re: [sympa-dev] language names (was Charset/encoding for e-mail message), Olivier Salaün - CRU, 10/27/2006
-
Re: [sympa-translation] Re: [sympa-dev] language names (was Charset/encoding for e-mail message),
Hatuka*nezumi - IKEDA Soji, 10/26/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 10/19/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Sylvain Amrani, 10/20/2006
- Re: [sympa-dev] Charset/encoding for e-mail message, Hatuka*nezumi - IKEDA Soji, 10/21/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/19/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 10/18/2006
Archive powered by MHonArc 2.6.19+.