Subject: Developers of Sympa
List archive
Re: [sympa-dev] Charset/encoding for e-mail message
- From: Olivier Salaün - CRU <address@concealed>
- To: Hatuka*nezumi - IKEDA Soji <address@concealed>
- Cc: address@concealed
- Subject: Re: [sympa-dev] Charset/encoding for e-mail message
- Date: Wed, 18 Oct 2006 12:09:36 +0200
Hi, Hatuka*nezumi - IKEDA Soji wrote: By suggestions, I released new Perl modules: MIME::Charset and MIME::EncWords (similarly to Peter's work, this is another alternative for MIME::Words, but focuses on supporting multibyte charsets): http://search.cpan.org/perldoc/MIME%3A%3ACharset http://search.cpan.org/perldoc/MIME%3A%3AEncWords Reviced patch (attached) is based on these modules. Changes made are ---We have applied, tested and eventually committed your patch in the dev CVS branch. Below are a few comments, questions : - encode_mimewords()/decode_mimewords() of MIME::Words are repleaced by alternative ones of MIME::EncWords (but modification is imcomplete, as described later). o encode_mimewords() automatically chooses appropriate encoding (B, Q or unencoded) by enhanced ``Encoding="A"'' option.The 'A' (Standing for Automatic I suppose) is definitely a good idea since the choice of using either Base64 or QuotedPrinteable highly depends on the language used. The 'A' option is not documented in your module CPAN pages though. - src/mail.pm:reformat_message(): reformat outgoing service messages (along with charset conversion for Japanese messages).Does reformat_message() also alter message bodies ? There might be issues with S:MIME signed messages that should not be altered, or the signature is broken... I had to replace calls to croak() with proper error handling ; that were not acceptable within a daemon. * To support charset conversion for Japanese messages, _charset_ of po/ja.po must be changed from UTF-8 to EUC-JP (how about Rosetta stuff?).You're right, the PO file should be trans coded to EUC-JP. Rosetta is one option to translate Sympa GUI, other options include using other software such as Kbabel. And actually we think about providing a translation service (vs software) on our sympa.org website. It would be based on a home-made software or on Pootle. Anyway I had a try trans coding ja.po from utf-8 to euc-jp using iconv, but without success : % iconv -f utf-8 -t eucJP -o /tmp/ja.po po/ja.poThe problem I got were at the PO catalog compiling time : msgfmt -o ja.mo ja.poTherefore I aborted the process. If you have a clue of what the problem might be... - src/Message.pm:new(), src/List.pm:distribute_msg(): custom_subject processing was improved. It will handle mixed-charset situations better.I noticed that : it's great. BTW it looks like the way you call MIME::EncWords::encode_mimewords() is not documented on CPAN, ie : MIME::EncWords::encode_mimewords([ Problem --- In addtion to patch described above, I tried replacing MIME::Words::decode_mimewords(STRING) with MIME::EncWords::decode_mimewords(STRING, Charset=>CHARSET). But I cannot clarify what CHARSET may be used to feed decoded data to TT2 templates. By any charset (including _UNICODE_), TT2 seems to break fed data. See bug #1059.Can you provide a bit more explanations regarding this problem ? Other known bugs --- - When address headers of service messages include non-ASCII characters, headers will be encoded maliciously. It is advisable that structured headers (address fields, parenthesized comments, parameters,...) will be handled separately by some appropriate functions.I'm afraid I don't understand what you mean ? - Headers for some service messages (at least MIME-Version, Content-Type and Content-Transfer-Encoding) are duplicated. This doesn't seem to be caused by this patch.I can't manage to reproduce this problem on our server. Please fill out a bug report with enough information for us to reproduce. Thank you for this great contribution. I'm sure users of multibyte char sets will appreciate :-) |
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/15/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 10/18/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/19/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 10/19/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/20/2006
- Re: [sympa-dev] Charset/encoding for e-mail message, Olivier Salaün - CRU, 10/27/2006
- Re: [sympa-dev] Charset/encoding for e-mail message, Hatuka*nezumi - IKEDA Soji, 10/26/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/20/2006
-
Re: [sympa-dev] language names (was Charset/encoding for e-mail message),
Olivier Salaün - CRU, 10/20/2006
-
Re: [sympa-translation] Re: [sympa-dev] language names (was Charset/encoding for e-mail message),
Hatuka*nezumi - IKEDA Soji, 10/26/2006
- Re: [sympa-translation] Re: [sympa-dev] language names (was Charset/encoding for e-mail message), Olivier Salaün - CRU, 10/27/2006
-
Re: [sympa-translation] Re: [sympa-dev] language names (was Charset/encoding for e-mail message),
Hatuka*nezumi - IKEDA Soji, 10/26/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 10/19/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Sylvain Amrani, 10/20/2006
- Re: [sympa-dev] Charset/encoding for e-mail message, Hatuka*nezumi - IKEDA Soji, 10/21/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 10/19/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 10/18/2006
Archive powered by MHonArc 2.6.19+.