Subject: Developers of Sympa
List archive
Re: [sympa-dev] Charset/encoding for e-mail message
- From: Olivier Salaün - CRU <address@concealed>
- To: Hatuka*nezumi - IKEDA Soji <address@concealed>
- Cc: address@concealed, address@concealed
- Subject: Re: [sympa-dev] Charset/encoding for e-mail message
- Date: Wed, 27 Sep 2006 16:55:16 +0200
Hi,
Hatuka*nezumi - IKEDA Soji wrote:
Currently, there are following problems on e-mail encoding by Sympa ---Do you mean that we should move the default encoding from UTF-8 to ISO-2022-JP ?
o On some locales, specific charset & header encoding & body
transfer-encoding are de facto standard for e-mail messages.
--- For example on ja_JP locale, ISO-2022-JP & BASE64 & 7BIT are
commonly used. UTF-8 / QUOTED-PRINTABLE are very less common.
Note that this encoding is only used for "service messages" sent by Sympa (welcome message, error report,...)
Are there any problems to read UTF-8 encoded emails with standard mail clients in Japan ?
o On the other hand, various charsets are used for Web interface.We've fixed this problem in the version to come (current development version) : all web pages are now recoded to UTF-8, so are web archives. We don't have problem anymore with mixtures of encodings in web pages.
--- For ja_JP: EUC-JP, SHIFT_JIS, UTF-8 and also ISO-2022-JP
are used (from coding view, since ISO-2022-JP prevents
HTML-entity escape, 8-bit schema are preferred).
Along with this new version we've added a new sympa.conf parameter for the listmaster to declare what encoding is used on the filesystem.
o Also for other multibyte / non-Latin charsets, BASE64 (B)I assume you refer to "service messages" that Sympa sends because Sympa does not alter messages sent to mailing lists (expect when custom_subject is used).
encoding scheme is preferred or often de facto.
Should Sympa Base64-encode both message body and header fields ?
What kind of problems happen if using Quoted-Printable ?
o MIME::Words::encode_mimewords() breaks multibyte characterSince you've developed an alternative to MIME::Words, we'd much prefer that you make it a separate CPAN module that Sympa would use instead of the MIME::Words module. Actually we gave the same answer to Peter Szabo who sent us a similar proposition (check https://www.szszi.hu/wiki/Sympa4Patches). He has decided to build a new CPAN module called MIME::AltWords that would fix all the unicode problems of MIME::Words. I also suggested him the option of extending Encode::MIME::Header (see http://search.cpan.org/~dankogai/Encode-2.18/lib/Encode/MIME/Header.pm).
boundaries in encoded headers. cf.:
http://rt.cpan.org/Public/Bug/Display.html?id=13027
By attached patch I tried to solve these problems. Though this
patch can be applied to current branch, if my attempt agree to
policy of Sympa development, I'll continue working on dev branch.
Obviously you did similar works Peter and yourself.
We'd prefer having the best of both your codes ;-)
Why not work together on this new CPAN module ?
Peter is Cced.
Notes on attached patch ---Could you send us a lighter version of your patch without the new versions of xx_mimewords() subroutine (cf above paragraph). Thanks and sorry for making you more work.
- On locales where e-mail messages require charset conversion,
gettext(_charset_) should return a locale-targetted charset.
For example for ja_JP above, this might be wanted to be EUC-JP
(note that filesystem encoding may differ from this charset).
- Preferred encoding scheme for UTF-8 on header field is vary by
language contexts. Shorter one will be selected.
- Minimalism: texts not containing non-ASCII should be specified
as US-ASCII / 7BIT.
* This patch is imcomplete. Message bodies aren't converted using
charset/encoding for e-mail: how may I handle message bodiess
generated from tt2?
BTW this must be a FAQ: How should "Sympa" be pronounced, whetherWe pronounce it the French way but most non French people pronounce it the English way.
"sympa(-thetic)" in English, "sympa(-thique)" en français, another
or ...everything?
What would be the Japanese way ? Please send us an MP3...
We might start a collection of MP3 on sympa.org, pronounced in each language.
-
[sympa-dev] Charset/encoding for e-mail message,
Hatuka*nezumi - IKEDA Soji, 09/24/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 09/27/2006
- Re: [sympa-dev] Charset/encoding for e-mail message, Hatuka*nezumi - IKEDA Soji, 09/28/2006
- Re: Pronounciation of "Sympa" was Re: [sympa-dev] Charset/encoding for e-mail message, Hatuka*nezumi - IKEDA Soji, 09/28/2006
-
Re: [sympa-dev] Charset/encoding for e-mail message,
Olivier Salaün - CRU, 09/27/2006
Archive powered by MHonArc 2.6.19+.