Skip to Content.
Sympa Menu

devel - [sympa-dev] Charset/encoding for e-mail message

Subject: Developers of Sympa

List archive

Chronological Thread  
  • From: Hatuka*nezumi - IKEDA Soji <address@concealed>
  • To: address@concealed
  • Subject: [sympa-dev] Charset/encoding for e-mail message
  • Date: Sun, 24 Sep 2006 12:43:54 +0900

Hi sympa-dev.


Currently, there are following problems on e-mail encoding by
Sympa ---

o On some locales, specific charset & header encoding & body
transfer-encoding are de facto standard for e-mail messages.

--- For example on ja_JP locale, ISO-2022-JP & BASE64 & 7BIT are
commonly used. UTF-8 / QUOTED-PRINTABLE are very less common.

o On the other hand, various charsets are used for Web interface.

--- For ja_JP: EUC-JP, SHIFT_JIS, UTF-8 and also ISO-2022-JP
are used (from coding view, since ISO-2022-JP prevents
HTML-entity escape, 8-bit schema are preferred).

o Also for other multibyte / non-Latin charsets, BASE64 (B)
encoding scheme is preferred or often de facto.

o MIME::Words::encode_mimewords() breaks multibyte character
boundaries in encoded headers. cf.:
http://rt.cpan.org/Public/Bug/Display.html?id=13027

By attached patch I tried to solve these problems. Though this
patch can be applied to current branch, if my attempt agree to
policy of Sympa development, I'll continue working on dev branch.


Notes on attached patch ---

- On locales where e-mail messages require charset conversion,
gettext(_charset_) should return a locale-targetted charset.
For example for ja_JP above, this might be wanted to be EUC-JP
(note that filesystem encoding may differ from this charset).

- Preferred encoding scheme for UTF-8 on header field is vary by
language contexts. Shorter one will be selected.

- Minimalism: texts not containing non-ASCII should be specified
as US-ASCII / 7BIT.

* This patch is imcomplete. Message bodies aren't converted using
charset/encoding for e-mail: how may I handle message bodiess
generated from tt2?

*

BTW this must be a FAQ: How should "Sympa" be pronounced, whether
"sympa(-thetic)" in English, "sympa(-thique)" en français, another
or ...everything?


--
Hatuka*nezumi - IKEDA Soji <address@concealed>

Attachment: sympa-release_5_2_branch-mail_encoding.patch.gz
Description: GNU Zip compressed data




Archive powered by MHonArc 2.6.19+.

Top of Page