Skip to Content.
Sympa Menu

devel - [sympa-dev] Unicode vs. UTF-8

Subject: Developers of Sympa

List archive

Chronological Thread  
  • From: Hatuka*nezumi - IKEDA Soji <address@concealed>
  • To: address@concealed
  • Subject: [sympa-dev] Unicode vs. UTF-8
  • Date: Sun, 12 Nov 2006 22:14:33 +0900

Hi Sympa-dev,


I have been playing with Sympa 5.3 test release. Almost all
things seem to go nice. But following phenomenon is reproduced
again:

- When customized template (including UTF-8 data beyond ISO-8859-1
range) is installed, either under $EXPL_DIR or under $DATADIR,
they are decoded/encoded as ISO-8859-1 text.

This is caused because some paths of processing in Sympa won't handle
Unicode string properly; they occasionally strip off utf8 flags of
data (in the case above that path is Template::Parser. MIME::Parser
also is known to strip utf8 flags off).

To avoid this problem, there are several options:

(a) Use undocumented ``UTF-8 BOM'' feature of Template::Provider (as
of Template-toolkit 2.14):
http://www.template-toolkit.org/pipermail/templates/2004-June/006270.html

(b) Force templates' encodings to be Unicode, guessing input is
UTF-8 or Unicode. For exapmle:
http://search.cpan.org/perldoc?Template::Provider::Encoding

(c) Switch Sympa's internal encoding from Unicode to UTF-8 (byte
string).

I suppose the last option is better:

- ``UTF-8 BOM'' is confusing for those wish to create/edit template
text: Many text editors silently remove it (essentially, BOM is
not allowed by official UTF-8 feature).

- Former two options (a) and (b) solve possible problems only by
Template-toolkit.

- Last option (c) will reduce redundant internal encoding/decoding
tasks. Decoding to UTF-8 will be required only at the time of
reading data; No encodings will be required for Web output.

I'd like to listen developers' opinion on this issue.


--- nezumi



Archive powered by MHonArc 2.6.19+.

Top of Page