Subject: Developers of Sympa
List archive
- From: Hatuka*nezumi - IKEDA Soji <address@concealed>
- To: address@concealed
- Subject: [sympa-dev] Unicode vs. UTF-8
- Date: Sun, 12 Nov 2006 22:14:33 +0900
Hi Sympa-dev,
I have been playing with Sympa 5.3 test release. Almost all
things seem to go nice. But following phenomenon is reproduced
again:
- When customized template (including UTF-8 data beyond ISO-8859-1
range) is installed, either under $EXPL_DIR or under $DATADIR,
they are decoded/encoded as ISO-8859-1 text.
This is caused because some paths of processing in Sympa won't handle
Unicode string properly; they occasionally strip off utf8 flags of
data (in the case above that path is Template::Parser. MIME::Parser
also is known to strip utf8 flags off).
To avoid this problem, there are several options:
(a) Use undocumented ``UTF-8 BOM'' feature of Template::Provider (as
of Template-toolkit 2.14):
http://www.template-toolkit.org/pipermail/templates/2004-June/006270.html
(b) Force templates' encodings to be Unicode, guessing input is
UTF-8 or Unicode. For exapmle:
http://search.cpan.org/perldoc?Template::Provider::Encoding
(c) Switch Sympa's internal encoding from Unicode to UTF-8 (byte
string).
I suppose the last option is better:
- ``UTF-8 BOM'' is confusing for those wish to create/edit template
text: Many text editors silently remove it (essentially, BOM is
not allowed by official UTF-8 feature).
- Former two options (a) and (b) solve possible problems only by
Template-toolkit.
- Last option (c) will reduce redundant internal encoding/decoding
tasks. Decoding to UTF-8 will be required only at the time of
reading data; No encodings will be required for Web output.
I'd like to listen developers' opinion on this issue.
--- nezumi
-
[sympa-dev] Unicode vs. UTF-8,
Hatuka*nezumi - IKEDA Soji, 11/12/2006
-
[sympa-dev] Re: Unicode vs. UTF-8,
Olivier Salaün - CRU, 11/13/2006
-
[sympa-dev] Re: Unicode vs. UTF-8,
Hatuka*nezumi - IKEDA Soji, 11/14/2006
-
[sympa-dev] Re: Unicode vs. UTF-8,
Olivier Salaün - CRU, 11/14/2006
- [sympa-dev] Re: Unicode vs. UTF-8, Hatuka*nezumi - IKEDA Soji, 11/15/2006
- [sympa-dev] Re: Unicode vs. UTF-8, Hatuka*nezumi - IKEDA Soji, 11/18/2006
-
[sympa-dev] Re: Unicode vs. UTF-8,
Olivier Salaün - CRU, 11/14/2006
-
[sympa-dev] Re: Unicode vs. UTF-8,
Hatuka*nezumi - IKEDA Soji, 11/14/2006
-
[sympa-dev] Re: Unicode vs. UTF-8,
Olivier Salaün - CRU, 11/13/2006
Archive powered by MHonArc 2.6.19+.