Skip to Content.
Sympa Menu

en - Re: [sympa-users] RSS and character encoding

Subject: The mailing list for listmasters using Sympa

List archive

Chronological Thread  
  • From: Olivier Salaün - CRU <address@concealed>
  • To: Anders Lund <address@concealed>
  • Cc: address@concealed
  • Subject: Re: [sympa-users] RSS and character encoding
  • Date: Wed, 02 Nov 2005 13:31:47 +0100

Well encoding is not an easy topic to cope with ; sympa is not too bad at it but it could be better :

The encoding used for the RSS XML document is determined by Sympa with the following order of priority :
user preffered language > list language > virtual robot language > site language

Current versions of Sympa don't use the "web_recode_to" parameter for the RSS feed.
I've changed this behavior in the CVS tree, here is the patch : http://sourcesup.cru.fr/cgi/viewcvs.cgi/sympa/wwsympa/wwsympa.fcgi?_only_with_tag_=sympa-5_1-branch&r2=1.560.2.11&r1=1.560.2.10&makepatch=1&diff_format=u

But this patch will not really fix your problem because "web_recode_to" only applies to translatable strings, not to other data such as :
  • messages data (subject, from and message body)
  • list config data (including the list subject)
  • anything that was stored on the file system...
This is related to a more general problem we've got to address : we need to keep the track of the encoding for every string sympa deels with and we should provide a way to recode it to another charset. We're also lacking a good perl module that would be able to recode strings properly.

To summarize the current situation :
Let's consider the RSS feed of latest messages in a list ; it will have correct encoding if :
  1. all messages use the same encoding
  2. the language configured for the list uses this same encoding

Otherwise you might have some broken encoded characters in the RSS feed...

Anders Lund wrote:
A little addition to what I said. The message sent from Thunderbird was with character encoding ISO-8859-1. If I send using UTF-8 I get the same result as with posting through the web GUI. I've sent 3 messages now:
 1) through the web interface
 2) with Thunderbird and ISO-8859-1
 3) with Thunderbird and UTF-8
The resulting RSS feed has

<?xml version="1.0" encoding="us-ascii"?>

and

<language>en_US</language>

and includes UTF-8 characters from message 1 and 3, but ISO-8859-1 from message 2. So my question is if it is possible to change this in a way that should give correct result in all RSS readers?
  
I have a question regarding how to set up Sympa to give RSS feeds with correct character encodings:

I've set up Sympa to have in sympa.conf:

lang    en_US
supported_lang  de,en_US,fr,it,nl,fi,sv
web_recode_to   UTF-8

I have added to mhonarc-ressources.tt2:

<TextEncode>
utf-8; MHonArc::UTF8::to_utf8; MHonArc/UTF8.pm
</TextEncode>

So, when posting something that includes Norwegian characters I get into trouble:

When I post to a list using the "post" interface in the web GUI I get an RSS feed which includes non ascii characters, but the feed also says:

<language>en_US</language>

Different RSS readers display this in different ways. Akregator in KDE on Linux present this in an OK way, but live bookmarks in Firefox or RSS reader in Thunderbird don't.

If I post to the same list using for example Thunderbird I get OK results in Thunderbird/Firefox, but not in Akregator.

The archives shown in the Sympa web GUI seem OK, so this is not perhaps a big issue, but I would like to do something about it if possible. Any hints on what to do?
    

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature




Archive powered by MHonArc 2.6.19+.

Top of Page