Skip to Content.
Sympa Menu

en - Re: [sympa-users] Odd characters in archived messages

Subject: The mailing list for listmasters using Sympa

List archive

Chronological Thread  
  • From: Mark Valiukas <address@concealed>
  • To: Adam Bernstein <address@concealed>
  • Cc: address@concealed
  • Subject: Re: [sympa-users] Odd characters in archived messages
  • Date: Tue, 26 Apr 2005 11:32:53 +1000

Adam Bernstein wrote:
I'm very happy to see someone else is seeing this problem -- we've
seen it manifest in several slightly different ways, and often the
subscribers see the altered characters in their received messages,
not just in the Web archives (and sometimes not at all in the Web
archives).


For most of our lists, we don't send out the messages immediately -
we send a summary out overnight, using some html summary formatting
enhancements we developed here. We have had some issues with formatting
of some messages going through the immediate-send lists, but they
mostly looked like they were generated through Outlook and sent as
8-bit messages... our main staff mailserver won't accept 8-bit
messages, so the internal relay converts them to quoted-printable.
Funny thing is, sometimes messages sent to a to: or cc: address
as well as via the list come through correctly for the direct
one but mangled after going via sympa. 'Don't use Outlook" seems
to be the best advice we can offer them at this time.



FWIW, I believe it usually, or possibly always, happens
when the sender of the message is on a Mac.

I've seen odd things happening with email out of a mac,
even for direct delivery.



Actually there's nothing Mhonarc can do with mis-encoding problems. There's no way it can find out these caracters are supposed to be utf-8 encoded. This issue has to be fixed at the Operating system level : it should properly handle charset mapping while doing copy/paste.


I accept that as a reasonable diagnosis, but this is still a problem
that seems to come up only when people are composing newsletters and
sending them to lists. So, as far as they're concerned it's a Sympa
problem, so we should do our best to fix it.

The webpage Sebastiaan referred to is probably going to be quite
helpful in dealing with this issue... now that I've pinned down a
set of circumstances in which this occurs, I can tell the senders
of affected messages how to avoid it happening in future. The user
whose message prompted my message to the list is happy with the
work-around of being careful about what characters are in the messages for now.




If Mark's diagnosis is correct, I can imagine a filter on all incoming
messages that would look at the character encoding header, then scan
for any 8-bit characters,

I was thinking about something along those lines... maybe "flattening"
to utf and converting to 7-bit-legal and html-ised characters where
appropriate for known dodgy characters (like some of the fancy quotation
marks). However, not knowing a lot about international character encoding
issues, I don't know if this would cause other issues for languages other
than english or for some non-Windows character encodings - knowing which
messages to modify and which to leave alone might be difficult. Coming up
with a quick hack to placate my users might be relatively easy to implement,
but I'd then have to maintain it - I've already got one set of site-specific
mods for Sympa to maintain for my html summary stuff, and would rather not
add more work to that list.

Mark.





Archive powered by MHonArc 2.6.19+.

Top of Page