Skip to Content.
Sympa Menu

en - Re: [sympa-users] Digest bugs

Subject: The mailing list for listmasters using Sympa

List archive

Chronological Thread  
  • From: Olivier Salaun - CRU <address@concealed>
  • To: Chris Hastie <address@concealed>
  • Cc: address@concealed
  • Subject: Re: [sympa-users] Digest bugs
  • Date: Mon, 16 Feb 2004 12:00:31 +0100

Hi Chris,

Chris Hastie wrote:

Digging around in sub send_msg_digest this morning I noticed a slight bug, and testing my theory that this would be a problem encountered a second one.

The loop commented 'Headers cleanup' involves changing the Subject and Form headers of messages to be digested. This, presumably, is so that their decoded form can be included in the TOC. Because it happens ahead of the line

| $msg->{'full_msg'} = $mail->as_string;

the message/rfc822 parts in the digest include the MODIFIED headers, not the original. In some circumstances, this looked a real mess, eg (I'm presuming MUA's won't decode this if it's not in a header):
[...]
If $mail->as_string could be called before any manipulation of headers (or anything else) has been done the messages would be more faithful to their original form. The TOC would still have an illegible title in it, but at least the message itself would be comprehensible.

You're absolutely right.
We've patched the current CVS version that no more alters messages header fields :
http://listes.cru.fr/cgi-bin/cvsweb.cgi/sympa/src/List.pm.diff?r1=1.416&r2=1.417

The other problem that I noticed when testing this is that the default digest template declares the CTE for the TOC part to be 7bit. Clearly, if the subjects given have had their mime words decoded it is quite possible that 8 bit characters will appear in this part. There is also going to be an issue with the declared character set in this part, but I suspect we may just have to live with that unless we do translate the whole lot to UTF-8. It's worth looking at MIME::WordDecoder for some more flexible ways of decoding and specifying ways of handling unusual characters and character sets.

There's nothing we can do unless everything is UTF8-encoded.
We are assuming 7bit for english digests whereas French one is 8bit iso-8859-1... All digest templates have different charsets defined.

--
Olivier Salaun
Comite Reseau des Universites





Archive powered by MHonArc 2.6.19+.

Top of Page