en - Re: [sympa-users] Text digests with new digest template?

Subject: The mailing list for listmasters using Sympa

List archive

Re: [sympa-users] Text digests with new digest template?

From: Chris Hastie <address@concealed>
To: address@concealed
Subject: Re: [sympa-users] Text digests with new digest template?
Date: Sun, 8 Feb 2004 18:57:09 +0000

On Thu, 5 Feb 2004, Adam Bernstein <address@concealed> wrote
>
>I mean, Topica, Yahoo, Mailman, and just about every other listserver
>somehow manages to do text digests (with varying success, I grant),
>and it's absurd that Sympa can't.

Well I've been off work with a cold for a few days and in need of
something to keep my brain working whilst my body got on with shivering
and sweating simultaneously, so I thought I'd look at this a bit more.

Attached is a module, MyPlainDigest.pm (or at least it was attached - if
it isn't anymore I guess the list doesn't allow such things and I'll
have to find another way of distributing it!). It's intended to extend
Mail::Internet, adding a method that returns a plain text string of the
message body. Assuming you have your message in a Mail::Internet object
as $mail, as is the case in the sub send_msg_digest in List.pm, then

| use MyPlainDigest;
| $string = $mail->MyPlainDigest::plain_body_as_string;

will give you a plain text string suitable for use in a digest with CT
of text/plain, CTE of 8bit and charset of ISO-8859-1.

At it's most basic, drop the MyPlainDigest.pm into sympa/bin, add a

| use MyPlainDigest;

to the top of List.pm, and then a line like

| $msg->{'plain_body'} = $mail->MyPlainDigest::plain_body_as_string;

in an appropriate point in the send_msg_digest sub. This will make
[plain_body] available in your digest template.

MyPlainDigest will strip out any non-text attachments, replacing them
with an advice that they're gone. It converts text/plain parts with CTE
of quoted-printable or base64 to 8bit. If the original charset was not
us-ascii or ISO-8859-1 it replaces 8 bit characters (ie those over 127)
with a '?' and adds a warning that this has been done. If a message is
single part and text/html it makes a very crude attempt to represent it
as plain text. If a part is message/rfc822 it recurses through it
performing the same, having marked it as an attached message. If a line
consists of only 30 hyphens it replaces the first with a blank, as per
RFC 1153. And finally, it wraps any long lines to 80 characters. It
partially copes with UUencode too. I've thrown a selection of fairly
complex mime messages at it and got out what I was expecting, although
throwing badly formatted mime messages at it is something I'm less able
to do ;) It's probably only of use to those whose generally use
ISO8859-1 or something pretty close. It's also in need of some
internationalisation, particularly of the strings that replace non-text
attachments or warn of out of range character sets.

I've also attached my current patch to List.pm. Sorry, this involves
more changes than is strictly needed for your purposes. And a template
for a near RFC1153 plain text digest, based on that patch.

I could do with adding [to] and [cc] options for the template in order
to meet with RFC1153, and having looked at how my MUA renders these
digest a [reply-to] would help get replies to the right place. All that
would be straightforward. I don't know if Sympa keeps a record of digest
issue numbers, and if so how I can access these. Any pointers welcome.

What I'd really like to achieve is offering subscribers the option of
choosing their digest format, adding an extra reception preference of,
say, digest-plain. If any one has any pointers on getting an extra
reception preference into the code I'd be grateful.

Cheers
--
Chris Hastie
#############################################################
# MyPlainDigest #
# #
# MyPlainDigest provides an extension to the Mail::Internet #
# class that returns a plain text version of an email #
# message, suitable for use in plain text digests. #
# #
# SYNOPSIS: #
# (assuming an existing Mail::Internet object as $mail) #
# #
# use MyPlainDigest; #
# $string = $mail->MyPlainDigest::plain_body_as_string; #
# #
# WHAT DOES IT DO? #
# Most attachments are stripped out and replaced with a #
# note that they've been stripped. text/plain parts are #
# retained and encoding is 'levelled' to 8bit. A crude #
# attempt to convert single part text/html messages to plain#
# text is made. For text/plain parts that were not #
# originally in chasrset us-ascii or ISO-8859-1 all #
# characters above ascii 127 are replaced with '?' and a #
# warning added. Parts of type message/rfc822 are recursed #
# through in the same way, with brief headers included. Any #
# line consisting only of 30 hyphens has the first #
# character changed to space (see RFC 1153). Lines are #
# wrapped at 80 characters. #
# #
# BUGS #
# Probably dozens of them, and possibly dependant on your #
# versions of Perl and MIME-Tools (on which it is very #
# reliant). #
# Seems to ignore any text after a UUencoded attachment. #
# Probably horrible if ISO-8859-1 or something close isn't #
# you're usual charset. #
# #
# ACKNOWLEDGEMENTS #
# Written by an (c) Chris Hastie. _do_text_html largely #
# based on stripmime.pl by Alex Wetmore. Alex's original #
# boiler plate is at the end of this file. #
# My bits you can do with as you wish. #
# #
# Chris Hastie #
# #
#############################################################

package MyPlainDigest;

@ISA = qw(Mail::Internet);
use Mail::Internet;
use MIME::Parser;
use Text::Wrap;
use MIME::WordDecoder;

sub plain_body_as_string {

local $outstring = "";
my $mail = shift;

my $parser = new MIME::Parser;
$parser->extract_uuencode(1);
$parser->extract_nested_messages(1);

# Convert Mail::Internet object to a MIME::Entity:
my @lines = (@{$mail->header}, "\n", @{$mail->body});
my $topent = $parser->parse_data(\@lines);

#$topent->dump_skeleton; # for debugging only!

_do_toplevel ($topent);

# clean up after ourselves
$topent->purge;

$Text::Wrap::columns = 80;
return wrap ('','',$outstring);
}

sub _do_toplevel {

my $topent = shift;
if ($topent->effective_type =~ /^text\/plain/) {
_do_text_plain($topent);
}
elsif ($topent->effective_type =~ /^text\/html/) {
_do_text_html($topent);
}
elsif ($topent->effective_type =~ /^multipart\/.*/) {
_do_multipart ($topent);
}
elsif ($topent->effective_type =~ /^message\/rfc822/) {
_do_message ($topent);
}
else {
_do_other ($topent);
}
return 1;
}

sub _do_multipart {

my $topent = shift;

# cycle through each part and process accordingly
foreach $subent ($topent->parts) {
if ($subent->effective_type =~ /^text\/plain/) {
_do_text_plain($subent);
}
elsif ($subent->effective_type =~ /^multipart\/.*/) {
_do_multipart ($subent);
}
elsif ($subent->effective_type =~ /^text\/html/ &&
$topent->effective_type =~ /^multipart\/alternative/) {
# assume there's a text/plain alternive, so don't warn
# that the text/html part has been scrubbed
next;
}
elsif ($subent->effective_type =~ /^message\/rfc822/) {
_do_message ($subent);
}
else {
_do_other ($subent);
}
}
return 1;

}

sub _do_message {
my $topent = shift;
my $msgent = $topent->parts(0);
my $wdecode = new MIME::WordDecoder::ISO_8859 (1);

my $from = $msgent->head->get('From');
my $subject = $msgent->head->get('Subject');
my $date = $msgent->head->get('Date');
my $to = $msgent->head->get('To');
my $cc = $msgent->head->get('Cc');
unless ($from = $wdecode->decode($from)) {
$from = "???????";
}
unless ($to = $wdecode->decode($to)) {
$to = "";
}
unless ($cc = $wdecode->decode($cc)) {
$cc = "";
}
unless ($subject = $wdecode->decode($subject)) {
$subject = "";
}
chomp $from;
chomp $to;
chomp $cc;
chomp $subject;
chomp $date;

$outstring .= sprintf ("\n----------------Attached
Message----------------\n",'');
$outstring .= "Date: $date\n" if $date;
$outstring .= "From: $from\n" if $from;
$outstring .= "To: $to\n" if $to;
$outstring .= "Cc: $cc\n" if $cc;
$outstring .= "Subject: $subject\n" if $subject;
$outstring .= "\n";

_do_toplevel ($msgent);

$outstring .= sprintf ("----------------End of Attached
Message----------------\n\n", '');
return 1;
}

sub _do_text_plain {
my $entity = shift;

my $charset = $entity->head->mime_attr('content-type.charset');
my $thispart;

# this reads in the decoded body of the current entity
if ($io = $entity->open("r")) {
while (defined($_ = $io->getline)) {
chomp $_;
# if line is 30 hyphens, replace first character with space (RFC 1153)
if ($_ eq "------------------------------") {
s/^\-/ /;
}
$thispart .= $_ . "\n";
}
}

# scrub the 8bit characters (replace with '?') if the charset
# isn't us-ascii or iso-8859-1. Add a warning.
if ($charset) {
unless ($charset =~ /us-ascii/i or $charset =~ /iso-8859-1/i) {
$outstring .= sprintf ("[** Warning: Message part originally used
character set %s\nSome characters may be lost or incorrect.**]\n\n",
$charset);
$thispart =~ tr/\x00-\x7F/\?/c;
}
}

$outstring .= $thispart;
return 1;
}

sub _do_other {
# just add a note that attachment was stripped.
my $entity = shift;
$outstring .= sprintf ("\n[An attachment of type %s was included here]\n",
$entity->mime_type);
return 1;
}

sub _do_text_html {
# Very crudely represent HTML part as plain text.
# only done if top level CT is text/html (ie there
# is no text/plain part).
#
# Based on stripmime.pl by Alex Wetmore - see end of
# file for original boilerplate

my $entity = shift;

# This is not at all complete.
my %hHTMLCharConversion = (
"nbsp" => " ",
"gt" => ">",
"lt" => "<",
"amp" => "&",
"and" => "&&",
"asymp" => "=~",
"brvbar" => "|",
"bull" => "*",
"cong" => "=~",
"copy" => "(c)",
"crarr" => "<cr>",
"equiv" => "==",
"ge" => "<=",
"lang" => "<",
"lsquo" => "\'",
"mdash" => "--",
"minus" => "-",
"reg" => "(R)",
"sim" => "~",
"thinsp" => " ",
"shy" => "-",
"trade" => "(tm)",
"times" => "X",
"pound" => "£",
);

my %hHTMLTagConversion = (
"/div" => "\n",
"/p" => "\n",
"hr" =>
"\n________________________________________________________________\n",
"br" => "\n",
"li" => " * ",
"/tr" => "\n",
);

my $fInTag;
if ($io = $entity->open("r")) {
while (defined($_ = $io->getline)) {
my $szThisLine = $_;
chomp $szThisLine;
my $szBeforeTag;
my $szInTag;
my $iStartTag;
my $iEndTag;
my $szAfterTag;
my $szTagConversion = "";
my $szTag;
if ($fInTag) {
if (($iEndTag = index($szThisLine, ">")) != -1) {
$szThisLine = substr $szThisLine, $iEndTag+1;
$fInTag = 0;
} else {
$szThisLine = "";
}
}
while (($iStartTag = index($szThisLine, "<")) != -1) {
$szBeforeTag = substr $szThisLine, 0, $iStartTag;
$szInTag = substr $szThisLine, $iStartTag+1;
if (($iEndTag = index($szInTag, ">")) != -1) {
# we have the end of the tag on this line
$szTag = substr $szInTag, 0, $iEndTag;
$szAfterTag = substr $szInTag, $iEndTag+1;
} else {
# the tag extends to another line
$szTag = $szInTag;
$fInTag = 1;
$szAfterTag = "";
}
$szTag = lc $szTag;
($szTag, undef) = split (/\s/, $szTag, 2);
if (exists $hHTMLTagConversion{$szTag}) {
$szTagConversion = $hHTMLTagConversion{$szTag};
}
$szThisLine = "$szBeforeTag$szTagConversion$szAfterTag";
}

# now that we have removed all tags we go through and
# convert special characters back to their ascii
# equivelents
while ($szThisLine =~ /^(.*)&(.*?);(.*)$/) {
my $szBefore = $1;
my $szAscii;
my $szEntity = $2;
my $szAfter = $3;
if ($szEntity =~ /^#(\n\n\n)/) {
if ($1 < 255) {
# convert a numeric code
$szAscii = chr $1;
} else {
$szAscii = "{$szEntity}";
}
} elsif (exists $hHTMLCharConversion{$szEntity}) {
$szAscii = $hHTMLCharConversion{$szEntity};
} else {
# don't know how to convert
$szAscii = "{$szEntity}";
}
$szThisLine = "$szBefore$szAscii$szAfter";
}

# if line is 30 hyphens, replace first character with space (RFC 1153)
if ($szThisLine =~ /^-{30}(\n|$)/) {
$szThisLine =~ s/^\-/ /;
}

$outstring .= $szThisLine . "\n" unless ($szThisLine =~ /^\s*$/);
}
$io->close;
}

}
1;

#####################################################################
#
# Code in _do_text_html is based largely on stripmime.pl by Alex
# Wetmore. Below is the original boiler plate from that script.
#
########## BEGIN ALEX WETMORE'S COPYRIGHT NOTICE ####################
# This code is Copyright 2000-2001 Alex Wetmore.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above
# copyright notice, this list of conditions and the following
# disclaimer in the documentation and/or other materials provided with
# the distribution.
#
# 3. All advertising materials mentioning features or use of this
# software must display the following acknowledgement: This product
# includes software developed by Alex Wetmore.
#
# 4. The name of Alex Wetmore may not be used to endorse or promote
# products derived from this software without specific prior written
# permission.
#
# THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS''
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
# SUCH DAMAGE.
#
############# END ALEX WETMORE'S COPYRIGHT NOTICE ####################--- List.pm.orig Thu Feb 5 17:56:44 2004
+++ List.pm Sat Feb 7 21:51:39 2004
@@ -193,8 +193,10 @@
use Time::Local;
use MIME::Entity;
use MIME::Words;
+use MIME::WordDecoder;
use MIME::Parser;
use Message;
+use MyPlainDigest;

## Database and SQL statement handlers
my ($dbh, $sth, $db_connected, @sth_stack, $use_db, $include_lock_count);
@@ -2537,39 +2539,57 @@
splice @list_of_mail, 0, 1;

## Headers cleanup
- foreach $i (0 .. $#list_of_mail){
- my $mail = $list_of_mail[$i];
- my ($subject, $from);
-
- ## Subject cleanup
- if ($subject =
&MIME::Words::decode_mimewords($mail->head->get('Subject'))) {
- $mail->head->replace('Subject', $subject);
- }
-
- ## From cleanup
- if ($from = &MIME::Words::decode_mimewords($mail->head->get('From')))
{
- $mail->head->replace('From', $from);
- }
- }
+### Moved to keep messages intact ###
+# foreach $i (0 .. $#list_of_mail){
+# my $mail = $list_of_mail[$i];
+# my ($subject, $from);
+#
+# ## Subject cleanup
+# if ($subject =
&MIME::Words::decode_mimewords($mail->head->get('Subject'))) {
+# $mail->head->replace('Subject', $subject);
+# }
+#
+# ## From cleanup
+# if ($from = &MIME::Words::decode_mimewords($mail->head->get('From')))
{
+# $mail->head->replace('From', $from);
+# }
+# }

- my @topics;
- push @topics, sprintf(Msg(8, 13, "Table of content"));
- push @topics, sprintf(" :\n\n");
+ ### @topics appears to be redundant now??
+ #my @topics;
+ #push @topics, sprintf(Msg(8, 13, "Table of content"));
+ #push @topics, sprintf(" :\n\n");

## Digest index
foreach $i (0 .. $#list_of_mail){
my $mail = $list_of_mail[$i];
my $subject = $mail->head->get('Subject');
+
+ ## clean up subject for TOC
+ my $wdecode = new MIME::WordDecoder::ISO_8859 (1);
+ unless ($subject = $wdecode->decode($subject)) {
+ $subject = "???????";
+ }
chomp $subject;
+
+ ## clean up from for TOC
+ my $from = $mail->head->get('From');
+ unless ($from = $wdecode->decode($from)) {
+ $from = "???????";
+ }
+ chomp $from;
+
my $msg = {};
$msg->{'id'} = $i+1;
$msg->{'subject'} = $subject;
- $msg->{'from'} = $mail->head->get('From');
+ $msg->{'from'} = $from;
+ $msg->{'date'} = $mail->head->get('Date');
+ chomp $msg->{'date'};
$mail->tidy_body;
$mail->remove_sig;
$msg->{'full_msg'} = $mail->as_string;
$msg->{'body'} = $mail->body;
- chomp $msg->{'from'};
+ $msg->{'plain_body'} = $mail->MyPlainDigest::plain_body_as_string;
$msg->{'month'} = &POSIX::strftime("%Y-%m", localtime(time)); ##
Should be extracted from Date:
$msg->{'message_id'} = $mail->head->get('Message-Id');

@@ -2579,14 +2599,15 @@

push @{$param->{'msg_list'}}, $msg ;

- push @topics, sprintf ' ' x (2 - length($i)) . "%d. %s", $i+1,
$subject;
+ # push @topics, sprintf ' ' x (2 - length($i)) . "%d. %s", $i+1,
$subject;
}

## Prepare Digest
if (@tabrcpt) {

my @now = localtime(time);
- $param->{'date'} = sprintf "%s", POSIX::strftime("%a %b %e %H:%M:%S
%Y", @now);
+ $param->{'datetime'} = sprintf "%s", POSIX::strftime("%a, %d %b %Y
%H:%M:%S", @now);
+ $param->{'date'} = sprintf "%s", POSIX::strftime("%a, %d %b %Y",
@now);

# ## Add a footer
# my $new_msg = $self->add_parts($msg);
From: [from]
To: [to]
Reply-to: [reply]
Subject: [list->name] Digest [date]
Content-Type: text/plain; charset=iso-8859-1;
Content-transfer-encoding: 8bit

[list->name] Digest [date]

Table of contents:

[FOREACH m IN msg_list]
[m->id]. [m->subject] - [m->from]
[END]

----------------------------------------------------------------------

[FOREACH m IN msg_list]
Date: [m->date]
From: [m->from]
Subject: [m->subject]

[m->plain_body]

------------------------------

[END]
End of [list->name] Digest [date]
*********************************************

Text digests with new digest template?, Adam Bernstein, 02/05/2004
- Re: [sympa-users] Text digests with new digest template?, Chris Hastie, 02/05/2004
  - RE: [sympa-users] Text digests with new digest template?, Adam Bernstein, 02/06/2004
    - Re: [sympa-users] Text digests with new digest template?, Chris Hastie, 02/06/2004
      - Re: [sympa-users] Text digests with new digest template?, Walter H. Hopgood, 02/10/2004
        
        Re: [sympa-users] Text digests with new digest template?, Chris Hastie, 02/11/2004
        
        Re: [sympa-users] Text digests with new digest template?, Olivier Salaun - CRU, 02/16/2004
    - Re: [sympa-users] Text digests with new digest template?, Chris Hastie, 02/08/2004

List archive

Re: [sympa-users] Text digests with new digest template?