Skip to Content.
Sympa Menu

en - RE: [sympa-users] bulk.pl is frequently crashing silently

Subject: The mailing list for listmasters using Sympa

List archive

Chronological Thread  
  • From: Derek Lofstrom <address@concealed>
  • To: micah <address@concealed>, "address@concealed" <address@concealed>
  • Subject: RE: [sympa-users] bulk.pl is frequently crashing silently
  • Date: Fri, 28 Aug 2015 19:33:33 +0000

Thanks. It's definitely not oom (as one person suggested); there's plenty of
memory available and I don't see any oom errors logged in /var/log/messages
dating several weeks back.

As far as queued message: perhaps. Seems like if that were the case though,
failures would be more predictable (as you would assume the service to
continually fail whenever the message is re-processed). At the time of
failure, it's usually in the middle of processing an email, gets partway
through processing the list, then just craps out. After restarting the
service, it continues and finishes processing the message(s) just fine and
the bulk spool empties.

I did notice however several messages stuck in the digest queue associated
with deleted lists. I don't know if that would cause issues, but I dumped
those the other day since they serve no purpose other than filling up
sympa.log. I also set up a cron job to run every minute which queries the
bulk.pl process and restarts sympa if it sees that it's not running. Not
ideal (in the sense that it doesn't prevent the problem), but a suitable
workaround so we aren't finding out about it after someone reports not
receiving an email several days after it was sent.

-----Original Message-----
From: micah [mailto:address@concealed]
Sent: Wednesday, August 26, 2015 11:16 AM
To: Derek Lofstrom; address@concealed
Subject: Re: [sympa-users] bulk.pl is frequently crashing silently

Derek Lofstrom <address@concealed> writes:

> We are running Sympa 6.2.3-1.20150717.RHEL6 on CentOS 6.7 with all the most
> recent updates. Several times over the past month, we've been experiencing
> an issue where bulk.pl silently dies without any warning, and bulk mail
> processing stops, causing delivery delays that are often not discovered
> until several hours (or in some cases days) later. We first experienced
> this after upgrading to v 6.2.1 and attempted upgrading to 6.2.3 to see if
> the issue was resolved (which it has not been). We are a very moderate
> mailing list user organization; as of this date, we only have 12 or so
> lists that get distributed to once a day or less, so it's not like we are
> processing a huge workload. But the lists that are being used are extremely
> important to our institution.

...

> Has anyone experienced this? I having difficulty finding information on the
> web or any user forums and do not see an existing bug logged for this
> specific issue (I found one for 2010 pertaining to issues relating to
> forking, but that it was apparently resolved in 6.2).

We've experienced this in the past - it usually is the result of one specific
message in the bulk spool, typically a message with some interesting
encoding... once we remove that message (and restart), things stop dying. It
isn't always easy to find the message causing the problem, sometimes we just
do a binary search by placing half the entries in the bulk directory in a
temporary directory, starting things up, and seeing if it crashes. If it
doesn't, then feed in another half from the temporary directory, until we
narrow it down.

We haven't had this happen for some time now, but we did have a particular
list that kept causing this, so when things died we would guess that it was
that bulk spool entry and move it out of the way and restart and things would
work 99% of the time.

We setup nagios to alert us when bulk.pl dies, so we notice it, otherwise
days would go by without any processing.





Archive powered by MHonArc 2.6.19+.

Top of Page