en - Re: [sympa-users] Sympa outgoing queue draining too slowly

Subject: The mailing list for listmasters using Sympa

List archive

Re: [sympa-users] Sympa outgoing queue draining too slowly

From: Matt Taggart <address@concealed>
To: Steve Shipway <address@concealed>
Cc: "address@concealed" <address@concealed>
Subject: Re: [sympa-users] Sympa outgoing queue draining too slowly
Date: Mon, 24 Feb 2014 21:41:05 -0800

Hi Steve,

Steve Shipway writes:
> We have a situation on our Sympa server where the outgoing queue in
> /var/spool/sympa/outgoing appears to be taking a very long time to drain.
>
> Although the incoming queue in /var/spool/sympa/msg rarely even gets above
> 10 and never gets over 20, the outgoing queue can hit 500 or more before
> slowly draining at about 200 msg/hr. The SMTP queue on the host in question
> never goes over 1 and the mail smarthost has stacks of capacity (we have
> Nagios and MRTG monitoring of the queue lengths - let me know if anyone
> wants the relevant plugins to achieve this)
>
> In sympa.conf, we have maxsmtp=200 , nrcpt=50 and avg=10. Most email
> addresses will be in one of two domains.

Just for reference lists.riseup.net currently uses

maxsmtp=100 nrcpt=200 avg=10

We use postfix, so I will tell you what we do there but if you use
something else there is probably an equivalant.

postfix's default_process_limit is 100, so we turn that up to 175 so that
sympa injection (via maildrop service) can't totally saturate it (btw the
smtpd service gets it's own set of 175 for dealing with inbound, bounce
processing, etc).

postfix's maildrop_recipient_limit (the number of recipients sympa can use
per message when injecting) defaults to $default_recipient_limit and
default_recipient_limit = 20000 so you won't hit limits on injection. After
injection when postfix is sending to remote sites the default limit
(default_destination_recipient_limit) is 50. Postfix will break messages up
to deal with that (and your problem seems to be getting it out of sympa and
into the MTA, not out of the MTA)

If you also run the only servers receiving mail from sympa, you can tune
both ends to speed things up. Also even if you do communicate with other
sites, you can still setup dedicated transports in postfix/master.cf to use
special tuned settings when talking to servers you control.

As a first step I'd try turning up nrcpt so that you are sending to more
recipients per message. However ISTR that you have lots of small lists, so
maybe it won't help you that much if each list message is only going to <50
recipients anyway... Riseup also has lots of small lists, but what really
clogs things up for us is when our big lists (>30k) send. So maybe analyze
the queue when you are having problems to see what is sending.

Are you running bulk.pl's on multiple servers yet? We want to move to that
but it will take some work as some of our infrastructure is dependent on
the mail being injected on the list server initially. (right now everything
gets injected on the list server and then relayed to a few other servers to
go out)

> Does anyone have an idea why the outgoing queue might take so long to clear?
> I would have thought it could drain faster that this - and that if anything
> messages would end up in the sendmail queue. Do we have something
> misconfigured? Is there something that might be locking the outbound queue?

The other thing to consider is disk I/O. Where are the following located?
* sympa spools
* MTA spools
* sympa logs
* MTA logs
* database
* expl/
* arc/

When a list message arrives, all of those things are going to get disk I/O
to at the same time. If several of the above are on the same disk they are
going to fight. In particular if it's a platter hard drive where activity
on multiple parts of the disks would require the heads to seek back and
forth rapidly in order to keep up it will _really_ have problems.

On the current lists.riseup.net we have split things up by adding a pair of
SSDs (RAID1) to move the spools to. They don't need to be big SSDs because
the spools don't use that much space, we're using Intel 40gb ones that cost
~$90USD each a few years ago (now for the same price you can get much
bigger). They are way faster than HDDs for these types of operations and
don't have the seek penalty when you do have to access different parts of
the disk.

For the next lists.riseup.net server that we're setting up now, we are
splitting things up even further, using 4 pairs of SSDs in RAID1 :)

--
Matt Taggart
address@concealed

[sympa-users] Sympa outgoing queue draining too slowly, Steve Shipway, 02/25/2014
- Re: [sympa-users] Sympa outgoing queue draining too slowly, Matt Taggart, 02/25/2014
  - RE: [sympa-users] Sympa outgoing queue draining too slowly, Steve Shipway, 02/25/2014
- Re: [sympa-users] Sympa outgoing queue draining too slowly, Etienne MELEARD, 02/25/2014

List archive

Re: [sympa-users] Sympa outgoing queue draining too slowly