Skip to Content.
Sympa Menu

en - Re: [en@sympa] Can't import old messages from Mailman into Sympa archive (out of memory? Not sure)

Subject: The mailing list for listmasters using Sympa

List archive

Chronological Thread  
  • From: Matt Taggart <address@concealed>
  • To: address@concealed
  • Subject: Re: [en@sympa] Can't import old messages from Mailman into Sympa archive (out of memory? Not sure)
  • Date: Mon, 29 Apr 2024 12:22:51 -0700

On 4/29/24 11:21, Steve Sobol - NTF wrote:
Weird problem on Debian 11 - Sympa was installed from packages from the
official Debian repo:

I have a mailing list hosted on our Sympa server. From mid-2022 until about
a month ago, it was hosted on a server running mlmmj. Moving archives from
mlmmj is pretty simple, as the messages are already stored in the format
Sympa requires (plain text, full headers, one file per message).

The archives that were generated when the list was hosted on mlmmj are
already moved over, and are visible in the web UI.

But prior to mid-2022, we have 17 or 18 years of archives from the time the
list was hosted on a Mailman server.

Getting the archives into the right format is not a problem. Mailman stores
entire months in a single file. It's just a question of copying each message
into its own text file, and un-munging the envelope and From: headers
(changing "user at domain.name" to "address@concealed"). I have scripts that
do that, and put the messages into a folder named YYYY-MM. (Actually, the
messages go into YYYY-MM/arctxt).

I manually copy YYYY-MM to /address@concealed.

From there, all I should need to do to complete the task is run "sympa
rebuildarc address@concealed".

And this worked, for archives from 2003, 2004, 2005, 2006 and 2007. At that
point, I had archives from those years, plus 2022-2024.

Then something weird happened. With log level set to 2, I could see Sympa
processing the emails for the years I'd already imported, but then
archived.pl died, apparently killed by something sending a SIGKILL.

Maybe a timeout somewhere? I see that happen with things via the web interface (webserver timeouts etc) but I haven't with a command line so far...

There weren't any useful clues in sympa.log or syslog, but thinking that
maybe systemd might be killing archived.pl due to low memory, I tried
increasing the RAM allocated to the VM from 2 to 4 GB.

Do you have any data that's the case? Like graphs of system memory or even just watching top sorted by memory usage. I don't know how large this list is, but it would not surprise me if things exceeded 4gb during a rebuild of a large archive.

That got me a little further - I now have the first 11 months of 2008 in the
archive. But now I'm running into the same problem. And I still have to
import archives from December of 2008, plus archives from every year between
2009 and 2021 (inclusive). I can't keep throwing more RAM at the VM.

Can you "bisect" to determine if it's a problem with particular archives? Can you skip over the year you are currently trying to add (2008?) and see if other years work better?

Are there any good solutions to this problem? Can I tell Sympa to only
rebuild the archive for a certain year?

I don't think that exists. :(

--
Matt Taggart
address@concealed




Archive powered by MHonArc 2.6.19+.

Top of Page