Skip to Content.
Sympa Menu

devel - Re: [sympa-dev] sympa 6.0.1 and wwsympa scaling issues

Subject: Developers of Sympa

List archive

Chronological Thread  
  • From: David Verdin <address@concealed>
  • To: address@concealed
  • Subject: Re: [sympa-dev] sympa 6.0.1 and wwsympa scaling issues
  • Date: Tue, 09 Mar 2010 15:35:01 +0100

Hi Kristina,

Sorry for this late answer.

Le 19/02/2010 05:13, address@concealed a écrit :
Greetings!

I have been working with the lists.riseup.net list server, which hosts
14,000+ lists and almost 3 million subscribers. We recently upgraded from
sympa 5.3.4 and have seen some old and new scaling issues with wwsympa.
I'm hoping I can work with some of the sympa developers to figure out the
best way to fix these issues.
Sure you can! ;-)
I'm also wondering if anyone working with
sympa has noticed anything similar.

Before I list the problems, I'd like to mention that generally we've
noticed a decreased use of both CPU and memory resources on the server,
which is great!
Nice to read this! Olivier did a great deal of work with version 5.4
The first issue is a known issue from 5.3.4: any call to
List::get_lists('*') that does not pass an optional set of lists requires
a traversal of the entire lists data directory. This is obviously quite a
feat for a system of this size! Disabling the ability to list all lists
is one thing we have done in response to this issue. However, to list
pending and closed lists, for example, also requires a read of the lists
data directory. We actually have a patch for this issue which stores all
of the lists and some basic configuration information in a mysql table.
Our modified source can be browsed with the git repository:
https://labs.riseup.net/code/repositories/browse/sympa/sympa-6.0.1-src
I'll look at it. This could be part of a future release, but we must check how deep the changes are, and how well it will fit in Sympa distribution. Anyway, I'm confident that, if you use this code with such a large server without trouble, you probably have written something quite stable. ;)
I have read some mention of someday sympa having its configuration files
stored in the database. I'm wondering if this is similar to the work we've
done in this direction? If we can be of any assistance with this, please
let us know!
For now (version 6.0) a few parameters of the main Sympa configueration are stored in database.
The work is done to have all the main configuration (sympa.conf, wwsympa.conf and robot.conf) in the database. It will be released with version 6.1.
For now, nothing is done for lists config. the main problem is the structural variety of lists parameters: some of them are simple key/vlues pairs, but other are arrays, hashes, event arrays of hashes. It is hard to put the detail in database. If I remember weel, yuor solution is more straightforward than a strict representation of the data structure, but I'll have to look deeper in your code to understand how it works.
The second issue that we noticed was that some queries to arcsearch_id
were resulting in wwsympa processes that were using 100% of the cpu and
running for a very long time. The few times I managed to run an strace on
these processes, it seemed like they were traversing the lists data
directory! However, the *really* odd part of this was that this was
happening for unauthorized requests to arcsearch_id. The even stranger
part is that after a user POSTed to arcsearch_id and received a timeout
error, any request they submitted after this would also time out.
Our fix for the runaway wwsympa processes was this patch:
- return undef unless (defined&check_authz('do_arcsearch_id',
'web_archive.access'));
+ unless (defined&check_authz('do_arcsearch_id',
'web_archive.access')) {
+ param->{'action'} = 'authorization_reject';
+ param->{'reason'} = 'web_archive_closed';
+ return 1;
+ }
I'm currently digging in the code about this. I found some things, but nothing completely conclusive, so I'll contact you later.
More details in a later mail.
The second part of this issue has me a little more perplexed. It seems
related to session data - the sessions for those users had stored a
redirect to the arcsearch_id request. Our quick-fix for this was to add
arcsearch_id to the %temporary_actions hash. Is there a better way to
approach this issue?


The last thing we are seeing, after fixing the aforementioned, is that we
still have an occasional wwsympa process using 100% of the cpu and never
terminating. These processes mostly seem to be requests to either
archives or rss feeds. The frustrating thing about this now is that I
have had a very difficult time finding a request that consistently results
in the error. I have noticed that many of the requests to archives which
result in this error involve lists that have been around for at least a
few years, so they perhaps have a large volume of archives. Still, it is
baffling to me that the requests sometimes complete successfully but
sometimes don't. I realize that there are other relevant factors here --
like other processes that are running on the server and impacting the
server's load -- but we didn't have this problem with 5.3.4, so I'm
wondering what might have changed.
Not much, regarding archives.
a lot regarding authentication and session management.
My guess for now: it could come from authentication issues when comebody receives a summary, clicks on a link and arrives to the Sympa web interface without being authenticated.

I'll dig deeper.

Regards,

David
Thanks for your attention! If you want any further information in the way
of logs, please let me know.

Kristina


--
David Verdin
Comité réseau des universités




Archive powered by MHonArc 2.6.19+.

Top of Page