Skip to Content.
Sympa Menu

devel - [sympa-dev] Optimizing Sympa for handling 1000+ lists List::get_list & List::get_which(_db)

Subject: Developers of Sympa

List archive

Chronological Thread  
  • From: <address@concealed>
  • To: address@concealed
  • Subject: [sympa-dev] Optimizing Sympa for handling 1000+ lists List::get_list & List::get_which(_db)
  • Date: Wed, 18 Jul 2007 16:13:29 +0200


Salut,

I am in the process of optimizing Sympa so that it will "scale" to support
over
10,000 mailing lists.

wwsympa.fcgi makes numerous calls to List::get_list and List:get_which - both
of these functions create a data-structure representing an array of List
objects. On a machine with 10,000 lists, this data structure can be larger
than 160MB.

I have not done in-depth analysis of the data structures inside of
wwsympa.fcgi
and List.pm, but it is my strong impression that these calls to get_list and
get_which that result in a wwsympa.fcgi process that each occupies more than
1GB of memory.

I have experimented with using Memcached inside of get_lists and get_which -
with unsatisfactory results. If Memcached were designed to store values
larger
than 1 megabyte it would be possible to store get_lists' @all_lists inside of
memory - preferably inside of wwsympa.fcgi.

Modifying memcached to accommodate 200MB values is not the way to go, however.

All signs indicate that the proper place to begin optimizing is
List::get_which
and get_which_db


List::get_which() & get_which_db()

This function uses a hash from List::get_which_db() containing the user's
subscribed lists. The 'closed/open/pending' status, and directory path
(relative to the robot-dir) are not stored inside of the database,
List::get_which makes a call to get_lists to retrieve a complete @all_lists by
calling get_lists.

It is my current intention to change the behavior of get_which_db(), so that
it
retrieves a mailing list's `status` and `dir` from a database table:
`list_table`. get_which would then use this array to load the user's lists
into memory, saving a call to get_lists.

`list_table` will be generated at sympa startup using get_lists, and further
updated anytime a list 's status or name is changed.

I will be making these changes so that Sympa is able to run on the hardware
described above.

However, I am aware that this is not a long term fix, or a change that I can
expect to be folded into the main sympa distribution--

As get_lists() is called many times for maintenance activities inside of
wwsympa, a longer term solution might be to abstract these maintenance
functions out of wwsympa into specialized functions inside of List.pm, and
then
optimize the `list_table` schema to accommodate for the list parameters
needed
by wwsympa so that calls to the database could be made instead of calls to
get_lists.

I would like to make the sympa-dev list aware of my activities, and I am
actively soliciting for advice on optimizing Sympa to accommodate large
numbers
of lists.

Salut,
Charles Paul



Archive powered by MHonArc 2.6.19+.

Top of Page