Subject: Developers of Sympa
List archive
[sympa-dev] Optimizing Sympa for handling 1000+ lists List::get_list & List::get_which(_db)
- From: <address@concealed>
- To: address@concealed
- Subject: [sympa-dev] Optimizing Sympa for handling 1000+ lists List::get_list & List::get_which(_db)
- Date: Wed, 18 Jul 2007 16:13:29 +0200
Salut,
I am in the process of optimizing Sympa so that it will "scale" to support
over
10,000 mailing lists.
wwsympa.fcgi makes numerous calls to List::get_list and List:get_which - both
of these functions create a data-structure representing an array of List
objects. On a machine with 10,000 lists, this data structure can be larger
than 160MB.
I have not done in-depth analysis of the data structures inside of
wwsympa.fcgi
and List.pm, but it is my strong impression that these calls to get_list and
get_which that result in a wwsympa.fcgi process that each occupies more than
1GB of memory.
I have experimented with using Memcached inside of get_lists and get_which -
with unsatisfactory results. If Memcached were designed to store values
larger
than 1 megabyte it would be possible to store get_lists' @all_lists inside of
memory - preferably inside of wwsympa.fcgi.
Modifying memcached to accommodate 200MB values is not the way to go, however.
All signs indicate that the proper place to begin optimizing is
List::get_which
and get_which_db
List::get_which() & get_which_db()
This function uses a hash from List::get_which_db() containing the user's
subscribed lists. The 'closed/open/pending' status, and directory path
(relative to the robot-dir) are not stored inside of the database,
List::get_which makes a call to get_lists to retrieve a complete @all_lists by
calling get_lists.
It is my current intention to change the behavior of get_which_db(), so that
it
retrieves a mailing list's `status` and `dir` from a database table:
`list_table`. get_which would then use this array to load the user's lists
into memory, saving a call to get_lists.
`list_table` will be generated at sympa startup using get_lists, and further
updated anytime a list 's status or name is changed.
I will be making these changes so that Sympa is able to run on the hardware
described above.
However, I am aware that this is not a long term fix, or a change that I can
expect to be folded into the main sympa distribution--
As get_lists() is called many times for maintenance activities inside of
wwsympa, a longer term solution might be to abstract these maintenance
functions out of wwsympa into specialized functions inside of List.pm, and
then
optimize the `list_table` schema to accommodate for the list parameters
needed
by wwsympa so that calls to the database could be made instead of calls to
get_lists.
I would like to make the sympa-dev list aware of my activities, and I am
actively soliciting for advice on optimizing Sympa to accommodate large
numbers
of lists.
Salut,
Charles Paul
-
[sympa-dev] Optimizing Sympa for handling 1000+ lists List::get_list & List::get_which(_db),
epsas, 07/18/2007
-
[sympa-dev] Re: Optimizing Sympa for handling 1000+ lists List::get_list & List::get_which(_db),
Olivier Salaün, 07/22/2007
- [sympa-dev] Re: Re: Optimizing Sympa for handling 1000+ lists List::get_list & List::get_which(_db), Sergiy Zhuk, 07/22/2007
-
[sympa-dev] Re: Optimizing Sympa for handling 1000+ lists List::get_list & List::get_which(_db),
Olivier Salaün, 07/22/2007
Archive powered by MHonArc 2.6.19+.