Skip to Content.
Sympa Menu

en - Re: [sympa-users] Profiling wwsympa bad performances and our findings

Subject: The mailing list for listmasters using Sympa

List archive

Chronological Thread  
  • From: IKEDA Soji <address@concealed>
  • To: Olivier Salaün <address@concealed>
  • Cc: address@concealed
  • Subject: Re: [sympa-users] Profiling wwsympa bad performances and our findings
  • Date: Mon, 4 May 2020 20:38:37 +0900

Hi Olivier,

Thank you for valuable inputs!

Here’s my comments.

2020/05/04 18:34、Olivier Salaün <address@concealed>のメール:



Hello,

I wish to summarize here the results of our investigation to understand the origin of very bad performances of wwsympa in our production server (Sympa 6.2.48).

We are running a ML service with : 8 robots, 5.200 lists (total), 2.500 family lists, 960.000 subscribers (subscriber_table), 36.000 admins (admin_table).

You may know that on Sympa's web interface the critical operations are the ones that require wwsympa.fcgi to load all the mailing lists. This includes the /lists and /lists_categories URLs. On our main robot (1.500 ML) /lists processing takes: 3 minutes if the wwsympa.fcgi process is not running ; 16 seconds otherwise. These response times make Sympa's web interface almost unusable. That's why we decided to investigate...

To find out what's going on we used a Perl Profiler (https://metacpan.org/pod/Devel::NYTProf). Once enabled it keeps track of all subroutine calls and draws a picture of the time spent in each subroutine, how many times it was called and other fine grained details.

<mjdlniplkombakhh.png>

The profiler results helped us identify where wwsympa.fcgi spends too much time. We then tried to disable parts of the code and observed when it had significant positive impact on the performances. Below I detail the 5 tracks we followed.

Patch #1: prevent database calls from get_current_admins()

Response time gain: -62%

We noticed that there were 50.846 calls to Sympa::Database::do_prepared_query(), most of them from get_current_admins(). It occurs in the main loop of wwsympa::do_lists() but also every time a scenario including 'is_editor' or 'is_owner' condition is evaluated. Obviously a level of cache should be introduced here

=> found a related Pull Request: https://github.com/sympa-community/sympa/pull/925

It looks good for me. :)

Patch #2: disable authorization scenario evaluation

Response time gain: -34%

During scenario evaluation the code does too many regular _expression_ evaluation (490.000). Having a look at Sympa::Scenario::_verify(), it could be optimized. One option would be to use https://metacpan.org/pod/Regexp::Grammars to parse the scenarios files. It would also make the code more readable. But to prevent issues with existing scenarios, this requires more complete unit tests.

=> I will propose a new approach for scenario grammar parsing

As of 6.2.49b.1 or later, each scenario file will be parsed and compiled just only once: cf. https://github.com/sympa-community/sympa/pull/782

However improving parsing process is still useful, because it will make the first access faster.

Patch #3: disable syslog calls

Response time gain: -9%

The profiler reveals huge amount of time spent in Sympa::Log::syslog() (151.862 calls). Many low-level subroutines in the code provide syslog() logs for debugging purpose. Even when Sympa is configured with "log_level info", these syslog() calls have a significant cost because of the subroutine call itself. A simple bypass would be to replace syslog('debug',...) calls with syslog('debug',...) if ($Sympa::Log::levels{'debug'}). This could be done by configure/make.

=> not a big performance gain, but a quick win though

Currently, I feel debug (or debug2, debug3) logs are not so useful.  I think that we’d be better to stop embedding erratic debug messages and to use generic module such as Debug::Trace.

Patch #4: prevent param_constraint.conf loading (for lists belonging to families)

Response time gain:

  • for a fresh wwsympa.fcgi process: -83%
  • for an existing wwsympa.fcgi process:-18%

With existing code, param_constraint.conf files are loaded every time a Sympa::List object is loaded. It's expensive to run check_param_constraint() and it only make sense when (re)instantiating list families.

=> I submitted this Pull Request: https://github.com/sympa-community/sympa/pull/933

I feel it nice. :-)

Patch #5: disabling file locking

Response time gain: -25%

The profiler shows that Sympa code spends too much time calling File::NFSLock functions. I just try disabling file locking to evaluate impact, but it would make sense to consider using alternative file locking implementations.

I think file locking is necessary when the processes read/write the files, though Racke suggested it not always necessary.

For example I tried multiple hosts sharing configuration on network storage and found they work successfully.  In such case, reliable (and portable) file locking is required, even if it is slightly slow.

I suppose that, if we need quicker locking, we may switch from file system to the other storage such as redis.

Conclusion

Applying all 5 patches wwsympa performances are much better:

  • for a fresh wwsympa.fcgi process: 3min => 8,50 seconds
  • for an existing wwsympa.fcgi process: 16 seconds => 5,80 seconds

Some changes I tried are quick wins, others either provide less benefits or require more refactoring work.


Regards,
— Soji

-- 
Olivier Salaün
DSI / pôle SI / équipe SNUM
Tel : 02 23 23 74 54



Archive powered by MHonArc 2.6.19+.

Top of Page