Skip to Content.
Sympa Menu

devel - Re: [sympa-developpers] Proposal: Clarify log levels

Subject: Developers of Sympa

List archive

Chronological Thread  
  • From: Guillaume Rousse <address@concealed>
  • To: address@concealed
  • Subject: Re: [sympa-developpers] Proposal: Clarify log levels
  • Date: Mon, 16 Jun 2014 16:45:32 +0200

Le 10/06/2014 05:56, IKEDA Soji a écrit :
Hi,

Currently, "err" and "notice", or "info" and "notice" levels are
sometimes confused. And, "err" level outputs traceback to syslog
even when the error was caused by user.

So I wish to clarify the purpose of each log level.


Summary of proposal:

* Add "alert" and "crit" levels for fault on the system.

* "err" level will be used for information caused by users' action.
"alert", "crit" and "notice" levels will be used for the system.

* "info" level will be used for successful result.

* "debug*" levels will be used for things not corresponding to any
levels above.

Detailed description of each level follows.

"alert": Program cannot continue processing

e.g.:
- Mandatory file or directory does not exist, cannot be open or not
be writable.
- Unrecoverable errors in sympa.conf or robot.conf.
- Program unexpectedly dies.
- and other failure to prevent running the program.

n.b. This level outputs traceback to log and STDERR.


"crit": Unexpected situation

e.g.:
- Sending one or more messages failed.
- Logging failed.
- Database server is (temporarily or permanently) gone.
- Query is denied by SQL or LDAP server.
- DNS returned malformed response (DKIM etc.).
- OAuth server unexpectedly denied assertion.
- Synchronization of list members unexpectedly failed.
- ... and other errors caused by foreign components.

n.b. This level outputs traceback to log.


"err": Errors caused by messages, user or user agent

e.g.:
- Errors in list config file (list does not have owners etc.).
- Errors in TT2 template.
- Mail loop is detected.
- Cannot parse MIME.
- Cannot verify signature or cannot decrypt message.
- Authentication failed.
- Specified entity (robot, list, family, user, ...) is not found.
- Incorrect command in the message.
- Command failed.
- Incorrect parameter(s) in request from user agent, or mandatory
parameters does not be given.
- Other action by user is inhibited or failed (e.g. by scenario).
- Message was ignored by other reason (e.g. No senders).
- ... and other errors caused by inputs from outside of Sympa.

n.b. This level DOES NOT output traceback.


"notice": Epochal events on the system

e.g.:
- Program starts, forks or expectedly stops.
- Data structure was successfully updated.
- Temporary failure is solved (database connection restored etc.)
- List is created, instantiated, closed, restored or purged.
- ... and so on.


"info": Other successful information

e.g.:
- New message was fetched from spool.
- Message was added into spool.
- Message was sent successfully.
- Authentication suceeds.
- List member(s) are added, deleted or successfully synchronized.
- Command succeeds.
- Action by user (partially or completely) succeeds.
- Statistical information.
- ... and other information not related to error nor debugging.


"debug", "debug2", "debug3": Debugging information

e.g.:
- Information for tracing or profiling.
- AND, everything not corresponding to other log levels.
This make sense, and should indeed allow more consistency between messages. I just have two remarks.

First, I don't think we need so much categories (8), when some of them are uneeded and some are quite fuzzy, and we'd rather merge some of them for sake of simplicity.

- debug, debug2, and debug3: do we really need 3 different levels ? without any definition of their respective intents ?

- info and notice: what's an 'epochial event' ? Why should we make a distinction between 'some kind of success' and 'other kind of success' ?

- alert and crit: we don't need two different levels just for enforcing two different reactions. Especially when you classify 'program unexpectedly dies' under 'alert' category, whereas you defined 'crit' as 'Unexpected situations' :)

Basically, I'd rather go for the four categories:
- fatal error
- error
- information
- debug

Mapping those levels into syslog priorities is another exercice.

Second, there is a confusion between 'how to classify an error' vs 'how to react to an error'. Basically, we don't need two different categories for unrecoverable errors, whereas the context is enough to provide different responses.

Expected unrecoverable errors should be handled by specific error handlers, so as to deliver human-readable and context-specific error messages. For example, we should make a distinction between 'database unavailable' and 'database available, outdated schema' errors. And I don't see the need of a stack trace here, if we're able to pinpoint the actual issue.

Unexpected unrecoverable errors should be handled by the default error handler (the one installed as a signal handler), acting as a safety net, but its purpose should be only to deal with uncaught exceptions in other parts of the code, not to replace all kind of error handling. Here it make sense to output a call trace.

That's more or less the current situation now, with terminate_on_expected_error() (previously Sympa::Log::fatal_err()) and terminate_on_unexpected_error() (previously Sympa::Site signal handler), in Sympa::Tools::Daemon, excepted I didn't change anything in their original behaviour.

Also, as long as the configuration is not loaded, the logging subsystem is quite limited to STDERR.

--
Guillaume Rousse
INRIA, Direction des systèmes d'information
Domaine de Voluceau
Rocquencourt - BP 105
78153 Le Chesnay
Tel: 01 39 63 58 31

Attachment: smime.p7s
Description: Signature cryptographique S/MIME




Archive powered by MHonArc 2.6.19+.

Top of Page