Skip to Content.
Sympa Menu

devel - Re: [sympa-developpers] SQLSource encoding

Subject: Developers of Sympa

List archive

Chronological Thread  
  • From: IKEDA Soji <address@concealed>
  • To: address@concealed
  • Subject: Re: [sympa-developpers] SQLSource encoding
  • Date: Sun, 28 Jul 2013 00:30:22 +0900

Guillaume,

On Fri, 26 Jul 2013 11:21:53 +0200
Guillaume Rousse <address@concealed> wrote:

> Le 26/07/2013 00:36, IKEDA Soji a écrit :
> > On Thu, 25 Jul 2013 19:55:49 +0200
> > Guillaume Rousse <address@concealed> wrote:
> >
> >> Le 25/07/2013 19:25, IKEDA Soji a écrit :
> >>> On Thu, 25 Jul 2013 17:21:11 +0200
> >>> David Verdin <address@concealed> wrote:
> >>>
> >>>>
> >>>> Le 25/07/13 16:49, Guillaume Rousse a écrit :
> >>>>>
> >>>>>
> >>>>> Everything related to parameter binding in do_query and
> >>>>> do_prepared_query can be discarded, those methods don't exist anymore,
> >>>>> and binding is now delegated to DBI internals.
> >>>> As we speak about it: why not keeping a single function to do all the
> >>>> query business?
> >>>> We always do the same succession of calls to do a query to database. I
> >>>> refactored all them in do_query and even planned to cover the final
> >>>> mile
> >>>> by fetching data in this sub and sending the hashref or arrayref we
> >>>> needed actually in the code.
> >>>
> >>> I wonder whether non-binding version of do_query() may be discarded
> >>> or not.
> >>>
> >>> - Statement cache (either implemented in DB engine or emulated by DBI)
> >>> will be cleaned unexpectedly earlier, if queries by users for each
> >>> are prepared.
> >> What is cached is the unbinded query handle ("SELECT * FROM table WHERE
> >> id = ?"), resulting from prepare() call.
> >>
> >> I removed all calls to $handle->terminate(), excepted in Sympa::Database
> >> DESTROY method.
> >>
> >> All usage of query handle to read returned values immediatly follows a
> >> call to $handle->execute(), where final values binding actually occurs.
> >>
> >> I don't see any scenario where a query handle resulting from
> >> get_query_handle() would be cleaned (terminated ?), or used in an
> >> erroneous state. As long as we are in a non-threaded application, of
> >> course.
> >
> > I understood that get_query_handle() adds a "statement handle" to
> > the cache in DB engine (e.g. on Oracle). Is it right?
> > If it is right, I guessed that, once the amount of SQL statement
> > handled by DB engine (not Sympa::Database module) exceeds the size
> > predefined by DB engine, caching mechanism will become very
> > inefficient. Moreover, it might hit performance of other
> > applications using the DB engine.
> OK, I get your point. The statement handle cache itself is in our
> Sympa::Database object. As this relies on precompiling SQL queries,
> there will be indeed some kind of resources mobilized in the server, as
> long as the handle is not freed. So, theorically, it could lead to
> excessive resource usage on the database server, indeed.
>
> However, there isn't any mention of the issue in DBI documentation,
> which rather encourage generalized usage of pre-compiled queries.
> Re-reading carefully, I even found than DBI provide its own caching
> method (prepare_cached() instead of prepare()), and than it discourage
> usage of finish() on statement handles. All in all, it doesn't sound
> very much like 'be careful when accumulating prepared statement handles'...
>
> However, I agree we can do better than caching everything just because
> we can.

I can mislead this issue. I guessed some DB engines implement
"statement caches" in themselves, and DBI::prepare() or DBI::bind_param()
is a wrapper of them. I couldn't find answer from Oracle document.

Why I pointed this issue was that, we should present to the users
expected amount of engine-side cache requested by Sympa.

> From the calling context, we can sometimes explicitely decide of the
> best strategy:
> 1) we're sure than the query won't be used more than once in the process
> lifetime (ALTER queries, for instance): it should not be cached
> 2) we're reasonably sure than the query will be used more than once
> (INSERT queries in Sympa::Log::Dabatase, for instance): it should be cached
> 3) the query doesn't have any variable part (for instance, no
> placeholder): it should not be cached
> etc...
>
> With the limitation than the current API doesn't allow to use SELECT
> queries without caching them (that should be fixed), that is what I
> tried to achieve so far.
>
> Now, we could also put intelligence in the database object, and use some
> kind of cache expiration strategy to avoid it growing indefinitly. For
> instance, limiting its size, or imposing some kind of TTL on cached
> handles. The DBI documentation suggest to use Tie::Cache::LRU, for instance.
>
> However, I wouldn't reach into technical solution as long as there isn't
> any evidence of the problem it is supposed to cure...
>
> >>> - Before any expressive ORM is integrated into Sympa, do_query()
> >>> function to execute raw SQL seems to be required.
> >> I'm not sure about what you cal 'raw SQL'. Both execute_query() and
> >> get_query_handle() takes an SQL string as main argument.
> >>
> >> Constructing this SQL string is still the responsability of calling
> >> code. But whereas you previously inserted '%s' placeholders, and
> >> transmitted quoted values to do_query() function, now you insert '?' as
> >> placeholder, and you either pass values to bind to execute_query(),
> >> either you perform binding through $handle->execute() directly.
> >>
> >> # SELECT query
> >> my $query = "SELECT * FROM table WHERE id = ?";
> >> my $handle = $database->get_query_handle($query);
> >> $handle->execute($id);
> >> while (my @row = $handle->fetchrow()) {
> >> ....
> >> }
> >>
> >> # non-SELECT query
> >> my $query = "DELETE FROM table WHERE id = ?";
> >> my $rows = $database->execute_query($query, $id);

We can simply add a switch to enable/disable caching. Because
in some cases DELETE or INSERT would be cached, and SELECT in
examples below may be difficult to determine whether cache would
be used or not.

> > If we have to use e.g. get_limit_clause(), placeholder will not
> > work. Instead, we have to construct query string containing limit
> > clause then to execute it.
> It still works, because get_limit_clause() is self-contained, and
> returns a string without placeholders. The calling code doesn't care:
>
> my $query =
> "SELECT * FROM table WHERE id = ? " .
> get_limit_clause(rows_count => 1);
> my $handle = $database->get_query_handle($query);
> $handle->execute($id);
> while (my @row = $handle->fetchrow()) {
> ....
> }
>
> get_limit_clause() (and similar functions) could eventually get
> rewritten to return separatly a query string with placeholder, and a
> list a parameters, but that's too much complex to my taste, without
> proven advantage.

Some queries may have variable WHERE (with built-in functions),
ORDER BY or LIMIT clause, e.g. when the user shows logs or spools.
In these cases each statement will be used nearly once.

--
株式会社 コンバージョン セキュリティ&OSSソリューション部 池田荘児
〒231-0004 神奈川県横浜市中区元浜町3-21-2 ヘリオス関内ビル7F
e-mail address@concealed TEL 045-640-3550
http://www.conversion.co.jp/




Archive powered by MHonArc 2.6.19+.

Top of Page