devel - Re: [sympa-developpers] SQLSource encoding

Subject: Developers of Sympa

List archive

Re: [sympa-developpers] SQLSource encoding

From: David Verdin <address@concealed>
To: address@concealed
Subject: Re: [sympa-developpers] SQLSource encoding
Date: Wed, 31 Jul 2013 14:51:25 +0200

Le 29/07/2013 14:16, Guillaume Rousse a écrit :

Le 27/07/2013 17:30, IKEDA Soji a écrit :
[..]

I can mislead this issue. I guessed some DB engines implement
"statement caches" in themselves, and DBI::prepare() or DBI::bind_param()
is a wrapper of them. I couldn't find answer from Oracle document.

AFAIK, that's just the normal low-level way to execute queries, and doesn't involve caching, just retrieving a pointer over the parsed query:
http://www.sqlite.org/capi3ref.html#sqlite3_prepare
http://dev.mysql.com/doc/refman/5.0/fr/mysql-stmt-prepare.html
http://www.postgresql.org/docs/8.1/static/libpq-exec.html

The caching occurs in our memory space, by keeping the relationship between the original SQL query string, and the corresponding pointer.

Why I pointed this issue was that, we should present to the users
expected amount of engine-side cache requested by Sympa.

Unfortunatly, I don't think it is mesurable. Additionaly, I think pre-compiled queries is just yet another kind of database server resource usage. As much as data storage space, queries rate, simultaneous connections, etc... I don't think that's worth a specific focus.

I'm far from a database expert, tough, I may be wrong here.

Étienne had a thought that was I think insightful: He said that, supposing we try to cache more queries than the RDBMS can handle, it will probably simply stop caching it before being overwhelmed by memory consumption. So we would probably just get back to non cached queries and certainly not break the RDBMS engine.

From the calling context, we can sometimes explicitely decide of the
best strategy:
1) we're sure than the query won't be used more than once in the process
lifetime (ALTER queries, for instance): it should not be cached
2) we're reasonably sure than the query will be used more than once
(INSERT queries in Sympa::Log::Dabatase, for instance): it should be cached
3) the query doesn't have any variable part (for instance, no
placeholder): it should not be cached
etc...

With the limitation than the current API doesn't allow to use SELECT
queries without caching them (that should be fixed), that is what I
tried to achieve so far.

Now, we could also put intelligence in the database object, and use some
kind of cache expiration strategy to avoid it growing indefinitly. For
instance, limiting its size, or imposing some kind of TTL on cached
handles. The DBI documentation suggest to use Tie::Cache::LRU, for instance.

However, I wouldn't reach into technical solution as long as there isn't
any evidence of the problem it is supposed to cure...

- Before any expressive ORM is integrated into Sympa, do_query()
function to execute raw SQL seems to be required.

I'm not sure about what you cal 'raw SQL'. Both execute_query() and
get_query_handle() takes an SQL string as main argument.

Constructing this SQL string is still the responsability of calling
code. But whereas you previously inserted '%s' placeholders, and
transmitted quoted values to do_query() function, now you insert '?' as
placeholder, and you either pass values to bind to execute_query(),
either you perform binding through $handle->execute() directly.

# SELECT query
my $query = "SELECT * FROM table WHERE id = ?";
my $handle = $database->get_query_handle($query);
$handle->execute($id);
while (my @row = $handle->fetchrow()) {
....
}

# non-SELECT query
my $query = "DELETE FROM table WHERE id = ?";
my $rows = $database->execute_query($query, $id);

We can simply add a switch to enable/disable caching. Because
in some cases DELETE or INSERT would be cached, and SELECT in
examples below may be difficult to determine whether cache would
be used or not.

caching DELETE or INSERT is already possible: just use get_query_handle() instead of execute_query(). That's get_query_handle() that would eventually need such as switch.

[..]

Some queries may have variable WHERE (with built-in functions),
ORDER BY or LIMIT clause, e.g. when the user shows logs or spools.
In these cases each statement will be used nearly once.

That's the 'no use for caching' case, I perfectly agree with you.

In such a case, correctly built indexes woul probably be enough to improve performances.
We should however not put indexes on volatile data or - at least - volatile data with large data.

Re: [sympa-developpers] SQLSource encoding , (continued)
- Re: [sympa-developpers] SQLSource encoding, IKEDA Soji, 07/25/2013

List archive

Re: [sympa-developpers] SQLSource encoding