Saturday, June 13, 2009

Using Query Caching in Xataface

Version 1.2 of Xataface introduces a new query caching feature.  When enabled, it causes Xataface to cache the results of database queries on the filesystem so that MySQL doesn't have to be dealing with the same queries over and over.  Between this feature and APC your application can almost get away with never touching the database during its requests.

How it works
You enable query caching by adding the following directive to the beginning of your conf.ini file:  
cache_queries=1
Once it is enabled Xataface will begin checking the cache every time an SQL query is about to be performed.  If it has a cached result that is more recent than the update times of the tables involved in the query, then it simply loads that cached result.  This is particularly useful when queries are performed that are quite intensive.  Previously Xataface had to perform the MySQL query on every single request (if the search took 30 seconds, then you would have to wait 30 seconds on every request).   Now it only needs to perform the query the first time, and subsequent requests will simply load if from the cache.

This feature should be used on any production site where performance is important.

Where are the Queries Cached?
The queries are stored inside your application's templates_c directory in a directory called "query_results".  This directory is created owned by the webserver with 0700 privileges (so nobody else can see them) and it contains an .htaccess file to prevent apache from serving the results.  The results are stored as PHP serialized arrays.

How to know if the cache is working
Once you enable the cache you won't really see any difference in your application except that it should be a bit more snappy and present less of a load to your server.  If you want to confirm that your queries are indeed being cached, you can enable the cache log by adding the following directive to the beginning of your conf.ini file:
cache_queries_log=1
This causes a log to be written to /tmp/querylog.log that tells you which queries are being executed, whether they used the cache, and which tables each query is dependent upon.

Table Modification Times and Dependencies
In order to determine whether a result cache is current, Xataface checks the table modification times of the tables that are used in a query and compares them to the cached result modification time.  If the cache is newer, then it uses the cache.  For example, for the query:
select * from foo inner join bar where fooid=10
Xataface would check the modification times of the tables foo and bar and compare these to the cache.  You can also specify table dependencies (i.e. a list of other tables to compare against) by way of the __dependencies__ directive in the fields.ini file.    E.g. Suppose that for any queries involving table foo, we also want to compare the cache modification time to the table foo2.  Then we could add the following to the beginning of the foo table fields.ini file:
__dependencies__=foo2
Note that this directive can take multiple tables as a comma-delimited list.
__dependencies__="foo2,foo3,foo4"
Gotchas: Queries involving Views
MySQL makes query caching of queries that include views a little bit more difficult because it doesn't actually track a modification time for a view.  Hence if your query uses a view, Xataface will NOT use the query cache by default.   You can, however, remedy this issue by specifying the source tables of the view in the view's fields.ini file by way of the __source_tables__ directive.

E.g. suppose I have a view named foo defined as follows:
create view foo as select * from foo2,foo3
Then clearly the modification time of foo is the latter of the modification times for foo2 and foo3, so let's specify this in our foo fields.ini file (tables/foo/fields.ini):
__source_tables__ = "foo2,foo3"
This small change will allow Xataface to perform query caching on queries involving the foo view.

Using the Query Cache Programmatically
When performing your own queries from PHP you can take advantage of the query cache.  All queries performed using the df_query() function in array mode will make use of the query cache.  (i.e. the 3rd parameter should be true).  In this mode, the df_query() function returns an array of rows in the result rather than a mysql result pointer as usual.  If you want your application to have the maximum performance it is a good idea to get in the habit of using this function for all of your application's database queries.

E.g.
instead of
$res = mysql_query("select * from foo", df_db());
while ($row = mysql_fetch_assoc($res) ){
    //... do something with the row
}
we would have
$rows = df_query("select * from foo", null, true);
foreach ($rows as $row){
    // ... do something with the row
}

No comments:

Post a Comment