Overzealous caching

Date March 26, 2006

As some of you who listen to my podcast may know, I recently took over a project at work - a PHP project. While I still have the perl/rt system I maintain, more of my time has recently been in the PHP world getting up to speed with the code my predecessor left. I’d been listening to his work trying to get things to speed up - make the code run faster, squeeze extra milliseconds out of the pages, etc. However, when I started using the code on the dev server, some pages were *extremely* slow (15-30 seconds per page load slow). This made development on this area of the code simply undoable. So, I spent about 2 hours tracing down exactly what the code was doing, and found that, by default, huge sets of data were being cached on the file system.

*Some* deserved to be, in that the datasets coming back took a long time to be calculated on the database server (in my view, a ‘long time’ is >.2 seconds most of the time). There were already proper indexes on the data, so those datasets deserved to be cached (they didn’t change all that frequently, but it was still worth it to cache). However, many others didn’t deserve to be cached. Queries that took 20 ms were being cached on the file system, which was taking 150-200 ms to seek and load the data. You can see were I’m going with this. By turning off the caching, I was able to get a 20 second page load down to about 7 seconds. Still not great for testing, but certainly much better. I then turned on gzipped output buffering and got things down to about 2-3 seconds, which was even better. To be fair, this was much more of a problem on the dev machine, which is only 800mhz (live is, I think, 3ghz), so the speed problem isn’t as apparent on live. However, it’s a bear to develop like that.

The moral of the story is don’t cache data unless you *really* need to. :) How can you tell? One idea which makes sense when caching SQL results is to store the query execution time as part of the cached data itself. When pulling from the cache, time that process as well, and have the cache system report if it’s taking longer to pull from cache than the original query took to execute. This won’t work in all cases, obviously, but if you’re looking to query or data caching to help speed your code, take some time to engineer in some method of determining what the true impact of your design is.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="">