February 15, 2009 will

A better Caching system for Django

Django has pretty good support for caching, which is one of the easiest ways of speeding up a web application. The default is to cache for ten minutes, which means that if you get multiple requests for a page within a ten minute window then Django can serve up a stored copy of the page without hitting the database or rendering the HTML. The caching period can be set per-page and fragments of pages can be cached rather than the whole, but the system rests on the fact that it doesn't matter if the content doesn't change for a period of time.

Trouble is, not many web applications work that way. Consider a humble blog post with a piece of content and a list of comments. Granted, the post content isn't going to change very often and will probably never change again once the author has corrected all the typos, but a list of comments may change at any point through the life-time of the blog. If the page were cached, comments would not be visible on the page for a short period of time, which doesn't give that instant gratification that web users expect these days.

It would be nice if a web app could benefit from caching and still serve a fresh page when content changes. Alas, Django's default time-based caching is never going to achieve this. What is needed is a way of invalidating an item in the cache when there are changes that alter the content; using the blog example, this would be when the post content changes or a new comment is submitted. Django does give you the tools to do this -- using the low level cache API, you can construct a cache key (a simple string) that changes according to the page content, then store and retrieve your content manually. For example you might create a cache key with the modification date of the post, and the ID of the most recent comment. Then the page need only be rendered when the cache key isn't found in the cache. The downside of this approach is that it can take a little work to calculate the key (you may still have to hit the database to generate the cache key). This is not quite as nice as time-based caching, which will do negligible work for cached pages.

A better approach, which has the benefit of always serving a fresh page, and still requires negligible work for cached pages is to construct the cache key based on the request, then invalidate (i.e. delete) the cache key when the content changes. Again, this can be done with Django, although it has no explicit support for it. Such a system requires more work because the web app must track any event that could potentially require a fresh page to be generated, but has the major advantage that up-to-date pages are virtually always going to be available in the cache.

I've written ad-hoc page-event cache systems a couple of times and they are a major headache to maintain. Even on simple pages there can be multiple events that may invalidate a cached page. What would be nice, is some formal way of associating a model instance with a the URL(s) that depend on it, so that when the instance is saved, the associated pages will be invalidated and re-generated at the next request.

I've not yet settled on the best way of implementing this, but it will probably be a decorator that builds up a mapping of object type plus primary key to its dependant URLs, and a mixin for models that hooks in to the save method.

I'm open to suggestions regarding the most elegant implementation, and any other potential solutions!

Use Markdown for formatting
*Italic* **Bold** `inline code` Links to [Google](http://www.google.com) > This is a quote > ```python import this ```
your comment will be previewed here
gravatar
Ronald Hobbs

Hi, I'm still new at Django so can't comment on implementation specific details like the decorators. But your approach is interesting, I've taken a slightly different approach with caching, instead of caching the page as a single unit, I've divided it up into parts which are combined at runtime, kind of a 1st level cache.

This doesn't give the same level of benefit as a front facing cache, i.e. you still have to hit some code, but the level of gain depends on where the load in your system is, e.g. back end heavy systems would receive a lot of benefit.

This also then allows portions of the page to remain in cache for much longer, and combining with your approach allows you to dynamically alter which portions of the page hit the cache, and which hit the backend.

In your blog example, you could permanently cache the blog post, and invalidate the comments (and re-cache them) whenever a new comment is added.

Its a bit of a balancing act between how much space to use in the cache vs the benefits of not hitting the backend.

gravatar
teserak

You can use StaticGenerator http://superjared.com/projects/static-generator/

gravatar
felix

you could use the save signal of the model. simply delete the cache on save.

that would be better then messing with the save method.

gravatar
Ferd

Traditionally, the biggest bottlenecks of any app end up being the database queries of all sort.

In most cases, the processing you do in your app is taking a fraction of the database time (especially if things like your templates are pre-compiled).

The easiest way to improve performance with caching would be to cache your queries that involve fetching data in your view (view for django, controller for other frameworks). Cache them indefinitely and then invalidate/update them when saving, updating or deleting content. Of course, this may not fit in the best way ever in Django (I haven't worked with it for such a long time), but it's what's possibly going to give you the fastest boost in performance without the users ever noticing out of sync data, while still leaving you all the control you want.