March 1, 2009 will

Fast Caching with Django and Nginx

I've been toying with optimizing the caching on my blog recently – for my own interest (this humble blog doesn't get all that much traffic). All the same, any speed improvements will only mean snappier page-loads and greater capacity to handle a slashdotting, or similar.

I discovered that Nginx has a memcached module that can serve pages directly from memcached without touching the file-system, or a downstream web-app. Which makes for very fast response times. To get it working, you need to set a cache key named with the url of the page, and the value to the HTML. Alas, this means that it would not work with Django's caching mechanism due to the way Django caches unique pages based on the contents of the request header.

Still, the promise of serving up cached pages without even touching Django was most tempting. So I took the cache middleware from Django and butchered it so that it created a simple enough cache key for Nginx to handle.

Here's the code that generates a cache key, given a request:

def get_cache_key(request, key_prefix=None):
    if key_prefix is None:
        key_prefix = settings.CACHE_MIDDLEWARE_KEY_PREFIX
    cache_key = '%s.%s' % (key_prefix, request.path)
    return cache_key

The following snippet, taken from my Nginx conf file, creates a variable called $memcached_key that matches the cache_key generated in the Python code. If that key exists in memcache it is served directly, otherwise it proxies through to the Django app.

        location / {
set $memcached_key .$uri;
memcached_pass 127.0.0.1:11211;
default_type text/html;
error_page 404 = /dynamic$uri;
}
location /dynamic {
proxy_pass http://127.0.0.1:80/;
include /etc/nginx/proxy.conf;
}

Unfortunately my butchering of Django's cache code meant that it would no longer handle dynamic pages. I compensated for this by re-writing any urls with dynamic content so that they went directly to the app.

        rewrite /xhr/ /dynamic$uri;
rewrite /search/ /dynamic$uri;

Yet another casualty was Django auth – I couldn't access any pages as a logged in user. The work-around for this was to create a sub-domain that pointed at the same IP, but didn't do the caching via Nginx. That way, I can use the sub-domain for any site admin work.

So far, this Frankenstein cache mechanism seems to be working nicely – pages are served as rapidly as memcached can pluck them out of memory.

If you would like to see the code, Django Techblog is open source. It was meant to satisfy my own needs in a blogging engine, but hopefully it will be of use to others. And it would be very cool if there were other Techblogs on the interwebs!

Use Markdown for formatting
*Italic* **Bold** `inline code` Links to [Google](http://www.google.com) > This is a quote > ```python import this ```
your comment will be previewed here
gravatar
Justin Lilly
Great article. Not sure if I like “butchering” cache code, but interesting either way. Do you have any benchmarks of this way of caching versus any other method built with django?
gravatar
Andreas
I did some extensive benchmark on serving html from memcache vs static html from nginx a couple of weeks ago, i took the following code: http://soyrex.com/blog/django-nginx-and-memcached/

and put it up to the test against staticgenerator http://github.com/JaredKuolt/staticgenerator/

I was not so surprised by the results. Static content is almost 4x faster than serving from memcache with nginx.

Request rate: 6243.8 req/s (0.2 ms/req) (static html)
Request rate: 2285.5 req/s (0.4 ms/req) (same html in memcache)

You see, linux and *bsd system caches files in ram so its only the first hit that generates fs hit. Also from what i could tell from my testing is that nginx needs to do a tcp handshake with the memcache on every single request, theres no keep alive.

That said, ~2200 reqs per second is still pretty damn good and there's still pro's on using memcache over psuedo-static generated files like you could put memcache on another machine, or you could have a cloud of memcaches so it will scale in all directions, as will static files though one would have to fiddle with scp/rsync/nfs/<whatever tool to transfer files over network here>

Also I think its a cleaner concept to put generated content in mem because its just a restart of the memcache daemon to make it all go away.

So I'd say there are pros and cons with both approaches but its a fact that nginx handles static files a lot faster.
gravatar
WIll McGugan
Justin, no benchmarks – at least nothing scientific, I just watch ‘top’ for a bit.

Ardreas, great write-up. I'd like to use the static generator approach, but my pages aren't quite static enough. For instance, the ‘recent comments’ module could be on just about every page. That would mean deleting a whole load of files when there was a comment. Doable, I guess. But there is something inelegant about it. Maybe next weekend I'll have another opportunity to play with it!
gravatar
Andreas C
you could use this for auth though there might be a cookie with a sessionid after log out even if the sid isnt valid.

if ($http_cookie ~* “sessionid=.{32}”) {
proxy_pass http://127.0.0.1:80/;
include /etc/nginx/proxy.conf;
}

You could use the staticgenerator way with memcache to use signals to delete cache on model save.

And to delete a whole load of files with staticgenerator you can do quick_delete('/', MyModel.objects.all())
which would delete / and the absolute urls for all MyModels

I have a half-baked fork of staticgenerator which uses memcache instead of static files, maybe should upload it to github. :)
gravatar
Alex Holt
Hey Will..

Nice article - the way that i solved the admin issue was by having configuration variables for URL patterns that were being cached, so that memcached will never contain cached content for parts of the site that don't need caching: so the code will never cache the admin pages in memcached, and subsequently nginx will always fall back on django to render those pages.

Andreas is quite right, the static content will run much faster than middleware and memcached, however memcached has the added benefit of NOT filling up the hard disk on the server.. which was a major reason that i initially did the memcached middleware - my VPS at the time had no disk space ;)