July 20, 2009 will

ETag magic with Django

An ETag is a feature of HTTP that allows for a web server to know if content has changed since the last time the browser visited the page. The client sends the ETag from the cached page in a header. If the ETag in the header matches the current ETag then the server lets the browser know that the cached is up-to-date by sending back a 304 Not Modified response.

The most natural way to build an ETag is to generate it from the HTML returned by the view, which I believe is how the default view caching works in Django. The downside of this is that the page is generated even if the client has a cached copy, and all that is saved is the cost of sending the page to the client.

Bigger wins can be had by using Django's conditional view processing to calculate an ETag outside of the view. I haven't seen the requirements documented, but as far as I can tell there is only a single property needed in an ETag:

  • The ETag should vary with the page, i.e. when the page content changes, the ETag chages.

A simple alternative to generating an ETag from the page content is to store a version number on the model, or models, that are used to generate the page. If this version number is incremented each time the page content changes, then the version number itself can be used as the ETag.

If you want to keep a version number for a Django model, store it as an IntegerField and increment it in the model's save method, or hook up a post_save signal handler and do it there.

Keeping a version number is easy to implement, but it has the disadvantage that you have to access the database each time the view is accessed. It would be nicer still, if the ETag could be generated without even that single DB query.

I have experimented with a simple method of doing this with Django's caching system. My ETags are random strings stored in the cache with a key that is created from the parameters to the view. When the DB object changes, the existing ETag is replaced with a freshly generated one.

Using a random string, unconnected with the model, may seem counter-intuitive, but the contents of the ETag are unimportant as long as it varys with the page.

Here's some example code taken from my current project.

def get_etag_key(username, desktop_slug):
    etag_key = "desktopetag.%s.%s" % (username, desktop_slug)
    return etag_key

def get_etag(request, username, desktop_slug):
    """ Create an etag for the a given deskop.
    The etag itself its stored in the cache and is a random identifier.
    The cached etag is changed when the desktop changes, so it is always unique.
    etag_key = get_etag_key(username, desktop_slug)
    etag = cache.get(etag_key, None)
    return etag

def desktop_view(request, username, desktop_slug):
    # An expensive view

The view desktop_view is a typical Django view, decorated with the etag decorator which simply calls the get_etag function to pluck the ETag (if it exists) from the cache – a very fast operation, particularly if memcached is deployed.

The other part of the system is the code that is called when the DB object is changed:

etag_key = get_etag_key(username, desktop_slug)
cache.set(etag_key, str(random.random()))

The above code simply generates a random float and converts it to a string, which serves as a perfectly good ETag. If you are paranoid (and every good engineer is), you could also append the current time in milliseconds to avoid the possibility of re-generating the same random number.

etag_key = get_etag_key(username, desktop_slug)
cache.set(etag_key, str(random.random())+str(time.time()))

So far, it seems like a pretty good system with negligible overhead, although I haven't yet used it in a production environment. There is a downside of course; if you are using an in-memory cache, like memcached, then your ETags will be lost when the server is power cycled – and subsequent pages will regenerated even if there is a cached copy. As always YMMV.

Use Markdown for formatting
*Italic* **Bold** `inline code` Links to [Google](http://www.google.com) > This is a quote > ```python import this ```
your comment will be previewed here
Nice post. But Etag is not a “feature of HTML” instead is a feature of the mighty HTTP protocol ;-)
Will McGugan
Alex, you are absolutely correct, of course! Corrected.
I thought Etag was HTML too.
Reesun Huang
Hi, Will:

I'm a green hand.

I build a proxy server with nginx, whenever a require reach it will search memcached to find the file, e.g. “www.gif”, but there isn't etag in header.

Look forward your reply and thank you very muck!