An ETag is a feature of HTTP that allows for a web server to know if content has changed since the last time the browser visited the page. The client sends the ETag from the cached page in a header. If the ETag in the header matches the current ETag then the server lets the browser know that the cached is up-to-date by sending back a 304 Not Modified response.

The most natural way to build an ETag is to generate it from the HTML returned by the view, which I believe is how the default view caching works in Django. The downside of this is that the page is generated even if the client has a cached copy, and all that is saved is the cost of sending the page to the client.

Bigger wins can be had by using Django's conditional view processing to calculate an ETag outside of the view. I haven't seen the requirements documented, but as far as I can tell there is only a single property needed in an ETag:

  • The ETag should vary with the page, i.e. when the page content changes, the ETag chages.

A simple alternative to generating an ETag from the page content is to store a version number on the model, or models, that are used to generate the page. If this version number is incremented each time the page content changes, then the version number itself can be used as the ETag.

If you want to keep a version number for a Django model, store it as an IntegerField and increment it in the model's save method, or hook up a post_save signal handler and do it there.

Keeping a version number is easy to implement, but it has the disadvantage that you have to access the database each time the view is accessed. It would be nicer still, if the ETag could be generated without even that single DB query.

I have experimented with a simple method of doing this with Django's caching system. My ETags are random strings stored in the cache with a key that is created from the parameters to the view. When the DB object changes, the existing ETag is replaced with a freshly generated one.

Using a random string, unconnected with the model, may seem counter-intuitive, but the contents of the ETag are unimportant as long as it varys with the page.

Here's some example code taken from my current project.

def get_etag_key(username, desktop_slug):
    etag_key = "desktopetag.%s.%s" % (username, desktop_slug)
    return etag_key

def get_etag(request, username, desktop_slug):
    """ Create an etag for the a given deskop.
    The etag itself its stored in the cache and is a random identifier.
    The cached etag is changed when the desktop changes, so it is always unique.
    """
    etag_key = get_etag_key(username, desktop_slug)
    etag = cache.get(etag_key, None)
    return etag


@etag(get_etag)
def desktop_view(request, username, desktop_slug):
    # An expensive view

The view desktop_view is a typical Django view, decorated with the etag decorator which simply calls the get_etag function to pluck the ETag (if it exists) from the cache – a very fast operation, particularly if memcached is deployed.

The other part of the system is the code that is called when the DB object is changed:

etag_key = get_etag_key(username, desktop_slug)
cache.set(etag_key, str(random.random()))

The above code simply generates a random float and converts it to a string, which serves as a perfectly good ETag. If you are paranoid (and every good engineer is), you could also append the current time in milliseconds to avoid the possibility of re-generating the same random number.

etag_key = get_etag_key(username, desktop_slug)
cache.set(etag_key, str(random.random())+str(time.time()))

So far, it seems like a pretty good system with negligible overhead, although I haven't yet used it in a production environment. There is a downside of course; if you are using an in-memory cache, like memcached, then your ETags will be lost when the server is power cycled – and subsequent pages will regenerated even if there is a cached copy. As always YMMV.

This blog post was posted to It's All Geek to Me on Monday July 20th, 2009 at 9:08PM
 

4 Responses to "ETag magic with Django"

  • July 21st, 2009, 8:17 a.m.

    Hi.

    Nice post. But Etag is not a “feature of HTML” instead is a feature of the mighty HTTP protocol ;-)

  • July 21st, 2009, 9:08 a.m.

    Alex, you are absolutely correct, of course! Corrected.

  • bob
    July 29th, 2009, 4:11 a.m.

    I thought Etag was HTML too.

    http://www.djangoproject.com/ [djangoproject.com]

  • Reesun Huang
    April 15th, 2010, 6:33 a.m.

    Hi, Will:

    I'm a green hand.

    I build a proxy server with nginx, whenever a require reach it will search memcached to find the file, e.g. “www.gif”, but there isn't etag in header.

    Look forward your reply and thank you very muck!

    reesun

Leave a Comment

You can use bbcode in the comment: e.g. [b]This is bold[/b], [url]http://www.willmcgugan.com[/url], [code python]import this[/code]
Preview Posting...
Previewing comment, please wait a moment...
Will McGugan

My name is Will McGugan. I am an unabashed geek, an author, a hacker and a Python expert – amongst other things!

You are reading my tech blog. See the homepage for my other blogs.

Search for Posts
Possibly related posts
Tags
Popular Tags
 
Archives
2013
 
Recent Comments
Sorry for the double comment my browser is very slow.
Hi Will I get the following error when i try to run simpleopengl.py. Traceback (most recent call last): File firstopengl.py, ...
Hi Will I get the following error when i try to run simpleopengl.py. Traceback (most recent call last): File firstopengl.py, ...
Men, you are awesome. You are so great. It works so fucking good and it was so easy to install. ...
Men, you are awesome. You are so great. It works so fucking good and it was so easy to install. ...
 
© 2008 Will McGugan.

A technoblog blog, design by Will McGugan