I have create a google code project for my Python netstring module.

So what is a netstring? A netstring is a way of encoding strings of data in a file or network stream. The classic way of doing this is to terminate the string with a special character, such as carriage return, line-feed or a null byte. But this means that when reading the encoded data you have to check every character in the stream to see if it is the terminator character -- which can be inefficient. It also makes it impossible to encode a string that contains the terminator character, because it will be incorrectly interpreted as the end of the string. Netstrings solve both these problems by encoding the size of the string up-front.

A string encoded as a netstring consists of the length of the string in ASCII, followed by a colon, the string itself and a comma character. For example, here is my name encoded as two netstrings:

4:Will,7:McGugan,

It's a very simple protocol, but it can simplify writing file formats (Not everything need be XML!) and encoding network streams. For the official documentation on netstrings see http://cr.yp.to/proto/netstrings.txt.

You can install the netstring module with following command:

easy_install netstring

You can individual encode netstrings with the netstring.encode function, but it will probably be simpler to use the netstring.FileEncoder class which takes a writable file-like object as the only parameter, then call the write method to encode and write netstrings to the file. For network streams I would suggest writing a small proxy class that implements a write method that calls the sockets send method.

Decoding the netstring file / stream is done with the netstring.Decoder class. Simply create a Decoder object then call its feed method with data from the stream. The feed method is a generator that yields strings as they are decoded. Note that the data you feed to the decoder need not be whole netstrings, it could be a portion of a netstring or a group of netstrings -- the decoder will buffer the data until it has encoded a full string, or strings. This means that feed may yield zero or mode strings, depending on the data you give it.

The following pseudo-code shows how you might use netstrings as the basis of a network stream (which is the purpose they were intended for).

import netstring

decoder = netstring.Decoder()

# Assumes that 'sock' is a previously opened socket
while True:
    data = sock.recv(1024)
    if not data:
        break
    for packet in decoder.feed(data):
        handle_packet(packet)

The license is Public Domain, even though Google Code claims otherwise (there was no setting for PD).

This blog post was posted to It's All Geek to Me on Saturday January 19th, 2008 at 5:42PM
 

5 Responses to "Netstring theory"

  • Nick Moffitt
    January 19th, 2008, 8:17 p.m.

    The reason that there is no "Public Domain" option in google is that it's a strictly US-centric concept and it is not clear that even saying "I hereby put this work in the public domain." has the effect that you would expect in all US jurisdictions. In fact, it's possible that in many places this would be seen as an attempt to avoid certain responsibilities of Copyright (I know, I know...) and would revert immediately back to "all rights reserved" (which is the opposite of what you are trying to do).

    Your best bet is to just slog through the broken worldwide copyright system and use a license like this one: http://sam.zoy.org/wtfpl/

    It's hard to find that ambiguous in *any* jurisdiction. If you really want, you can probably disclaim warranty by adding "1. WHATEVER HAPPENS ISN'T MY FAULT."

  • January 20th, 2008, 11:05 p.m.

    You should check out sharesource (http://sharesource.org). They offer both SVN and Mercurial, and a lot more licence types (although I don't see public domain - but you could always request it to be added).

    The library itself looks interesting, and I may look at integrating it into my Django app.

  • January 21st, 2008, 5:13 a.m.

    The colon before the string is necessary as a known end marker for the string length integer. However, once the string begins the absolute length of it is known. Hence, the end of the current string, and thus the beginning of the next string is also known.

    With this being the case, it seems that the terminating comma is unnecessary. Besides just being nicer to look at, I'm thinking you may be using the comma for error checking... though it still feels superfluous.

    Disclaimer: I have no experience with network streams and am just inquiring out of interest.

  • January 21st, 2008, 10:53 a.m.

    Larry, your analysis is correct. The terminating comma is superfluous, its there to conform to the spec. I'm guess it is a form of syntactical sugar for mentaly parsing netstrings. There is some advantage in using it for error checking, so that the decoder can be sure that it is actualy being fed netstrings, and not some other data.

  • February 29th, 2008, 2:34 a.m.

    Find more discussion of netstrings here: http://wiki.tcl.tk/15074

Leave a Comment

You can use bbcode in the comment: e.g. [b]This is bold[/b], [url]http://www.willmcgugan.com[/url], [code python]import this[/code]
Preview Posting...
Previewing comment, please wait a moment...

My Tweets

Will McGugan

My name is Will McGugan. I am an unabashed geek, an author, a hacker and a Python expert – amongst other things!

Search for Posts
Possibly related posts
Tags
Popular Tags
 
Archives
2009
 
Recent Comments
Hi Will, I brought your book from Amazon. I think its a great book. It has given me everything i ...
- P K on A bad review
yeah! red hair looks good!! thanks 4 your help in our project in science under maam b.
you have a nice decision.
@ Anatoliy But this is, hopefully, what you're looking for. His library is extensive enough that it provide the features ...
Thanks for this. I started writing my own and was starting to get a little frustrated when a friend pointed ...
- Ahlywog on BBCode Python Module
 
© 2008 Will McGugan.

A technoblog blog, design by Will McGugan