Netstring theory
Saturday, January 19th, 2008I have create a google code project for my Python netstring module.
So what is a netstring? A netstring is a way of encoding strings of data in a file or network stream. The classic way of doing this is to terminate the string with a special character, such as carriage return, line-feed or a null byte. But this means that when reading the encoded data you have to check every character in the stream to see if it is the terminator character — which can be inefficient. It also makes it impossible to encode a string that contains the terminator character, because it will be incorrectly interpreted as the end of the string. Netstrings solve both these problems by encoding the size of the string up-front.
A string encoded as a netstring consists of the length of the string in ASCII, followed by a colon, the string itself and a comma character. For example, here is my name encoded as two netstrings:
4:Will,7:McGugan,
It’s a very simple protocol, but it can simplify writing file formats (Not everything need be XML!) and encoding network streams. For the official documentation on netstrings see http://cr.yp.to/proto/netstrings.txt.
You can install the netstring module with following command:
easy_install netstring
You can individual encode netstrings with the netstring.encode function, but it will probably be simpler to use the netstring.FileEncoder class which takes a writable file-like object as the only parameter, then call the write method to encode and write netstrings to the file. For network streams I would suggest writing a small proxy class that implements a write method that calls the sockets send method.
Decoding the netstring file / stream is done with the netstring.Decoder class. Simply create a Decoder object then call its feed method with data from the stream. The feed method is a generator that yields strings as they are decoded. Note that the data you feed to the decoder need not be whole netstrings, it could be a portion of a netstring or a group of netstrings — the decoder will buffer the data until it has encoded a full string, or strings. This means that feed may yield zero or mode strings, depending on the data you give it.
The following pseudo-code shows how you might use netstrings as the basis of a network stream (which is the purpose they were intended for).
import netstring decoder = netstring.Decoder() # Assumes that 'sock' is a previously opened socket while True: data = sock.recv(1024) if not data: break for packet in decoder.feed(data): handle_packet(packet)
The license is Public Domain, even though Google Code claims otherwise (there was no setting for PD).