Rotonym, what the heck is a rotonym?

January 22nd, 2008

Rotonyms are words that ROT-13 in to other words. I discovered the term from this site, which I found during my morning Reddit session. I think the author invented the term, because the only reference I found to 'rotonym' came from his site. Long story short, I knocked out a Python script to find all the rotonyms, given a list of words (just for the heck of it). I like problems like this. Its like the kind of work you had to do in high school and uni, before you enter the real word of programming and discover that most problems have blurry definitions and its hard to tell if your solution actualy works in all cases. Here's the code, let me know if you have a better solution.

words = [line.rstrip() for line in open("WORD.LST")]
words_set = set(words)

letters = "abcdefghijklmnopqrstuvwxyz"
repl = dict(zip(letters, letters[13:]+letters[:13]))

def rot13(word):
    return "".join(repl.get(c, c) for c in word)

found = []

for word in words:
    rotword = rot13(word)
    if rotword in words_set:
        found.append( (word, rotword) )
        words_set.discard(word)

found.sort( key=lambda words:(len(words[0]), words[0]) )
for word, rotword in found:
    print "%s\\t%s"%(word, rotword)

If you want to run it, you will need this word list. Its not the most interesting code you will see, but I was impressed at how much functionality Python squeezes in to each line, without making it too obfusticated. It would be interesting to see this solved in other languages for comparison.

Update: fs111 reminded me that there was a rot13 string encoding in Python. So the code becomes:

words = [line.rstrip() for line in open("WORD.LST")]
words_set = set(words)

found = []

for word in words:
    rotword = word.encode("rot13")
    if rotword in words_set:
        found.append( (word, rotword) )
        words_set.discard(word)

found.sort( key=lambda words:(len(words[0]), words[0]) )
for word, rotword in found:
    print "%s\\t%s"%(word, rotword)
 

Netstring theory

January 19th, 2008

I have create a google code project for my Python netstring module.

So what is a netstring? A netstring is a way of encoding strings of data in a file or network stream. The classic way of doing this is to terminate the string with a special character, such as carriage return, line-feed or a null byte. But this means that when reading the encoded data you have to check every character in the stream to see if it is the terminator character -- which can be inefficient. It also makes it impossible to encode a string that contains the terminator character, because it will be incorrectly interpreted as the end of the string. Netstrings solve both these problems by encoding the size of the string up-front.

A string encoded as a netstring consists of the length of the string in ASCII, followed by a colon, the string itself and a comma character. For example, here is my name encoded as two netstrings:

4:Will,7:McGugan,

It's a very simple protocol, but it can simplify writing file formats (Not everything need be XML!) and encoding network streams. For the official documentation on netstrings see http://cr.yp.to/proto/netstrings.txt.

You can install the netstring module with following command:

easy_install netstring

You can individual encode netstrings with the netstring.encode function, but it will probably be simpler to use the netstring.FileEncoder class which takes a writable file-like object as the only parameter, then call the write method to encode and write netstrings to the file. For network streams I would suggest writing a small proxy class that implements a write method that calls the sockets send method.

Decoding the netstring file / stream is done with the netstring.Decoder class. Simply create a Decoder object then call its feed method with data from the stream. The feed method is a generator that yields strings as they are decoded. Note that the data you feed to the decoder need not be whole netstrings, it could be a portion of a netstring or a group of netstrings -- the decoder will buffer the data until it has encoded a full string, or strings. This means that feed may yield zero or mode strings, depending on the data you give it.

The following pseudo-code shows how you might use netstrings as the basis of a network stream (which is the purpose they were intended for).

import netstring

decoder = netstring.Decoder()

# Assumes that 'sock' is a previously opened socket
while True:
    data = sock.recv(1024)
    if not data:
        break
    for packet in decoder.feed(data):
        handle_packet(packet)

The license is Public Domain, even though Google Code claims otherwise (there was no setting for PD).

 
Search for Posts
2013
 
2012
 
2011
 
2010
 
2009
 
2008
 
2007
 
2006
 
 
© 2008 Will McGugan.

A technoblog blog, design by Will McGugan