New version of Postmarkup

November 1st, 2008

I've release a new version of Postmarkup, my bbcode rendering engine for Python. If you are not familiar with bbcode, it is a simple markup used by many message boards. For example [b]Hello, World![/b] would render Hello, World!

There are a number of bugfixes in the 1.1.4 release, mostly to fix the possibility of HTML injection by manipulation of the tags and attributes. The link tag was particularly problematic for this, so it has been re-written. I've also made a number of optimizations so that it will render HTML faster. It wasn't exactly slow, but I have noticed that most people use Postmarkup as a filter in web frameworks (rather than storing the pre-rendered HTML in a database), so the speed boost may be appreciated.

I've also added the option to turn new lines in to paragraphs rather than inserting break tags. Break tags are a little more literal, in that the bbcode author will get what they expect in the output when they hit the return key, but paragraph tags make for more elegant markup that can be styled a little easier. Another difference with paragraph tags is that multiple newlines will result in only a single paragraph.

Another new feature is the ability to run the resulting html through a cleanup filter that removes redundant markup, which can be produced if the bbcode author doesn't explicitly close tags. The markup will still be valid, but it may contain something like <b> </b>, which doesn't do anything useful. Incidentally, I was kind of pleased with the method that does this -- it seemed almost too simple. Here's the code, let me know if you come up with a better way!

# Matches simple blank tags containing only whitespace
    _re_blank_tags = re.compile(r"\< (\w+?)\>\s*\")

    def cleanup_html(cls, html):
        """Cleans up html. Currently only removes blank tags, i.e. tags containing only
        whitespace. Only applies to tags without attributes. Tag removal is done
        recursively until there are no more blank tags. So <strong><em></em></strong>
        would be completely removed.

        html -- A string containing (X)HTML


        original_html = ''
        while original_html != html:
            original_html = html
            html = cls._re_blank_tags.sub(u"", html)
        return html

Yet another new feature is the ability to retrieve additional information generated when the bbcode is rendered. When the render_to_html method is called it creates a dictionary which the tag classes can use to store any additional data needed when rendering. This dictionary was discarded after rendering, but now the interface allows for an alternative dictionary to be supplied so that it can be accessed after rendering. This could be used to create tags that supply meta information and don't contribute to the resulting HTML. For instance, if this blog post was using postmarkup, it might be nice to do something like [tags] python, postmarkup, code, tech [/tags] or [template] halloween.html [/template].

Postmarkup is licensed under my politeware license, which allows you to do anything at all you want with it, as long as you say thanks.

Search for Posts
© 2008 Will McGugan.

A technoblog blog, design by Will McGugan