November 1, 2008 will

New version of Postmarkup

I've release a new version of Postmarkup, my bbcode rendering engine for Python. If you are not familiar with bbcode, it is a simple markup used by many message boards. For example [b]Hello, World![/b] would render Hello, World!

There are a number of bugfixes in the 1.1.4 release, mostly to fix the possibility of HTML injection by manipulation of the tags and attributes. The link tag was particularly problematic for this, so it has been re-written. I've also made a number of optimizations so that it will render HTML faster. It wasn't exactly slow, but I have noticed that most people use Postmarkup as a filter in web frameworks (rather than storing the pre-rendered HTML in a database), so the speed boost may be appreciated.

I've also added the option to turn new lines in to paragraphs rather than inserting break tags. Break tags are a little more literal, in that the bbcode author will get what they expect in the output when they hit the return key, but paragraph tags make for more elegant markup that can be styled a little easier. Another difference with paragraph tags is that multiple newlines will result in only a single paragraph.

Another new feature is the ability to run the resulting html through a cleanup filter that removes redundant markup, which can be produced if the bbcode author doesn't explicitly close tags. The markup will still be valid, but it may contain something like <b> </b>, which doesn't do anything useful. Incidentally, I was kind of pleased with the method that does this -- it seemed almost too simple. Here's the code, let me know if you come up with a better way!

# Matches simple blank tags containing only whitespace
    _re_blank_tags = re.compile(r"\< (\w+?)\>\s*\")

    @classmethod
    def cleanup_html(cls, html):
        """Cleans up html. Currently only removes blank tags, i.e. tags containing only
        whitespace. Only applies to tags without attributes. Tag removal is done
        recursively until there are no more blank tags. So <strong><em></em></strong>
        would be completely removed.

        html -- A string containing (X)HTML

        """

        original_html = ''
        while original_html != html:
            original_html = html
            html = cls._re_blank_tags.sub(u"", html)
        return html

Yet another new feature is the ability to retrieve additional information generated when the bbcode is rendered. When the render_to_html method is called it creates a dictionary which the tag classes can use to store any additional data needed when rendering. This dictionary was discarded after rendering, but now the interface allows for an alternative dictionary to be supplied so that it can be accessed after rendering. This could be used to create tags that supply meta information and don't contribute to the resulting HTML. For instance, if this blog post was using postmarkup, it might be nice to do something like [tags] python, postmarkup, code, tech [/tags] or [template] halloween.html [/template].

Postmarkup is licensed under my politeware license, which allows you to do anything at all you want with it, as long as you say thanks.

Use Markdown for formatting
*Italic* **Bold** `inline code` Links to [Google](http://www.google.com) > This is a quote > ```python import this ```
your comment will be previewed here
gravatar
Jason Peddle

This is perfection, and seemingly braindead to extend. Thank you.

gravatar
Anatoliy

I use this library within my websites and I have one feature request:
Making a "br" tag from single newline symbol and paragraph from two (or more) symbols.

What do you think of it? I'd even like to join the project for making few features :)

gravatar
Automatthias

Are you planning to add support for smilies that are present in phpbb databases?

An example:

<!-- s:D --><img src="{SMILIES_PATH}/icon_biggrin.gif" alt=":D" title="Very Happy" /><!-- s:D -->

gravatar
Ralph Corderoy

The code is getting mangled. The definition of _re_blank_tags looks wrong; There's a backslash escaping the closing double quote. And how can it be a blank tag if it has \w+ in it?

gravatar
Brandon Thomson

Thanks, I am adapting this library to use it on a new website I'm working on at www.cafesurvey.com. I use TinyMCE so people who don't know HTML can edit fields but this will be useful for fields which are too small for the full-up TinyMCE editor.

gravatar
Ahlywog
@ Anatoliy

But this is, hopefully, what you're looking for. His library is extensive enough that it provide the features necessary to do most anything.

 add_tag(LineBreakTag, 'br')
class LineBreakTag(TagBase): # Ahlywog Contribution
    
    """A tag used to include line breaks in BB code. """
    def __init__(self, name):
        TagBase.__init__(self, name, inline=True)
        
    def render_open(self, parser, node_index):
        return u"<br />"

Here is another I added for my own use; It allows you to call a function from within the open tag and pass it whatever is in the contents between the tag.

So info => MyFunction(info)

 add_tag(FunctionReturnTag, 'fr')
class FunctionReturnTag(TagBase): # Ahlywog Contribution
    """This tag allows you to specify a function in the params then the info to be passed to that function between the tags. """
    """All information sent to the function will be in the form of a tuple. """
    def __init__(self, name):
        TagBase.__init__(self, name, inline=True)
    def render_open(self, parser, node_index):
        output = u""
        if self.params:
            if self.get_contents(parser):
                self.skip_contents(parser)
                if self.params.strip() in dir(sys.modules['__main__']):
                    function = getattr(sys.modules['__main__'], self.params.strip())
                    args = self.get_contents(parser).strip().split(',')
                    output = function(args)
        return output

It's a little dirty but it works for now.
gravatar
kentona
I am trying to implement table, tr and td tags using this parser, but I am running into issues of multiple <br/> breaks being inserted for newlines.

My attempts to make use of strip_first_newline and the begin_no_breaks and end_no_breaks failed.

How can I prevent the rendering of breaks inside the <table></table> tags?

Here is what I have going now:

 class TableTag(TagBase):
    
    def __init__(self, name, **kwargs):
        TagBase.__init__(self, name, strip_first_newline=True)
        
    def open(self, parser, params, open_pos, node_index):
        TagBase.open(self, parser, params, open_pos, node_index)
    def close(self, parser, close_pos, node_index):
        TagBase.close(self, parser, close_pos, node_index)
        
    def render_open(self, parser, node_index, **kwargs):
        return u'<table>'
    def render_close(self, parser, node_index):
        return u'</table>'
class TRTag(TagBase):
    
    def __init__(self, name, **kwargs):
        TagBase.__init__(self, name, strip_first_newline=True)
    def render_open(self, parser, node_index, **kwargs):
        return u'<tr>'
    def render_close(self, parser, node_index):
        return u'</tr>'
class TDTag(TagBase):
    
    def __init__(self, name, **kwargs):
        TagBase.__init__(self, name, strip_first_newline=True)
    def render_open(self, parser, node_index, **kwargs):
        return u'<td>'
    def render_close(self, parser, node_index):
        return u'</td>'

A sample bbcode table will render as:
<br/><table><tr><td>Buckler</td><td>6</td><td></td><td>So, Fr, Th, Mg, Dr, Pa</td></tr>
<br/><tr><td>Kite Shield</td><td>19</td><td></td><td>So, Fr, Dr, Pa</td></tr>
<br/><tr><td>Tower Shield</td><td>30</td><td></td><td>So</td></tr>
<br/><tr><td>Spiked Shield</td><td>15</td><td>+15 STR</td><td>So, Fr, Pa</td></tr>
<br/></table>
gravatar
Will McGugan
Ketona,

I think you can call parser.begin_no_breaks() from the ‘open’ mehtod and parser.end_no_breaks() from the ‘close’ method.

Will
gravatar
kentona
Thanks Will! That worked!
gravatar
Dan Watson
Just wanted to say thanks for sharing your module. I got several good years out of it. I've since written my own parser, if you're interested in having a look:

https://bitbucket.org/dcwatson/bbcode