Saving processes and threads in a WSGI server with Moya

March 14th, 2015

I have a webserver with 3 WSGI applications running on different domains (1, 2, 3). All deployed with a combination of Gunicorn and NGINX. A combination that works really well, but there are two annoyances that are only going to get worse the more sites I deploy:

A) The configuration for each server resides in a different location on the filesystem, so I have to recall & type a long path to edit settings.

B) More significantly, each server adds extra resource requirements. I follow the advice of running each WSGI application with (2 * number_of_cores + 1) processes, each with 8 threads. The threads may be overkill, but that ensures that the server can use all available capacity to handle dynamic requests. On my 4 core server, that's 9 processes, 72 threads per site. Or 27 processes, and 216 threads for the 3 sites. Clearly that's not scalable if I want to host more web applications on one server.

A new feature recently added to Moya fixes both those problems. Rather than deploy a WSGI application for each site, Moya can now optionally create a single WSGI application that serves many sites. With this new system, configuration is read from /etc/moya/, which contains a directory structure like this:

|-- logging.ini
|-- moya.conf
|-- sites-available
| |-- moyapi.ini
| |-- moyaproject.ini
| `-- notes.ini
`-- sites-enabled
|-- moyapi.ini
|-- moyaproject.ini
`-- notes.ini

At the top level is “moya.conf” which contains a few server-wide settings, and “logging.ini” which contains logging settings. The directories “sites-available” and “sites-enabled” work like Apache and NGINX servers; settings for each site are read from “sites-enabled”, which contains symlinks to files in “sites-available”.

Gunicorn (or any other wsgi server) can run these sites with a single instance by specifying the WSGI module as “moya.service:application”. This application object loads the sites from “sites-available” and is responsible for dispatching requests based on domains specified in the INI files.

Because all sites now go through a single Gunicorn instance, requests are shared amongst one optimal pool of processes / threads. This keeps the memory footprint low and negates the need to allocate resources based on traffic.

This new multi-server system is somewhat experimental, and hasn't been documented. But since I believe in eating my own dog-food, it has been live now for a whole hour–with no problems.

 

Logging Incoming Links with Moya

March 7th, 2015

Google Analytics and kin are great for getting stats on your visitors, but often I simply want to know: who is linking to my site? You can deduce this from web server logs, but server logs tend to be too noisy and a make it hard to pick out the referer URLs.

Moya doesn't have a stats library yet, but it's not hard to MacGuyver up a solution to log incoming links. We need to run some code on every request so that we can detect the referer and write a log message. The simplest way to do that is to create a <url> tag with a wildcard route of “/*”. We can add this <url> to the mountpoint of the site library (the site library is where we customize various aspects of the site). Here's the code:

<url route="/*">
    <log logger="referers">
        incoming link from "${.request.referer}" to "${.request.url}"
    </log>
</url>

Yes, referer is a misspelling, but it has been codified in to the http spec!

So now when you visit any url on the server, it will execute the <log> tag to write the referer (where the link came from) and the current url. We also need to edit prodlogging.ini (production logging) to configure the new logger. This ini file is similar to logging configuration format for the Python logging module (used by Moya under the hood). Moya's syntax is slightly less maddening. Here's what we need to add to prodlogging.ini:

[logger:referers]
level=INFO
handlers=referers_log
propagate=no

[handler:referers_log]
formatter=format_referer
class=logging.FileHandler
args=('/var/log/nginx/referers.log',)

[formatter:format_referer]
format=%(asctime)s %(message)s
datefmt=[%d/%b/%Y %H:%M:%S]

I've used the path “/var/log/nginx/referers.log” because that's where the rest of my logs where going. You may want to edit that.

With this change, Moya will write a line such as the followin to “referers.log”:

[07/Mar/2015 10:33:57] incoming link from "http://notes.moyaproject.com/" to "http://www.moyaproject.com/"

This works well enough, but there is a flaw; if there is no referer (the user didn't arrive via a click) then it will log a referer of None. Since we're not interested in requests with no referer, we can filter them out by adding the following line before the <log>:

<done if="not .request.referer" />

The <done> tag tells moya to stop processing the URL. The if attribute makes the tag conditional, so it stops processing the <url> if there is no referer (and never reaches the <log> tag).

Another flaw that will soon become obvious is that we will get a line in the referer log for links clicked within our site, and not just incoming links. We can filter those out with the following code:

<done if="domain:.request.url == domain:.request.referer"/>

The “domain:” syntax used in the condition is a modifer which extracts the domain from a URL. If the domain for the referer is the same as the domain of the URL being requested, we can deduce that the visitor clicked a link within our site, and skip the log. So the <url> code now looks something like the following:

<url route="/*">
    <done if="not .request.referer" />
    <done if="domain:.request.url == domain:.request.referer"/>
    <log logger="referers">
        incoming link from "${.request.referer}" to "${.request.url}"
    </log>
</url>

It wouldn't be difficult to extend this to write incoming links to the database rather than to a log. Which I may end up implementing; it would be nice to have a simple summary of incoming links somewhere in the admin site. Probably nothing more complex than that–I wouldn't want to try and re-implement Google Analytics!

 

Sublime Text like fuzzy matching in Javascript

March 5th, 2015

I recently implemented a Sublime Text like fuzzy matching for my encrypted notes app. Fuzzy matching is a really nice feature that I haven't seen used outside of code editors.

If you haven't used Sublime Text, the fuzzy matching is used to quickly open files. Rather than navigate directories in the UI – which can laborious – the open file dialogue uses the characters you type to filter a list of paths. Each character you type must match a character in the file path exactly once and and in the same order as they appear in the path. For instance the search “abgvi” would match “/application/blog/views”, as would “blgview”. The basic idea should work with any text, not just paths.

I fully expect a real Javascript programmer to do this in two lines (I'm a Python guy that has been faking Javascript proficiency for years).

My first thought in implementing this was regular expressions, but as well as matching I also wanted to highlight the matched characters in the text. That proved harder to do with a regular expression. Probably not impossible, but I'll be honest with you; I gave up.

Turns out a non-regex solution is simple enough, and plenty fast. Here it is:

function fuzzy_match(text, search)
{
    /*
    Parameter text is a title, search is the user's search
    */
    // remove spaces, lower case the search so the search
    // is case insensitive
    var search = search.replace(/\ /g, '').toLowerCase();
    var tokens = [];
    var search_position = 0;

    // Go through each character in the text
    for (var n=0; n<text.length; n++)
    {
        var text_char = text[n];
        // if we match a character in the search, highlight it
        if(search_position < search.length &&
          text_char.toLowerCase() == search[search_position])
        {
            text_char = '<b>' + text_char + '</b>';
            search_position += 1;
        }
        tokens.push(text_char);
    }
    // If are characters remaining in the search text,
    // return an empty string to indicate no match
    if (search_position != search.length)
    {
        return '';
    }
    return tokens.join('');
}

This function compares a string with the fuzzy search query. If it matches, it will return the text with the matched characters wrapped in <b> tags, otherwise it returns an empty string.

I put together a demo that gets a list of links from Reddit and gives you a text box to do the fuzzy matching:

http://www.willmcgugan.com/files/fuzzymatch.html

View the source if you want to know more, there are some helpful comments.

 

New Encrypted Notes Web Application

March 4th, 2015

The last two weekends I scratched a web development itch. I've been using KeepNote to store notes for years now. It's a nice simple desktop app, which I use to store details such as past addresses, account numbers, phone numbers etc. And more sensitive information like PIN numbers and passwords.

I configured KeepNote to store notes in Dropbox so that I don't risk losing anything. This has worked quite well, but I've always been frustrated that (a) I can't access my notes on my mobiles devices, and (b) I'm relying on a third party to keep my secrets.

My answer to this is a web application that stores notes on a server, but does the encryption in the browser (i.e. with Javascript). That way, there is no need to trust the provider. I'm not the first person to think of this (try having an original idea these days), there are some pretty good implementation of this idea. But I wanted something that is self-hosting, i.e. I could install on my own server, and I had some ideas about how the user interface should work.

In particular, I wanted to implement Sublime Text's fuzzy search. Essentially this allows you to filter the list of notes with a few key presses. For instance, if I have a note entitled ‘Social Security Number’, I can find it by typing ‘SSN’ or ‘SocSecNum’.

The site is currently live. Feel free to create a new encrypted notebook, but be aware that it is just for testing. Please don't use this to store the PIN number for your safety deposit box, or missile launch codes just yet. I will likely wipe the DB at some point.

I've create an test notebook which you can play with here:

http://notes.moyaproject.com/~will

Passphrase is: “where there is a will”

To be honest, I'm not sure what would happen if more than one person is editing a notebook at a time – but feel free to try it out.

The code is available on GitHub. I'm no cryptography expert, so I would appreciate someone who is to review the code…

 

A simple method for rendering templates with Python

February 15th, 2015

I never intended to write a template system for Moya. Originally, I was going to offer a plugin system to use any template format you wish, with Jinja as the default. Jinja was certainly up to the task; it is blindingly fast, with a comfortable Django-like syntax. But it was never going to work exactly how I wanted it to, and since I don't have to be pragmatic on my hobby projects, I decided to re-invent the wheel. Because otherwise, how do we get better wheels?

The challenge of writing a template language, I discovered, was keeping the code manageable. If you want to make it both flexible and fast, it can quickly descend in to a mass of special cases and compromises. After a few aborted attempts, I worked out a system that was both flexible and reasonable fast. Not as fast as template systems that compile directly in to Python, but not half bad. Moya's template system is about 10-25% faster than Django templates with a similar feature set.

There are a two main steps in rendering a template. First the template needs to be tokenized, i.e. split up in a data structure of text / tags. This part is less interesting I think, because it can be done in advance and cached. The interesting part is the following step that turns that data structure in to HTML output.

This post will explain how Moya renders templates, by implementing a new template system that works the same way.

Let's render the following template:

<h1>Hobbit Index</h1>
<ul>
    {% for hobbit in hobbits %}
    <li{% if hobbit==active %} class="active"{% endif %}>
        {hobbit}
    </li>
    {% endfor %}
</ul>

This somewhat similar to a Django or Moya template. It generates HTML with unordered list of hobbits, one of which has the attribute class="active" on the <li>. You can see there is a loop and conditional in there.

The tokenizer scans the template and generates a hierarchical data structure of text, and tag tokens (markup between {% and %}). Tag tokens consist of a parameters extracted from the tag and children nodes (e.g the tokens between the {% for %} and {% endfor %}).

I'm going to omit the tokenize functionality as an exercise for the reader (sorry, I hate that too). We'll assume that we have implemented the tokenizer, and the end result is a data structure that looks like this:

[
    "<h1>Hobbit Index</h1>",
    "<ul>",
    ForNode(
        {"src": "hobbits", "dst": "hobbit"},
        [
            "<li",
            IfNode(
                {"test": "hobbit==active"},
                [
                    ' class="active"'
                ]
            ),
            ">",
            "{hobbit}",
            "</li>",
        ]
     ),
    "</ul>"
]

Essentially this is a list of strings or nodes, where a node can contain further nested strings and other nodes. A node is defined as a class instance that handles the functionality of a given tag, i.e. IfNode for the {% if %} tag and ForNode for the {% for %} tag.

Nodes have the following trivial base class, which stores the parameters and the list of children:

class Node(object):
    def __init__(self, params, children):
        self.params = params
        self.children = children

Nodes also have an additional method, render, which takes a mapping of the data we want to render (the conext). This method should be a generator, which may yield] one of two things; either strings containing output text or an iterator that yields further nodes. Let's look at the IfNode first:

class IfNode(Node):
    def render(self, context):
        test = eval(self.params['test'], globals(), context)
        if test:
            yield iter(self.children)

The first thing the render method does is to get the test parameter and evaluate it with the data in the context. If the result of that test is truthy, then the render method yields an iterator of it's children. Essentially all this node object does is render its children (i.e. the template code between {% if %} and {% endif %}) if the test passes.

The ForNode is similar, here's the implementation:

class ForNode(Node):

    def render(self, context):
        src = eval(self.params['src'], globals(), context)
        dst = self.params['dst']
        for obj in src:
            context[dst] = obj
            yield iter(self.children)

The ForNode render method iterates over each item in a sequence, and assigns the value to an intermediate variable. It also yields to its children each pass through the loop. So the code inside the {% for %} tag is rendered once per item in the sequence.

Because we are using generators to handle the state for control structures, we can keep the main render loop free from such logic. This makes the code that renders the template trivially easy to follow:

def render(template, **context):
    output = []
    stack = [iter(template)]

    while stack:
        node = stack.pop()
        if isinstance(node, basestring):
            output.append(node.format(**context))
        elif isinstance(node, Node):
            stack.append(node.render(context))
        else:
            new_node = next(node, None)
            if new_node is not None:
                stack.append(node)
                stack.append(new_node)
    return "".join(output)

The render loop manages a stack of iterators, initialized to the template data structure. Each pass through the loop it pops an item off the stack. If that item is a string, it performs a string format operation with the context data. If the item is a Node, it calls the render method and pushes the generator back on to the stack. When the stack item is an iterator (such as a generator created by Node.render) it gets one value from the iterator and pushes it back on to the stack, or discards it if is empty.

In essence, the inner loop is running the generators and collecting the output. A more naive approach might have the render methods also rendering their children and returning the result as a string. Using generators frees the nodes from having to build strings. Generators also makes error reporting much easier, because exceptions won't be obscured by deeply nested render methods. Consider a node throwing an exception inside a for loop; if ForNode.render was responsible for rendering its children, it would also have to trap and report such errors. The generator system makes error reporting simpler, and confines it to one place.

There is a very similar loop at the heart of Moya's template system. I suspect the main reason that Moya templates are moderately faster than Django's is due to this lean inner loop. See this GutHub gist for the code from this post. You may also find Moya's template implementation interesting.

 

Long Time No See

February 10th, 2015

The feed for this Blog has been broken for a while. I knew it was down; somebody was kind enough to alert me on twitter. I've been procrastinating fixing it because a) the Django code I wrote to power this blog is old and crusty, and b) I've been busy with another project. I only got around to fixing it now, because I have something to announce. Stay tuned.

Can't believe the last time I posted a blog was 2013. Where does time go.

 

Hiring a Python web application developer

November 26th, 2013

My client is looking to hire a new Python developer, initially for an 8 month contract. It's a home working position, we communicate mostly via Skype / email / gtalk etc. Although we do meet up in meatspace from time to time, so ideally a candidate would be in the London / Oxford area.

You will be working with your truly. The projects I've been working on are in the server side of web-enabled devices. The web interface is written in Django, so you'll need the usual battery of front-end technologies; HTML, CSS, Javascript etc. We have a Twistd server which communicates with devices in the field, that my client produces. In the middle we have dynamic user interface generation from XML.

So there is some genuinely interesting technology there, and more such projects planned. We need someone who is a good problem solver with a general interest in web technologies. There's also the occasionally need work with data at the bits and bytes level, so a working knowledge of C that would be a plus.

See the Careers page on wildfoundry.com for the full details.

 

Finding the first bit set with Python

November 25th, 2013

Here's a Python gotcha that I spent some time tracking down. I'm writing it up in the spirit of saving developers a debugging headache in the future.

I had an integer with a single bit set, and I wanted to find the index of that bit. For example, 4 in binary is 00000100. The 1 is at the third position from the right, which should give an index of 2 – since the first position is 0.

You can do this in two ways; either check each bit in turn until you find a 1, or you can use math as a shortcut. I chose the math solution:

>>> import math
>>> myint = 4
>>> int(math.log(myint, 2))
2

Simple right? Finally staying awake in high school maths paid off. So simple that it was the last bit of code I suspected to be broken (spoiler: it was).

This is Python 2.7 which still has two types of integer; type int and arbitrary long integer type long. I was testing with ints because that's what you get when you type 4. However the numbers I was getting out of the Django db where longs. Look what happens with the above code when you use 4L rather than 4:

>>> myint = 4L
>>> int(math.log(myint, 2))
1

And that was the result of my headache. Longs and ints are generally interchangeable. But not in this case. math.log gives a different result with long and ints. Which we can see here.

>>> math.log(4, 2)
2.0
>>> math.log(4L, 2)
1.9999999999999998

That tiniest of rounding errors for the long version would be insignificant for most applications, but not if you are converting to an integer and discarding the fractional part. The fix is simple. Round the return value of math.log to the nearest whole.

>>> int(round(math.log(4L, 2)))
2

If you are working with Python 3, this problem goes away. Another reason to migrate if you have the option!

 

Resurrecting the blink tag

August 7th, 2013

After today's news that the <blink> tag will be deprecated in Firefox, I decided to re-implement it in html5 / CSS3 (no Javascript required). Now it's all modern again, you are free to use <blink> liberally in your web application.

<!DOCTYPE HTML>
<html>
<head>
<style type="text/css">
blink
{
animation:blink 1s;
animation-iteration-count: infinite;
-webkit-animation:blink 1s;
-webkit-animation-iteration-count: infinite;
}
@keyframes blink
{
0%{opacity:0.0;}
50%{opacity:0.0;}
50.01%{opacity:1.0;}
100%{opacity:1.0;}
}
@-webkit-keyframes blink
{
0%{opacity:0.0;}
50%{opacity:0.0;}
50.01%{opacity:1.0;}
100%{opacity:1.0;}
}
</style>
</head>
<body>

The <blink>Blink Tag</blink>, re-implemented!

</body>
</html>

Be sure to give attribution for the above code, so your users will know where to go to thank me…

Edit: Just to demonstrate that it works!

Inspect the code if you don't believe me...

 

Instant Pygame for Python Game Development How-to

June 22nd, 2013
Instant Pygame for Python Game Development How-to

Instant Pygame for Python Game Development How-to

Packt Publishing have released Instant Pygame for Python Game Development How-to, a guide to getting started with PyGame, written by Ivan Idris. This title will help you get over the initial hurdles in setting up a PyGame environment and developing your own games.

I was the technical reviewer for this book.

 

My Tweets

Will McGugan

My name is Will McGugan. I am an unabashed geek, an author, a hacker and a Python expert – amongst other things!

Search for Posts
Popular Tags
 
Archives
2015
 
Recent Comments
Venelin, this wouldn't work with virtualenvs. Don't think you have much options there, except to allocate resources like you suggested.
This looks very nice, bit I have one question. What about virtual environments? If every site use different virtual environment ...
great app!! this is such a secure app to store your information..but its having disadvantage ,if you lose your phone ...
Read this article, I'm just beginning Python, and wondered if people could be employed if they knew python well, but ...
Thanks for this! It really helped me to properly implement the camera in my pygame-opengl project :) Cheers!
 
© 2008 Will McGugan.

A technoblog blog, design by Will McGugan