March 7, 2015 will

Logging Incoming Links with Moya

Google Analytics and kin are great for getting stats on your visitors, but often I simply want to know: who is linking to my site? You can deduce this from web server logs, but server logs tend to be too noisy and a make it hard to pick out the referer URLs.

Moya doesn't have a stats library yet, but it's not hard to MacGuyver up a solution to log incoming links. We need to run some code on every request so that we can detect the referer and write a log message. The simplest way to do that is to create a <url> tag with a wildcard route of “/*”. We can add this <url> to the mountpoint of the site library (the site library is where we customize various aspects of the site). Here's the code:

<url route="/*">
    <log logger="referers">
        incoming link from "${.request.referer}" to "${.request.url}"
    </log>
</url>

Yes, referer is a misspelling, but it has been codified in to the http spec!

So now when you visit any url on the server, it will execute the <log> tag to write the referer (where the link came from) and the current url. We also need to edit prodlogging.ini (production logging) to configure the new logger. This ini file is similar to logging configuration format for the Python logging module (used by Moya under the hood). Moya's syntax is slightly less maddening. Here's what we need to add to prodlogging.ini:

[logger:referers]
level=INFO
handlers=referers_log
propagate=no

[handler:referers_log]
formatter=format_referer
class=logging.FileHandler
args=('/var/log/nginx/referers.log',)

[formatter:format_referer]
format=%(asctime)s %(message)s
datefmt=[%d/%b/%Y %H:%M:%S]

I've used the path “/var/log/nginx/referers.log” because that's where the rest of my logs where going. You may want to edit that.

With this change, Moya will write a line such as the followin to “referers.log”:

[07/Mar/2015 10:33:57] incoming link from "http://notes.moyaproject.com/" to "http://www.moyaproject.com/"

This works well enough, but there is a flaw; if there is no referer (the user didn't arrive via a click) then it will log a referer of None. Since we're not interested in requests with no referer, we can filter them out by adding the following line before the <log>:

<done if="not .request.referer" />

The <done> tag tells moya to stop processing the URL. The if attribute makes the tag conditional, so it stops processing the <url> if there is no referer (and never reaches the <log> tag).

Another flaw that will soon become obvious is that we will get a line in the referer log for links clicked within our site, and not just incoming links. We can filter those out with the following code:

<done if="domain:.request.url == domain:.request.referer"/>

The “domain:” syntax used in the condition is a modifer which extracts the domain from a URL. If the domain for the referer is the same as the domain of the URL being requested, we can deduce that the visitor clicked a link within our site, and skip the log. So the <url> code now looks something like the following:

<url route="/*">
    <done if="not .request.referer" />
    <done if="domain:.request.url == domain:.request.referer"/>
    <log logger="referers">
        incoming link from "${.request.referer}" to "${.request.url}"
    </log>
</url>

It wouldn't be difficult to extend this to write incoming links to the database rather than to a log. Which I may end up implementing; it would be nice to have a simple summary of incoming links somewhere in the admin site. Probably nothing more complex than that–I wouldn't want to try and re-implement Google Analytics!