Contrary to every intuition that I now hold, the back-end of this site is one of the most hideous, monolithic examples of code ever written. I daren't dredge it for particular examples, but I'd guess there are plenty that would befit The Daily WTF. Forget a component-based templating system: almost everything is contained within a single
Therefore, modifying the site to add new capabilities is decidedly non-trivial. In the face of a bandwidth-exhausting onslaught from comment-spammers, I'd love to be able to plug in a Bayesian spam detector, blacklisting functionality, or some such. Truth is, to add something like these would require a herculean effort that I don't have time to put in. Nevertheless, something had to be done.
The approach I've taken is as follows:
- Add a series of mod_rewrite rules to block the most-frequent referrers. For some reason, comment spammers like to use what appear to be spurious referrers when leaving their comments. The URLs often don't lead to any real site, so I'm not sure what they gain from this. Still, I've noticed that this approach has cut my daily bandwidth use by more than half.
- Switch on moderation after a delay. Global moderation was my first stopgap, because the comments were being swamped with adverts. I could then peruse my RSS feed of the comments and mod up the genuine comments (of which there were still relatively few). However, this took the immediacy out of commenting, and I still sometimes missed the signal from amongst all the noise. It was then that I realised the probability of a comment being spam was proportional to the age of a post. Thus I've picked an arbitrary cut-off (currently seven days), before which a post is automatically approved, and after which it enters the moderation queue. The option to close commenting after the cut-off would be lower maintenance, but then we'd lose such witty badinage as evidenced by my post on big, strong boys.
Thusfar, there have been no false negatives, and few false positives, which are to be expected. If you have any more suggestions for improving this (either in terms of accuracy or maintenance-level), do leave a comment.