← Home

Inverse Captcha Anti-Comment-Spam Technique: Now A Regular Mephisto Plugin

The inital experiment

When I first came across Damien's description of the "Negative Captcha" technique I wanted to give it a test-drive. I decided to do the "simplest thing that could possibly work" and happened to implement a fully-functional, highly efficient anti-spam outer floodgate mechanism in two super-simple steps:

  1. I hid the email form input element through CSS from the real users eyes. (Additional I added a warning message in case a user would have disabled CSS that would instruct him to not enter an email address.)
  2. In MephistoController#dispatch_comments I checked whether a stupid bot had filled in an email address and if so, kicked him.

That worked surprisingly well.

I had expected that there would have been at least some Watir- or whatever-kind-of-cool-engine driven bots that would correctly interpret the CSS directive and thus not post back any email addresses. (I'm pretty confident that these bots would be picked up by Mephistos build-in Akismet support, so that's what I mean by the "outer" and "inner floodgate" metaphor.) Not at all. Nothing. Nada. This super-simple technique actually blocked all of the comment-spam from my blog!

Obviously a major drawback of my implementation was that I discarded the opportunity to enter an email address. But knowing your users email addess sometimes turns out to be a pretty useful thing when you want to directly get in contact with a user. I clearly wanted to re-allow users to leave me their email address.

Also, after I had tested the technique for a couple of months I decided to revamp this stuff as a plugin so that I wouldn't necessarily need to patch Mephistos codebase in order to get this in.

Make it a regular Mephisto plugin

When you think about extracting things into a plugin you think about how things can be abstracted away from the special case at hand to fit a more general purpose. Also, you naturally try to find some more descriptive or declarative way to illustrate things.

So I sort of invented the concept of undercover-agent-like "sneaky" HTTP parameter that behaves like follows:

  1. It hides its real purpose from bots by obfuscating its original name (of course real users will know its purpose because they don't look at the input field's names but at the HTML labels or descriptions)
  2. It allows for a strawman stand-in fake parameter which bears the parameter's original name, so that bots are lured into filling out this fake parameter (which real users won't see at all because we're hiding it through CSS, just like in the original, simple approach)
  3. It un-hides itself when the HTML form is posted back to the application. It does this in the :before_filter stage, so that the Controller does not need to know anything about what's going on here at all.
  4. It notices when the strawman parameter has been filled in so that it safely can be assumed that a super-stupid-bot™ is trying to drop some garbage - so we're able to kick him.

Well ... I don't know about you but I am probably lacking some creativity to think of an different usecase where this concept of "sneaky parameters" could probably applied. Can you? Tell me :-)

How to use this?

Basically, to use this, you'll have to:

Check out the plugin:

script/plugin install http://svn.artweb-design.de/stuff/mephisto/mephisto_inverse_captcha/

... will work fine.

Tweek your comment form template:

We don't have access to your templates from a Mephisto plugin. So you have to do this yourself.

Somewhere in themes/[site]/[theme]/templates/_comment.liquid make sure that you have a field like this:


<label for="">E-Mail</label>

[...]
<p id="comment-email">
	If you can read this, you don't use a typical webbrowser that plays nice with CSS. <br />
	<strong>Please do not fill in anything here!</strong><br />
	
</p>

And somewhere in your CSS files add a rule to hide the #comment-email part from the user:


#comment-email {
	display: none;
}

This will output HTML form elements like these:


<label for="comment_soyjjncmhaju">E-Mail</label>
<input type="text" id="comment_soyjjncmhaju" name="comment[soyjjncmhaju]" value="" />
[...]
<p id="comment-email">
	If you can read this, you don't use a typical webbrowser that plays nice with CSS. <br />
	<strong>Please do not fill in anything here!</strong><br />
	<input type="text" id="comment_author_email" name="comment[author_email]" value="" />
</p>

Obviously, to real users the form will look just like about any other blog comment form (without any annoying Captcha or logic-puzzle stuff on it though!). They can drop us their email addresses and everything's fine.

A bot on the other hand will see a form field that it does not understand and a regular email field. It thus will fill in the email field and immediately get caught in our before_filter.

Job done :-)

Further suggestions?