Why captchas test humans, not machines

There was an article on slashdot recently discussing Captchas (for those who don’t know what these are, pop over to the Wikipedia article on Captchas and have a quick read). The most common post on the article seemed to be along the lines of stop making the people prove they’re human, and instead make the bots prove they are.

Various solutions were put forward, from the extremely ignorant, to the mostly ignorant. I feel compelled to explain a little of why these various solutions won’t work. To do that, first we will need to discuss how spam bots work, and then the reason why these suggested solutions won’t work.

Spam bots put ads and spam links on blog comments. They do this by having pre-written code that is targeted at certain software. Commonly Wordpress blogs, PHP forums, etc. When someone writes their own custom blog, it doesn’t matter what spam filtering or anti-bot code they put in, it’ll work 99% of the time because no bot has yet been written to deal with their software.

There are some bots out there that have some smarts that allow them to work out basic things, like if they come across a page which has a “Name”, “Email” and “Comment” field, try throwing obvious stuff in there and submitting it. So if you are going to write your own custom software, use odd field names at least.

Ok, so why can’t I write a custom anti-spam solution for, say Wordpress, and publish it on the net and have it work? Well, to begin with it will, until many people start using it, then the spam bot writers will add code to work around my solution.

This means that any solution will only work until the spam-bot writers work out a way around it.

Can we write a solution that can’t be worked around, without using Captchas (or any other human-test)? Not really.

Lets take one solution that almost has merit (its one of the smarter ones):

Have the form have 10 username and 10 email fields, and have javascript code only show one of those fields at random. When a bot comes along, it’ll either fill in one field or all of them, and it’ll be wrong. The server then rejects the field that wasn’t visible. The bots would have to interpret the whole page, and execute the javascript code to find which field was visible.

This is a nice idea, but there is one big problem: How does the server know what field was visible?

  • When the server sends the page out, it could randomly select the field number to be visible and write that in the javascript code it sends. – Nope, because then the bot code would just do a text scan for “showField=N” or whatever javascript field was selected.
  • The field is selected client-side by the javascript, and it tells the server what field it selected? – Nope, because the POST that is sent back would look like this:

    “comment=blahblah&username1=&username2=&username3=Bob&username4=&selected=3″.

    This of course would be easy to make the bot send.

Ok, but what about if we do…..

No, you are missing the point. You are wanting the user to be able to send data (a comment) to your server without doing something that only a user would do. And even the things that only a user would do, can be emulated by a computer. We need to pick a task that computers are not good at. It turns out, vision recognition and audio recognition areĀ  tough tasks for a computer at the moment. Hence the captcha. Sometime in the future it will be possible for machines to see the letters in the captcha as well or better than humans, then that avenue is gone too.

Now after explaining why Captchas are so good, in my next blog I’ll explain why I don’t use Captchas and why you should consider going elsewhere when a company requires you do. Confused? Yup.

Leave a Reply