Kitten Captcha Needs to be Mashed With Flickr

April 7, 2006

I hate comment spam. I hate captchas only slightly less. Since optical character recognition is continually improving, captchas become ever more inscrutable to the average person as time progresses. Now, perhaps we might finally break out of captcha hell with Kitten Auth (via Digg).

The idea is simple. Instead of struggling to read twisted non-words, the user must recognize and select several pictures of kittens that have been mixed in with other cute animals. The idea is clever because it takes advantage of the huge gap in image recognition between humans and computers, making this menial task a rather effective Turing test.

Both the author and many commentors point out the necessity of a large corpus of kitten and puppy pictures. Such a problem could easily be solved by integrating with Flickr. For example, in the 9x9 grid in the example, you could pick three pictures with the kitten tag and six photos with a puppy tag. With 49,931 and 68,243 photos respectively, you have a pretty good corpus.

Of course, since you don’t control the corpus, there is always the danger of a photo with no easily visible kitten. This can be alleviated by displaying four “kitten” photos but requiring correct identification of only three. The “fudge factor” could be increased to any level deemed appropriate to the corpus, although doing so increases the chances a spambot could get through. It’s a trade-off with grid size, too.

The beauty of integrating with Flickr tags is that you basically get unlimited permutations on Kitten Captcha for free. Some commentors mentioned a Fruit Captcha - it’s easily done with the tags apple,fruit, orange,fruit, etc. Or imagine a Planes, Trains, and Automobiles Captcha. The possibilities are endless!

It seems likely that eventually the images would have to be proxied rather than serve them directly from Flickr. Otherwise, spammers could simply use the Flickr API to download the tags for the displayed photos and determine which ones are correct. That could be accomplished either by attempting to parse the page for the name of the goal picture and comparing it to the tags, or by analyzing the distribution of tags. (As an example of the last case, imagine the goal of “kitten.” That means the “kitten” tag will be on four images, while “puppy” will be on five. Thus, “puppy” is the distraction, and “kitten” is the right answer.) Proxying the images deprives the spammer of the only information tying the picture back to Flickr, thus depriving him of the tag information necessary to defeat the system other than by brute force.

So - anybody want to implement it for Drupal?

Brian Vargas