Even before news that the startup Vicarious has found a way to crack CAPTCHA at least 90% of the time, CAPTCHA was broken. Not only that, it is a tedious and increasingly frustrating task. A UCSD study calculates that the average user spends 14 seconds on a CAPTCHA with an error rate of 10%. We were overdue for a new, better way of combating spam long before Vicarious revealed its hack.
CAPTCHA, which stands for Completely Automated Public Turing test to tell Computers and Humans Apart, is actually a reverse Turing test whose aim is to differentiate machines from humans by finding those ways in which a machine is incapable of thinking like a human. A Turing test, named after famed codecracker, mathematician, and early computer scientist Alan Turing, originally described a test of machine intelligence. For Turing, a machine passed the test and could be considered a thinking entity when it could convince a human that it is human. The idea was originally that computers could read text, but not if the text was in an image.
However, as technology improved and advancements like OCR text recognition technology allowed computers to read text from images, CAPTCHAs had to become more and more complicated to foil machines. From wavy lines and letters to crosshatched or noise-filled images, CAPTCHA had to come up with more ways to be illegible to machines while still remaining accessible to humans.
Ironically, one of the most popular CAPTCHA services, the Google-owned reCAPTCHA, actually uses user submitted responses to read some of the copious numbers of books that Google has digitized. In other words, this CAPTCHA service may be helping text-in-image recognition even as it uses text-in-image recognition to confound machines.
The other problem, of course, is that as computers get better at reading text in images, CAPTCHA images become increasingly inscrutable and unintelligible. It has come to the point where making text illegible to machines also makes them pretty illegible to humans. Or illegible enough that humans complain.
Finally, since the goal of CAPTCHA is to differentiate human from machine, CAPTCHA never managed to completely thwart human spammers. At best, these measures only slowed them down.
Images are a bust, and CAPTCHA never really addressed human spammers. We must utilize the other many ways humans differ from machines to ensure that keep the spammers at bay.
Form-Fillers of the World, Divide
While Turing tests will never weed out all spammers, especially human spammers, autofilters are a valuable first defense against all that computer-generated spam. One way humans and spambots definitely differ is in how they fill out forms. Even humans who routinely use the auto-fill functions in their browsers must manually fill in at least one portion of a form. Systems have already been developed that can analyze how a user is filling out a form (how quickly, in what order, etc.). The reverse Turing test could then use behavior itself as a metric to discern machine from human.
Talk Spam to Me, Baby
The language of spam has its own particular grammar. Instead of using CAPTCHA (whose primary purposeis to prevent machine-generated spam), the content of messages could be analyzed to identify and block spam. Like email spam filters, these filter could flag or block messages with particular keywords, suspicious email addresses or suspicious names.
The advantage of these methods is that, unlike CAPTCHA, they only require the user not to be posting spam. The ideal spam filter would not add another step to sometimes already frustrating and tedious forms. Losing a user because the spammers got too smart for CAPTCHA is bad for you and your users.