12.10.2002 - Breaking Gimpy: Researchers crack security system designed to block Internet robots

UC Berkeley >

News - Media Relations

NEWS SEARCH

NEWS HOME

ARCHIVES

EXTRAS

MEDIA
RELATIONS

  Press Releases

  Image Downloads

  Contacts

Breaking Gimpy: Researchers crack security system designed to block Internet robots
10 December 2002

By Sarah Yang, Media Relations

Berkeley - For every warm-blooded human who has ever taken an online poll or signed up for free web-based email, there are legions of computer-automated Internet robots, or "bots," trying to do the same thing.

A clever security system designed to stop these bot programs - which contribute to the Internet equivalent of computer-generated telemarketing calls - has now been cracked by a pair of computer scientists from the University of California, Berkeley.

Researchers at Carnegie Mellon University in Pittsburgh created the security system, known as Gimpy, to thwart the bot programs that relentlessly scour cyberspace for opportunities to register new email addresses, stuff ballots for online polls and direct unwitting participants in Internet chat rooms to advertisements. Bot-produced email accounts are hard to block or trace, making them ideal vehicles for sending spam to legitimate email users.

The UC Berkeley effort was a response to an open challenge by the research team at Carnegie Mellon to the computer science community to write a program capable of reading the Gimpy-distorted text.

Above is an example of an EZ-Gimpy image designed to be easy for humans to read, but difficult for bot programs to decipher. Photo courtesy of the Captcha Project at Carnegie Mellon University

Gimpy takes advantage of the fact that most people can easily recognize words with letters that are squiggly, fuzzy or otherwise distorted. In contrast, computer programs, such as those based upon optical character recognition (OCR) technology, are easily flustered if the text is not clear and free of background clutter.

Last year, Yahoo, one of the largest providers of free web-based email, implemented the Gimpy check as part of the new account registration process. People who can pass the test by typing in the correct word shown on the screen can go on to get an account. Bots, presumably, are stopped cold.

"We were able to crack Gimpy because of our previous research on a technique called 'shape contexts' for object recognition," said Jitendra Malik, professor and chair of the Division of Computer Science at UC Berkeley's College of Engineering. "The basic idea is to match shapes based upon the relative configuration of contours in a way that can tolerate small distortions. We had applied the technique before to handwritten digits and human figures, as well as to three-dimensional objects, so it seemed plausible to try it here."

It took five days for Malik and Greg Mori, computer science doctoral student at UC Berkeley, to create the program, which works by comparing the distorted letters in the given field to the 26 letters of the alphabet. Algorithms then come up with three to five likely candidate letters and group them in pairs that are analyzed to see whether they can be joined to form complete words. The resulting words are then scored based upon how closely the letters matched the image in the field. The word with the best score is then chosen.

Mori compared the process to detecting arms, legs and a head in an image to come up with the conclusion that a human is depicted.

In a trial using 191 images, the process worked 83 percent of the time for the simplest version of Gimpy, known as EZ-Gimpy, which "hides" a single word amid a cluttered background. This is the version used by Yahoo in the email registration process.

In a more difficult version of Gimpy, as many as five pairs of distorted words are presented with the word pairs superimposed upon each other. The user must then ferret out three correct words to pass the test.

"Breaking this harder version of Gimpy is still a work in progress because recognizing letters that are pasted on top of each other is more difficult," said Mori. "At this point, our success rate for the more challenging Gimpy is 30 percent."

Gimpy is one of several different programs in a project called CAPTCHA, which stands for "Completely Automated Public Turing test to Tell Computers and Humans Apart," headed by Manuel Blum, professor of computer science at Carnegie Mellon University, with his graduate student, Luis von Ahn. Before joining Carnegie Mellon, Blum taught computer science at UC Berkeley for 30 years.

Malik said the Gimpy challenge is a great test for the ongoing research in computer object recognition he and others at UC Berkeley are conducting. "We're looking at the bigger picture, so to speak," he said. "The goal of the computer vision research we are doing is to develop programs that can recognize people, animals and other objects in a picture. It's a shift from programs that can simply read text to those that can actually see pictures, which is a major step forward in the field of artificial intelligence."

Once Malik and Mori successfully cracked the EZ-Gimpy system, they notified Blum at Carnegie Mellon.

"I was delighted when I heard from them," said Blum. "They were the first ones to successfully take up the challenge."

Blum said that he hopes this research will eventually bring online the wealth of materials in the Library of Congress, which has been a daunting task because of the difficulty current scanning software has in "reading" handwritten or manually typed text.

Blum noted that Carnegie Mellon's Gimpy would be much more difficult to crack than EZ-Gimpy.

"They'll keep making it harder, and we'll keep working to break it," said Malik. "It's great fun."

###