Breaking Gimpy: Researchers crack security system designed to block Internet robots
10 December 2002
By Sarah Yang, Media Relations
Berkeley - For every warm-blooded human who has ever taken
an online poll or signed up for free web-based email, there
are legions of computer-automated Internet robots, or "bots,"
trying to do the same thing.
A clever security system designed to stop these bot programs
- which contribute to the Internet equivalent of computer-generated
telemarketing calls - has now been cracked by a pair of computer
scientists from the University of California, Berkeley.
Researchers
at Carnegie Mellon University in Pittsburgh created the security
system, known as Gimpy, to thwart the bot programs that relentlessly
scour cyberspace for opportunities to register new email addresses,
stuff ballots for online polls and direct unwitting participants
in Internet chat rooms to advertisements. Bot-produced email
accounts are hard to block or trace, making them ideal vehicles
for sending spam to legitimate email users.
The
UC Berkeley effort was a response to an open challenge by
the research team at Carnegie Mellon to the computer science
community to write a program capable of reading the Gimpy-distorted
text.
|
Above
is an example of an EZ-Gimpy image designed to be easy
for humans to read, but difficult for bot programs to
decipher. Photo courtesy of the Captcha Project at
Carnegie Mellon University
|
Gimpy
takes advantage of the fact that most people can easily recognize
words with letters that are squiggly, fuzzy or otherwise distorted.
In contrast, computer programs, such as those based upon optical
character recognition (OCR) technology, are easily flustered
if the text is not clear and free of background clutter.
Last year,
Yahoo, one of the largest providers of free web-based email,
implemented the Gimpy check as part of the new account registration
process. People who can pass the test by typing in the correct
word shown on the screen can go on to get an account. Bots,
presumably, are stopped cold.
"We were
able to crack Gimpy because of our previous research on a
technique called 'shape contexts' for object recognition,"
said Jitendra Malik, professor and chair of the Division of
Computer Science at UC Berkeley's College of Engineering.
"The basic idea is to match shapes based upon the relative
configuration of contours in a way that can tolerate small
distortions. We had applied the technique before to handwritten
digits and human figures, as well as to three-dimensional
objects, so it seemed plausible to try it here."
It took
five days for Malik and Greg Mori, computer science doctoral
student at UC Berkeley, to create the program, which works
by comparing the distorted letters in the given field to the
26 letters of the alphabet. Algorithms then come up with three
to five likely candidate letters and group them in pairs that
are analyzed to see whether they can be joined to form complete
words. The resulting words are then scored based upon how
closely the letters matched the image in the field. The word
with the best score is then chosen.
Mori compared
the process to detecting arms, legs and a head in an image
to come up with the conclusion that a human is depicted.
In a trial
using 191 images, the process worked 83 percent of the time
for the simplest version of Gimpy, known as EZ-Gimpy, which
"hides" a single word amid a cluttered background. This is
the version used by Yahoo in the email registration process.
In a more
difficult version of Gimpy, as many as five pairs of distorted
words are presented with the word pairs superimposed upon
each other. The user must then ferret out three correct words
to pass the test.
"Breaking
this harder version of Gimpy is still a work in progress because
recognizing letters that are pasted on top of each other is
more difficult," said Mori. "At this point, our success rate
for the more challenging Gimpy is 30 percent."
Gimpy
is one of several different programs in a project called CAPTCHA,
which stands for "Completely Automated Public Turing test
to Tell Computers and Humans Apart," headed by Manuel Blum,
professor of computer science at Carnegie Mellon University,
with his graduate student, Luis von Ahn. Before joining Carnegie
Mellon, Blum taught computer science at UC Berkeley for 30
years.
Malik
said the Gimpy challenge is a great test for the ongoing research
in computer object recognition he and others at UC Berkeley
are conducting. "We're looking at the bigger picture, so to
speak," he said. "The goal of the computer vision research
we are doing is to develop programs that can recognize people,
animals and other objects in a picture. It's a shift from
programs that can simply read text to those that can actually
see pictures, which is a major step forward in the field of
artificial intelligence."
Once Malik
and Mori successfully cracked the EZ-Gimpy system, they notified
Blum at Carnegie Mellon.
"I was
delighted when I heard from them," said Blum. "They were the
first ones to successfully take up the challenge."
Blum said
that he hopes this research will eventually bring online the
wealth of materials in the Library of Congress, which has
been a daunting task because of the difficulty current scanning
software has in "reading" handwritten or manually typed text.
Blum noted
that Carnegie Mellon's Gimpy would be much more difficult
to crack than EZ-Gimpy.
"They'll
keep making it harder, and we'll keep working to break it,"
said Malik. "It's great fun."
###
|