UC Berkeley press release

NEWS RELEASE, 11/19/97

On-line plagiarism detector helps computer science professors bust cheating programmers

by Robert Sanders

Alex Aiken has a nearly foolproof way to tell if his computer programming students are cheating, and now, to the dismay of students everywhere, he is making it available to professors around the world.

Aiken, an associate professor of computer science at UC Berkeley, has developed a reliable and easy-to-use piece of software that lets anyone check within minutes whether a student in the class has plagiarized a programming assignment.

The programs these students are writing are simplified versions of the complex software packages marketed for business or home use, ranging from word processors to spreadsheets, written in languages like C and Java that provide step by step instructions for the computer.

Because these programs involve hundreds or even thousands of lines of complex code, plagiarism is hard to catch, and some students appear to be taking advantage of the situation, Aiken says.

"One professor at a major East Coast university tried out the software and immediately discovered nine cases of plagiarism in just one class," Aiken says. "He then did a check on classes taught by other faculty and found that cheating had gotten totally out of hand."

The software, dubbed MOSS for Measure Of Software Similarity, looks for similar or identical lines of code sprinkled throughout a program, then creates a web page where the instructor can see the top 40 matches.

After three years of testing, Aiken became convinced that the software was simple enough to be used broadly, and so two months ago he posted it on the web for the benefit of his colleagues (http://www.cs.berkeley.edu/~aiken/moss.html). The response has been amazing -- each day on average seven assignments are checked by computer science professors around the world.

"Plagiarism has been a problem at every computer science department I've been in since I was an undergraduate," he says. "The only way to learn programming is to write lots of programs, but it's easy for people to take each other's code and submit it as their own.

"A lot of systems have been built over the years to check for plagiarism in programming classes, but most are hard to use and so people don't use them. This software is easy to use, plus it's more accurate and much faster than previous systems."

Katherine Yelick, associate professor of computer science at UC Berkeley, has used other anti-plagiarism software over the past seven years, and finds Aiken's program the best.

"The web interface is wonderful -- it makes a large difference," she says, referring to the ease of comparing plagiarized programs using the web page MOSS creates. "You can easily see the similarities between programs, which before was very time consuming to check."

MOSS automatically determines the similarity of programs written in any of several computer languages, most commonly C, C++ and Java, but also Pascal, Ada, ML, Lisp and Scheme.

After all the assignments are submitted, the MOSS server produces web pages listing pairs of programs with similar code. MOSS also highlights individual passages in programs that appear the same, making it easy to compare the files. The software also automatically eliminates matches to code that are expected to be shared -- for example, libraries or instructor-supplied code -- thereby eliminating false positives that arise from legitimate sharing of code.

Aiken, who specializes in programming languages and analysis of software, says MOSS is more sophisticated than systems based on counting the frequency of certain words in the program text -- a widely used way to detect cheating that can work with English essays, for example. Instead, MOSS actually examines program structure.

While an instructor might typically expect to find 5-10 percent similarity between any two

programs, a similarity greater than 20-30 percent is suspect, and anything over 70 percent is almost certainly plagiarism, he says.

"The software program doesn't say, this is definitely a copy. It just indicates which assignments look most like copies," Aiken says.

Though plagiarism has been a problem in college for ages, programming classes are especially prone because they are difficult, require lots of time and are crucial to a student's career, Aiken says.

For example, one of the classes software company recruiters consider critical is the course in which students write the code for a compiler -- a translator between human oriented language and the language a computer understands.

"One of the first things a recruiter wants to find out is, how did you do in the compiler course," says Bill Griswold, assistant professor of computer science at UC San Diego and a convert to MOSS. "It's a critical skill for software engineers, like designing a suspension bridge is for civil engineers."

Griswold first used MOSS this past summer and discovered that nearly ten percent of students in one class had cheated. All failed the course and were suspended for a quarter.

"It was a very tragic situation, where some students got in a bind and took the easy way out," Griswold said. "But they knew it was wrong."

Aiken too has been surprised by the number of students who cheat, but his goal is not to flunk them but to teach a lesson.

"The short-term goal is to find the people who are cheating, but my long-term goal is to change the culture which encourages it," Aiken said. "If they are convinced they are likely to be caught, they won't do it in the first place."

He hopes some day that merely mentioning the possibility of getting caught will deter students from trying to cheat. In his classes the penalty has usually been a zero on the assignment -- which typically makes up ten percent of the entire grade -- plus a decrease of one letter grade for the course. This makes it hard for anyone to get more than a C- in the course, he says. He has never had to invoke harsher disciplinary activity.

To date, however, the warning hasn't deterred everyone.

"People may not believe me because the consequences aren't visible," he says. "People who get caught don't tell their friends about it."

This server has been established by the University of California at Berkeley Public Information Office. Copyright for all items on this server held by The Regents of the University of California. Thanks for your interest in UC Berkeley.
More Press Releases | More Campus News and Events | UC Berkeley Home Page


Send comments to: comments@pa.urel.berkeley.edu