UC Berkeley News


Surfing is safer - and smarter - with flotation devices
In Wikipedia's wake, two SIMS researchers assess the quality of online information, and find it strained

| 18 January 2006

"Wikipedia gets a lot of its facts wrong," cautions SIMS professor Paul Duguid, who learned first-hand the frustrations of engaging in the online encyclopedia's collaborative editorial process.

It's a truism that the Internet puts the world at its users' fingertips. But it's fast becoming clear that while some parts of the World Wide Web rest on solid ground, much of the information to be found there is about as substantial as fairy dust.

Last month, the online encyclopedia Wikipedia (en.wikipedia.org) made headlines when a defamatory and deliberately false article posted on that site - about John Siegenthaler Sr., the former publisher of The Tennessean newspaper and founding editorial director of USA Today - came to light. The entry linked Siegenthaler, a former aide to Robert F. Kennedy, to the assassinations of both the former attorney general and his brother, President John F. Kennedy. Siegenthaler responded with a Nov. 29 op-ed in USA Today charging that Wikipedia, whose content is created by a community of anonymous contributors, "is a flawed and irresponsible research tool." Though the inaccuracies about Siegenthaler had been introduced as a joke, Wikipedia founder Jimmy Wales took the matter seriously: He changed the encyclopedia's policy so that English-language contributors who post new articles must first register on the site. Nonetheless, the episode sparked questions about Wikipedia's reliability as well as that of other information found on the web.

To explore the question of online-information quality and provide context for the debate, the Berkeleyan turned to two new faculty members at the School of Information Management and Systems (SIMS) with expertise in this area. Geoffrey Nunberg, a leading linguistics and information researcher who's also a print and broadcast commentator on language, and Paul Duguid, a cutting-edge researcher in organizational knowledge and co-author (with John Seely Brown) of The Social Life of Information (Harvard Business School Press, 2000), both recently joined SIMS as adjunct professors. The pair will team up to teach undergraduate and graduate classes on "The History of Information" this fall.

Geoffrey Nunberg

Nunberg, who delivers commentaries on language for National Public Radio's "Fresh Air" program, evinced no surprise at the errors on Wikipedia. "You throw it open so that anyone can contribute, and people are shocked it's a flawed research tool?" he asked rhetorically. While admitting that Wikipedia is "surprisingly good" on some topics - in particular when dealing with concepts familiar to many people, such as "the undead and zombies" or the chi square - he says it falls short in treating "broader cultural topics" such as "Hitler, World War II, or the rise of the novel."

In the wake of the Siegenthaler brouhaha, the journal Nature conducted a study comparing the accuracy of Wikipedia and Encyclopedia Britannica, through a review of articles on 42 scientific topics available in both sources. Though the study determined that the online encyclopedia's articles contain 30 percent more errors than do their Britannica counterparts, Nunberg thinks that to focus simply on inaccuracies obscures a larger point: "When a topic like the medieval papacy or populism calls for scholarly breadth and critical synthesis, a collective site like Wikipedia just can't organize it, give it thematic structure, or do justice to it."

Context is king

Since the introduction of the telegraph in the late 19th century ushered in the Information Age, information has been perceived "as though it's the final basic substance in the world that exists independent of people and context," says Duguid. He takes issue with that view, arguing, "Information is something humans create, and it is therefore dependent on humans for context and verification." Taking information at face value, without paying attention to context, leaves people open to "misunderstanding, misinterpreting, and relying on a lot of rotten, foolish, wrong, mistaken ideas," Duguid charges.

Wikipedia's collaborative process treats information as though it is "modular and granular," says Duguid. The problem is that "once you say that information is the basic building block, the assumption is that a lot of people can contribute these blocks and what we'll end up with is the Taj Mahal." Wikipedia's methodology is more likely to result in a patchwork quilt, he says, one that, in Wikipedia's case, is "simply an amalgam of facts." Such an approach, he says, isn't how good encyclopedia articles get written.

Paul Duguid

Though it might be tempting to dismiss Duguid as a conservative and resistant to change - charges he says he's heard any number of times - the historian and social theorist considers himself "a great champion of the digital world." In a class on the quality of information he and Nunberg taught at SIMS as visiting professors in 2004 and 2005, Duguid asked his students to contribute to Wikipedia, then decided to perform the exercise himself.

Duguid looked up the 17th-century English writer Daniel Defoe, finding, he says, "eight substantial errors" in the first paragraph alone, including Defoe's date of birth, date of death, the town where he was born, his father's occupation, the reason he changed his name, and the explanation for his rise to notoriety. "The minute I got to [the text stating] which book made him famous - and while I'm English, I'm no expert in Defoe - I knew it was wrong."

The process of rectifying those mistakes was more disturbing to Duguid than the original errors he had discovered: "My corrections were undone by people who clearly had little idea what they were talking about almost as quickly as they were made by me (who knew a little of what he was talking about)." Well-intentioned but "ill-informed editors" added their corrections to the article without offering meaningful sources for verification or entering the discussion on the discussion page. "People point to the instantaneous revision process as an indication of Wikipedia's quality-assurance mechanism," says Duguid. "These problems - of earnest but inept changes - are to me much more significant than simply finding errors."

Caveat emptor

Traditional media impose a set of practices and institutions that enable consumers to evaluate the trustworthiness of information, says Nunberg. "When I walk into a library, I know everything was screened several times: by editors, publishers, librarians. I assume the writer was someone good enough to have been given a book contract." The web eliminates those mechanisms, he says, and so "puts more of a burden on the user than the world of print." Calling that problem a technological one overlooks its complexity, says Nunberg: "You have to have a sense of what's out there on the web, who put it up, and why they put it up."

While new-technology evangelists predicted the web would signal an end to traditional media, Duguid thinks those futurists have become trapped by assuming that the old guard would become outmoded and disappear. For instance, by discounting the importance of newspapers and the publishing industry, the digerati handed those institutions "quite a lot of power," never imagining that Time Warner or The New York Times would be among the most widely visited websites today. New media relies on mainstream websites, says Duguid: "Even the blogs spend a lot of time talking about what appears in the newspapers." He adds: "I think you have to consider what role those existing institutions played in the past and ask who's going to play that role in the future?"

Duguid and Nunberg agree that the key to using any source of online information is to know its strengths and limitations. "We don't think Encyclopedia Britannica would have a definitive article on Madonna," says Duguid. "Instinctively we just know that. We need to develop those same instincts around tools like Wikipedia."

[an error occurred while processing this directive]