Click here to bypass page layout and jump directly to story.=

UC Berkeley >

University of California

News - Media Relations







  Press Releases

  Image Downloads



UC Berkeley professors measure exploding world production of new information
18 Oct 2000

By Kathleen Maclay, Media Relations

Berkeley - Two University of California, Berkeley, professors have just finished analyzing all new data produced worldwide last year - on the Internet, in scholarly journals, even in junk mail - and report not just staggering totals, but a "revolution" in information production and accessibility.

In their report, "How Much Information?" professors Hal Varian and Peter Lyman of the UC Berkeley School of Information Management & Systems (SIMS) report new information production in terms of paper, film, optical and magnetic data. They analyzed industry and government reports for production of information that also includes e-mail, digital production, videos, DVDs, CDs, broadcast outlets, photographs, books and newspapers.

The study has, for the first time, used "terabytes" as a common standard of measurement to compare the size of information in all media, linking and interpreting research reports from industry and academia. One terabyte equals a million megabytes or the text content of a million books. This standard makes it possible to compare growth trends for different media using one universal standard.

The numbers in the UC Berkeley report are mindboggling:

* The directly accessible "surface" Web consists of about 2.5 billion documents and is growing at a rate of 7.3 million pages per day.

* Counting the "surface" Web with the "deep" Web of connected databases, intranet sites and dynamic pages, there are about 550 billion documents, and 95 percent is publicly accessible.

* Fifty percent of all Internet users are native English speakers, while English language Web sites account for about 78 percent of all Web sites, and 96 percent of E-commerce Web sites.

* A white-collar worker receives about 40 e-mail messages daily at the office.

* Ninety percent of the world's e-mailboxes were found in the United States in 1984, but that dropped to 59 percent by the end of 1999. E-mail production accounts for about 500 times as much information as Web page production each year.

* Worldwide production of books increased by 2 percent in the last year.

* Production of newspapers in the last year decreased by 2 percent.

* The United States produces 35 percent of all print material, 40 percent of the images and more than half of the digitally stored material.

SIMS professors Lyman and Varian and their research assistants James Dunn, Aleksey Strygin and Kirsten Swearingen translated original content volume into bytes, using the terabyte as the project's smallest practical measure. Then they calculated how much storage each type of media takes when subjected to different compression techniques, and factored in anticipated duplication.

The professors said they were struck by three emerging trends.

One is "the 'democratization of data," the vast amount of unique information stored and also created by individuals. Original documents created by office workers represent nearly 90 percent of all original paper documents, while 56 percent of magnetic storage is in single-user desktop computers.

"A century ago, the average person could only create and access a small amount of information," wrote Varian and Lyman in their report. "Now, ordinary people not only have access to huge amounts of data, but are also able to create gigabytes of data themselves and, potentially, publish it to the world via the Internet."

The second surprise for the professors was the finding that print accounts for such a miniscule amount of the total information storage. But they said it doesn't mean print is dead, rather it is a very efficient and concentrated form for the communication of information.

The third striking finding for them was the dominance of digital information and its phenomenal growth. This further feeds the democratization of data, they said, because digital information is potentially accessible anywhere on the Internet and is a "universal" medium because it can copy from any other format.

But just because storing vast amounts of information no longer requires an investment in real estate, the researchers said the ease of production and access to information may lead people to turn over personal data management to specialized businesses with giant data storage systems.

"After all," they wrote, "would you rather keep all your family photos on your PC hard drive, and risk losing everything if it crashes, or on a secure site managed by Kodak? On the other hand, individuals may prefer to keep information about themselves in smaller systems that only they control."

The researchers also forecast that businesses will be tremendously affected by this increase in individuals' instant access to real-time company data, something that a few years ago was restricted to the upper management.

"The difficulty will be in managing this information effectively: making sure that your suppliers, your employees, and your customers not only have access to the data they need to make informed decisions, but also can locate, manipulate and understand it," the report said.

Lyman and Varian caution that our ability to store and communicate information has far outpaced the ability to search, retrieve and present it. That's one reason for a place like SIMS, where people can learn the techniques and technologies for sorting the valuable information from the superfluous, they said.

"Information management - at the individual, organizational, and even societal level - may turn out to be one of the key challenges we face," the report said.

"It's the next stage of literacy," Lyman said.

The latest report is not in printed form, because its authors see it as a "living" document. It can be found at and will be updated periodically in response to comments from readers.

"It's a good way to kick off a discussion of what information is. We don't have a very good way of talking about information because it's changing so fast," said Varian, also co-author of "Information Rules: A Strategic Guide to the Network Economy" (Harvard Business School Press, 1998).

"In the past, we've talked about information in terms of the size of a physical inventory, such as counting books or films," Lyman said. "But in the future, the size and format of information will be dynamically reshaped to the needs of the reader."

EMC Corp., the world's largest data storage systems company, financed the research.



"How Much Information?" report

EMC Corp.