Professors measure exploding world production of new information
By Kathleen Maclay, Public Affairs

25 October 00 | Two Berkeley professors have just finished analyzing all new data produced worldwide last year - on the Internet, in scholarly journals, even in junk mail - and report not just staggering totals, but a "revolution" in information production and accessibility.

In their report, "How Much Information?" professors Hal Varian and Peter Lyman of the School of Information Management & Systems report new information production in terms of paper, film, optical and magnetic data. They analyzed industry and government reports for production of information that also includes e-mail, digital production, videos, DVDs, CDs, broadcast outlets, photographs, books and newspapers.

The study has, for the first time, used "terabytes" as a common standard of measurement to compare the size of information in all media, linking and interpreting research reports from industry and academia. One terabyte equals a million megabytes or the text content of a million books. This standard makes it possible to compare growth trends for different media using one universal standard.

The numbers are mindboggling:

• The directly accessible "surface" Web consists of about 2.5 billion documents and is growing at a rate of 7.3 million pages per day.

• Counting both "surface" Web and the "deep" Web of connected databases, intranet sites and dynamic pages, there are about 550 billion documents, 95 percent of which are publicly accessible.

• Fifty percent of all Internet users are native English speakers, while English-language Web sites account for about 78 percent of all Web sites, and 96 percent of E-commerce Web sites.

• A white-collar worker receives about 40 e-mail messages daily at the office.

• In 1984, 90 percent of the world's e-mailboxes were found in the United States, but that dropped to 59 percent by the end of 1999. E-mail production accounts for about 500 times as much information as Web-page production each year.

• Worldwide production of books increased by 2 percent in the last year.

• Production of newspapers in the last year decreased by 2 percent.

• The United States produces 35 percent of all print material, 40 percent of the images and more than half of digitally stored material.

Lyman and Varian and their research assistants James Dunn, Aleksey Strygin and Kirsten Swearingen translated original content volume into bytes, using the terabyte as the project's smallest practical measure. Then they calculated how much storage each type of media takes when subjected to different compression techniques, and factored in anticipated duplication.

The professors said they were struck by three emerging trends.

One is "the democratization of data," the vast amount of unique information stored and also created by individuals. Original documents created by office workers represent nearly 90 percent of all original paper documents, while 56 percent of magnetic storage is in single-user desktop computers.

"A century ago, the average person could only create and access a small amount of information," wrote Varian and Lyman in their report. "Now, ordinary people not only have access to huge amounts of data, but are also able to create gigabytes of data themselves and, potentially, publish it to the world via the Internet."

The second surprise for the professors was the finding that print accounts for such a miniscule amount of the total information storage. But they said it doesn't mean print is dead, rather it is a very efficient and concentrated form for the communication of information.

The third striking finding for them was the dominance of digital information and its phenomenal growth. This further feeds the democratization of data, they said, because digital information is potentially accessible anywhere on the Internet and is a "universal" medium because it can copy from any other format.

But just because storing vast amounts of information no longer requires an investment in real estate, the researchers said the ease of production and access to information may lead people to turn over personal data management to specialized businesses with giant data storage systems.

"After all," they wrote, "would you rather keep all your family photos on your PC hard drive, and risk losing everything if it crashes, or on a secure site managed by Kodak? On the other hand, individuals may prefer to keep information about themselves in smaller systems that only they control."

The researchers also forecast that businesses will be tremendously affected by this increase in individuals' instant access to real-time company data, something that a few years ago was restricted to the upper management.

"The difficulty will be in managing this information effectively: making sure that your suppliers, your employees, and your customers not only have access to the data they need to make informed decisions, but also can locate, manipulate and understand it," the report said.

Lyman and Varian caution that our ability to store and communicate information has far outpaced the ability to search, retrieve and present it. That's one reason for a place like SIMS, where students learn techniques and technologies for sorting the valuable information from the superfluous, they said.

"Information management - at the individual, organizational, and even societal level - may turn out to be one of the key challenges we face," the report said.

"It's the next stage of literacy," Lyman said.

The latest report is not in printed form, because its authors see it as a "living" document. It can be found at and will be updated periodically in response to comments from readers.



Home | Search | Archive | About | Contact | More News

Copyright 2000, The Regents of the University of California.
Produced and maintained by the Office of Public Affairs at UC Berkeley.

Comments? E-mail