Berkeley
- Two University of California, Berkeley, professors have
just finished analyzing all new data produced worldwide
last year - on the Internet, in scholarly journals, even
in junk mail - and report not just staggering totals, but
a "revolution" in information production and accessibility.
In their
report, "How Much Information?" professors Hal Varian and
Peter Lyman of the UC Berkeley School of Information Management
& Systems (SIMS) report new information production in terms
of paper, film, optical and magnetic data. They analyzed
industry and government reports for production of information
that also includes e-mail, digital production, videos, DVDs,
CDs, broadcast outlets, photographs, books and newspapers.
The study
has, for the first time, used "terabytes" as a common standard
of measurement to compare the size of information in all
media, linking and interpreting research reports from industry
and academia. One terabyte equals a million megabytes or
the text content of a million books. This standard makes
it possible to compare growth trends for different media
using one universal standard.
The numbers
in the UC Berkeley report are mindboggling:
* The
directly accessible "surface" Web consists of about 2.5
billion documents and is growing at a rate of 7.3 million
pages per day.
* Counting
the "surface" Web with the "deep" Web of connected databases,
intranet sites and dynamic pages, there are about 550 billion
documents, and 95 percent is publicly accessible.
* Fifty
percent of all Internet users are native English speakers,
while English language Web sites account for about 78 percent
of all Web sites, and 96 percent of E-commerce Web sites.
* A white-collar
worker receives about 40 e-mail messages daily at the office.
* Ninety
percent of the world's e-mailboxes were found in the United
States in 1984, but that dropped to 59 percent by the end
of 1999. E-mail production accounts for about 500 times
as much information as Web page production each year.
* Worldwide
production of books increased by 2 percent in the last year.
* Production
of newspapers in the last year decreased by 2 percent.
* The
United States produces 35 percent of all print material,
40 percent of the images and more than half of the digitally
stored material.
SIMS
professors Lyman and Varian and their research assistants
James Dunn, Aleksey Strygin and Kirsten Swearingen translated
original content volume into bytes, using the terabyte as
the project's smallest practical measure. Then they calculated
how much storage each type of media takes when subjected
to different compression techniques, and factored in anticipated
duplication.
The professors
said they were struck by three emerging trends.
One is
"the 'democratization of data," the vast amount of unique
information stored and also created by individuals. Original
documents created by office workers represent nearly 90
percent of all original paper documents, while 56 percent
of magnetic storage is in single-user desktop computers.
"A century
ago, the average person could only create and access a small
amount of information," wrote Varian and Lyman in their
report. "Now, ordinary people not only have access to huge
amounts of data, but are also able to create gigabytes of
data themselves and, potentially, publish it to the world
via the Internet."
The second
surprise for the professors was the finding that print accounts
for such a miniscule amount of the total information storage.
But they said it doesn't mean print is dead, rather it is
a very efficient and concentrated form for the communication
of information.
The third
striking finding for them was the dominance of digital information
and its phenomenal growth. This further feeds the democratization
of data, they said, because digital information is potentially
accessible anywhere on the Internet and is a "universal"
medium because it can copy from any other format.
But just
because storing vast amounts of information no longer requires
an investment in real estate, the researchers said the ease
of production and access to information may lead people
to turn over personal data management to specialized businesses
with giant data storage systems.
"After
all," they wrote, "would you rather keep all your family
photos on your PC hard drive, and risk losing everything
if it crashes, or on a secure site managed by Kodak? On
the other hand, individuals may prefer to keep information
about themselves in smaller systems that only they control."
The researchers
also forecast that businesses will be tremendously affected
by this increase in individuals' instant access to real-time
company data, something that a few years ago was restricted
to the upper management.
"The
difficulty will be in managing this information effectively:
making sure that your suppliers, your employees, and your
customers not only have access to the data they need to
make informed decisions, but also can locate, manipulate
and understand it," the report said.
Lyman
and Varian caution that our ability to store and communicate
information has far outpaced the ability to search, retrieve
and present it. That's one reason for a place like SIMS,
where people can learn the techniques and technologies for
sorting the valuable information from the superfluous, they
said.
"Information
management - at the individual, organizational, and even
societal level - may turn out to be one of the key challenges
we face," the report said.
"It's
the next stage of literacy," Lyman said.
The latest
report is not in printed form, because its authors see it
as a "living" document. It can be found at www.sims.berkeley.edu/how-much-info/index.html
and will be updated periodically in response to comments
from readers.
"It's
a good way to kick off a discussion of what information
is. We don't have a very good way of talking about information
because it's changing so fast," said Varian, also co-author
of "Information Rules: A Strategic Guide to the Network
Economy" (Harvard Business School Press, 1998).
"In the
past, we've talked about information in terms of the size
of a physical inventory, such as counting books or films,"
Lyman said. "But in the future, the size and format of information
will be dynamically reshaped to the needs of the reader."
EMC Corp.,
the world's largest data storage systems company, financed
the research.