Google: Organizing .0034% of the world’s information.

January 21st, 2007 § Leave a Comment

I have been doing some research on information, specifically, how much information there is in general, and how much of that information is searchable and indexable online. This is not an easy number to come up with, and is very dependent on what you choose your definition of “information” to be in the first place. Phone calls, IMs, and emails are produced in huge volume, but only certain portions of that data is actually interesting (e.g.: Presidential phone calls v. my personal email). It seems that questions about the size of the web and the information universe were asked fairly often 4, 5, or 6 years ago when people were still getting comfortable with the net. They would ask AOL “how big is the internet?” and some folks at the time tried to figure it out.

The favored unit of measure for massive amounts of data is the terabyte (this is actually a pretty funny wikipedia entry). A terabyte is the equivalent of 1000 gigabytes, or 1 trillion bytes (10^12). I spent a fair chunk of yesterday afternoon trying to find some sort of reliable source that had researched this in the last few years. I used an interesting service – www.chacha.com – in my search. They have been getting a good amount of press recently, so I thought I would give it a try. You go to the site and are linked with a “search consultant” via IM who helps run your search for you. It seems like they screen Google results and give you what they think is best. They make $0.83 a search but only get paid for the first ten minutes (unclear how that meshes, but whatever). Chacha turned up nothing I hadn’t seen already, but it was cool to try it out. This is what I found:

The most interesting thing I found was a transcript of a speech given by Google CEO Eric Schmidt to the Association of National Advertisers on October 8, 2005. Mr. Schmidt said:

…how much information is there in the world? A study that was done last year indicated roughly five million terabytes. How much is indexable, searchable today? Current estimate: about 170 terabytes. So again we’re back in that two or three percent of the indexed and searchable world.

Takeaways:

1. Google had access to 170 terabytes of data in 2005 surmised there to be 5 million terabytes available. That is not a whole lot of coverage.

2. Eric Schmidt and his speechwriters need to check their math. 170 of 5 million is .0034%, not 2 or 3%.

EDIT: To be clear, I love Google and used it for all the research in this. However, I think the volume of actual information out there vs. the volume accessible via the web is difficult to comprehend and if Google’s numbers are right, they are very surprising.

Also, one could argue that Google might have organized the .0034% of the world’s information that is interesting and applicable…

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

What’s this?

You are currently reading Google: Organizing .0034% of the world’s information. at robwebb2k.

meta

Follow

Get every new post delivered to your Inbox.