Google up to .01% of the world’s information
February 7th, 2007 § Leave a Comment
Marissa Mayer did a phenomenal job of fielding questions after her very informative talk today at the Kellogg Technology Conference. I was able to sneak in a question during the session and asked her if she could ballpark the number of terabytes that Google had indexed to date – her guess was 500. If she is right, then they are up 330 terabytes, or 194%, from 170 where Schmidt thought they were 16 months ago. That’s pretty good! However, if you take all of the numbers from Schmidt’s speech as valid, specifically that 5 million terabytes of data exist, then they have now organiz
ed .01% of the world’s information, up from .0034%. Interestingly, Marissa spent not an insignificant portion of her talk focusing on the acquisition of offline data via projects like Google Books, Picasa, etc…
As you can see, I have convinced myself that 500 terabytes, in the scheme of things, isn’t that much. This may be crazy, especially considering I don’t really trust my conceptualization of how much data is in a gigabyte, let alone a terabyte. However, the fact that 1TB servers are commonplace and Apple announced last month that they were shipping 10.5 terabyte servers (for a mere $12,399) makes me think I’m not too far out in left field on this. Pictured to the left is the 200 terabyte GLOW system that the University of Wisconsin physics department has put together. Yes, the information Google has organized is likely a larger percentage of the “useful” information out there. Yes, what you take your definition of “information” to be makes a huge impact on this analysis. Yes, the information Google has indexed may be on the “light” side of the spectrum. And yes, these servers I’m talking about are HUGE. But the fact is, all of the information that Google has indexed could be put onto roughly 48 Apple Xserve RAID servers, or 2.5 of these behemoths.
The question that follows for me: why is Google building all these massive computing centers if all of their information can be stored in the area the size of a large walk in closet? Tthe answer t
o this was covered also covered by Marissa in her presentation: she mapped out what happens when you run a search on Google and showed how Google searchs hundreds of millions of sites in less than a second. So these data centers are needed to get nearly instant access to an amount of data that could be stored on 48 commercially available computers.
Edit: If google continues at the 194% growth rate, they will hit 5MTB in roughly 28 years…that’s a lot better than 300 years…