People often ask me what it is I do for Google. Hmm, that's secret. Google has a lot of secrecy going on, for good reasons most of the time. One of them is that we don't want to tell people about stuff before we launch it in order to keep rumors down and not play the FUD (fear uncertainty doubt) game (where big companies announace pre-anounce stuff that subsequently never shows up but keeps customers from buying stuff from competitors). There are however some things that give a great insight into how Google works that are out there, which seem like they should have been a secret.
One of them is the Google Filesystem. The paper describes how we store data at Google and goes into some detail here. Basically we run one distributed filesystem over more than a thousand machines using thousands of disks in order to manipulate hundreds of terrabytes at a go. Data is safely duplicated and can be checkpointed. It is really quite astonishing to work on stuff like that from the inside and you'd think that if you're going to keep stuff secret, this would be one thing. But it is not, it is a public paper.
If think the Google File System is pretty cool, keep checking back. There's another paper out there about some of the stuff we're doing here that will really blow your mind.