Douwe Osinga's Blog: Using Google as Common Sense engine

Wednesday, October 15, 2003

Using Google as Common Sense engine

Hjalmar Gislason has an interesting piece about Google Miner. Basically, Google Minder uses Google to extract common sense from the web, using the ordering feature of Google to get the most relevant information. Is it doable?

When I started with Google History, my aim was actually something broader, a Google application that could answer any type of question. The first results of that were so bad, that I decided to limit myself to questions that have years in the range 1800-2050 as an answer. That did work. Better.

Is it possible to mine the Web for common sense? Another failed Google project of mine tried to work out the best time to visit a location, interesting information for the travel site I help building, world66. I fed Google the name of the location and the sentence "best time to visit" or a synonym. I would then search the returned descriptions of Google for the names of months and choose the mostly named one. It worked pretty well, returning june a lot of the time for European locations. But it also returned june for Australia and februari for Denmark.

Now, I only scanned Googles top-10, so the results might improve if the program would take a top 100. I'll look into that later. But the point is that if a rather trivial piece of information like this is hard to extract from Google results, then the more arcane things will be very hard indeed.

Things might look different for Google self, though. They have 3 billion documents indexed in all kinds of ways. Running some clever algorithm against 3 billion documents in stead of 10 might improve results drastically. Google became self-aware at 2:14 a.m. eastern time, August 29.

One other thing. A couple of weeks ago, I wrote a piece titled 'how much is a billion', complaining about the number-unawareness that is riding high. Currently the entry has position two at Google for the term "how much is a billion" and there are quite a few people arriving on my site with that very question. Ironic and insulting for the visitors. One guy wrote me to complain that I didn't actually explain how much a billion is, in an easy to understand way. Couldn't I post something about that on Google?

What about: if a thousand people lose a thousand dollar, every day, for three years, they have lost a billion dollar. Still no way to understand the size of the American deficit. Anybody with a better explanation?