Douwe Osinga's Blog: Google as research substitute

Thursday, February 19, 2004

Google as research substitute

Mediabistro has an interesting article about Lies, Damned Lies and Goolge. How journalist more and more use the number of hits on a Google search to prove something. Not so strange, of course. A Google search is quick and usually yields the required high number to impress: Abba is still popular, twenty years after they broke up, scoring 1.4 million hits.

The number of Google hits, like a witty saying, proves nothing, of course. Quite a few of my Google hacks use the number of hits returned by Google as measure of something or other, but I also found out that the number jumps up and down quite a bit and is not very accurate, especially for combinational searches. Search for:
USA -Cheese
USA +Cheese
You'd expect the number of hits for the last to searches to equal that of the first (the set of pages containing USA can be divided in two parts, the pages that also contain Cheese and the pages that do not contain Cheese). But it doesn't.

In order to prove my point, I wrote a little project to predict the US presidential election by checking the number of hits statename + partyname returns according to Google. For one thing, searching using the Google API returns completely different results from the results you get when you search using Google web interface and Altavista does something completely different again.

The project is still interesting (Bush wins, but it will be close race) and it would also be interesting to see whether the results become better the closer we get to the election and the more the Internet is concentrated on the elections. Check it out at Google Elections.