Tuesday, August 19, 2003

The unreliability of Internet

Internet is a great medium for knowledge. If you want to know something, Google is a click a way and if you're lucky the answer just one more click. But if you want to know numbers, the Internet is unreliable.

The problem is not so much that you can't find the numbers. It's not hard to find a number for the average cost of electricity generation by nuclear reactors. The problem is that you find a lot of different numbers. The problem is how to find out which number is reliable. It is clearly one of the things that the Google PageRank algorithm fails. Pages that are linked to a lot, don't necessarily contain more reliable numbers.

If a seperate search engine could be constructed to search just for numerical facts, then reliability could be part of features. Numbers more often quoted are more reliable. Sources that quote numbers that are more often quoted are reliable, etc. Such a search engine could focus on <table> tags and try to work out the meaning of cell values by scanning the horizontal and vertical headers. I would like that.