Douwe Osinga's Blog: More on ungoogle numbers

Saturday, December 6, 2003

Two days ago I was writing about an idea by Hjalmar Gislason: the smallest number that cannot be found on Google and how hard it is to find it. I've thought about it some more and came up with an algorithm that finds it about 10 times faster than just doing repeated searches.


At face value it seems there is no way around it, you're just going to have to query Google for all the numbers from one up until it comes back with: no search results. You can try tricks with AND and OR and what have you, but that doesn't help you.


But if you start searching for big numbers, you'll notice something. A lot of them are serial number of a kind or telephone numbers and they don't come alone. One guy will put up a page with serial numbers of a software product, another of phone numbers in his area.


So I wrote a little script that does a Google search, retrieves the first hundred hits including the little text Google puts around it and finds all numbers in the text. My script start with searching for 1 and finds about 25 numbers or so. It puts them in a list. Then it searches for the next number not yet in the list and it adds all the new numbers found in this search. After a while the becomes more and more frequent for the program to skip large series of numbers, because they have already appeared in earlier results. Scanning the numbers between 1 000 000 and 1 000 5000 took about 500 searches like this and resulted in quite a list of numbers we also know Google has.


Of course this is not going to cut it if there are indeed a million numbers to scan. 100 000 google searches still take a long time. Maybe we can even come up with smarter things than this.

0 comments: