Douwe Osinga's Blog: November 2016

Tuesday, November 22, 2016

Calculating the set of universal numbers

Frederick II allegedly tried to find out what the universal language of humanity was, by depriving a bunch of young children from any language input. He expected them to start speaking Latin, Greek or maybe whatever Adam and Eve spoke in paradise. Instead they went insane.

Esperanto and a set of even less successful competitors tried to construct a universal language (the rumor that Klingon has more speakers than Esperanto is not true, but the fact that people believe it tells you something). Linguists have gone the other way. Languages evolve and seem to have common ancestors. By studying these evolutions, you can come up with theories as to what that common ancestor might have been. If you are very daring you can take this all the way to a supposed Proto-Human Language.

Trying to follow language evolution this far back is tricky and the approach has been widely criticized. We can only observe language evolution for the last 2000 years or so - applying rules learned from that on the 198 thousand years before is extrapolating by a large margin. For example, most languages have become simpler in the last 2000 years, but how did they become complex in the first place?

Algorithmically, there is a much simpler way to determine the most common language. Given a reasonable edit distance between two words, find for each word the median translation over all languages. The median translation is the translation that has the lowest squared edit distance to all the others.
I've done just that for the number one to ten using the phrasebooks from wikivoyage:



(Expanding this approach beyond the numbers one to ten might be doable, but is harder - words don't just change pronunciation, but also meaning. The Dutch word "tuin", the English "town" and the German "Zaun" all have the same root, but mean respectively garden, town and fence)

It is somewhat remarkable that this approach works. Wikivoyage uses a rather unscientific phonetic spelling based on how English speakers would pronounce a word. The edit distance I'm using is Levenshtein with some SoundEx thrown in - both approaches pre-date microprocessors. The languages Wikivoyage cover are whatever their volunteers found interesting enough to add and of those I can only use the ones that happen to parse. But it does look reasonable to me.

Is there a way to support this intuition? Why, yes there is! By aggregating the distances between the numbers on the language level, a language distance matrix can be calculated. This in turn we can use to calculate a language tree. Here it is:


There are some weird bits in the tree, but by and large you see the major language groupings appear as we know them from linguistics. The Slavic and Germanic groups look quite convincing as does the Latin group although the insertion there of Welsh and Irish seems debatable. Malagasy and Hawaiian get their own minigroup, which is quite interesting.

I think this is a promising approach. Using IPA for pronunciation, getting a more representative set of languages in (and maybe weigh them by number of speakers) and using a distance measure based on linguistic theories could all improve performance quite dramatically. If you want to play with the code so far, have a look at https://github.com/DOsinga/universal_numbers



Sunday, November 20, 2016

The Jordan-Egypt Ferry

If you search the web for information about the ferry between Jordan and Egypt, you probably end up in a state of slight confusion. That is because as things stand right now, it is confusing. This post describes our experience which might proof a useful extra datapoint for anybody wanting to go that way.

Let's start with the basics. As of October 2016 there doesn't seem to be a fast ferry between Aqaba and Nueweba, only a regular one. Our ticket said it would leave at 11PM though at the ferry terminal they said it would be midnight. It actually left around 1AM. The fare we were quoted was US$ 75, which isn't cheap - it might be US$ 70 if you buy directly from the ferry terminal. The crossing took about 3 hours.

I think you need a good reason to take the ferry. The alternative of going through Elat in Israel and then continue to Taba is probably cheaper, faster and more comfortable. If you do and you want to travel outside of the Sinai, you should pick up a visa for Egypt in Aqaba as they only issue Sinai ones in Taba and it seems to be a pain to convert these into full visas.

We wanted to continue to Sudan after Egypt. Sudan might refuse you entry if you've been to Israel and even though Israel kindly doesn't stamp your passport, the exit stamp from Jordan allegedly has caused trouble for others, so we took the ferry.

If you arrive early in Aqaba you could spend the day at one of the resorts in south beach, though the one we checked closed at 6PM which is still some time from the midnight departure. Alternatively there's a public beach not too far from the ferry terminal.

We duly arrive at 8:30PM. You can easily come by an hour later, though maybe not much more as they did seem to shut down stuff way before the ferry left. Before you can get your exit stamp, you need to pay the JOD 10 exit tax at a counter around the corner. There's a small restaurant and a duty free shop that sells half liter Heinekens for US$ 3. The currency exchange rates offered are rather terrible, so better give that a miss.

At boarding time the officer in charge called out the foreigners to let them embark first. This might seem unfair, but then again, we also pay a lot more than the local price. You can drop your heavier luggage in a container downstairs and head upstairs. The seats are quite comfy and there's a little (non alcoholic) bar serving snacks and soda.

There's also a passport processing facility. If you don't have an Egyptian visa yet, hand over your passport here before they let all the non-foreigners aboard. Again make sure you mention you need a full visa and not a free, Sinai only visa - unless you fly out from say Sharm El Sheik of course.

The lounge in the back seemed quieter and partly more comfortable as some arm rests can be pulled up to create more space for sleeping. It is also considerably more airconditioned so bring a sweater.

After a few hours of fitful sleep, an officer in white woke me asking where my passport was - he was holding it in his hands, so I pointed that out. He collected all other foreigners (5) and navigated us off the boat, by way of the dropped off luggage through a series of check points into a waiting area opposite a small office where the Egyptian visas are processed. The charge was US$ 25, payable in cash only. There didn't seem to be an ATM at the place.

Once outside of the ferry terminal you can either wait for the busses to start running from around 6AM or haggle with the collected taxi drivers. We paid 400 EGP (US$ 44) for four people to Sharm and made it there just before sunrise.