Douwe Osinga's Blog: 2012

Tuesday, October 9, 2012

Triposo Hackathon and the evolution of languages

This week we got the Triposo team together in Sitges, a nice beach town near Barcelona, for some thinking, discussions and general strategizing but also a hackathon in our finest of traditions.

I decided to look at language similarities. I have always been interested in the evolution of languages. It seems though that discussions about the similarity of languages are always a bit arbitrary. You need to compare lists of words, but how do you pick them? If you take the word 'town' in English you'd translate that into German to 'Stadt' and in Dutch it would be 'plaats'. Those words aren't very similar at all. However German also has the word 'Zaun' which sounds very similar to 'town' and means 'fence'. In Dutch there's a word 'tuin' which means 'garden'.

I wanted to take out this arbitrary part and do the comparisons fully automatic and occurred to me that if you only take the cardinal numbers (one to nine for example) into account, you'd take this arbitrariness away. I wouldn't expect those words to change their meaning easily.

We've had phrasebooks in the Triposo apps for a long while now, based on the content from Wikitravel. So I went ahead and wrote a script to extract from those Wikitravel pages for each language the phonetic version of the words one to nine. I then calculated the similarity between all language pairs by calculating some sort of edit distance between the corresponding pairs of words.

Traditionally the edit distance between two words is the minium amount of edits (i.e. deletions and insertions) to change one word into another. So 'town' and 'Zaun' have an edit distance of 6 (delete the 't', 'o' and the 'w', then insert a 'z', 'a' and a 'u' and from that perspective they're not very similar. You can do better by assigning a likelyhood of a specific transitions. The 't' and the german 'z' are a bit similar. Vowel changes are also quite likely to happen etc, etc.

Based on these pairs I then calculated a tree of languages. We start by creating for each language a language group consisting out of only that language. We then merge language groups that have small distances to each other. Subgroups that match well together and slightly less well with the other languages remain subgroups and so a tree is built.

The result is below. As you can see, there are clear groups of the Germanic, Slavic and Roman language groups. They fold together with some other languages into a Indo-European group. There's some other smaller groups that jump out (Turkic, Philippines, Arab) but most really are islands. Finnish and Estonian match up quite nicely, too. I left out some of the languages that the model turns into singletons.

It works surprisingly well given that the data is rather noisy and that is based on phonetic spelling rendered in English which just isn't great.

Thursday, September 27, 2012

Silicon Valley Startups vs German Startups

A lot of entrepreneurs who are starting out in Europe I talk to seem to have trouble with the ambition. Where as your 22 year old Silicon Valley startup will happily declare that their iPhone app will soon completely change the financial industry and chance banking as we know it, your average Berlin starter-upper seems to be satisfied to build a company that will let them and a 9 friends do something they like while making enough money to live a modest lifestyle.

It's easy to jump to the conclusion that the key to success is thinking big. But there is such a thing as thinking too big. At Google even before he became CEO Larry had the reputation that no matter what was pitched as a new project, his answer was always, no, no, your solving the wrong problem. You need to solve the underlying, bigger issue. Always solve the biggest problem available.

I was once pitched this idea of a structured data wiki at Google. There's lots of people that for whatever reason are very much interested in certain subjects and giving them the tools to organise themselves in "data communities" seemed to me like a good way to get some hard data into what is now the knowledge graph of Google. Larry didn't agree. Why build a tool for communities? Why only do structured data? Why do wikis? We should make it so that any part of the World Wide Web should be editable by anybody with the ability to annotate anything with any semantic value. 

Obviously that went nowhere. The best ideas that turn out big don't start that way; they often start with a small wonderment about something and grow from there. They say go big or go home, but go big prematurely and you go home too.

It is not only about the right amount of bigness the think about, but it occurs to me there's also a choice to be made to think about depths versus breadth. The Silicon Valley and German model make an interesting if idealised comparison. They start the same; somebody comes up with an interesting idea, builds a prototype, get some more people involved and they decide to go for it: to start a new company. A quick look at the wider market tells them of course that there are a lot of players already.

The Silicon Valley startup will look at this market and then try to raise enough money to throw enough resources to match the opportunity. To disrupt the market by changing the rules for everybody. This could fail at various levels of course; they might not raise the money, there's probably other startups with similar ideas and intentions and the market might simply not be disruptable. But if it works the pay off is obviously very large. Worst case you end up with a zombie company, best case with something like AirBnb.

The German startup meanwhile looks at the market and then decides to narrow down their focus on the biggest niche that is defendable and then focusses on becoming the very best in precisely this niche. It seems at first that this is the formula for a life style company but then by relentlessly focussing on their small and defendable niche and choosing product quality over short term profitability some of these companies become the best of the world in their respective area and join the ranks of the much admired Mittelstand. Worst case you end up with a struggling lifestyle company, best case with an unknown world-beater like Putzmeister who make the concrete pumping machines and export to all over the world.

The difference has partly to do with culture but also a lot with money. In Silicon Valley you can raise millions with a new company but only in return for the promise to build a 100 million dollar company. In return for 3 million dollars your VC will want 30% of the company and a path to making their money back 10 times - they'll value your company at 10 million now if you can show how you could sell it for 100 million later. This gives the startup not only the resources to go for a disrupt strategy but it also requires it.

For the German company it is usually much harder to raise significant amounts of money. A seed round against a very modest valuation is usually all that's to be had. Without the resources for immediate expansion, they focus on finding the right niche and building the right product for that niche. They might be successful but will typically remain small or make it to what they call middle-sized.

If you want you can see in this a reflection of the German approach to making things work the best they can and the American approach of going for bulk as fast as possible.

Monday, September 17, 2012

Ideas and the curse of powers of 10

Most people have lots of ideas. Some people even have ideas that are interesting and quite a few of those people start doing something with those ideas. Almost nobody finishes. Why is that? Simple. Each next phase in the execution takes ten times as long as the last one. When Edison said that genius was 1% inspiration and 99% perspiration he was off by two orders of magnitude. Here's what you can accomplish with various amounts of (man) hours:

1 Hour
You go to a pub with a friend you haven't talk to for a while and after a bit of random chatting about the economy, the tech industry and possible changed personal circumstances and most certainly some drinks, you hit upon this brilliant, world changing idea. It takes you only an hour.

10 Hours
If the idea looks still OK in the morning and there's an app or website to be build, you can built quite a nice demo in the next ten hours. It has still a lot of rough edges of course, but if things are on track, it shows how the idea could actually work in practice.

100 Hours
Vision made clear to the onlookers, it is time to build a prototype. Leave out the edge cases, combine some great open source building blocks with creative commons licensed art-work and before you know it you have something. You show it to your friends and they say, awesome! ship it!

1000 Hours
But it isn't quite shippable. So you maybe get a friend enthusiastic and start working weekends and evenings to get it there. Or maybe you are bolder, quit your job, maybe find a backer and you double down to get to the first version of what now becoming a product.

10 000 Hours
Once you have a first version out there and you're starting to have an impact it is time to start building an organisation around the idea and the product. You need to branch out, maybe raise money, get more people involved and face the bureaucracy.

Why this isn't bad

Most ideas die in their march to become real somewhere past the 100 hours when things start to become rather serious. Once you hit the 1000 hours it really seems you're 75% done but in reality you're only really just at 10%. It might seem frustrating but really it is not. It means you can try your brilliant idea and find out after only 1% of the effort needed to make it into something really serious whether it stands a chance at all. Only by giving up easily on ideas will you have the time to cover enough attempts. And only by having enough attempts will you come across the thing that makes it big.

Of course 10 000 hours is only the beginning. It'll take about a million hours to go public and 100 million to built the next Google.

It works in the reverse too. It's hard not to wonder about companies like Dropbox, EventBrite or Evernote who built a great product but seemingly did so in that first 1000 hours and don't seem to have that much more to show for now that they're closing on a million hours. This in turn leads to people unhappy with products like Twitter saying, oh I can build a brand new Twitter in a thousand hours and all will be well.

Saturday, September 8, 2012

On Bad Software and Bad Coffee

Two things have been puzzling me lately. Why do cafes make bad coffee and why is there so much bad software out there. In structure I think the two problems are related.

Let's start with the coffee. Why do so many cafes make bad coffee? I'm not talking about the ones that really don't care, that'll serve you nescafe in luke warm water, but about the places that look like they are about coffee. Why buy a shiny, multi thousand dollar espresso machine, invest in a top of the range grinder and set up an arrangement to get a steady supply of fresh beans of a known brand and then spoil the whole arrangement by having a "barista" who has no idea what they are doing? I understand that becoming really good takes years of experience. But learning how to make a semi-decent cup only takes a few hours of training. If you get all the equipment, why leave out the most important bit?

For software the argument is similar but not quite the same. We're hiring at Triposo ([email protected] if you feel like it could be for you) and luckily we get quite a few resumes and generally they look quite promising. We ask rather technical questions including some actual coding. Here's the thing. A good percentage of these candidates have trouble with what I would call basic coding, but at the same time they have been working in a role developing software in respectable companies.

So an obvious reason why there is so much bad software around is that it is hard enough to write decent code with great coders and that most companies don't care or don't have the ability to distinguish between people that can and cannot code. So they end up with a team of people who have XSLT on their resume and talk about frameworks, but don't actually know how to build something. Bad products are sure to follow.

I realize that training a barista is different from training a software engineer and that for most companies putting in the 10 000 hours to make somebody a good coder just isn't feasible, but it's harder to see why they wouldn't improve the hiring process to weed out the bad coders. Yes, it means it becomes even harder to hire new people, but with the best software engineers being up to 28 times more productive than the worst, it is really worth the effort.