Home || Blog || Projects || Google Hacks || Artificial Life || Search || About

Tuesday, September 20, 2016

Project: Offline Movie Reviews

I used my first week of freedom to write a little toy-app: Offline Movie Reviews.

Airplanes don't fly faster than they did 40 years ago, nor do they provide us with more legroom. But we did make a lot of progress when it comes to personal entertainment on board. Most airlines these days will provide you with your own screen and a selection of movies to while the time away. They'll also usually insist that all their movies are just great. And while most will improve with each consumed Gin & Tonic, it still helps to pick one with a good base score. This is where the offline movie reviews app comes in.

It ships with the reviews of 15 000 of the most popular movies. Each usually has a thumbnail version of the movie poster and always the section from the wikipedia article describing the reception. This will typically contain the scores on rotten tomatoes, metacritic and/or the imdb and often comes with a quote or two from a movie critic. Enough to make a somewhat informed decision on how to spend the next two hours.

If you're not interested in how it technically works, you should download the app now, keep it on your phone for your next flight and stop reading.

Apart from the usefulness of the app, I wanted to accomplish two things: learn Swift and share with the world how at Triposo we process and massage data. When Swift came out, I liked some of the things, but was disappointed by how errors were handled and the lack of real garbage collection. Meanwhile, error handling has improved and overall I must say it is a very pleasant language to develop in (even more so if you compare it directly to Objective-C and all [its awkwardness]). The app is not very complicated - master/detail with a tiny bit of care to make sure it executes searches over the movies in smooth fashion.

The data processing builds on my wiki_import project. Wiki_import imports dumps of the Wikipedia, Wikidata and the Wikistats into a Postgres database, after which we can query things conveniently and fast. In this case we want to get our hands on all the movies from the wikipedia sorted by popularity. The wikipedia contains roughly a hundred thousand movies - including all of them would create a db of 700MB or so. We're shooting for roughly 100MB or 15 000 movies. The query to get these movies is then quite straightforward:

SELECT wikipedia.*, wikistats.viewcount 
FROM wikipedia JOIN wikistats ON wikipedia.title = wikistats.title WHERE wikipedia.infobox = 'film' 
ORDER BY wikistats.viewcount 
DESC limit 15000

For each movie, we collect a bunch of properties from the infobox using the mwparserfromhell package, an image and the critical reception of the movie. The properties have standard names, but their values can be formatted in a variety of ways, which requires some tedious stripping and normalizing - as always with wikipedia parsing. The image processing is quite straightforward. I crop and compress the image up to the pain limit to keep the size down. I switched to using Google's WebP which makes images a lot better at these high compression levels.

As you'd expect from user generated content, the critical reception section on the Wikipedia can hide under a number of headings. I might not have gotten all of them, but the great majority for sure. So we find the first of those headings and collect all the wikitext until we encounter a heading of the same or less indent. Feed that into mwparserfromhell wiki stripper and voilá: a text with the reception and only a minimum of wiki artifacts (some image attributes go awry it seems).

We then stick everything into a sqlite database with a full text search index on the title of the movie and the starring field, so we can search for both the name of a movie and who appears in it. That last bit isn't needed for when you decide which movie to watch, but I find myself often wondering, where did I see this actress before? Full text search on iOS works fast and well these days and even gives you prefix search for free.

You can find all the code on github.

Monday, September 12, 2016

Leaving Triposo

Wednesday, August 31 2016 was my last day as full time employee at Triposo, the travel guide company I started 5 years ago with my brothers and Jon Tirsen.

Triposo will continue to exist and will focus on delivering content and technology solutions for other companies. While I think that this is the best strategy for the company, it just isn't me. So Nishank Gopal will take over as CEO who has a lot more experience executing this sort of B2B strategy. I'll remain on the board and be involved as an adviser.

I'm taking some time off to think, write, code, learn and travel. With the company continuing, this isn't quite one of those Startup Post Mortems. I did want to share some thoughts on running travel companies though:

What worked and what didn't?

Triposo started out with a three pronged plan:
  • Build travel guides from targeted web crawls
  • Make the travel guides sticky by adding a travel log
  • Make money by selling tours and travel services on the go
The first prong worked rather well. We went from a few city guides that were basically mash ups of Wikipedia and Wikitravel to a travel guide that covered the world within the first year and kept improving the data quality from there on. I was especially proud when we launched the system that matched web pages automatically to our poi database and then ran opinion mining and fact extraction over those pages.

With this we could rank pois not just on one score, but on a variety of aspects - coffee, drinks, location, which in turn we could use for recommendations and personalization. On top of that we developed a nifty similarity measure for pois powering our "people that like this place, also like."

The second prong of adding a travel log, started promising. Being able to add photos and notes to entries in a travel guide and building a story that way, was fun. For us. Our users didn't use the feature very much though. They used Facebook for sharing their travel experiences. And so we were confronted with a choice: do we keep betting on two things, or do we focus on the thing that really works well, our core travel guide? We went with the last one and killed the travel log.

Sometimes I think we shouldn't have. 5 years ago, Facebook was the place to share this sort of thing, but I wonder if nowadays there would be room for a sharing platform specifically for travel. Breadtrip seems to do well in this space. But you know what they say, being too early is just as bad as being too late.

We didn't pay a lot of attention to our third prong in the first years. People spend a lot of money on travel and half of that is spent during the trip. We figured that once we had a large enough user base, they could start spending that through us. The conversion rates we got linking to web pages from our app were quite low and it seemed to us that just natifying those flows should do the trick.

It didn't. Or not enough. In our presentations we always talked about the shift from desktop to mobile and from booking before a trip to during a trip. This trend is real, but we still have a long way to go. People are happy to research a hotel on their phone, but when it is time to make a booking and enter those credit card details, they'll often quickly switch to the desktop browser, leaving your poor travel guide without its margin.

The other issue was that for tours and activities we had almost no options that had same day availability. When your model is based on telling people at the breakfast table what they should be doing that day in the city where they are, this is a problem. Again, I'm sure this will get better in the next few years, but it didn't in time for us.

What do you do when things don't work?

This is a question people in the start-up world don't talk about much. The general opinion is that when you have a start-up, you focus on that one thing that you do best. That's how you become successful, that's how Google and Facebook did it. Only when you are huge do you diversify.

That's all very well, but what if the one thing you are good at isn't enough? Initially we were doing great, our user base was growing exponentially. But that growth wasn't really viral, it was just Apple and Google sending us downloads. With the travel log shut down, we were seeing bad retention numbers. With our bookings on the go not really taking off, we didn't have a real ecommerce play either.

So what do you do? "Pivot" is a popular answer. But for every success story about pivots there are ten failures and to me it always seemed like spending the money of your investors on an idea they didn't invest in. So you start thinking about things you could add that would fix retention or fix conversion. 

City walks, mini guides, a chat room for triposo users in the location, printable posters, sponsored free wifi, audio guides, a chat bot that advises users about hotels and attractions, partly powered by a human - we built all these things and launched them. And then when the feature doesn't quite take off, you are faced with the choice of removing it and disappointing the users that enjoyed it, or have it clutter up an already complex app.

Maybe this is the right strategy. You try stuff until you hit it out of the park or run out of money. But often I think we should just have focused on building the best travel guide possible. Improve the data quality, the data coverage and the smartness. And if that's not enough, well, then there just wasn't enough a market for the original plan.

Can a travel planning app be a success?

A few month ago there was a popular blog post titled "Why you should never consider a travel planning startup." I was asked a few times about my opinion. Triposo was of course never a travel "planning" startup - we always focused on being helpful when you are on the road. But the arguments against it are very similar.

In short the article says: Getting lots of users for travel is hard, because people do it only once or twice a year. Getting people comfortable with something as complicated as a travel planning app is hard. Getting people to trust you enough to book through you rather than through an OTA they know is hard. Outbidding the site that pays you a commision for a hotel sale is hard.

This is all true and we've seen all of these things first hand at Triposo. But even though I'm writing a post about why Triposo as a consumer product hasn't taken off, I would still answer the question of whether a travel planning app can be a success with a yes.

First of all, these arguments are about all travel startups, not just the ones that do planning or help you while on the road. And yet using Kayak has become a habit. We actually succeeded in attracting a fair amount of users organically. And while we had trouble getting people to book through our app, Tripadvisor figured this out - I could read the reviews there and then go to Expedia to make my booking. And outbidding the guy who pays you a commission is the hallmark of the entire travel industry. How can Booking.com outbid the hotels themselves on Google?

We focused on being a travel guide that is helpful when you are at the destination, because people don't like to plan. It seems inevitable that there will be an app that will let you have a perfect experience on your trip without you doing more planning than necessary. An app that has all the travel information in the world and knows who you are, where you are and your mood. Unfortunately it looks like it won't be Triposo.

So what's next?

I'm taking some time off to learn, write, code, read and travel. I think that when it comes to technology things have never been as interesting as they are now, so taking a bit of time to figure out what's next seems like the best approach. I'll be doing some smaller projects around stuff I want to try out. A first small one you can find here: https://github.com/DOsinga/wiki_import - some scripts to import the wikipedia, wikidata and wikistats into postgres and make them searchable.

Triposo as a consumer product will continue and will remain "probably the best travel guide" in the app store. The engineering team will focus on data quality, coverage and smartness - in a way executing on the "focus on the one thing you're good at" strategy.  If you are interested in using the Triposo data and smartness for your own business, get in touch. There's some wonderful stuff there.

Sunday, June 5, 2016

Predictions for Euro 2016

Two years ago I coded up a small python model to simulate the world cup. The results back then were more or line with what the general predictions were; Brazil to win.

I updated the model for the Euro 2016 tournament. My data source for matches had gone, so I had to adjust that and I also introduced weights for previous games. Games that are friendly, or longer ago weigh less. The oldest matches I am taking into account are from just after the World Cup.

The results seem more different from the pundits than last time around. France is the favorite (25%), but that is because the home advantage which I set at 0.25 - historically the model has it between 0.2 and 0.3. Poland is the surprising number two with 21%. They did a decent job qualifying, had some good friendlies, so I find it hard to argue with.

Spain and England are basically tied at 11%.  Of course Englands performance could very well decide whether Brexit happens or not, so this is important.

The model does not like Germany's chances much at 8%. The results from two years ago are now weighed only at 30% because of the time gone by.

Just to put my money where my model is, I made an actual bet for Poland to win

Saturday, January 2, 2016

Where the streets have no name

Growing up in the Netherlands I never considered that our system for street addresses wasn't obvious and therefore universal. Street , Postal code, City, Country. How else would you do it? It turns out there are many ways.

Putting the house number before the street actually is more consistent. Not using house numbers, but the number of meters from a crossing gives one a better idea of where the house actually will be. Some places issue house numbers in chronological order rather than in a geographical fashion. Some don't use house numbers at all, but give buildings names. In Japan streets usually don't have names, but the blocks (banchi) do. In India (at least in Hyderabad) there are street names and numbers, but if you want to go somewhere you need to specify the closest landmark - a temple, a shopping mall or maybe an office building.

Bangkok is no exception to these exceptions. Landmarks are also popular, but more to give a general idea where things are. Streets in Bangkok follow more the pattern of rivers than the grid pattern of North American cities with the smallest streets meandering until they flow into a bigger street which in turn meanders until it merges into an even bigger street.

Addresses start with the biggest street which has a name and then count down the number of side streets with odd and even ones on opposite sides of the streets. If the side street has its own side streets, this process is repeated.

It has its own logic to it, but it is confusing to new comers. You ask your hotel what the address is and they say something like "Soi 3." If you then walk around town for a full day and tell your taxi driver "take me home to Soi 3", they'll look at you confused. The third side street of what?

Thursday, December 3, 2015

Moving to Thailand!

Let's start with the news. Tonja and me are moving to Thailand. We don't know yet for how long or exactly where we'll live, but if everything works out with the visa, we should be living in Bangkok from January 2016 on.

Those who know us a little might suspect this is because of the weather or the food or that we're just ready for something else after 4 good years in Berlin. Those things all count on some level, but the real and immediate reason is Triposo, the startup I've working on since leaving Google.

When we started with Triposo, we wanted to build the best travel guide for mobile. I think we mostly succeeded and we'll continue improving over the next years, but we also need to proof that we can make real money or to proof that the unit economics work as is popular to say now.

And while it makes sense to cover the entire world at the same time when you build a algorithmic travel guide (you make something work somewhere and it works everywhere), it is less clear that this is true when it comes to selling services to travellers in the app. We think we need to work closely with local providers in new ways - watch this space for further developments. And that's why I am here.

My last burst of blogging was when we were living in India, so I wanted to pick this up again now that we're back in the tropics.

Wednesday, February 20, 2013

What do you do after a genoicide?

Arriving in Kigali, the capital of Rwanda from most other African countries must be a bit of a reverse culture shock; the city is clean and pretty, the traffic not too busy and well behaved. The shops are well laid out and give a sense of prosperity and the people seem healthy and relaxed. The government though recently giving in to a certain degree of authoritarianism, is still efficient with streaks of visionary mixed in; they banned plastic bags and decided to change the national language from French to English for economic reasons (though certain disagreements with the government in Paris might have pushed them over the edge). All in all it feels more like a nation taking its cue from Singapore than South-Africa.

I imagine it is much like Germany must have been in the sixties. It's been about 20 years since Hutu death squads went on a killing spree killing around a million Tutsis and moderate Hutu's in one of the worst genocides of the second half of the twentieth century. Led by Paul Kagame, the current president, the RPF, a Tutsi dominated rebel movement, succeeded in pushing out the genocidistas before the United Nations got their act together.

What puzzles me is how they got back to a state of normalcy. The Rwandese genocide didn't happen in relatively remote concentration camps. It wasn't executed by a small group of well armed extremists. It happened everywhere at the same time, with neighbours killing neighbours, sometimes family members killing each other. People trying to find refugee in churches were sometimes turned over to their killers by nuns and priests, sometimes the Interahamwe would just blow up the church.

After World War II people in the Netherlands would whisper that somebody had been "wrong in the war" when they suspected collaborators or wonder if a visiting German tourist might have been "a good german". Over time that went away, but it took a good while. More than 40 years after the end of the war, football supporters were still celebrating the rare win over the German team declaring they got their grandfathers bicycle back.

In Rwanda they seemed to just have decided to do away with the whole thing. Now there are no more Hutu's or Tutsi, just Rwandese. The events in 1994 were a grim reminder that 80% of humans will turn into mass murderers given the right circumstances. Now Rwanda is showing the world that you can come back from even the worst tragedy imaginable.

Tuesday, February 12, 2013

The Paleo diet is wrong about grains

The Paleo diet insists we should only eat things our forefathers ate back in the stone age; our systems just aren't developed to process modern foods. It's an interesting idea that intuitively makes sense although the objection that it's crazy to get health advice from a group of people that had a life expectance of 32 is hard to overlook.

So you're mostly left with a diet of some vegetables and lots of animal protein from meat, fish and eggs. Especially grains are a big no-no. To the untrained eye it appears as yet another low-carb diet with a better back story. I think though that they are wrong about the grains.

I'm writing this while being on a trip to East-Africa, the cradle of humanity. And even though you don't see many primitive hominoids on the planes of the Serengeti, you do see baboons. Baboons aren't great apes so not very related to humans, but they do seem to fill a similar niche as early humans did; they're ape-like creatures living in social groups on the savannahs getting by on whatever they find.

This time of year the Serengeti looks like a field of grain. The rains make the grasses grow tall and all those grasses are laden with seeds. Those seeds are of course nowhere as big as modern grains but it is still free calories to the baboons. And so a common sight is to see a group of baboons "harvesting" "grains". It just seems very unlikely to me that our ancestors would just let that opportunity go.
(c) Douwe Osinga 2001-2005, douwe.webfeedback@gmail.com Goede Vertaling Nederlands Duits?