Douwe Osinga's Blog

Tuesday, January 17, 2017

Learning to Draw: Generating Icons and Hieroglyphs

In this blog post we'll explore techniques for machine drawn icons. We'll start with a brute force approach, before moving on to machine learning, where we'll teach a recurrent neural network to plot icons. Finally we'll use the same code to generate pseudo hieroglyphs by training the network on a set of hieroglyphs. With the addition of a little composition and a little coloring, we'll end up with this:

Last week's post "Deep Ink" explored how we can simulate computers playing with blobs of ink. But even if humans see things in these weird drawings, Neural Networks don't. If you take the output of Deep Ink and feed it back into something like Google's Inception, it offers no opinions.

The simplest thing I could come up with to generate icons was brute force. If you take a grid of 4x4 pixels, there are 2^16 possible black and white images. Feed all of them into an image classifier and see if anything gets labeled. It does. But 16 pixels don't offer a lot of space for expression, so the results are somewhat abstract if you want to be positive, or show weaknesses in the image classifier if you are negative. Here are some examples:

We can do a little better by going to 5x5. To give the model a little more to play with, we can add a permanent border. This will increase the size to 7x7, but we'll only flip the middle pixels. Unfortunately the amount of work we need to do going from 4x4 to 5x5 increases by a factor 512. Trying all 4x4 icons takes about an hour on my laptop. Exploring the 5x5 space takes weeks:

These are better and easier to understand. It some cases they even somewhat explain what the network was trying to see in the 4x4 images.

They say that if brute force doesn't work, you're just not using enough. In this case though, there might not be enough around. 8x8 for icons is tiny, but it would take my laptop something like 3 times the age of the universe to try all possibilities. Machine Learning to the rescue. Recurrent Networks are a popular choice to generate sequences, for example fake Shakespeare, recipes and Irish folk music, so why not icons?

I found a set of "free" icons at After deduping it'll give us about 4500 icons. Downsample them to 8x8 and we can easily encode them as sequences of pixels to be turned on. An RNN can learn to draw these quite quickly. Adding a little coloring for variety and on a 15x10 grid you'll get this sort of output:

These are pretty nice. They look like monsters from an 8 bit video game. The network learns a sense of blobbiness that matches the input icons. There's also a sense of symmetry and some learned dithering when just black and white doesn't cut it. In short it, learns the shapes you'll get when you downsample icons to 8x8.

As cute as these throw backs to the 80's are, 64 pixels still isn't a lot for an icon. Especially since the input isn't a stream of optimized 8x8 icons, but rather downsampled 32x32 icons (the lowest resolution that the icons8 pack comes in).

We can't use the same encoding for 32x32 icons though. With a training set of 4500, having a vocabulary of 64 for the 8x8 icons is OK. Each pixel will occur on average 70 times, so the network has a chance to learn how they relate to each other. On a 32x32 grid, we'd have a vocabulary of 1024 and so the average pixel would only be seen 4 times, which just isn't enough to learn from.

We could run length encode; rather than store the absolute position of the next black pixel, store its relationship with the previous one. This works, but it makes it hard for the network to keep track where it is in the icon. An encoding that is easier to learn specifies for each scanline which pixels are turned on, followed by a new line. This works better:

The network does seem to learn the basic shapes it sees and we recognize some common patterns like the document and the circle. I showed these to somebody and their first reaction was "are these hieroglyphs?" They do look like hieroglyphs a bit of course and it begs the question, what happens if we train on actual hieroglyphs?

As often with these sort of experiments, the hard thing is getting the data. Images of hieroglyphs on the Internet are found easily; getting them in nice 32x32 pixel bitmaps is a different story though. I ended up reverse engineering a seemingly abandoned icon rendering app for the Mac that I found on Google Code (itself abandoned by Google). This gave me a training set of 2500 hieroglyphs.

The renderer responsible for the image at the beginning of this post has some specific modifications to make it more hieroglyphy: Icons appear underneath each other, unless two subsequent icons fit next to each other. Also, if the middle pixel of an icon is not set and the area it belongs to doesn't connect to any of the sides, it gets filled with yellow - the Old Egyptians seem to have done this.

Alternatively we can run the image classifier over the produced hieroglyphs:

You can see it as mediocre labeling by a neural network that was trained for something else. Or as hieroglyphs from a alternate history where the Ancient Egyptians developed modern technology and needed hieroglyphs for "submarine", "digital clock" and "traffic light"

As always you can find the code on github.

Thursday, January 12, 2017

Deep Ink: Machine Learning fantasies in black and white

One of the New New Things in Machine Learning is the concept of adversary networks. Just like in coevolution, where both the leopard and the antelope become faster in competition with each other, the idea here is to have one network learn something and to have the other network to learn judge the results of the first. The process does a remarkable job generating photorealistic images. Deep dreaming has been generating crazy and psychedelic images for a while now. RNNs have been imitating writers for a few years with remarkable results.
Deep Dreaming, by jessica mullen from austin, txt

The state of the art Neural Networks used to classify images contain many connected layers. You feed in the image on one side and a classification rolls out on the other side. One way to think about this process is to compare it with how the visual cortex of animals is organized. The lowest level recognizes only pixels, the layer just above it knows about edges and corners. As we go up through the layers, the level of abstraction increases too. At the very top the network will say "I saw a cat." One or two levels below that though, it might have neurons seeing an eye, paw or the skin pattern.

Deep Dreaming reverses this process. Rather than asking the network, do you see an eye somewhere in this picture, it asks the network, how would you modify this picture to make you see more eyes in it? There's a little more to this, but this is the basic principle.

So Software can imitate Art, just as Life does. It tends to be fairly literal though. The Deep Dreaming images, fascinating as they are, reflect patterns seen elsewhere in clouds or on top of random noise. So that got me thinking, what happens if we force some stark restrictions on what the network can do?

Deep Ink works similarly. But instead of starting with an image of a cloud, we start with white picture that has a little blob of black pixels in the middle, a little bubble of ink if you will. We then run the network over this image, but rather than allowing to adjust the pixels a tiny bit in a certain direction, the only thing it can do, is flip pixels, either from black to white or the other way around.

The network can't do much with areas that are pure black or pure white, so in effect it will only flip pixels at the border of the ink bubble in the middle. It's like it takes a pen and draws from the center in random directions to the sides, making patterns in the ink. Making that into an animated gif shows off the process nicely.
You can find the code as always on Github. You can experiment with which layer to activate and which channel in that layer. Activating a channel in the top layer doesn't seem to draw something represented that channel though. The other thing to play with is, are the values representing black and white in the network. I keep them very close together - the further apart they are, the more high frequencies sneak in.

Monday, January 9, 2017

Building Spotify's Song Radio in 100 lines of Python

Somebody said that when it comes to deep learning, it is still very early days, Like 1995 (I was going to lookup the quote, but I can't find - it was probably Churchill or Mark Twain). I disagree. The early days are gone, it is more like 1958. Fortran has just been invented. The early days of having to implement the mechanics of neural networks is akin to writing machine code in the Fifties. Platforms like Tensorflow, Theano and Keras let us focus on what we can do, rather than how.

Marconi the inventor (from Wikipedia)

Marconi is a demonstration of this. It shows how to build a clone of Spotify's Song Radio in a less than a hundred of Python using open source libraries and data easily scraped from the Internet. If you include the code that scrapes and preprocesses the data, it is almost 400 lines, but still.

Pandora was the first company to successfully launch a product that produced playlists based on a song. It did this by employing humans who would listen to songs and characterize each on 450 musical dimensions. It worked well. Once you have all songs mapped into this multi-dimensional space, finding similar songs is a mere matter of finding songs that occupy a similar position in this space.

Employing humans is expensive though (even underpaid musicians). So how do we automatically map songs into a multi-dimensional space? These days the answer has to be Machine Learning. Now you could build some sort of model that really understands music. Probably. It sounds really hard though. Most likely you'd need more than a hundred if not more than a thousand lines of code.

Marconi doesn't know anything about the actual music. It is trained on playlists. The underlying notion here is that similar songs will occur near each other in playlists. When you are building a playlist, you do this based on the mood you are in, or maybe the mood you want to create.

A good analogy here is Word2Vec. Word2Vec learns word similarities by feeding it sentences. After a lot of sentences, it knows that "coffee" and "espresso" are similar, because it notices that in a sentence where you use the one, you might as well expect the other. It even learns deeper relationships between words, for example that the words "king" and "man" have the same relationship as the words "queen" and "woman".

Fascinating stuff. Usefully, the Python library GenSim contains a great implementation of Word2Vec. So if we feed this playlists containing song ids, rather than sentences containing words, it will after a while learn relationships between songs. Suggesting a playlist based on a song becomes than again a straightforward nearest neighbor search.

The trick is collecting the data. How do we get our hands on a large set of playlists? To their great credit, Spotify has a wonderful API that lets you get info on songs, artists and playlists. It does not, however, grant access to a dump of (public) playlists, which would be the ideal input for this project.

The workaround I use, is to search for words in the title of playlists. We start with the word 'a'. This will return a thousand playlists containing the word 'a'. We store those. Then we count for all playlists scraped so far, how often we see any of the words in the titles of those playlists. We pick the word that appears most often that we haven't searched for. Rinse and repeat. So after 'a', you'll see 'the', 'of' etc. After a while 'greatest' and 'hits' appear.

It works and quickly returns a largish set of public playlists. It's not ideal in that it is hardly a natural way to sample playlists. For example, by searching for the most popular words, chances are you'll get the most popular playlists. The playlists returned also seemed very long (hundreds of songs), but maybe that's normal.

Next up: get the tracks that are actually in those playlists. Thanks to Spotify's API, that's quite simple. Just keep calling:

res = session.user_playlist_tracks(owner_id,  playlist_id,
    fields='items(track(id, name, artists(name, id), duration_ms)),next')
There's a bunch of parsing, boiler plate and handling timeouts etc to make it work in practice, but it's all fairly straight-forward.

Once we have the training data, building the model is also quite easy. I throw out playlists that are too long or that have only songs from one artist and the model is trained with a lowish number of dimensions. To make this accessible online, I import the song vectors into postgres. Recommending music then becomes as simple as this sql statement:

SELECT song_id, song_name, artist_name, 
       cube_distance(cube(vec), cube(%(mid_vec)s)) as distance
FROM song2vec ORDER BY distance
Where mid_vec is the vector representing the song that was used as an input, or the middle of a set of vectors if multiple songs were provided. 

How does it perform? Well, you can try for yourself, but I think it works pretty well!

There's a lot of room for more experiments here, I think. Building an artists only model would be a simple extension. Looking at the meta information of the songs and using it to build classifiers might also be interesting. Even more interesting would be to look at the actual music and see if there are features in the wave patterns that we can map onto the song vectors.

Tuesday, November 22, 2016

Calculating the set of universal numbers

Frederick II allegedly tried to find out what the universal language of humanity was, by depriving a bunch of young children from any language input. He expected them to start speaking Latin, Greek or maybe whatever Adam and Eve spoke in paradise. Instead they went insane.

Esperanto and a set of even less successful competitors tried to construct a universal language (the rumor that Klingon has more speakers than Esperanto is not true, but the fact that people believe it tells you something). Linguists have gone the other way. Languages evolve and seem to have common ancestors. By studying these evolutions, you can come up with theories as to what that common ancestor might have been. If you are very daring you can take this all the way to a supposed Proto-Human Language.

Trying to follow language evolution this far back is tricky and the approach has been widely criticized. We can only observe language evolution for the last 2000 years or so - applying rules learned from that on the 198 thousand years before is extrapolating by a large margin. For example, most languages have become simpler in the last 2000 years, but how did they become complex in the first place?

Algorithmically, there is a much simpler way to determine the most common language. Given a reasonable edit distance between two words, find for each word the median translation over all languages. The median translation is the translation that has the lowest squared edit distance to all the others.
I've done just that for the number one to ten using the phrasebooks from wikivoyage:

(Expanding this approach beyond the numbers one to ten might be doable, but is harder - words don't just change pronunciation, but also meaning. The Dutch word "tuin", the English "town" and the German "Zaun" all have the same root, but mean respectively garden, town and fence)

It is somewhat remarkable that this approach works. Wikivoyage uses a rather unscientific phonetic spelling based on how English speakers would pronounce a word. The edit distance I'm using is Levenshtein with some SoundEx thrown in - both approaches pre-date microprocessors. The languages Wikivoyage cover are whatever their volunteers found interesting enough to add and of those I can only use the ones that happen to parse. But it does look reasonable to me.

Is there a way to support this intuition? Why, yes there is! By aggregating the distances between the numbers on the language level, a language distance matrix can be calculated. This in turn we can use to calculate a language tree. Here it is:

There are some weird bits in the tree, but by and large you see the major language groupings appear as we know them from linguistics. The Slavic and Germanic groups look quite convincing as does the Latin group although the insertion there of Welsh and Irish seems debatable. Malagasy and Hawaiian get their own minigroup, which is quite interesting.

I think this is a promising approach. Using IPA for pronunciation, getting a more representative set of languages in (and maybe weigh them by number of speakers) and using a distance measure based on linguistic theories could all improve performance quite dramatically. If you want to play with the code so far, have a look at

Sunday, November 20, 2016

The Jordan-Egypt Ferry

If you search the web for information about the ferry between Jordan and Egypt, you probably end up in a state of slight confusion. That is because as things stand right now, it is confusing. This post describes our experience which might proof a useful extra datapoint for anybody wanting to go that way.

Let's start with the basics. As of October 2016 there doesn't seem to be a fast ferry between Aqaba and Nueweba, only a regular one. Our ticket said it would leave at 11PM though at the ferry terminal they said it would be midnight. It actually left around 1AM. The fare we were quoted was US$ 75, which isn't cheap - it might be US$ 70 if you buy directly from the ferry terminal. The crossing took about 3 hours.

I think you need a good reason to take the ferry. The alternative of going through Elat in Israel and then continue to Taba is probably cheaper, faster and more comfortable. If you do and you want to travel outside of the Sinai, you should pick up a visa for Egypt in Aqaba as they only issue Sinai ones in Taba and it seems to be a pain to convert these into full visas.

We wanted to continue to Sudan after Egypt. Sudan might refuse you entry if you've been to Israel and even though Israel kindly doesn't stamp your passport, the exit stamp from Jordan allegedly has caused trouble for others, so we took the ferry.

If you arrive early in Aqaba you could spend the day at one of the resorts in south beach, though the one we checked closed at 6PM which is still some time from the midnight departure. Alternatively there's a public beach not too far from the ferry terminal.

We duly arrive at 8:30PM. You can easily come by an hour later, though maybe not much more as they did seem to shut down stuff way before the ferry left. Before you can get your exit stamp, you need to pay the JOD 10 exit tax at a counter around the corner. There's a small restaurant and a duty free shop that sells half liter Heinekens for US$ 3. The currency exchange rates offered are rather terrible, so better give that a miss.

At boarding time the officer in charge called out the foreigners to let them embark first. This might seem unfair, but then again, we also pay a lot more than the local price. You can drop your heavier luggage in a container downstairs and head upstairs. The seats are quite comfy and there's a little (non alcoholic) bar serving snacks and soda.

There's also a passport processing facility. If you don't have an Egyptian visa yet, hand over your passport here before they let all the non-foreigners aboard. Again make sure you mention you need a full visa and not a free, Sinai only visa - unless you fly out from say Sharm El Sheik of course.

The lounge in the back seemed quieter and partly more comfortable as some arm rests can be pulled up to create more space for sleeping. It is also considerably more airconditioned so bring a sweater.

After a few hours of fitful sleep, an officer in white woke me asking where my passport was - he was holding it in his hands, so I pointed that out. He collected all other foreigners (5) and navigated us off the boat, by way of the dropped off luggage through a series of check points into a waiting area opposite a small office where the Egyptian visas are processed. The charge was US$ 25, payable in cash only. There didn't seem to be an ATM at the place.

Once outside of the ferry terminal you can either wait for the busses to start running from around 6AM or haggle with the collected taxi drivers. We paid 400 EGP (US$ 44) for four people to Sharm and made it there just before sunrise.

Friday, October 21, 2016


When I was 11 or so, I saw my first Times World Atlas. I was blown away. It was so much better in every way compared to the school atlas I was used to. The maps were huge, detailed and beautiful. The thematic maps and diagrams visualized everything from land use to economy to desertification.
Countries resized to reflect their populations

These days, with Google Maps and data visualizations of every type, you don't hear so much about atlases other than that people cut up old ones to sell the individual maps as wall decorations. There was one type of visualisation in the Times World Atlas that impressed me very much that I haven't seen online though: resized countries.

The idea is that you change the size of a country according to some statistic while trying to minimize the overall distortion of the map. As a kid, I wondered how you would calculate this - now I am pretty sure they just had an artist do their best. It is therefore with some pride that I am publishing an algorithm here to do something similar online: WorldSizer.

It's a two step process. The first, offline step takes the country data from the CIA Factbook and the shape information from Natural Earth. The CIA data is nice, because it is more evenly edited than, say, Wikipedia, but does need some massaging. Especially the country codes used by the CIA are seen nowhere else. One wonders if this has ever led to the wrong government being thrown over in a small Latin American country.

The shape files from Natural Earth have one or more shapes per country. The first step is to apply an area preserving projection to the shapes. Resizing a mercator map according to population would only cause confusion since it would still show India too small and Canada too big. The next step replace the points of the shapes with list of indexes into a global list of points. Here we also try to make sure that points on shared borders are only stored once. Since now points on borders are shared, we modify the shape of one country, it will also modify the shape of the other.

The online step takes the output of this and allows the user to pick a measure that the world will be "resized" to. For each country, we calculate the deflation or inflation factor needed to reflect the chosen measure. Then in an iterative process, we calculate for each shape how far the current area is off from the target area. If the shape is too small, all points of the shape are pushed away from the center, if the shape is too big, they are pushed towards the center.

For islands, this is enough. For countries that share borders, a tug of war process plays out. Especially in a region where all shapes need to grow or shrink, it takes a while for things to stabilize. Each shape also tries to maintain its original shape - for example, if you look at the map for population, you see India grow and sort of fall out of Asia as a blob before regaining its normal shape (only much bigger).

The shared borders and the country shape maintenance keep continents mostly in shape, too. But without extra care, islands will just drift into the mainland or each other. We stop this from happening by calculating "bridges". For each island, we look for a larger land mass nearby and anchor it to it. This keeps Ireland next to Great Britain, Great Britain next to France and Sri Lanka close to India.

Finally, we create some bridges by hand. Neither Spain nor Morocco are islands, but we'd still don't want them to crash into each other. Similarly, we attach Yemen to Somalia, Australia to New Zealand and Sweden to Denmark. In some situations this still leads to overlap. If you use proved oil reserves as a measure, the Middle East of course increases in size by a lot. But it has nowhere to go, so it pushes into the Mediterranean and squashes Syria into Greece.
More sad: scaled to number of people living with AIDS
One could add more bridges to force more map preservation, but this does come at a cost. The more forced the map, the less freedom the model has to preserve the shapes and get to the target sizes and it starts to behave more and more like a water balloon where you squeeze on one side and it just bulges out on the other.

As usual the source code is on GitHub. It should be fairly straight forward to use the worldsizer.js on another website with different data.

Wednesday, October 12, 2016

Styled Museums

The Prisma app became an overnight hit after it launched because of its great photo filters. Rather than just applying some face feature transformation or adjusting the colors, it rerenders a picture in the style of a famous artwork. The results are remarkable and quite recognizable. And it is no secret how this is done. The basic algorithm is described in a paper published a year ago called "A Neural Algorithm of Artistic Style"

Besides the scientific paper and the startup executing on it, there's also an open source implementation of the algorithm. I played around with it a bit and it also works well, although it seems roughly 100x slower than Prisma. It got me thinking, what happens if you use this to re-render pictures of museums in the style of their most famous work? That way you see what the building is like and at the same time what to expect when you go in.

Styled Museums does exactly that. It has the top 100 museums (by their wikipedia page view count) and their most popular works (same measure) and shows them on a world map. It uses the wiki_import frame work to get the data. You can click around and find your favorite museum and see what happens to it.

I think the fact that artistic style transfer is available as a scientific paper, a startup and an open source implementation is indicative of a wider trend. We now live in a world where we have three forms of innovation. The traditional scientific method where publicly financed institutions produce papers describing new ideas; the startup world that funnels large amounts of private money into ideas to see if they come to commercial fruition; and finally the open source world where individuals build something and share it with the world to build up their public profile.

These three engines of innovation aren't silos. Google started out as a scientific experiment, became a startup and a commercial success and now publishes scientific papers and open sources part of their technology. Github is a startup that is not only based on an open source project, but also hosts other open source projects. Twitter open sourced their data processing engine, which now helps academics keep up with what their peers in Silicon Valley are up to.

It doesn't always seem fair. The founders of Google became billionaires with technology developed while being employed by Stanford University, while the inventor of the world wide web works for a non profit. For years Werner Koch maintained the GnuPG email encryption package on the salary of a postman, while the founder of Hotmail is worth more than a 100 million dollars.

The 1980 Bayh-Dole Act in the US explains some of the difference between there and Europe. It allows universities and companies to claim patent rights on research undertaken with federal funding. On one level this doesn't seem right - if the government paid for research, shouldn't the patents end up with the government too? Then again, it turns out that the government isn't particularly good at doing interesting stuff with those patents - startups do much better.

And so we end up with Styled Museums. Inspired by Prisma, a VC funded company, I found the original paper which is based on research paid for mostly by the University of Tübingen, which in turn led me to an Open Source implementation. You can find the code used to get to Styled Museums (of interest is mostly the matching of museums and paintings) on Github, of course.