Douwe Osinga's Blog: Learning to Draw: Generating Icons and Hieroglyphs

Tuesday, January 17, 2017

In this blog post we'll explore techniques for machine drawn icons. We'll start with a brute force approach, before moving on to machine learning, where we'll teach a recurrent neural network to plot icons. Finally we'll use the same code to generate pseudo hieroglyphs by training the network on a set of hieroglyphs. With the addition of a little composition and a little coloring, we'll end up with this:


Last week's post "Deep Ink" explored how we can simulate computers playing with blobs of ink. But even if humans see things in these weird drawings, Neural Networks don't. If you take the output of Deep Ink and feed it back into something like Google's Inception, it offers no opinions.

The simplest thing I could come up with to generate icons was brute force. If you take a grid of 4x4 pixels, there are 2^16 possible black and white images. Feed all of them into an image classifier and see if anything gets labeled. It does. But 16 pixels don't offer a lot of space for expression, so the results are somewhat abstract if you want to be positive, or show weaknesses in the image classifier if you are negative. Here are some examples:


We can do a little better by going to 5x5. To give the model a little more to play with, we can add a permanent border. This will increase the size to 7x7, but we'll only flip the middle pixels. Unfortunately the amount of work we need to do going from 4x4 to 5x5 increases by a factor 512. Trying all 4x4 icons takes about an hour on my laptop. Exploring the 5x5 space takes weeks:


These are better and easier to understand. It some cases they even somewhat explain what the network was trying to see in the 4x4 images.

They say that if brute force doesn't work, you're just not using enough. In this case though, there might not be enough around. 8x8 for icons is tiny, but it would take my laptop something like 3 times the age of the universe to try all possibilities. Machine Learning to the rescue. Recurrent Networks are a popular choice to generate sequences, for example fake Shakespeare, recipes and Irish folk music, so why not icons?

I found a set of "free" icons at https://icons8.com/ After deduping it'll give us about 4500 icons. Downsample them to 8x8 and we can easily encode them as sequences of pixels to be turned on. An RNN can learn to draw these quite quickly. Adding a little coloring for variety and on a 15x10 grid you'll get this sort of output:


These are pretty nice. They look like monsters from an 8 bit video game. The network learns a sense of blobbiness that matches the input icons. There's also a sense of symmetry and some learned dithering when just black and white doesn't cut it. In short it, learns the shapes you'll get when you downsample icons to 8x8.

As cute as these throw backs to the 80's are, 64 pixels still isn't a lot for an icon. Especially since the input isn't a stream of optimized 8x8 icons, but rather downsampled 32x32 icons (the lowest resolution that the icons8 pack comes in).

We can't use the same encoding for 32x32 icons though. With a training set of 4500, having a vocabulary of 64 for the 8x8 icons is OK. Each pixel will occur on average 70 times, so the network has a chance to learn how they relate to each other. On a 32x32 grid, we'd have a vocabulary of 1024 and so the average pixel would only be seen 4 times, which just isn't enough to learn from.

We could run length encode; rather than store the absolute position of the next black pixel, store its relationship with the previous one. This works, but it makes it hard for the network to keep track where it is in the icon. An encoding that is easier to learn specifies for each scanline which pixels are turned on, followed by a new line. This works better:

The network does seem to learn the basic shapes it sees and we recognize some common patterns like the document and the circle. I showed these to somebody and their first reaction was "are these hieroglyphs?" They do look like hieroglyphs a bit of course and it begs the question, what happens if we train on actual hieroglyphs?

As often with these sort of experiments, the hard thing is getting the data. Images of hieroglyphs on the Internet are found easily; getting them in nice 32x32 pixel bitmaps is a different story though. I ended up reverse engineering a seemingly abandoned icon rendering app for the Mac that I found on Google Code (itself abandoned by Google). This gave me a training set of 2500 hieroglyphs.

The renderer responsible for the image at the beginning of this post has some specific modifications to make it more hieroglyphy: Icons appear underneath each other, unless two subsequent icons fit next to each other. Also, if the middle pixel of an icon is not set and the area it belongs to doesn't connect to any of the sides, it gets filled with yellow - the Old Egyptians seem to have done this.

Alternatively we can run the image classifier over the produced hieroglyphs:


You can see it as mediocre labeling by a neural network that was trained for something else. Or as hieroglyphs from a alternate history where the Ancient Egyptians developed modern technology and needed hieroglyphs for "submarine", "digital clock" and "traffic light"

As always you can find the code on github.












0 comments: