Today's project is a small Python notebook to recommend movies. I know, I know, there's a million of those out there, but this one is special, since it is not trained on user ratings, but on the outgoing links of the Wikipedia articles of the movies.
Why is that good? Two reasons. One is using diverse data. When you build a recommender system just on user ratings, you do get an Amazon-like system of people that liked this movie, also liked that movie. But if you're not using information like the year of the movie, the genre or the director, you are throwing away a lot of relevant features that are easy to get.
The second reason is that when you start a new project, you probably don't have enough user ratings to be able to recommend stuff from the get go. On the other hand, for many knowledge areas it is easy to extract the relevant wikipedia pages.
The outgoing links of a wikipedia page make for a good signature. Similar pages will often link to the same page. Estimating the similarity between two pages by calculating the jaccard distance would probably already work quite well. I went a little further and trained an embedding layer over the outgoing links.
The result is not Netflix quality, but it works reasonably well. As an extra bonus, I projected the resulting movies onto a 2 dimensional plane, rendering their movie posters as placeholders. It's fun to explore movies that way. Go play with it.