tag:blogger.com,1999:blog-79055801746871179052024-03-15T18:12:27.309-07:00Douwe Osinga's BlogDouwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.comBlogger405125tag:blogger.com,1999:blog-7905580174687117905.post-79402802396711589972024-01-20T23:10:00.000-08:002024-01-23T03:35:32.141-08:00Lies, statistics and the ultra rich<p>Every year during the gathering of the world leaders in Davos, Oxfam comes out with <a href="https://www.oxfam.org/en/press-releases/wealth-five-richest-men-doubles-2020-five-billion-people-made-poorer-decade-division">a report</a> about how the rich are getting richer and the poor are getting poorer. It’s good timing and press repeats the conclusion in capitals. This year the claim was: <a href="https://www.theguardian.com/inequality/2024/jan/15/worlds-five-richest-men-double-their-money-as-poorest-get-poorer">World’s five richest men double their money as poorest get poorer</a>.<br /><br /></p><p>I decided to read it. The sources it is based on, <a href="https://www.forbes.com/billionaires/">the Forbes Billionaires lis</a>t and the <a href="https://www.credit-suisse.com/about-us/en/reports-research/global-wealth-report.html">Credit Suisse’s World Wealth Report</a> are impeccable enough. Oxfam’s conclusions less so.</p><p>Start with the World’s five richest men double their money. Taken literally true, but if you want to look at inequality, you should compare todays richest men with the richest men in 2020 — of course today’s richest men did well, that’s how they ended up in the top 5.</p><p>Secondly, the World Wealth Report isn’t out for 2023 (expected in August). So the data on the world’s poorest is rather convolutedly put together. Finally, using 2020 as a baseline seems arbitrary and smells like cherry-picking.<br /><br />What’s the founder of <a href="https://www.neptyne.com">a spreadsheet company</a> to do? Indeed. Write some code to get the data into a spreadsheet and have a look. If you want to see if the richest 5 are increasing their share, we should probably look at just that — the faction of all wealth in the world owned by the richest 5 people. From 2000 (earliest I could get data) it looks like this:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEhhjFtFZ-p4QnpLFEgwnOphvSInl2g-1ACQBjKpU7FoDbFg9x9QWGsTVwPYuPVyFE-JLvs_uOYibuy5bYX6MwNhNaByzTnesSpfaHYebTiIqx9oUOk81GRYj7D33DWcbinRAUo6xjxUGB8xAkJqalXdk6t8VZVIo6I3RgwDNy_W7TKCeAhLVVxZGFJ4d201" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="780" data-original-width="1284" height="388" src="https://blogger.googleusercontent.com/img/a/AVvXsEhhjFtFZ-p4QnpLFEgwnOphvSInl2g-1ACQBjKpU7FoDbFg9x9QWGsTVwPYuPVyFE-JLvs_uOYibuy5bYX6MwNhNaByzTnesSpfaHYebTiIqx9oUOk81GRYj7D33DWcbinRAUo6xjxUGB8xAkJqalXdk6t8VZVIo6I3RgwDNy_W7TKCeAhLVVxZGFJ4d201=w640-h388" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><span face="sohne, "Helvetica Neue", Helvetica, Arial, sans-serif" style="background-color: white; color: #6b6b6b; font-size: 14px; white-space-collapse: break-spaces;">Percentage of wealth held by the richest 10 or 5 people</span></div><br />2021 and 2022 do seem historically high, but the lines seem mostly flat-ish. And this doesn’t have 2023, a year where the richest 5 people lost 10% of their wealth — where’s the headline, record number of billionaires no longer able to afford private planes.<p></p><p>In conclusion, there doesn’t seem to be a lot of support for the suggestion that the super-rich command an ever increasing fraction of the total wealth. You can still think they are too rich, but it is not something new.<br /><br />Here’s the spreadsheet with <a href="https://docs.google.com/spreadsheets/d/1qOgW-YDAdcD1F5ly-PILFWjTr6fd_5rqu6beP1zas-o/edit#gid=0">the underlying data</a>. If you have the <a href="https://workspace.google.com/marketplace/app/neptyne_python_for_sheets/891309878867">Neptyne Python Add-on</a> installed, you can open the code editor and see how it imports data from <a href="https://workspace.google.com/marketplace/app/neptyne_python_for_sheets/891309878867">Wikipedia’s The World’s Billionaires</a>. There’s another function that does the interpolation of the wealth data, but apart from that it is all straight up spreadsheet.</p><p><br /></p>Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-47729158465364388502023-07-19T07:30:00.005-07:002023-12-13T01:22:38.021-08:00Automate Research with a Neptyne Spreadsheet and OpenAI<p> <span face="Arial, sans-serif" style="font-size: 11pt; white-space-collapse: preserve;">While ChatGPT definitely has grabbed the headlines when it comes to the AI revolution, using LLMs to automate all kinds of tasks has yielded some interesting results too. A number of frameworks have sprung up like LangChain, AutoGPT and BabyAGI. They allow users to go beyond the simple chat interface and connect all kinds of components to GPT3.5/4, like web search, memory stores or even code writing. Very powerful tools, but not the easiest things to get started with.</span></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRffC6HRjLYeJJi3OuDq_MdwIsuqkUVdIq2FBcsKIEtR5Daa-HN1H4hwgNYYwK_55aDAIMRJNslTs98_DEU-NB48CZN-pzDUBNrmYP9DIZ8AxN3SZDS5e9py304noOZnWI3_eQ70SraZMT1S07NWXgMqLTgtV0ptQLRS7GvAuzgkdzk0B4n-6v4PXrv3OZ/s1472/AutomaticResearch.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1024" data-original-width="1472" height="446" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRffC6HRjLYeJJi3OuDq_MdwIsuqkUVdIq2FBcsKIEtR5Daa-HN1H4hwgNYYwK_55aDAIMRJNslTs98_DEU-NB48CZN-pzDUBNrmYP9DIZ8AxN3SZDS5e9py304noOZnWI3_eQ70SraZMT1S07NWXgMqLTgtV0ptQLRS7GvAuzgkdzk0B4n-6v4PXrv3OZ/w640-h446/AutomaticResearch.png" width="640" /></a></div><br /><span id="docs-internal-guid-7ea48571-7fff-83ca-6c00-7a43297a822f"><br /><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">In this post we’ll show how you can achieve similar results in a Neptyne spreadsheet without having to worry about deployment. We’re going to create a spreadsheet that autonomously does research. It’s fully customizable and should work for any type of task, but in our current example we’ll focus on researching AI startup funding news. It looks like this::</span></p><br /><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"><span style="border: none; display: inline-block; height: 207px; overflow: hidden; width: 624px;"><img height="207" src="https://lh3.googleusercontent.com/3r3blbJ0IRBR9mEph2K3r__SlULhR9xWcZ4fIzWcNutka_TttnsNDrasJdBaudFaPGoZezED84U__pY7r1oO1dPXk2dmmg1-FY_OZUH41oHaVJ7N334WqhCByIOTol8OE-mPvsGWQATFWKGQWRvMWjA" style="margin-left: 0px; margin-top: 0px;" width="624" /></span></span></p><br /><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">You specify a (news) search in C2 and the freshness in G2. Column headers E4 to H4 are also configurable and determine the information we want to extract from the articles we find.</span></p><br /><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">After you hit the “go” button, the automatic research will start by sending the search query to Bing’s news search. For each result it will call out to an external service to render the page, running javascript and everything. It will then extract the interesting information from that news article and feed it into ChatGPT, asking that service to find information for each of the specified columns (in this case Company, Amount, CEO and Investors). It will then add a row to the spreadsheet of the information it found.</span></p><br /><br /><h2 dir="ltr" style="line-height: 1.38; margin-bottom: 6pt; margin-top: 18pt;"><span face="Arial, sans-serif" style="font-size: 16pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; font-weight: 400; vertical-align: baseline; white-space-collapse: preserve;">Get your own research bot</span></h2><br /><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">Ready to create your own research AI bot? The bot combines three different services so you will need to sign up for those and you’ll need of course a Neptyne account.</span></p><br /><ul style="margin-bottom: 0px; margin-top: 0px; padding-inline-start: 48px;"><li aria-level="1" dir="ltr" style="font-family: Arial, sans-serif; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; list-style-type: disc; vertical-align: baseline; white-space: pre;"><p dir="ltr" role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><a href="https://www.microsoft.com/en-us/bing/apis/bing-news-search-api" style="text-decoration-line: none;"><span style="color: #1155cc; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; text-decoration-line: underline; text-decoration-skip-ink: none; text-wrap: wrap; vertical-align: baseline;">Bing News Search</span></a><span style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; text-wrap: wrap; vertical-align: baseline;">. We use this to get a list of news articles based on the query you entered into C2. If you have a Microsoft Azure account, setting this up is fairly straightforward. Comes with a free tier for 1000 searches per month. </span></p></li><li aria-level="1" dir="ltr" style="font-family: Arial, sans-serif; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; list-style-type: disc; vertical-align: baseline; white-space: pre;"><p dir="ltr" role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><a href="https://phantomjscloud.com/" style="text-decoration-line: none;"><span style="color: #1155cc; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; text-decoration-line: underline; text-decoration-skip-ink: none; text-wrap: wrap; vertical-align: baseline;">PhantomJsCloud</span></a><span style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; text-wrap: wrap; vertical-align: baseline;">. This service takes a url and renders it as html in the cloud. Just getting the html of a document is not enough anymore today. This step is actually the slowest in our pipeline - rendering a modern web page can take time. Sign up is free and the free tier gets you 500 page loads per day.</span></p></li><li aria-level="1" dir="ltr" style="font-family: Arial, sans-serif; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; list-style-type: disc; vertical-align: baseline; white-space: pre;"><p dir="ltr" role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><a href="https://openai.com/blog/openai-api" style="text-decoration-line: none;"><span style="color: #1155cc; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; text-decoration-line: underline; text-decoration-skip-ink: none; text-wrap: wrap; vertical-align: baseline;">OpenAI</span></a><span style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; text-wrap: wrap; vertical-align: baseline;">. The current code uses ChatGPT 3.5 - you can switch to 4 if you feel like it is missing things, but it’ll be slower and more expensive. </span></p></li></ul><br /><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">Sign up for all services and note the api key for each. Now navigate to: </span><a href="https://app.neptyne.com/-/tgjqzmjbfi" style="text-decoration-line: none;"><span face="Arial, sans-serif" style="color: #1155cc; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; text-decoration-line: underline; text-decoration-skip-ink: none; vertical-align: baseline; white-space-collapse: preserve;">https://app.neptyne.com/-/tgjqzmjbfi</span></a><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> and make a copy. When you hit the Go button, the system will ask you for the keys and once you’ve entered those, it will start running. You can interrupt the current run by hitting the button again, but it will take a little while since it will finish the current task.</span></p><h2 dir="ltr" style="line-height: 1.38; margin-bottom: 6pt; margin-top: 18pt;"><span face="Arial, sans-serif" style="font-size: 16pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; font-weight: 400; vertical-align: baseline; white-space-collapse: preserve;">How does this work?</span></h2><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">The main code is called from the button Go and lives in the run() method. Here’s the slightly simplified code:</span></p><br /><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="color: #1967d2; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">for</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">item</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #1967d2; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">in</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">news_search(C2,</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">freshness=G2.value)[</span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">'value'</span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">]:</span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">article</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">=</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">fetch_article(item[</span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">'url'</span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">])</span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">title,</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">summary</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">=</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">extract_content(article)</span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">keywords</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">=</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">get_keywords(summary,</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">E4:H4)</span></p><br /><br /><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: "Courier New", monospace; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">news_search </span><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">returns a list of news articles. </span><span style="font-family: "Courier New", monospace; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">fetch_article</span><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> calls PhantomJsCloud to get a rendered version of the article. </span><span style="font-family: "Courier New", monospace; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">extract_content</span><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> uses the </span><a href="https://github.com/buriy/python-readability" style="text-decoration-line: none;"><span face="Arial, sans-serif" style="color: #1155cc; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; text-decoration-line: underline; text-decoration-skip-ink: none; vertical-align: baseline; white-space-collapse: preserve;">python-readability library</span></a><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> to strip all non content from an html text. Finally </span><span style="font-family: "Courier New", monospace; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">get_keywords</span><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> uses OpenAI to extract the keywords from the article. Most of these are pretty straightforward but let’s have a closer look at </span><span style="font-family: "Courier New", monospace; font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">get_keywords</span><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">:</span></p><br /><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="color: #1967d2; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">def</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">get_keywords(body,</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">keywords):</span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">prompt</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">=</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">"Given this article:\n\n"</span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">prompt</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">+=</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">body</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">+</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">"\n\n"</span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">prompt</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">+=</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">"Generate a dictionary of key/value pairs in json with keys:"</span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">keywords</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">=</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">[</span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">'"'</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">+</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">kw</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">+</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">'"'</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #1967d2; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">for</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">kw</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #1967d2; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">in</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">keywords]</span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">prompt</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">+=</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">", "</span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">.join(keywords)</span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">prompt</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">+=</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">"\nLeave out what cannot be found"</span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #1967d2; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">return</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">call_open_ai([{</span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">"role"</span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">:</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">"user"</span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">,</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">"content"</span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">:</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">prompt}])</span></p><br /><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">All it does is build a prompt asking for a json document with the key/values for the columns we specify in the spreadsheet. The AI magic does the rest.</span></p><br /><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">There’s a bit of data massaging going on of course, but once we have the data from the article and the keywords, we just insert a new line into the spreadsheet with:</span></p><br /><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">B5:H.insert_row(</span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #c5221f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">0</span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">,</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">[item[</span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">'datePublished'</span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">],</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">item[</span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">'description'</span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">],</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">item[</span><span style="color: #188038; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">'url'</span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">],</span><span style="font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"> </span><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">*keywords]</span></p><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="color: #37474f; font-family: "Roboto Mono", monospace; font-size: 9pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">)</span></p><br /><h2 dir="ltr" style="line-height: 1.38; margin-bottom: 6pt; margin-top: 18pt;"><span face="Arial, sans-serif" style="font-size: 16pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; font-weight: 400; vertical-align: baseline; white-space-collapse: preserve;">Conclusion</span></h2><p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;">The integration of Neptyne spreadsheets with OpenAI opens a vast world of possibilities, from autonomously conducting research to data extraction and analysis. This article provided a detailed walkthrough of how to use the Neptyne spreadsheet in conjunction with various services like Bing News Search, PhantomJsCloud, and OpenAI to create an AI research bot. The bot seamlessly retrieves and parses news articles based on user-specific queries, applying AI to identify key data points from each source. By combining these different services and leveraging their unique capabilities, users can achieve a more automated, efficient, and effective research process.</span></p><div><span face="Arial, sans-serif" style="font-size: 11pt; font-variant-alternates: normal; font-variant-east-asian: normal; font-variant-numeric: normal; vertical-align: baseline; white-space-collapse: preserve;"><br /></span></div></span>Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-33455939947348718152022-11-10T09:26:00.000-08:002022-11-10T09:26:46.065-08:00Neptyne: Making Spreadsheets Programmable<p>So yeah, it's been kinda quiet around here - famous last words on any blog, but I thought I should at least post something about what I've been up to here. Programmable Spreadsheets. <a href="https://neptyne.com">Neptyne</a>!</p><p>We tried this whole stealth mode thing. I'm not sure, maybe just developing out in the open is better. But here we are, we're ready to talk about what we are up to. </p><p>A<span style="font-family: Arial; font-size: 11pt; white-space: pre-wrap;">fter a bit over a 1000 pull requests, we’re ready to start showing a few things. So here it is, Neptyne: the programmable spreadsheet.</span></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVBK-FCS8sZkbK-gERnGvSmlgJr_4vbmkfNEwWGyFGVFzIYJrFfgK9kyiapuhKl_uobQDRVBt-kXHye36ldeNgolD-L9JEqhtfarVRTWdIqXMhHv0BdZZbrNI8v-twEhimSIgnBdJTjLOnpEtp3lDYx96BRNM2GZzpcDF_wRsrH4tus4kEY6NHq9fWGg/s2902/Neptyne%20Main.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1748" data-original-width="2902" height="386" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVBK-FCS8sZkbK-gERnGvSmlgJr_4vbmkfNEwWGyFGVFzIYJrFfgK9kyiapuhKl_uobQDRVBt-kXHye36ldeNgolD-L9JEqhtfarVRTWdIqXMhHv0BdZZbrNI8v-twEhimSIgnBdJTjLOnpEtp3lDYx96BRNM2GZzpcDF_wRsrH4tus4kEY6NHq9fWGg/w640-h386/Neptyne%20Main.jpg" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;">Neptyne Screenshot</div><p>So this is what it looks like. You have a spreadsheet on the left, a code pane on the upper right and down right a REPL. Write code in the code pane, use it directly from the spreadsheet and type commands into the REPL to try stuff. The code is all Python, but we've extended Python to be compatible with Excel.</p><p>So for example, you can just loop through a cell range with:</p><p><span style="font-family: courier;">for row in A1:C10:<br /> for cell in row:<br /> if cell > 5:<br /> cell.set_background_color(255, 10, 10)</span></p><span style="font-family: Arial; font-size: 11pt; white-space: pre-wrap;">To make all cells with a value greater than 5 show up red.
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwfFrzL3693PxrDcpNAoQS4GZjlZA0C2QFSIcTfMvcql8yjy-Qm609Vr3-gI3mw_1mc2WpyJPWCISpdlk4I5uatBRHsIh4RreMGHoAsBgBjp_Rcqs72SsmhNTdIIddnjy9CmqEE4Zw5lL_wRNKL3iQFwYptn9BViUiNWol0zRm-HtxpqzfnwiLW-0P5A/s2891/Python%20Screenshot.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1797" data-original-width="2891" height="398" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwfFrzL3693PxrDcpNAoQS4GZjlZA0C2QFSIcTfMvcql8yjy-Qm609Vr3-gI3mw_1mc2WpyJPWCISpdlk4I5uatBRHsIh4RreMGHoAsBgBjp_Rcqs72SsmhNTdIIddnjy9CmqEE4Zw5lL_wRNKL3iQFwYptn9BViUiNWol0zRm-HtxpqzfnwiLW-0P5A/w640-h398/Python%20Screenshot.jpg" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;">Python screenhsot</div><br />Here's another example. We have a simple import function that fetches all the countries from the wikipedia that start with S, dump that into the spreadsheet by just assigning A1 to the results (it will spill over top create a table) and then we have a neat little function that annotates each row with the emoji flag. Finally we have a little widget that shows the countries on a map and shows the flags on mouse over. And all in less than 20 lines of code.</span><p></p><p>There's a lot more of neat stuff going on. Join the waitlist: <a href="https://neptyne.com/waitlist-add">https://neptyne.com/waitlist-add</a></p>Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-44493610435502744132022-04-01T07:21:00.004-07:002024-01-23T03:31:25.977-08:00The Clever Bit and the Gold Rush<div style="background-color: white; box-sizing: inherit; color: #292929; letter-spacing: -0.003em; line-height: 32px; margin: 2em 0px -0.46em; text-align: left; word-break: break-word;"><span style="font-family: inherit;">In my <a class="au kc" href="https://dosinga.medium.com/whats-the-clever-bit-feaad0537e44" rel="noopener" style="-webkit-tap-highlight-color: transparent; box-sizing: inherit;">last post</a> I lamented the fact that it sometimes seems that fewer tech companies have a clever bit and that it seems it is more and more just about applying some known tricks to existing industries while driving software’s mission <a class="au kc" href="https://mattermark.com/software-startups-eating-the-world-then-and-now/" rel="noopener ugc nofollow" style="-webkit-tap-highlight-color: transparent; box-sizing: inherit;" target="_blank">to eat the world</a> forward. And that maybe this is just the way things go, just like how you used to be able to build a steam engine to build a factory, but later you could just buy a steam engine, you don’t need to be really good at tech anymore to build a tech company. When all companies are tech, none of them are.</span></div><figure class="ke kf kg kh ga ki fo fp paragraph-image" style="background-color: white; box-sizing: inherit; clear: both; color: rgba(0, 0, 0, 0.8); margin: 56px auto 0px;"><div class="kj kk ct kl ea km" role="button" style="box-sizing: inherit; cursor: zoom-in; position: relative; transition: transform 300ms cubic-bezier(0.2, 0, 0.2, 1) 0s; width: 692px; z-index: auto;" tabindex="0"><div class="separator" style="clear: both; text-align: center;"><a href="https://upload.wikimedia.org/wikipedia/commons/3/34/400-oz-Gold-Bars-AB-01.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="533" data-original-width="800" height="426" src="https://upload.wikimedia.org/wikipedia/commons/3/34/400-oz-Gold-Bars-AB-01.jpg" width="640" /></a></div><div class="fo fp kd" style="box-sizing: inherit; margin-left: auto; margin-right: auto; max-width: 1500px;"><br /></div></div><figcaption class="kp ec fq fo fp kq kr bo b bp bq br" data-selectable-paragraph="" style="box-sizing: inherit; color: #757575; font-size: 14px; line-height: 20px; margin-left: auto; margin-right: auto; margin-top: 10px; max-width: 728px; text-align: center;"><span style="font-family: inherit;">Gold bars (source: <a class="au kc" href="https://upload.wikimedia.org/wikipedia/commons/3/34/400-oz-Gold-Bars-AB-01.jpg" rel="noopener ugc nofollow" style="-webkit-tap-highlight-color: transparent; box-sizing: inherit;" target="_blank">Wikimedia</a>)</span></figcaption></figure><p class="pw-post-body-paragraph je jf ih jg b jh ji jj jk jl jm jn jo jp jq jr js jt ju jv jw jx jy jz ka kb ia gi" data-selectable-paragraph="" id="04d6" style="background-color: white; box-sizing: inherit; color: #292929; letter-spacing: -0.003em; line-height: 32px; margin: 2em 0px -0.46em; word-break: break-word;"><span style="font-family: inherit;">But I think there’s more to where we are in our economic development. It occurred to me that historically economic progress has been equal measures of clever bits and gold rushes. The archetypal gold rush is probably the <a class="au kc" href="https://en.wikipedia.org/wiki/California_Gold_Rush" rel="noopener ugc nofollow" style="-webkit-tap-highlight-color: transparent; box-sizing: inherit;" target="_blank">California, well, eh, gold rush</a>. Gold is discovered and it doesn’t matter who is more clever or better at gold panning, what matters is who gets there first and who gets to stake a claim first.</span></p><p class="pw-post-body-paragraph je jf ih jg b jh ji jj jk jl jm jn jo jp jq jr js jt ju jv jw jx jy jz ka kb ia gi" data-selectable-paragraph="" id="04d6" style="background-color: white; box-sizing: inherit; color: #292929; letter-spacing: -0.003em; line-height: 32px; margin: 2em 0px -0.46em; word-break: break-word;"><a href="https://dosinga.medium.com/the-clever-bit-and-the-gold-rush-220f483c34d1">Continue reading the Clever Bit and the Gold Rush</a></p><p><span style="font-family: inherit;"> </span></p>Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-71233821652291951812022-03-19T12:49:00.005-07:002022-03-19T12:49:48.271-07:00What’s the clever bit?<p>I realized that the blog linked from my website is still pointing to the this old blogger thing while if I write something these days it is on medium. So I'm going to do some cross posting here to keep links from my website to medium until I come up with something better.</p><p>My last day at Sidewalk Labs, the company where I worked on making cities better for the last four years, was December 31st 2021. I’m going to do something new. It’s cool and exciting and I’ll write about it some other time.</p><p><br class="Apple-interchange-newline" /><img alt="" class="cf ki kj" height="525" role="presentation" src="https://miro.medium.com/max/1400/0*eJc1oBG-lKuFIqwX.JPG" style="background-color: white; box-sizing: inherit; color: rgba(0, 0, 0, 0.8); font-family: medium-content-sans-serif-font, -apple-system, "system-ui", "Segoe UI", Roboto, Oxygen, Ubuntu, Cantarell, "Open Sans", "Helvetica Neue", sans-serif; height: auto; max-width: 100%; vertical-align: middle; width: 692px;" width="700" /></p><p>Today I want to talk about the clever bit. If you ask me for advice about your startup, I’m going to ask you about the clever bit. What is your unique insight that nobody else thought of so far? It’s not enough to identify a problem to solve, you want to be able to solve it better than other people. Just having a clever bit is not enough of course — you do need to solve a problem for actual people, otherwise you just have a solution looking for a problem.</p><p><a href="https://dosinga.medium.com/whats-the-clever-bit-feaad0537e44">continue reading</a></p>Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-45369527139798186742017-05-10T05:51:00.000-07:002017-05-10T05:51:01.560-07:00Movie RecommendationsToday's project is a small <a href="https://github.com/DOsinga/notebooks/blob/master/Movie_Recommendations.ipynb">Python notebook</a> to recommend movies. I know, I know, there's a million of those out there, but this one is special, since it is not trained on user ratings, but on the outgoing links of the Wikipedia articles of the movies.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgf-4087FBh6wINUW2QWbMy4QY5h2WIqgXvmrB61LxWGU1EP_9BkCyaKR5kVnz3YRv7TYFOnm7Iz_KZT7U18L4osI4jJrkSxKnIWJ3J48lR1oI3IlbCEw7V_fadxX0PVSCwVl-1sLnIWve8/s1600/movie_recommend.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="266" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgf-4087FBh6wINUW2QWbMy4QY5h2WIqgXvmrB61LxWGU1EP_9BkCyaKR5kVnz3YRv7TYFOnm7Iz_KZT7U18L4osI4jJrkSxKnIWJ3J48lR1oI3IlbCEw7V_fadxX0PVSCwVl-1sLnIWve8/s400/movie_recommend.gif" width="400" /></a></div>
<br />
<br />
Why is that good? Two reasons. One is using diverse data. When you build a recommender system just on user ratings, you do get an Amazon-like system of people that liked this movie, also liked that movie. But if you're not using information like the year of the movie, the genre or the director, you are throwing away a lot of relevant features that are easy to get.<br />
<br />
The second reason is that when you start a new project, you probably don't have enough user ratings to be able to recommend stuff from the get go. On the other hand, for many knowledge areas it is easy to extract the relevant wikipedia pages.<br />
<br />
The outgoing links of a wikipedia page make for a good signature. Similar pages will often link to the same page. Estimating the similarity between two pages by calculating the <a href="https://en.wikipedia.org/wiki/Jaccard_index">jaccard distance</a> would probably already work quite well. I went a little further and <a href="https://github.com/DOsinga/notebooks/blob/master/Movie_Recommendations.ipynb">trained</a> an embedding layer over the outgoing links.<br />
<br />
The result is not Netflix quality, but it works reasonably well. As an extra bonus, I projected the resulting movies onto a 2 dimensional plane, rendering their movie posters as placeholders. It's fun to explore movies that way. Go <a href="https://douweosinga.com/projects/movie_recommend">play with it</a>.Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-9254879138867421592017-02-08T08:43:00.001-08:002017-02-08T08:59:04.224-08:00Amazon Dash and Philips Hue<style>
.ol { display: list-item; }
.li { display: list-item; }
</style>
When the Amazon Dash came out a few years ago, I thought it was <a href="http://www.usatoday.com/story/tech/2015/03/31/amazon-dash-ordering-button/70747342/">an April Fool's joke</a>. A button that you install somewhere in your house to order one specific product from Amazon? That's crazy!<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVMhdCv7kdaL24Qqe4G9knVyrMfXUXs9mgaDvWba8To11CwBQB1zXm8xY_qexhzb9uG8W2szT3NpwILvzNMb9ejysmk4QH__Z3r7V0U-BbLJCom7gnjq3eL19-nfygpBhKT3q2hO_AEykG/s1600/Amazon_Dash_Button_Tide.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="250" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjVMhdCv7kdaL24Qqe4G9knVyrMfXUXs9mgaDvWba8To11CwBQB1zXm8xY_qexhzb9uG8W2szT3NpwILvzNMb9ejysmk4QH__Z3r7V0U-BbLJCom7gnjq3eL19-nfygpBhKT3q2hO_AEykG/s400/Amazon_Dash_Button_Tide.jpg" width="400" /></a></div>
<br />
<div>
<br /></div>
<div>
It didn't take long for people to figure out how <a href="http://www.makeuseof.com/tag/hacks-amazon-dash-buttons/">to use the button for something other</a> than ordering products from Amazon. No higher hacking is required and at USD$ 4.99 a pop, quite <a href="https://www.amazon.com/b/?node=10667898011&sort=date-desc-rank&lo=digital-text">affordable</a>. When not in use, the button isn't connected to the wifi. So when you click it, the first thing it does is set up that connection. A script monitoring the local network can easily detect this event and then do something arbitrary - like <a href="https://medium.com/@brody_berson/drizly-dash-beer-me-with-the-press-of-a-button-a8c1185d316f#.1s7vlgyjg">order beer</a>.</div>
<div>
<br /></div>
<div>
There are many hacks around, but none of them do exactly what I want: <br />
<ul>
<li style="display: list-item;">When the last person leaves the house, all lights should switch off</li>
<li style="display: list-item;">When the first person comes back, all lights should switch back on</li>
</ul>
<div>
So I wrote a script that doesn't just monitor the Dash button, but also the presence of the phones of me and my wife on the local network. The basic rules are:</div>
<div style="list-style: inherit !important;">
<ul>
<li style="display: list-item;">If any lights are on, switch them off when:</li>
<ul>
<li style="display: list-item;">- the button was pressed</li>
<li style="display: list-item;">- no phones were seen on the network for 20 minutes</li>
</ul>
<li style="display: list-item;">If any lights have been previously switched off, switch them on when:</li>
<ul>
<li style="display: list-item;">- the button was pressed</li>
<li style="display: list-item;">- a phone is seen after 20 minutes of no phones</li>
</ul>
</ul>
</div>
This way, the button can always be used to switch on or off the lights, but if you don't switch off the lights when leaving home, they will go off automatically. Unlike with a motion controlled set up, this won't happen if you are home but not moving (though it will happen if your phone runs out of battery). When you come home and you had previously switched off the lights using the button, they will come on automatically.<br />
<br />
To get this working, check out <a href="https://github.com/DOsinga/auto_lights">the code</a>, install the <a href="https://github.com/DOsinga/auto_lights/blob/master/requirements.txt">requirements</a> and run: </div>
<div>
<br />
<span style="font-family: "courier new" , "courier" , monospace;">
python auto_lights --hue_bridge=<bridge-ip> --phone_mac=<phone-macs> --dash_mac=<dash-mac>
</span>
<br />
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span>
While running, the program will also print any new mac addresses it detects and for extra convenience it also prints the manufacturer. You can use this to find out the mac address of your phone and of the dash button - switch your phone to airplane mode, wait for things to quiet down and when you switch airplane mode off, you should see your phone's mac address. <br />
<br />
It works reasonably well. The longest I've seen my phone not contacting the wifi was 13 minutes, so 20 minutes seems safe. Coming home, it takes a little longer than ideal for the phone to reconnect to the wifi, but you can use the Dash button if you are in a hurry.<br />
<br />
As always the code is on <a href="https://github.com/DOsinga/auto_lights">Github</a>. </div>
Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-58493155740282013502017-01-26T06:41:00.000-08:002017-01-26T06:41:23.680-08:00Building a Reverse Image Search Engine using a Pretrained Image ClassifierIn the Olden Days, say more than 10 years ago, building Image Search was really quite hard (<a href="https://xkcd.com/1425/">see xkcd</a>). In fact, when Alta Visa first came out with <a href="http://www.thehistoryofseo.com/The-Industry/Short_History_of_Early_Search_Engines.aspx">a search engine for images</a>, they couldn't really do it. What they did was return the image that had text around it that most matched your query. It's a wonder that that worked at all, but it had to do for years and years.<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYmajuCfggDeFMVyv3U8qm8Tc6vNC3qVQYKJdzTJ4qMzReOwcfJ_pvtzozipNEMUINVEi9lkPzS5l7JmQ9HE_x4RJw-UmGqCDmz4TH9ciWy94Y0L-YShPRhHCg5QLRl1Z3PzphH7l_Lqbf/s1600/cats.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="262" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjYmajuCfggDeFMVyv3U8qm8Tc6vNC3qVQYKJdzTJ4qMzReOwcfJ_pvtzozipNEMUINVEi9lkPzS5l7JmQ9HE_x4RJw-UmGqCDmz4TH9ciWy94Y0L-YShPRhHCg5QLRl1Z3PzphH7l_Lqbf/s400/cats.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Why we need Reverse Image Search: find more cat pictures (from: <a href="https://en.wikipedia.org/wiki/Cat#/media/File:Cat_poster_1.jpg">wikipedia</a>)</td></tr>
</tbody></table>
<div>
How things have changed. These days Neural Networks have no problem detecting the actual content of pictures, in some categories outperforming <a href="http://www.eetimes.com/document.asp?doc_id=1325712">their human masters</a>. An interesting development here is <a href="https://www.tineye.com/">reverse image search</a> - supply a search engine with an image and it will tell you where else this or similar images occur on the web. Most articles on the web describing approaches on how to do this focus on things like <a href="https://en.wikipedia.org/wiki/Perceptual_hashing">Perceptual Hashing</a>. While I am sure that is a good way, it struck me that there is a much simpler way. <br /><br /><a href="https://keras.io/layers/embeddings/">Embeddings</a>! Algorithms like Word2Vec train a neural network for a classification task, but they don't use the learned classification directly. Instead they use the layer just before the classification as a representation of the word. Similarly, we can use a pre-trained image classifier and run it on a collection of images, but rather than using the final layer to label the result, we get the layer before that and use that as a vector representation of the image. Similar images will have a similar vector representation. So finding similar images becomes just a nearest neighbor search. <br /><br />As with a lot of things like this, getting the data to run the algorithm on is more work than getting the algorithm to run. Where do we get a set of representative images from? The images from the Wikipedia are a good start, but we might not want all of them. Most articles are about specific instances of things - for a reverse image search demo, classes of things are more interesting. We're interested in cats, not specific cats.<br /><br />Luckily, Wikidata annotates its records with a <a href="https://www.wikidata.org/wiki/Property:P31">'is instance of' property</a>. If you have imported a Wikidata snapshot into Postgres, then getting the wikipedia_ids for all values for the instance-of property is a simple SQL statement:<br /><span style="font-family: Courier New, Courier, monospace;">select properties->>'instance of' as thing, count(*) as c </span><div>
<span style="font-family: Courier New, Courier, monospace;">from wikidata group by thing</span> <br /><br /><div>
For some of these, Wikidata also provides us with a canonical image. For others we have to fetch the wikipedia page and parse the wikicode. We're just going to get the first image that appears on the page, nothing fancy. After <a href="https://github.com/DOsinga/reverse_image_search/blob/master/fetch_images.py">an hour of crawling</a>, we end up with a set of roughly 7 thousand images. <br /><br /><a href="http://scikit-learn.org/">SkLearn</a> provides us with an <a href="http://scikit-learn.org/stable/modules/neighbors.html">k-nearest-neighbor algorithm</a> implementation and we're off to the races. We can spin-up a flask based server that accepts an image as a POST request, feeds that image into our pre-trained classifier. From that we'll get the vector representing that image. We then feed that vector into the nearest neighbor model and out fall the most similar images. You can see a <a href="https://douweosinga.com/projects/reverse_image_search">working demo here</a>. </div>
<div>
<br /><div>
It mostly works well. If you feed it a cat, it will <a href="https://douweosinga.com/projects/reverse_image_search?image_url=https://cdn.pixabay.com/photo/2015/06/03/13/13/cats-796437_1280.jpg">return pictures of cats</a>, the definition of success on the <a href="http://www.rathergood.com/cats/">Internet</a>. On mobile, you can directly upload a picture from your phone's camera and that seems to go ok, too. The biggest limitation I've come across so far is that the algorithm is bad at estimating how good its guesses are. So if there aren't any suitable pictures in the training set, it will return the one that it thinks is the closest match, but <a href="https://douweosinga.com/projects/reverse_image_search?image_url=https://upload.wikimedia.org/wikipedia/commons/1/1e/Cadeira_Palmetal_-_Modelo_I_-_Preta_-_Simples.jpg">to the human eye it seems fairly unrelated</a>. <br /><br />As always, you can find the code <a href="https://github.com/DOsinga/reverse_image_search">on Github</a>.</div>
</div>
</div>
</div>
Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-34776516830289751512017-01-17T07:49:00.002-08:002017-01-17T07:49:47.228-08:00Learning to Draw: Generating Icons and HieroglyphsIn this blog post we'll explore techniques for machine drawn icons. We'll start with a brute force approach, before moving on to machine learning, where we'll teach a recurrent neural network to plot icons. Finally we'll use the same code to generate pseudo hieroglyphs by training the network on a set of hieroglyphs. With the addition of a little composition and a little coloring, we'll end up with this:<br />
<br />
<div style="text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtLKdI_kQqDPQvlnHgLdbfAAtqs5t6MQUDCqeFUgyKw9gTqRNWLvIQVXlNtE98cnhRRj8VGtVM87YEyBZn8H7saxkmxQGgc7o8FBtJ7RzJ0BWCNpiPTwiGaEuaoXUPYpR10EEq7GXB4pen/s1600/stela2.png"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhtLKdI_kQqDPQvlnHgLdbfAAtqs5t6MQUDCqeFUgyKw9gTqRNWLvIQVXlNtE98cnhRRj8VGtVM87YEyBZn8H7saxkmxQGgc7o8FBtJ7RzJ0BWCNpiPTwiGaEuaoXUPYpR10EEq7GXB4pen/s400/stela2.png" /></a></div>
<div>
<br /></div>
Last week's post "<a href="http://blog.douweosinga.com/2017/01/deep-ink-machine-learning-fantasies-in.html">Deep Ink</a>" explored how we can simulate computers playing with blobs of ink. But even if humans see things in these weird drawings, Neural Networks don't. If you take the output of Deep Ink and feed it back into something like <a href="https://github.com/tensorflow/models/tree/master/inception">Google's Inception</a>, it offers no opinions. <br />
<br />
The simplest thing I could come up with to generate icons was brute force. If you take a grid of 4x4 pixels, there are 2^16 possible black and white images. Feed all of them into an image classifier and see if anything gets labeled. It does. But 16 pixels don't offer a lot of space for expression, so the results are somewhat abstract if you want to be positive, or show weaknesses in the image classifier if you are negative. Here are some examples: <br />
<br />
<div style="text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjk4XPRY8l8NJII-HffQWipBmjIu3uMxQA7DWhNcMWlPFqSr1-YBi7BASuuOBBmVqhDiCqNARl4bYgq_iURd4HaNmfiYUkoXYE1WDc2eZKPfhO9emJxLAgL00v9D1ES3EW42wXWK9WUCgj-/s1600/4x4composed.png" imageanchor="1"><img border="0" height="271" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjk4XPRY8l8NJII-HffQWipBmjIu3uMxQA7DWhNcMWlPFqSr1-YBi7BASuuOBBmVqhDiCqNARl4bYgq_iURd4HaNmfiYUkoXYE1WDc2eZKPfhO9emJxLAgL00v9D1ES3EW42wXWK9WUCgj-/s400/4x4composed.png" width="400" /></a></div>
<br />
<div>
We can do a little better by going to 5x5. To give the model a little more to play with, we can add a permanent border. This will increase the size to 7x7, but we'll only flip the middle pixels. Unfortunately the amount of work we need to do going from 4x4 to 5x5 increases by a factor 512. Trying all 4x4 icons takes about an hour on my laptop. Exploring the 5x5 space takes weeks: <br />
<br />
<div style="text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNx_xT20aRFA1puD3D1l5dzmqxstDZPyi_5NM8_Gx1GV8zB5ufWLb0iRzV0nnmD3vEt-KPKo7jYcsLwPwsnkDS1NU6kfrX7CVRPBn4LBV_t1xOMv9npevSQ-Uob01j0Ar9N2FOl2-7bL6z/s1600/composed.png" imageanchor="1"><img border="0" height="271" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNx_xT20aRFA1puD3D1l5dzmqxstDZPyi_5NM8_Gx1GV8zB5ufWLb0iRzV0nnmD3vEt-KPKo7jYcsLwPwsnkDS1NU6kfrX7CVRPBn4LBV_t1xOMv9npevSQ-Uob01j0Ar9N2FOl2-7bL6z/s400/composed.png" width="400" /></a></div>
<br />
These are better and easier to understand. It some cases they even somewhat explain what the network was trying to see in the 4x4 images.</div>
<div>
<br /></div>
<div>
They say that if brute force doesn't work, you're just <a href="http://www.goodreads.com/quotes/775889-if-brute-force-doesn-t-work-you-aren-t-using-enough">not using enough</a>. In this case though, there might not be enough around. 8x8 for icons is tiny, but it would take my laptop something like 3 times the <a href="https://en.wikipedia.org/wiki/Age_of_the_universe">age of the universe</a> to try all possibilities. Machine Learning to the rescue. Recurrent Networks are a popular choice to generate sequences, for example <a href="http://karpathy.github.io/2015/05/21/rnn-effectiveness/">fake Shakespeare</a>, <a href="https://gist.github.com/nylki/1efbaa36635956d35bcc">recipes</a> and <a href="https://highnoongmt.wordpress.com/2015/05/22/lisls-stis-recurrent-neural-networks-for-folk-music-generation/">Irish folk music</a>, so why not icons?<br />
<div>
<div class="separator" style="clear: both; text-align: center;">
</div>
</div>
</div>
<div>
<br /></div>
<div>
I found a set of "free" icons at <a href="https://icons8.com/">https://icons8.com/</a> After deduping it'll give us about 4500 icons. Downsample them to 8x8 and we can easily encode them as sequences of pixels to be turned on. An RNN can learn to draw these quite quickly. Adding a little coloring for variety and on a 15x10 grid you'll get this sort of output:</div>
<div>
<br /></div>
<div style="text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhouGBgG-W_alGeIY9TDteOIQylfoI8KT_gcJUpgneTbSHkUY2FaSzvm4EJ_tU1unhsH7H_LEgR1XXQsg7mP-Kb2M0IWps7MQevtP1vXP1nXp8aDGp7NMqPA31iBBqsr9LDPCzbb1ZvXACU/s1600/poster8.png" imageanchor="1"><img border="0" height="266" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhouGBgG-W_alGeIY9TDteOIQylfoI8KT_gcJUpgneTbSHkUY2FaSzvm4EJ_tU1unhsH7H_LEgR1XXQsg7mP-Kb2M0IWps7MQevtP1vXP1nXp8aDGp7NMqPA31iBBqsr9LDPCzbb1ZvXACU/s400/poster8.png" width="400" /></a></div>
<div>
<br /></div>
<div>
These are pretty nice. They look like monsters from <a href="http://www.freepik.com/free-vector/space-invaders-game_761153.htm">an 8 bit video game</a>. The network learns a sense of blobbiness that matches the input icons. There's also a sense of symmetry and some learned dithering when just black and white doesn't cut it. In short it, learns the shapes you'll get when you downsample icons to 8x8.</div>
<div>
<br /></div>
<div>
As cute as these throw backs to the 80's are, 64 pixels still isn't a lot for an icon. Especially since the input isn't a stream of optimized 8x8 icons, but rather downsampled 32x32 icons (the lowest resolution that the icons8 pack comes in).</div>
<div>
<br /></div>
<div>
We can't use the same encoding for 32x32 icons though. With a training set of 4500, having a vocabulary of 64 for the 8x8 icons is OK. Each pixel will occur on average 70 times, so the network has a chance to learn how they relate to each other. On a 32x32 grid, we'd have a vocabulary of 1024 and so the average pixel would only be seen 4 times, which just isn't enough to learn from.</div>
<div>
<br /></div>
<div style="text-align: left;">
We could <a href="http://www.fileformat.info/mirror/egff/ch09_03.htm">run length encode</a>; rather than store the absolute position of the next black pixel, store its relationship with the previous one. This works, but it makes it hard for the network to keep track where it is in the icon. An encoding that is easier to learn specifies for each scanline which pixels are turned on, followed by a new line. This works better:</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihrYzIrZyTMANHHyMv0jU_L9B68GncVtg7BVdYend5v1p0YtvXuJjwgNBl1NejlGj4xtd-MABDM7Zx3X3GEwfMR_QJhVYoIGKqwUUMNEK_rcZxJYt1o3LmhtItvhYqoeQaiUrvhp7xrsGD/s1600/poster2.png" imageanchor="1"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEihrYzIrZyTMANHHyMv0jU_L9B68GncVtg7BVdYend5v1p0YtvXuJjwgNBl1NejlGj4xtd-MABDM7Zx3X3GEwfMR_QJhVYoIGKqwUUMNEK_rcZxJYt1o3LmhtItvhYqoeQaiUrvhp7xrsGD/s400/poster2.png" width="400" /></a></div>
<div>
The network does seem to learn the basic shapes it sees and we recognize some common patterns like the document and the circle. I showed these to somebody and their first reaction was "are these hieroglyphs?" They do look like hieroglyphs a bit of course and it begs the question, what happens if we train on actual hieroglyphs?</div>
<div>
<br /></div>
<div>
As often with these sort of experiments, the hard thing is getting the data. Images of hieroglyphs on the Internet are found easily; getting them in nice 32x32 pixel bitmaps is a different story though. I ended up reverse engineering a seemingly abandoned <a href="https://code.google.com/archive/p/iglyph/">icon rendering app for the Mac</a> that I found on Google Code (itself abandoned <a href="https://opensource.googleblog.com/2015/03/farewell-to-google-code.html">by Google</a>). This gave me a training set of 2500 hieroglyphs.</div>
<div>
<br /></div>
<div>
The renderer responsible for the image at the beginning of this post has some specific modifications to make it more hieroglyphy: Icons appear underneath each other, unless two subsequent icons fit next to each other. Also, if the middle pixel of an icon is not set and the area it belongs to doesn't connect to any of the sides, it gets filled with yellow - the Old Egyptians seem to have done this.</div>
<div>
<br /></div>
<div style="text-align: left;">
Alternatively we can run the image classifier over the produced hieroglyphs:</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSpS77SG4qOE3LiddDF0UZp4sM6WjZO6DyHxU38Ggv0_yRaSz8gie1NHvALNXArVZUSdosinplXYt8NHaO-h_c-g6gMe6ceRxKZ3mnDQk_0eRTwg2uTaMu4_4IQQbNE0fkBEkIpvevOPWB/s1600/composed.png" imageanchor="1"><img border="0" height="260" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSpS77SG4qOE3LiddDF0UZp4sM6WjZO6DyHxU38Ggv0_yRaSz8gie1NHvALNXArVZUSdosinplXYt8NHaO-h_c-g6gMe6ceRxKZ3mnDQk_0eRTwg2uTaMu4_4IQQbNE0fkBEkIpvevOPWB/s400/composed.png" width="400" /></a></div>
<div style="text-align: left;">
<br /></div>
<div>
You can see it as mediocre labeling by a neural network that was trained for something else. Or as hieroglyphs from a alternate history where the Ancient Egyptians developed modern technology and needed hieroglyphs for "submarine", "digital clock" and "traffic light"</div>
<div>
<br /></div>
<div>
As always you can find the code on <a href="https://github.com/DOsinga/mandylion">github</a>.</div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-36982277782901345272017-01-12T07:20:00.004-08:002017-01-12T07:27:39.544-08:00Deep Ink: Machine Learning fantasies in black and whiteOne of the New New Things in Machine Learning is the concept of <a href="https://openai.com/blog/generative-models/">adversary networks</a>. Just like in <a href="https://en.wikipedia.org/wiki/Coevolution">coevolution</a>, where both the leopard and the antelope become faster in competition with each other, the idea here is to have one network learn something and to have the other network to learn judge the results of the first. The process does a remarkable job <a href="https://arxiv.org/pdf/1612.03242v1.pdf">generating photorealistic images</a>. Deep dreaming has been generating crazy and psychedelic images for a while now. RNNs have been <a href="http://karpathy.github.io/2015/05/21/rnn-effectiveness/">imitating writers </a>for a few years with remarkable results.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><span style="margin-left: auto; margin-right: auto;"><a href="https://commons.wikimedia.org/w/index.php?curid=42684231"><img border="0" height="300" src="https://upload.wikimedia.org/wikipedia/commons/7/71/Deep_Dreamscope_(19822170718).jpg" width="400" /></a></span></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><a href="https://commons.wikimedia.org/w/index.php?curid=42684231">Deep Dreaming, by jessica mullen from austin, txt</a></td></tr>
</tbody></table>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<br /></div>
The state of the art <a href="https://en.wikipedia.org/wiki/Convolutional_neural_network">Neural Networks used to classify images</a> contain many connected layers. You feed in the image on one side and a classification rolls out on the other side. One way to think about this process is to compare it with how the <a href="https://en.wikipedia.org/wiki/Visual_cortex">visual cortex of animals</a> is organized. The lowest level recognizes only pixels, the layer just above it knows about edges and corners. As we go up through the layers, the level of abstraction increases too. At the very top the network will say "I saw a cat." One or two levels below that though, it might have neurons seeing an eye, paw or the skin pattern.<br />
<br />
Deep Dreaming reverses this process. Rather than asking the network, do you see an eye somewhere in this picture, it asks the network, how would you modify this picture to make you see more eyes in it? There's a little more to this, but this is the basic principle. <br />
<br />
So Software can imitate Art, just <a href="http://virgil.org/dswo/courses/novel/wilde-lying.pdf">as Life does</a>. It tends to be fairly literal though. The Deep Dreaming images, fascinating as they are, reflect patterns seen elsewhere in clouds or on top of random noise. So that got me thinking, what happens if we force some stark restrictions on what the network can do?<br />
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnrgc48WTJB_Lqzps7nFsu91hi5MCMTlMVwiGQc-3XoPA-hRhjB2Jnpr6urru9TTZBd0VkiB9rSKoxe3d7QtAMBUpAM358_fYwBKDplDVB_PG5j8yAd2quwTQHrM7i2GuVmgo1UVxNPtel/s1600/tiled.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnrgc48WTJB_Lqzps7nFsu91hi5MCMTlMVwiGQc-3XoPA-hRhjB2Jnpr6urru9TTZBd0VkiB9rSKoxe3d7QtAMBUpAM358_fYwBKDplDVB_PG5j8yAd2quwTQHrM7i2GuVmgo1UVxNPtel/s400/tiled.png" width="320" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
</div>
Deep Ink works similarly. But instead of starting with an image of a cloud, we start with white picture that has a little blob of black pixels in the middle, a little bubble of ink if you will. We then run the network over this image, but rather than allowing to adjust the pixels a tiny bit in a certain direction, the only thing it can do, is flip pixels, either from black to white or the other way around. <br />
<br />
The network can't do much with areas that are pure black or pure white, so in effect it will only flip pixels at the border of the ink bubble in the middle. It's like it takes a pen and draws from the center in random directions to the sides, making patterns in the ink. Making that into an animated gif shows off the process nicely.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-fJsbDHInwTgSvW9PD5pEl0xLHqgAq78GoTS3RR2GPTcxILhINdSIzrPsIqKW9jwFBphK4CTzqKXheoYEvgC8lIS1YZLe3HQApMphbDlMcHSw-O159bqNy7jIg0Wmv4z9nblwvVx14QYq/s1600/505.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-fJsbDHInwTgSvW9PD5pEl0xLHqgAq78GoTS3RR2GPTcxILhINdSIzrPsIqKW9jwFBphK4CTzqKXheoYEvgC8lIS1YZLe3HQApMphbDlMcHSw-O159bqNy7jIg0Wmv4z9nblwvVx14QYq/s1600/505.gif" /></a></div>
You can find the code as always on <a href="https://github.com/DOsinga/deep_ink">Github</a>. You can experiment with which layer to activate and which channel in that layer. Activating a channel in the top layer doesn't seem to draw something represented that channel though. The other thing to play with is, are the values representing black and white in the network. I keep them very close together - the further apart they are, the more high frequencies sneak in.<br />
<div>
<span style="font-family: "helvetica neue";"><span style="font-size: 14px;"><br /></span></span></div>
<!--?xml version="1.0" encoding="UTF-8"?-->
<br />
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
</div>
<br />
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<br /></div>
Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-22378634846361537292017-01-09T07:56:00.003-08:002017-01-11T01:29:48.590-08:00Building Spotify's Song Radio in 100 lines of Python<!--?xml version="1.0" encoding="UTF-8"?-->
<span style="font-family: "helvetica neue"; font-size: 14px;">Somebody said that when it comes to deep learning, it is still very early days, Like <a href="https://www.w3.org/community/webed/wiki/A_Short_History_of_JavaScript">1995</a> (I was going to lookup the quote, but I can't find - it was probably <a href="http://www.winstonchurchill.org/resources/quotations/135-quotes-falsely-attributed">Churchill</a> or <a href="http://mentalfloss.com/article/29372/10-things-mark-twain-didnt-really-say">Mark Twain</a>). I disagree. The early days are gone, it is more like 1958. Fortran has just <a href="https://en.wikipedia.org/wiki/John_Backus">been invented</a>. The early days of having to implement the mechanics of neural networks is akin to writing machine code in the Fifties. Platforms like Tensorflow, Theano and Keras let us focus on what we can do, rather than how.</span><br />
<span style="font-family: "helvetica neue"; font-size: 14px;"><br /></span>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgzyPoDLYao2VPLaOEQbvDLnCKj_ppfnd59N2qMvlSSj4wqdKcarQ80NC0Is3USvks_0Xj9wAdNkRJw86noBffoIAkEix15j-dvaw6m4MfnDbufmy84COq40iTRRVZMqqrmG7MWHmAfXOk/s1600/Marconi_at_desk.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="216" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgzyPoDLYao2VPLaOEQbvDLnCKj_ppfnd59N2qMvlSSj4wqdKcarQ80NC0Is3USvks_0Xj9wAdNkRJw86noBffoIAkEix15j-dvaw6m4MfnDbufmy84COq40iTRRVZMqqrmG7MWHmAfXOk/s400/Marconi_at_desk.jpg" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Marconi the inventor (<a href="https://en.wikipedia.org/wiki/File:Marconi_at_desk.jpg">from Wikipedia</a>)</td></tr>
</tbody></table>
<span style="font-family: "helvetica neue"; font-size: 14px;"><br /></span>
<!--?xml version="1.0" encoding="UTF-8"?-->
<span style="font-family: "helvetica neue"; font-size: 14px;"><a href="https://douweosinga.com/projects/marconi">Marconi</a> is a demonstration of this. It shows how to build a clone of <a href="https://support.spotify.com/us/using_spotify/search_play/spotify-radio/">Spotify's Song R</a><a href="https://github.com/DOsinga/marconi/tree/master">adio in a less than a hundred of Python</a> using open source libraries and data easily scraped from the Internet. If you include the code that scrapes and preprocesses the data, it is almost 400 lines, but still.</span><br />
<br />
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<a href="http://www.pandora.com/">Pandora</a> was the first company to successfully launch a product that produced playlists based on a song. It did this by employing humans who would listen to songs and characterize each on <a href="https://www.pandora.com/about/mgp">450 musical dimensions</a>. It worked well. Once you have all songs mapped into this multi-dimensional space, finding similar songs is a mere matter of finding songs that occupy a similar position in this space.</div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<br /></div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
Employing humans is <a href="http://fivethirtyeight.com/features/the-economic-recovery-is-finally-bringing-pay-raises/">expensive though</a> (even underpaid musicians). So how do we automatically map songs into a multi-dimensional space? These days the answer has to be Machine Learning. Now you could build some sort of model that really understands music. Probably. It sounds really hard though. Most likely you'd need more than a hundred if not more than a thousand lines of code.</div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<br /></div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
Marconi doesn't know anything about the actual music. It is trained on playlists. The underlying notion here is that similar songs will occur near each other in playlists. When you are building a playlist, you do this based on the mood you are in, or maybe the mood you want to create.</div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<br /></div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
A good analogy here is <a href="https://en.wikipedia.org/wiki/Word2vec">Word2Vec</a>. Word2Vec learns word similarities by feeding it sentences. After a lot of sentences, it knows that "coffee" and "espresso" are similar, because it notices that in a sentence where you use the one, you might as well expect the other. It even learns deeper relationships between words, for example that the words "king" and "man" have the same relationship <a href="https://www.quora.com/What-are-some-interesting-Word2Vec-results">as the words "queen" and "woman"</a>.</div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<br /></div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
Fascinating stuff. Usefully, the Python library <a href="https://radimrehurek.com/gensim/">GenSim</a> contains a great implementation of <a href="https://radimrehurek.com/gensim/models/word2vec.html">Word2Vec</a>. So if we feed this playlists containing song ids, rather than sentences containing words, it will after a while learn relationships between songs. Suggesting a playlist based on a song becomes than again a straightforward nearest neighbor search.</div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<br /></div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<div style="font-family: "helvetica neue"; font-size: 14px;">
<!--?xml version="1.0" encoding="UTF-8"?-->
</div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
The trick is collecting the data. How do we get our hands on a large set of playlists? To their great credit, Spotify has a <a href="https://developer.spotify.com/web-api/">wonderful API</a> that lets you get info on songs, artists and playlists. It does not, however, grant access to a dump of (public) playlists, which would be the ideal input for this project.
</div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
<br /></div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
The workaround I use, is to <a href="https://github.com/DOsinga/marconi/blob/master/find_playlists.py">search for words in the title of playlists</a>. We start with the word 'a'. This will return a thousand playlists containing the word 'a'. We store those. Then we count for all playlists scraped so far, how often we see any of the words in the titles of those playlists. We pick the word that appears most often that we haven't searched for. Rinse and repeat. So after 'a', you'll see 'the', 'of' etc. After a while 'greatest' and 'hits' appear.</div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
<br /></div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
It works and quickly returns a largish set of public playlists. It's not ideal in that it is hardly a natural way to sample playlists. For example, by searching for the most popular words, chances are you'll get the most popular playlists. The playlists returned also seemed very long (hundreds of songs), but maybe that's normal.</div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
<br /></div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
Next up: get the tracks that are actually in those playlists. Thanks to Spotify's API, that's quite simple. Just keep calling:
</div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
<br /></div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">res = session.user_playlist_tracks(owner_id, playlist_id, <br /> fields='items(track(id, name, artists(name, id), duration_ms)),next')</span></div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
There's a bunch of parsing, boiler plate and handling timeouts etc to make it work in practice, but <a href="https://github.com/DOsinga/marconi/blob/master/get_tracks.py">it's all fairly straight-forward.</a></div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
<br /></div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
Once we have the training data, building the model is also <a href="https://github.com/DOsinga/marconi/blob/master/train_model.py">quite easy</a>. I throw out playlists that are too long or that have only songs from one artist and the model is trained with a lowish number of dimensions. To make this accessible online, I import the song vectors into postgres. Recommending music then becomes as simple as this sql statement:</div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
<br /></div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">SELECT song_id, song_name, artist_name, </span></div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"> cube_distance(cube(vec), cube(%(mid_vec)s)) as distance</span></div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">FROM song2vec ORDER BY distance</span></div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">LIMIT 10;</span><br />
<div style="font-family: "helvetica neue"; font-size: 14px;">
Where mid_vec is the vector representing the song that was used as an input, or the middle of a set of vectors if multiple songs were provided. </div>
<div style="font-family: "helvetica neue"; font-size: 14px;">
</div>
</div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<br /></div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<!--?xml version="1.0" encoding="UTF-8"?-->
<br />
<div>
How does it perform? Well, you can try <a href="https://douweosinga.com/projects/marconi">for yourself</a>, but I think it works pretty <a href="http://www.apa.org/monitor/feb03/overestimate.aspx">well</a>!
</div>
<div>
<br /></div>
<div>
There's a lot of room for more experiments here, I think. Building an artists only model would be a simple extension. Looking at the meta information of the songs and using it to build classifiers might also be interesting. Even more interesting would be to look at the actual music and see if there are features in the wave patterns that we can map onto the song vectors.
</div>
<div>
</div>
</div>
<div style="font-family: 'Helvetica Neue'; font-size: 14px;">
<br /></div>
Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-86730121456827943372016-11-22T10:54:00.001-08:002017-01-08T04:33:52.080-08:00Calculating the set of universal numbers<a href="https://en.wikipedia.org/wiki/Frederick_II,_Holy_Roman_Emperor">Frederick II</a> allegedly tried to find out what the universal language of humanity was, by depriving a bunch of young children from any language input. He expected them to start speaking Latin, Greek or maybe whatever Adam and Eve spoke in paradise. Instead they went insane.<br /><br /><div>
Esperanto and a set of even less successful competitors tried to construct a universal language (the rumor that Klingon has more speakers than Esperanto is not true, but the fact that people believe it tells you something). Linguists have gone the other way. Languages evolve and seem to have common ancestors. By studying these evolutions, you can come up with theories as to what that common ancestor might have been. If you are very daring you can take this all the way to a supposed <a href="https://en.wikipedia.org/wiki/Proto-Human_language">Proto-Human Language</a>.<br /><br />Trying to follow language evolution this far back is tricky and the approach has been widely criticized. We can only observe language evolution for the last 2000 years or so - applying rules learned from that on the 198 thousand years before is extrapolating by a large margin. For example, most languages have become simpler in the last 2000 years, but how did they become complex in the first place?<br /><br /><div>
Algorithmically, there is a much simpler way to determine the most common language. Given a reasonable edit distance between two words, find for each word the median translation over all languages. The median translation is the translation that has the lowest squared edit distance to all the others.<br /></div>
<div>
I've done just that for the number one to ten using the phrasebooks from wikivoyage:<br /><br /><br /><div class="separator" style="clear: both;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqXyVZks7glKeezCPWQvinM8Zr1d-1EPKuha8tGE68wrihHjVZy0wwuDNhjGvi7APQJR0l4qx_L_IyuF2CmIFZiAvppnK6xpg5rBJWmUcRQyH0enIszt_gwYwZP2ogRSVcWwWP-7yO4SUS/s1600/numbers_1_10.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="235" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjqXyVZks7glKeezCPWQvinM8Zr1d-1EPKuha8tGE68wrihHjVZy0wwuDNhjGvi7APQJR0l4qx_L_IyuF2CmIFZiAvppnK6xpg5rBJWmUcRQyH0enIszt_gwYwZP2ogRSVcWwWP-7yO4SUS/s400/numbers_1_10.png" width="400" /></a></div>
<br />(Expanding this approach beyond the numbers one to ten might be doable, but is harder - words don't just change pronunciation, but also meaning. The Dutch word "tuin", the English "town" and the German "Zaun" all have the same root, but mean respectively garden, town and fence)<br /><br />It is somewhat remarkable that this approach works. Wikivoyage uses a rather unscientific phonetic spelling based on how English speakers would pronounce a word. The edit distance I'm using is Levenshtein with some SoundEx thrown in - both approaches pre-date microprocessors. The languages Wikivoyage cover are whatever their volunteers found interesting enough to add and of those I can only use the ones that happen to parse. But it does look reasonable to me.<br /><br />Is there a way to support this intuition? Why, yes there is! By aggregating the distances between the numbers on the language level, a language distance matrix can be calculated. This in turn we can use to calculate a language tree. Here it is:<br /><br /><div class="separator" style="clear: both;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzcdPl7Pzg47q6DDBqTzq2yX9S8HLDMmKHsEDoCxl2aJprwhPwpm_WbtRnI9rJNwHcY4YVW80BL-6NX7N-4zcTSlpzVacwq3OJrRvV2rsy9e5kUUiHSIgo2er26uRrGaLp9MchRRsVVHDz/s1600/language_tree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzcdPl7Pzg47q6DDBqTzq2yX9S8HLDMmKHsEDoCxl2aJprwhPwpm_WbtRnI9rJNwHcY4YVW80BL-6NX7N-4zcTSlpzVacwq3OJrRvV2rsy9e5kUUiHSIgo2er26uRrGaLp9MchRRsVVHDz/s640/language_tree.png" width="276" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
There are some weird bits in the tree, but by and large you see the major language groupings appear as we know them from linguistics. The Slavic and Germanic groups look quite convincing as does the Latin group although the insertion there of Welsh and Irish seems debatable. Malagasy and Hawaiian get their own minigroup, which is quite interesting.</div>
<br />I think this is a promising approach. Using <a href="https://en.wikipedia.org/wiki/International_Phonetic_Alphabet">IPA</a> for pronunciation, getting a more representative set of languages in (and maybe weigh them by number of speakers) and using a distance measure based on linguistic theories could all improve performance quite dramatically. If you want to play with the code so far, have a look at <a href="https://github.com/DOsinga/universal_numbers">https://github.com/DOsinga/universal_numbers</a><br /><br /><br /><br /></div>
</div>
Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-17315738534418729582016-11-20T08:44:00.002-08:002016-11-20T08:44:58.878-08:00The Jordan-Egypt FerryIf you search the web for information about the ferry between Jordan and Egypt, you probably end up in a state of slight confusion. That is because as things stand right now, it is confusing. This post describes our experience which might proof a useful extra datapoint for anybody wanting to go that way.<br />
<br />
Let's start with the basics. As of October 2016 there doesn't seem to be a fast ferry between Aqaba and Nueweba, only a regular one. Our ticket said it would leave at 11PM though at the ferry terminal they said it would be midnight. It actually left around 1AM. The fare we were quoted was US$ 75, which isn't cheap - it might be US$ 70 if you buy directly from the ferry terminal. The crossing took about 3 hours.<br />
<br />
I think you need a good reason to take the ferry. The alternative of going through Elat in Israel and then continue to Taba is probably cheaper, faster and more comfortable. If you do and you want to travel outside of the Sinai, you should pick up a visa for Egypt in Aqaba as they only issue Sinai ones in Taba and it seems to be a pain to convert these into full visas.<br />
<br />
We wanted to continue to Sudan after Egypt. Sudan might refuse you entry if you've been to Israel and even though Israel kindly doesn't stamp your passport, the exit stamp from Jordan allegedly has caused trouble for others, so we took the ferry.<br />
<br />
If you arrive early in Aqaba you could spend the day at one of the resorts in south beach, though the one we checked closed at 6PM which is still some time from the midnight departure. Alternatively there's a public beach not too far from the ferry terminal.<br />
<br />
We duly arrive at 8:30PM. You can easily come by an hour later, though maybe not much more as they did seem to shut down stuff way before the ferry left. Before you can get your exit stamp, you need to pay the JOD 10 exit tax at a counter around the corner. There's a small restaurant and a duty free shop that sells half liter Heinekens for US$ 3. The currency exchange rates offered are rather terrible, so better give that a miss.<br />
<br />
At boarding time the officer in charge called out the foreigners to let them embark first. This might seem unfair, but then again, we also pay a lot more than the local price. You can drop your heavier luggage in a container downstairs and head upstairs. The seats are quite comfy and there's a little (non alcoholic) bar serving snacks and soda.<br />
<br />
There's also a passport processing facility. If you don't have an Egyptian visa yet, hand over your passport here before they let all the non-foreigners aboard. Again make sure you mention you need a full visa and not a free, Sinai only visa - unless you fly out from say Sharm El Sheik of course.<br />
<br />
The lounge in the back seemed quieter and partly more comfortable as some arm rests can be pulled up to create more space for sleeping. It is also considerably more airconditioned so bring a sweater.<br />
<br />
After a few hours of fitful sleep, an officer in white woke me asking where my passport was - he was holding it in his hands, so I pointed that out. He collected all other foreigners (5) and navigated us off the boat, by way of the dropped off luggage through a series of check points into a waiting area opposite a small office where the Egyptian visas are processed. The charge was US$ 25, payable in cash only. There didn't seem to be an ATM at the place.<br />
<br />
Once outside of the ferry terminal you can either wait for the busses to start running from around 6AM or haggle with the collected taxi drivers. We paid 400 EGP (US$ 44) for four people to Sharm and made it there just before sunrise.Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-41755742356507012992016-10-21T05:28:00.000-07:002016-10-21T05:28:20.027-07:00WorldSizerWhen I was 11 or so, I saw my first <a href="http://www.timesatlas.com/product/9780007551408/The+Times+Comprehensive+Atlas+of+the+World">Times World Atlas</a>. I was blown away. It was so much better in every way compared to the <a href="http://www.bosatlas.nl/wps/portal/nubosatlas/bosatlas/basisonderwijs/dejuniorbosatlas">school atlas</a> I was used to. The maps were huge, detailed and beautiful. The thematic maps and diagrams visualized everything from land use to economy to desertification.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><span style="margin-left: auto; margin-right: auto;"><a href="http://douweosinga.com/projects/worldsizer?state=Population"><img border="0" height="247" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjTLXmoyHt-Fw5h2DTXrUkcfyg4A-Kgi9L0LcMqIJi98bhhaVHbSnLKIaTlijLoRHHnJD2y6-PhqXELyOzaWKoTlOODsn7ILpGhL7BCfwV6CJ6Sj39i_Rx-tZ2tz2UYfj9QSaihduwcg8Fa/s400/Screen+Shot+2016-10-21+at+11.48.39+AM.png" width="400" /></a></span></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><a href="http://douweosinga.com/projects/worldsizer?state=Population">Countries resized to reflect their populations</a></td></tr>
</tbody></table>
<br />
These days, with Google Maps and data visualizations of every type, you don't hear so much about atlases other than that people cut up old ones to sell the individual maps <a href="http://www.ebay.com/sch/Antique-Maps-Atlases-and-Globes/37958/bn_1856736/i.html">as wall decorations</a>. There was one type of visualisation in the Times World Atlas that impressed me very much that I haven't seen online though: resized countries.<br />
<br />
The idea is that you change the size of a country according to some statistic while trying to minimize the overall distortion of the map. As a kid, I wondered how you would calculate this - now I am pretty sure they just had an artist do their best. It is therefore with some pride that I am publishing an algorithm here to do something similar online: <a href="http://douweosinga.com/projects/worldsizer">WorldSizer</a>.<br />
<br />
It's a two step process. The first, offline step takes the country data from the <a href="https://www.cia.gov/library/publications/the-world-factbook/">CIA Factbook</a> and the shape information from <a href="http://www.naturalearthdata.com/downloads/">Natural Earth</a>. The CIA data is nice, because it is more evenly edited than, say, Wikipedia, but does need some massaging. Especially the country codes used by the CIA are seen nowhere else. One wonders if this has ever led to the wrong government being thrown over in a small <a href="https://en.wikipedia.org/wiki/1954_Guatemalan_coup_d%27%C3%A9tat">Latin American country</a>.<br />
<br />
The shape files from Natural Earth have one or more shapes per country. The first step is to apply an area preserving projection to the shapes. Resizing a mercator map according to population would only cause confusion since it would still show India too small and Canada too big. The next step replace the points of the shapes with list of indexes into a global list of points. Here we also try to make sure that points on shared borders are only stored once. Since now points on borders are shared, we modify the shape of one country, it will also modify the shape of the other.<br />
<br />
The online step takes the output of this and allows the user to pick a measure that the world will be "resized" to. For each country, we calculate the deflation or inflation factor needed to reflect the chosen measure. Then in an iterative process, we calculate for each shape how far the current area is off from the target area. If the shape is too small, all points of the shape are pushed away from the center, if the shape is too big, they are pushed towards the center.<br />
<br />
For islands, this is enough. For countries that share borders, a tug of war process plays out. Especially in a region where all shapes need to grow or shrink, it takes a while for things to stabilize. Each shape also tries to maintain its original shape - for example, if you look at the map for population, you see India grow and sort of fall out of Asia as a blob before regaining its normal shape (only much bigger).<br />
<br />
The shared borders and the country shape maintenance keep continents mostly in shape, too. But without extra care, islands will just drift into the mainland or each other. We stop this from happening by calculating "bridges". For each island, we look for a larger land mass nearby and anchor it to it. This keeps Ireland next to Great Britain, Great Britain next to France and Sri Lanka close to India.<br />
<br />
Finally, we create some bridges by hand. Neither Spain nor Morocco are islands, but we'd still don't want them to crash into each other. Similarly, we attach Yemen to Somalia, Australia to New Zealand and Sweden to Denmark. In some situations this still leads to overlap. If you use proved oil reserves as a measure, the Middle East of course increases in size by a lot. But it has nowhere to go, so it pushes into the Mediterranean and squashes Syria into Greece.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><span style="margin-left: auto; margin-right: auto;"><a href="http://douweosinga.com/projects/worldsizer?state=HIV%2FAIDS%20-%20people%20living%20with%20HIV%2FAIDS"><img border="0" height="242" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmDe702UYK8BGrcVs-m4BZkYL1FNeXjDdYL5t88Y-zQOmrhyphenhyphen4HhHv0vcJYSNhx1HA5Imh83GQxWhppsTg1l_M-tMAum2vRcjjZSHea45bh-yHDR3hasAmxohO6tINbkj4VVYeMmQvDS2V2/s400/Screen+Shot+2016-10-21+at+12.15.22+PM.png" width="400" /></a></span></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><a href="http://douweosinga.com/projects/worldsizer?state=HIV%2FAIDS%20-%20people%20living%20with%20HIV%2FAIDS">More sad: scaled to number of people living with AIDS</a></td></tr>
</tbody></table>
One could add more bridges to force more map preservation, but this does come at a cost. The more forced the map, the less freedom the model has to preserve the shapes and get to the target sizes and it starts to behave more and more like a water balloon where you squeeze on one side and it just bulges out on the other.<br />
<br />
As usual the source code is <a href="https://github.com/DOsinga/worldsizer">on GitHub</a>. It should be fairly straight forward to use the <a href="https://github.com/DOsinga/worldsizer/blob/master/worldsizer.js">worldsizer.js</a> on another website with different data.Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-75582889399684057002016-10-12T07:25:00.000-07:002016-10-12T07:25:32.175-07:00Styled MuseumsThe <a href="http://prisma-ai.com/">Prisma app</a> became an overnight hit after it launched because of its great photo filters. Rather than just applying some face feature transformation or adjusting the colors, it rerenders a picture in the style of a famous artwork. The results are remarkable and quite recognizable. And it is no secret how this is done. The basic algorithm is described in a paper published a year ago called <a href="https://arxiv.org/pdf/1508.06576v2.pdf">"A Neural Algorithm of Artistic Style"</a><br />
<br />
Besides the scientific paper and the startup executing on it, there's also an <a href="https://github.com/anishathalye/neural-style">open source implementation</a> of the algorithm. I played around with it a bit and it also works well, although it seems roughly 100x slower than Prisma. It got me thinking, what happens if you use this to re-render pictures of museums in the style of their most famous work? That way you see what the building is like and at the same time what to expect when you go in.<br />
<br />
<a href="http://douweosinga.com/projects/styledmuseums">Styled Museums</a> does exactly that. It has the top 100 museums (by their wikipedia page view count) and their most popular works (same measure) and shows them on a world map. It uses the <a href="https://github.com/DOsinga/wiki_import">wiki_import</a> frame work to get the data. You can click around and find <a href="http://douweosinga.com/projects/styledmuseums#Alte Nationalgalerie">your favorite museum </a>and see what happens to it.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://douweosinga.com/projects/styledmuseums#Museum Boijmans Van Beuningen"><img border="0" height="136" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_q8Q5l-SXTbVx2AxkI51gybj-JIl9l47xG2pktuuEauRy4BHqzwvNAYesW9m6g_Zpu9UVaGsho6Sk_x4vyRg2lbcvO3PYNiki-PKJ3g9Huyqb0564W2-GDLonMV1Q-lYmI89NYQDBC_6y/s400/Screen+Shot+2016-10-12+at+3.18.38+PM.png" width="400" /></a></div>
<br />
<br />
I think the fact that artistic style transfer is available as a scientific paper, a startup and an open source implementation is indicative of a wider trend. We now live in a world where we have three forms of innovation. The traditional scientific method where publicly financed institutions produce papers describing new ideas; the startup world that funnels large amounts of private money into ideas to see if they come to commercial fruition; and finally the open source world where individuals build something and share it with the world to build up their public profile.<br />
<br />
These three engines of innovation aren't silos. Google started out as <a href="http://infolab.stanford.edu/~backrub/google.html">a scientific experiment</a>, became a startup and a commercial success and now publishes scientific papers and open sources part of <a href="https://github.com/tensorflow/">their technology</a>. Github is a startup that is not only based on an open source project, but also hosts other open source projects. Twitter open sourced <a href="https://www.infoq.com/news/2014/01/twitter-summingbird">their data processing engine</a>, which now helps academics keep up with what their peers in Silicon Valley are up to.<br />
<br />
It doesn't always seem fair. The founders of Google became billionaires with technology developed while being employed by Stanford University, while the inventor of the world wide web works for a <a href="https://www.w3.org/People/Berners-Lee/">non profit.</a> For years Werner Koch maintained the GnuPG email encryption package <a href="http://thehackernews.com/2015/02/gnupg-email-encryption-project-relies.html">on the salary of a postman</a>, while the founder of Hotmail is worth more than a <a href="http://www.celebritynetworth.com/richest-businessmen/sabeer-bhatia-net-worth/">100 million dollars</a>.<br />
<br />
The 1980 <a href="https://en.wikipedia.org/wiki/Bayh%E2%80%93Dole_Act">Bayh-Dole Act</a> in the US explains some of the difference between there and <a href="http://www.ipeg.com/university-inventions-europe-needs-a-bayh-dole-act/">Europe</a>. It allows universities and companies to claim patent rights on research undertaken with federal funding. On one level this doesn't seem right - if the government paid for research, shouldn't the patents end up with the government too? Then again, it turns out that the government isn't particularly good at doing interesting stuff with those patents - startups do much better.<br />
<br />
And so we end up with <a href="http://douweosinga.com/projects/styledmuseums">Styled Museums</a>. Inspired by Prisma, a VC funded company, I found the original paper which is based on research paid for mostly by t<a href="https://www.uni-tuebingen.de/en/university.html">he University of Tübingen,</a> which in turn led me to an Open Source implementation. You can find the code used to get to Styled Museums (of interest is mostly the matching of museums and paintings) on <a href="https://github.com/DOsinga/styled_museums">Github</a>, of course.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhHR8vhqdq21sIvaI8fJ3BCy9BuWhJowpVWDvivpvNFuyUClrKOHHiLKr1PTmcGFEMQFgCWRczavGj0y-Onmh_1c5vnWOKIRH0778SWkp1ypKTMKDN8xsXgPxeaaxyQdpfczXIZSrqrwCCH/s1600/Screen+Shot+2016-10-12+at+3.20.10+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="136" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhHR8vhqdq21sIvaI8fJ3BCy9BuWhJowpVWDvivpvNFuyUClrKOHHiLKr1PTmcGFEMQFgCWRczavGj0y-Onmh_1c5vnWOKIRH0778SWkp1ypKTMKDN8xsXgPxeaaxyQdpfczXIZSrqrwCCH/s400/Screen+Shot+2016-10-12+at+3.20.10+PM.png" width="400" /></a></div>
<br />Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-13512896685998472922016-10-05T10:09:00.000-07:002016-10-05T10:09:15.735-07:00Introducing KarakameThis weeks project is a camera project: <a href="https://itunes.apple.com/en/app/karakame/id1159921168?mt=8">Karakame</a>. Sorry, Android guys, iOS only. The app takes 5 pictures with 3 seconds in between. After adjusting for small movements of the camera, it will then for each pixel in the five images, pick the median one. This has the effect that when pointed at a scene where people walk in and out, it will remove those people in the aggregate picture.<br />
<br />
It works reasonably well. The app is by all means no replacement for the main camera app, more a proof of concept. It seems like the sort of thing main stream camera apps should add - if you have an app like that you can get the source for this at <a href="https://github.com/DOsinga/Karakame">https://github.com/DOsinga/Karakame</a>. We were in Leipzig this weekend and I tried it out on a statue of Bach:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfbfaazmHtzLoQh99s0K9IX98Uuavd1d40VTwgDVlEz3lRtWtlN1ip63jGb7rrKHRvSYhyphenhyphen_ES7juszbvuK8BQ-zoY0qewBIQF0TvlcKr9_p5rgTV1rSG2hzVYMBft9NAPQ-l_u7fOvUgNL/s1600/IMG_5271.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjfbfaazmHtzLoQh99s0K9IX98Uuavd1d40VTwgDVlEz3lRtWtlN1ip63jGb7rrKHRvSYhyphenhyphen_ES7juszbvuK8BQ-zoY0qewBIQF0TvlcKr9_p5rgTV1rSG2hzVYMBft9NAPQ-l_u7fOvUgNL/s320/IMG_5271.JPG" width="180" /></a>]</div>
<div class="separator" style="clear: both; text-align: center;">
Bach in Leipzig</div>
<br />
See? No people.<br />
<br />
Karaoke famously means "Empty Orchestra" in Japanese - "hauntingly beautiful". Except for that it doesn't quite. Kara means empty (see also Karate - empty hand), but the "oke" bit is just the last bit of the English word orchestra. So I called the app Karakame, from the almost Japanese for "empty camera".<br />
<br />
Some notes on the implementation. The app uses OpenCV which you can quite easily integrate into iOS these days. I extracted the interoperability code into a <a href="https://github.com/DOsinga/karakame/blob/master/Karakame/OpenCVBitmap.mm">OpenCVBitmap</a> class, so have a look if you're interested in that sort of thing. The image stabilization works really well. I normalize to the middle bitmap (i.e. the third one if you take five pictures). Image stabilization leads to the fact that some of the border pixels will be missing from some of the pictures, but by picking the median pixel value, most of the time we'll have values from other bitmaps.<br />
<br />
I also experimented with object detection. OpenCV comes with a set of detectors called <a href="https://en.wikipedia.org/wiki/Haar-like_features">haar cascades</a> that can detect faces, cars and people - no deep learning needed. It works well for face detection, but for cars and people I didn't get a lot of good results. The idea was to leave pixels inside rectangles that were classified as cars or people out of the median voting, but I took that out again.<br />
<br />
Finally the median pixel implementation. Calculating medians in higher dimensions is <a href="https://en.wikipedia.org/wiki/Geometric_median">expensive</a> so I decided to just calculate the medians for the red, green and blue channels. This could lead to weird results, but in my testing it seemed ok. I suppose I could do a little better by calculating the median for the three colors and in the case where there is a disagreement, pick whatever pixel has the smallest distance to the other candidates.<br />
<br />
If you have read this far, you're probably ready to get the project from github: <a href="https://github.com/DOsinga/Karakame">https://github.com/DOsinga/Karakame</a><br />
<br />
<br />Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-16370797013293912802016-09-29T07:25:00.002-07:002016-09-29T07:25:31.717-07:00Semantics, Maps and Word2VecThis week I published a new project: <a href="http://douweosinga.com/projects/worldmapof">worldmapof</a>. It uses <a href="https://en.wikipedia.org/wiki/Word2vec">Word2Vec</a> to calculate the distance between a given word and the name of a country and then colors each of the countries according to that distance. The results are often what you expect with some interesting surprises thrown in. You want to see Colombia and Ethiopia light up for coffee, but wonder why Greenland also features prominently only to learn one Google search later that Greenlandic coffee is a thing.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpz1ajHwGb-mDqKFIcgMri7EzLEGiltY-gy2KBfzh0dk0ymNI9Z0xdjxX8Ma5o36Eup3D3IdgCpxJldTp6G_VHkHuLqT2mdIx__33qDb1IWi50e3xGfBg5UZuOZShtzSCdWhGeFZk3dKQa/s1600/Screen+Shot+2016-09-29+at+2.57.52+PM.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="226" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpz1ajHwGb-mDqKFIcgMri7EzLEGiltY-gy2KBfzh0dk0ymNI9Z0xdjxX8Ma5o36Eup3D3IdgCpxJldTp6G_VHkHuLqT2mdIx__33qDb1IWi50e3xGfBg5UZuOZShtzSCdWhGeFZk3dKQa/s400/Screen+Shot+2016-09-29+at+2.57.52+PM.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Coffee projected on a map</td></tr>
</tbody></table>
Word2Vec analyses large amounts of text - in this case from Google News. By building a model to predict a word given a context, it associates a 300 dimensional vector with that word. The interesting thing about this vector is that it has some semantic meaning. If the distance between two vectors is small, the two words are related. So if the distance between the word Colombia and the word coffee is small, that means that Colombia and coffee are related and we can paint it in a brighter color green.<br />
<br />
To make this work in an online situation, I imported the word2vec data into postgres. If you want to play with that yourself, you can find the code on <a href="https://github.com/DOsinga/country2vec">github</a>.<br />
<br />
Since the underlying model is trained on a Google News archive, some biases shine through. There are some countries that don't appear often in the news - Chad, the Central African Republic and the Republic of Congo (not to be confused with the Democratic Republic of Congo) spring to mind. This makes the vectors of those countries unstable. One article about a guy who went walking in Chad and now Chad lights up for the word walk, even though it isn't particularly related.<br />
<br />
The US has the opposite problem. American news talks about <a href="https://www.google.com/?q=%22the+average+american%22&tbm=nws">"the average American"</a> or "in the US" when the subjects discussed aren't particularly American at all. So the US tends does well for day-to-day terms and maybe underscores a bit for international queries. I created a small spin-off thing, <a href="http://douweosinga.com/projects/usmapof">usmapof</a> that uses the names of the US states instead. Comparing the maps for "Germany", "Sweden" and "Norway" gives you an idea where migrants from those countries ended up. Or if you want to know where hockey is popular:<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJbdj7bMzcoMs2F7tXvuekZLJz34SR3WYyRB8IEVenhbtkO3nIElu3-8LXn10Y4BmXgT93q_p3RQqPf-y6cGgOH2VW4DO_3-DXWd4xdOAllFVDe2efqdiuQ8BKUS2qJG3gTpcFXJ2xy45k/s1600/Screen+Shot+2016-09-29+at+3.18.10+PM.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="223" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJbdj7bMzcoMs2F7tXvuekZLJz34SR3WYyRB8IEVenhbtkO3nIElu3-8LXn10Y4BmXgT93q_p3RQqPf-y6cGgOH2VW4DO_3-DXWd4xdOAllFVDe2efqdiuQ8BKUS2qJG3gTpcFXJ2xy45k/s400/Screen+Shot+2016-09-29+at+3.18.10+PM.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Hockey lights up the north</td></tr>
</tbody></table>
<br />
It's fun to play with, but sometimes you see the limits of the model shine through. The data is somewhat old, so you can't use it well to illustrate current political events. Moreover, names of states are somewhat poor representations of the underlying entities. Washington usually does not mean the state. <a href="http://douweosinga.com/projects/usmapof?word=England">England</a> makes New England light up for the US, but probably not because so many English settlers went there.<br />
<br />
So I wonder if we can do better. What if instead of running a <a href="http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/">skip-gram algorithm</a> over windows of words, we preprocessed the text into entities first? Then quite possibly the model would learn which entities have similar roles, rather than which words have similar roles. We might want to incorporate somehow even the roles of entities in sentences, which might allow the model to learn from a fragment like "Oil was found in Oklahoma" that oil is something that can be found and that Oklahoma is a place.<br />
<br />
Maybe I should try <a href="https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html">SyntaxNet</a> out for this and see what happens.<br />
<br />
<br />
<br />
<br />
<br />Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-16870521836450263222016-09-20T08:06:00.000-07:002016-09-20T08:13:10.950-07:00Project: Offline Movie ReviewsI used my first week of <a href="http://blog.douweosinga.com/2016/09/leaving-triposo.html">freedom</a> to write a little toy-app: <a href="https://itunes.apple.com/us/app/offline-movies/id1154180474?ls=1&mt=8">Offline Movie Reviews</a>.<br />
<br />
Airplanes don't fly faster than they did <a href="https://www.quora.com/Why-has-the-speed-of-airliners-remained-more-or-less-a-constant-over-the-years">40 years ago</a>, nor do they provide us with more legroom. But we did make a lot of progress when it comes to personal entertainment on board. Most airlines these days will provide you with your own screen and a selection of movies to while the time away. They'll also usually insist that all their movies are just great. And while most will improve with each consumed Gin & Tonic, it still helps to pick one with a good base score. This is where the offline movie reviews app comes in.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiqJ5S4-0DVVheFbOQTAHVVOa5S0y9uNgRZrrL5oQqFGXOHuh3B9A-9W4NR2rMgypDDQf7BI3hNc1ouf2ObP0x5VF4echke7Tf4lc_oatxmc7uKAoZYZRujjaCYSpHPui5YJp585Lcyw5k/s1600/Simulator+Screen+Shot+Sep+18%252C+2016%252C+8.42.48+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiqJ5S4-0DVVheFbOQTAHVVOa5S0y9uNgRZrrL5oQqFGXOHuh3B9A-9W4NR2rMgypDDQf7BI3hNc1ouf2ObP0x5VF4echke7Tf4lc_oatxmc7uKAoZYZRujjaCYSpHPui5YJp585Lcyw5k/s400/Simulator+Screen+Shot+Sep+18%252C+2016%252C+8.42.48+PM.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<br />
<br />
It ships with the reviews of 15 000 of the most popular movies. Each usually has a thumbnail version of the movie poster and always the section from the wikipedia article describing the reception. This will typically contain the scores on rotten tomatoes, metacritic and/or the imdb and often comes with a quote or two from a movie critic. Enough to make a somewhat informed decision on how to spend the next two hours.<br />
<br />
If you're not interested in how it technically works, you should download the app now, keep it on your phone for your next flight and stop reading.<br />
<br />
<br />
Apart from the usefulness of the app, I wanted to accomplish two things: learn Swift and share with the world how at <a href="http://www.triposo.com/">Triposo</a> we process and massage data. When Swift came out, I liked some of the things, but was disappointed by how <a href="https://swiftopinions.wordpress.com/tag/error-handling/">errors were handled</a> and the lack of real garbage collection. Meanwhile, error handling has improved and overall I must say it is a very pleasant language to develop in (even more so if you compare it directly to Objective-C and all [its awkwardness]). The app is not very complicated - master/detail with a tiny bit of care to make sure it executes searches over the movies in smooth fashion.<br />
<br />
The data processing builds on my <a href="https://github.com/DOsinga/wiki_import">wiki_import project</a>. Wiki_import imports dumps of the Wikipedia, Wikidata and the Wikistats into a Postgres database, after which we can query things conveniently and fast. In this case we want to get our hands on all the movies from the wikipedia sorted by popularity. The wikipedia contains roughly a hundred thousand movies - including all of them would create a db of 700MB or so. We're shooting for roughly 100MB or 15 000 movies. The query to get these movies is then quite straightforward:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">SELECT wikipedia.*, wikistats.viewcount </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">FROM wikipedia JOIN wikistats ON wikipedia.title = wikistats.title WHERE wikipedia.infobox = 'film' </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">ORDER BY wikistats.viewcount </span><br />
<span style="font-family: "courier new" , "courier" , monospace;">DESC limit 15000</span><br />
<br />
For each movie, we collect a bunch of properties from the infobox using the <a href="http://mwparserfromhell.readthedocs.io/">mwparserfromhell</a> package, an image and the critical reception of the movie. The properties have standard names, but their values can be formatted in a variety of ways, which requires some tedious stripping and normalizing - as always with wikipedia parsing. The image processing is quite straightforward. I crop and compress the image up to the pain limit to keep the size down. I switched to using <a href="https://developers.google.com/speed/webp">Google's WebP</a> which makes images a lot better at these high compression levels.<br />
<br />
As you'd expect from user generated content, the critical reception section on the Wikipedia can hide under a number of headings. I might not have gotten all of them, but the great majority for sure. So we find the first of those headings and collect all the wikitext until we encounter a heading of the same or less indent. Feed that into mwparserfromhell wiki stripper and voilá: a text with the reception and only a minimum of wiki artifacts (some image attributes go awry it seems).<br />
<br />
We then stick everything into a sqlite database with a full text search index on the title of the movie and the starring field, so we can search for both the name of a movie and who appears in it. That last bit isn't needed for when you decide which movie to watch, but I find myself often wondering, where did I see this actress before? Full text search on iOS works fast and well these days and even gives you prefix search for free.<br />
<br />
You can find all the code on <a href="https://github.com/DOsinga/offline_movies">github</a>.Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-28921894194924940022016-09-12T06:47:00.003-07:002016-09-12T06:47:50.809-07:00Leaving TriposoWednesday, August 31 2016 was my last day as full time employee at <a href="http://www.triposo.com/">Triposo</a>, the travel guide company I started 5 years ago <a href="https://techcrunch.com/2011/09/08/ex-googlers-launch-mobile-travel-guide-to-kill-lonely-planet-raise-seed-round-from-chris-sacca-more/">with my brothers and Jon Tirsen</a>.<br />
<br />
Triposo will continue to exist and will focus on delivering content and technology solutions for other companies. While I think that this is the best strategy for the company, it just isn't me. So <a href="https://www.linkedin.com/in/nishankgopalkrishnan">Nishank Gopal</a> will take over as CEO who has a lot more experience executing this sort of B2B strategy. I'll remain on the board and be involved as an adviser.<br />
<br />
I'm taking some time off to think, write, code, learn and <a href="http://wikitravel.org/en/Alexandria_to_Cape_Town_by_train_and_bus">travel</a>. With the company continuing, this isn't quite one of t<a href="https://www.cbinsights.com/blog/startup-failure-post-mortem/">hose Startup Post Mortems</a>. I did want to share some thoughts on running travel companies though:<br />
<h3>
What worked and what didn't?</h3>
<div>
Triposo started out with a three pronged plan:</div>
<div>
<ul>
<li>Build travel guides from targeted web crawls</li>
<li>Make the travel guides sticky by adding a travel log</li>
<li>Make money by selling tours and travel services on the go</li>
</ul>
<div>
The first prong worked rather well. We went from a few city guides that were basically mash ups of Wikipedia and Wikitravel to a travel guide that covered the world within the first year and kept improving the data quality from there on. I was especially proud when we launched the system that matched web pages automatically to our poi database and then ran opinion mining and fact extraction over those pages.</div>
<div>
<br /></div>
<div>
With this we could rank pois not just on one score, but on a variety of aspects - coffee, drinks, location, which in turn we could use for recommendations and personalization. On top of that we developed a nifty similarity measure for pois powering our "people that like this place, also like."</div>
<div>
<br /></div>
<div>
The second prong of adding a travel log, started promising. Being able to add photos and notes to entries in a travel guide and building a story that way, was fun. For us. Our users didn't use the feature very much though. They used Facebook for sharing their travel experiences. And so we were confronted with a choice: do we keep betting on two things, or do we focus on the thing that really works well, our core travel guide? We went with the last one and killed the travel log.</div>
</div>
<div>
<br /></div>
<div>
Sometimes I think we shouldn't have. 5 years ago, Facebook was the place to share this sort of thing, but I wonder if nowadays there would be room for a sharing platform specifically for travel. Breadtrip seems to do well in this space. But you know what they say, being too early is just as bad as being too late.</div>
<div>
<br /></div>
<div>
We didn't pay a lot of attention to our third prong in the first years. People spend a lot of money on travel and half of that is spent during the trip. We figured that once we had a large enough user base, they could start spending that through us. The conversion rates we got linking to web pages from our app were quite low and it seemed to us that just natifying those flows should do the trick.</div>
<div>
<br /></div>
<div>
It didn't. Or not enough. In our presentations we always talked about the shift from desktop to mobile and from booking before a trip to during a trip. This trend is real, but we still have a long way to go. People are happy to research a hotel on their phone, but when it is time to make a booking and enter those credit card details, they'll often quickly switch to the desktop browser, leaving your poor travel guide without its margin.</div>
<div>
<br /></div>
<div>
The other issue was that for tours and activities we had almost no options that had same day availability. When your model is based on telling people at the breakfast table what they should be doing that day in the city where they are, this is a problem. Again, I'm sure this will get better in the next few years, but it didn't in time for us.</div>
<h3>
What do you do when things don't work?</h3>
<div>
This is a question people in the start-up world don't talk about much. The general opinion is that when you have a start-up, you focus on that one thing that you do best. That's how you become successful, that's how Google and Facebook did it. Only when you are huge do you diversify.</div>
<div>
<br /></div>
<div>
That's all very well, but what if the one thing you are good at isn't enough? Initially we were doing great, our user base was growing exponentially. But that growth wasn't really viral, it was just Apple and Google sending us downloads. With the travel log shut down, we were seeing bad retention numbers. With our bookings on the go not really taking off, we didn't have a real ecommerce play either.</div>
<div>
<br /></div>
<div>
So what do you do? "Pivot" is a popular answer. But for every success story about pivots there are ten failures and to me it always seemed like spending the money of your investors on an idea they didn't invest in. So you start thinking about things you could add that would fix retention or fix conversion. </div>
<div>
<br /></div>
<div>
City walks, mini guides, a chat room for triposo users in the location, printable posters, sponsored free wifi, audio guides, a chat bot that advises users about hotels and attractions, partly powered by a human - we built all these things and launched them. And then when the feature doesn't quite take off, you are faced with the choice of removing it and disappointing the users that enjoyed it, or have it clutter up an already complex app.</div>
<div>
<br /></div>
<div>
Maybe this is the right strategy. You try stuff until you hit it out of the park or run out of money. But often I think we should just have focused on building the best travel guide possible. Improve the data quality, the data coverage and the smartness. And if that's not enough, well, then there just wasn't enough a market for the original plan.</div>
<h3>
Can a travel planning app be a success?</h3>
<div>
A few month ago there was a popular blog post titled "<a href="https://www.tnooz.com/article/why-you-should-never-consider-a-travel-planning-startup/">Why you should never consider a travel planning startup</a>." I was asked a few times about my opinion. Triposo was of course never a travel "planning" startup - we always focused on being helpful when you are on the road. But the arguments against it are very similar.</div>
<div>
<br /></div>
<div>
In short the article says: Getting lots of users for travel is hard, because people do it only once or twice a year. Getting people comfortable with something as complicated as a travel planning app is hard. Getting people to trust you enough to book through you rather than through an OTA they know is hard. Outbidding the site that pays you a commision for a hotel sale is hard.</div>
<div>
<br /></div>
<div>
This is all true and we've seen all of these things first hand at Triposo. But even though I'm writing a post about why Triposo as a consumer product hasn't taken off, I would still answer the question of whether a travel planning app can be a success with a yes.</div>
<div>
<br /></div>
<div>
First of all, these arguments are about all travel startups, not just the ones that do planning or help you while on the road. And yet using Kayak has become a habit. We actually succeeded in attracting a fair amount of users organically. And while we had trouble getting people to book through our app, Tripadvisor figured this out - I could read the reviews there and then go to Expedia to make my booking. And outbidding the guy who pays you a commission is the hallmark of the entire travel industry. How can Booking.com outbid the hotels themselves on Google?</div>
<div>
<br /></div>
<div>
We focused on being a travel guide that is helpful when you are at the destination, because people don't like to plan. It seems inevitable that there will be an app that will let you have a perfect experience on your trip without you doing more planning than necessary. An app that has all the travel information in the world and knows who you are, where you are and your mood. Unfortunately it looks like it won't be Triposo.</div>
<h3>
So what's next?</h3>
<div>
I'm taking some time off to learn, write, code, read and travel. I think that when it comes to technology things have never been as interesting as they are now, so taking a bit of time to figure out what's next seems like the best approach. I'll be doing some smaller projects around stuff I want to try out. A first small one you can find here: <a href="https://github.com/DOsinga/wiki_import">https://github.com/DOsinga/wiki_import</a> - some scripts to import the wikipedia, wikidata and wikistats into postgres and make them searchable.</div>
<div>
<br /></div>
<div>
Triposo as a consumer product will continue and will remain "<a href="http://www.triposo.com/">probably the best travel guide</a>" in the app store. The engineering team will focus on data quality, coverage and smartness - in a way executing on the "focus on the one thing you're good at" strategy. If you are interested in using the Triposo data and smartness for your own business, get in touch. There's some wonderful stuff there.</div>
Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-16098420536256275132016-06-05T05:27:00.000-07:002016-06-05T05:27:54.099-07:00Predictions for Euro 2016Two years ago I coded up a small python model to <a href="https://github.com/DOsinga/football_predictions">simulate the world cup</a>. The results back then were more or line with what the general predictions were; Brazil to win.<br />
<br />
I updated the model for the Euro 2016 tournament. My data source for matches had gone, so I had to adjust that and I also introduced weights for previous games. Games that are friendly, or longer ago weigh less. The oldest matches I am taking into account are from just after the World Cup.<br />
<br />
The results seem more different from the pundits than last time around. France is the favorite (25%), but that is because the home advantage which I set at 0.25 - historically the model has it between 0.2 and 0.3. Poland is the surprising number two with 21%. They did a decent job qualifying, had some good friendlies, so I find it hard to argue with.<br />
<br />
Spain and England are basically tied at 11%. Of course Englands performance could very well decide whether Brexit happens or not, so this is important.<br />
<br />
The model does not like Germany's chances much at 8%. The results from two years ago are now weighed only at 30% because of the time gone by.<br />
<br />
Just to put my money where my model is, I made an actual bet for Poland to winDouwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-13521313714191732302016-01-02T07:20:00.001-08:002016-01-02T07:20:14.002-08:00Where the streets have no nameGrowing up in the Netherlands I never considered that our system for street addresses wasn't obvious and therefore universal. Street <house number="">, Postal code, City, Country. How else would you do it? It turns out there are many ways.</house><br />
<br />
Putting the house number before the street actually is more consistent. Not using house numbers, but the number of meters from a crossing gives one a better idea of where the house actually will be. Some places issue house numbers in chronological order rather than in a geographical fashion. Some don't use house numbers at all, but give buildings names. In Japan streets usually don't have names, but the blocks (banchi) do. In India (at least in Hyderabad) there are street names and numbers, but if you want to go somewhere you need to specify the closest landmark - a temple, a shopping mall or maybe an office building.<br />
<br />
Bangkok is no exception to these exceptions. Landmarks are also popular, but more to give a general idea where things are. Streets in Bangkok follow more the pattern of rivers than the grid pattern of North American cities with the smallest streets meandering until they flow into a bigger street which in turn meanders until it merges into an even bigger street.<br />
<br />
Addresses start with the biggest street which has a name and then count down the number of side streets with odd and even ones on opposite sides of the streets. If the side street has its own side streets, this process is repeated.<br />
<br />
It has its own logic to it, but it is confusing to new comers. You ask your hotel what the address is and they say something like "Soi 3." If you then walk around town for a full day and tell your taxi driver "take me home to Soi 3", they'll look at you confused. The third side street of what?Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-87238143685823153832015-12-03T07:48:00.000-08:002015-12-03T07:48:11.435-08:00Moving to Thailand!Let's start with the news. Tonja and me are moving to Thailand. We don't know yet for how long or exactly where we'll live, but if everything works out with the visa, we should be living in Bangkok from January 2016 on.<br />
<br />
Those who know us a little might suspect this is because of the weather or the food or that we're just ready for something else after 4 good years in Berlin. Those things all count on some level, but the real and immediate reason is <a href="http://www.triposo.com/">Triposo</a>, the startup I've working on since leaving Google.<br />
<br />
When we started with Triposo, we wanted to build the best travel guide for mobile. I think we mostly succeeded and we'll continue improving over the next years, but we also need to proof that we can make real money or to proof that the unit economics work as is popular to say now.<br />
<br />
And while it makes sense to cover the entire world at the same time when you build a algorithmic travel guide (you make something work somewhere and it works everywhere), it is less clear that this is true when it comes to selling services to travellers in the app. We think we need to work closely with local providers in new ways - watch this space for further developments. And that's why I am here.<br />
<br />
My last burst of blogging was when we were living in India, so I wanted to pick this up again now that we're back in the tropics.Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-61368800174486462472013-02-20T03:25:00.002-08:002013-02-20T03:25:44.500-08:00What do you do after a genoicide?
<br />
<div class="p1">
Arriving in Kigali, the capital of Rwanda from most other African countries must be a bit of a reverse culture shock; the city is clean and pretty, the traffic not too busy and well behaved. The shops are well laid out and give a sense of prosperity and the people seem healthy and relaxed. The government though recently giving in to a certain degree of authoritarianism, is still efficient with streaks of visionary mixed in; they banned plastic bags and decided to change the national language from French to English for economic reasons (though certain disagreements with the government in Paris might have pushed them over the edge). All in all it feels more like a nation taking its cue from Singapore than South-Africa.</div>
<div class="p2">
<br /></div>
<div class="p1">
I imagine it is much like Germany must have been in the sixties. It's been about 20 years since Hutu death squads went on a killing spree killing around a million Tutsis and moderate Hutu's in one of the worst genocides of the second half of the twentieth century. Led by Paul Kagame, the current president, the RPF, a Tutsi dominated rebel movement, succeeded in pushing out the genocidistas before the United Nations got their act together.</div>
<div class="p2">
<br /></div>
<div class="p1">
What puzzles me is how they got back to a state of normalcy. The Rwandese genocide didn't happen in relatively remote concentration camps. It wasn't executed by a small group of well armed extremists. It happened everywhere at the same time, with neighbours killing neighbours, sometimes family members killing each other. People trying to find refugee in churches were sometimes turned over to their killers by nuns and priests, sometimes the Interahamwe would just blow up the church.</div>
<div class="p2">
<br /></div>
<div class="p1">
After World War II people in the Netherlands would whisper that somebody had been "wrong in the war" when they suspected collaborators or wonder if a visiting German tourist might have been "a good german". Over time that went away, but it took a good while. More than 40 years after the end of the war, football supporters were still celebrating the rare win over the German team declaring they got their grandfathers bicycle back.</div>
<div class="p2">
<br /></div>
<div class="p1">
In Rwanda they seemed to just have decided to do away with the whole thing. Now there are no more Hutu's or Tutsi, just Rwandese. The events in 1994 were a grim reminder that 80% of humans will turn into mass murderers given the right circumstances. Now Rwanda is showing the world that you can come back from even the worst tragedy imaginable.</div>
Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-82831452897834940132013-02-12T08:51:00.003-08:002013-02-12T08:51:54.950-08:00The Paleo diet is wrong about grains
<br />
<div class="p1">
The Paleo diet insists we should only eat things our forefathers ate back in the stone age; our systems just aren't developed to process modern foods. It's an interesting idea that intuitively makes sense although the objection that it's crazy to get health advice from a group of people that had a life expectance of 32 is hard to overlook.</div>
<div class="p2">
<br /></div>
<div class="p1">
So you're mostly left with a diet of some vegetables and lots of animal protein from meat, fish and eggs. Especially grains are a big no-no. To the untrained eye it appears as yet another low-carb diet with a better back story. I think though that they are wrong about the grains.</div>
<div class="p2">
<br /></div>
<div class="p1">
I'm writing this while being on a trip to East-Africa, the cradle of humanity. And even though you don't see many primitive hominoids on the planes of the Serengeti, you do see baboons. Baboons aren't great apes so not very related to humans, but they do seem to fill a similar niche as early humans did; they're ape-like creatures living in social groups on the savannahs getting by on whatever they find.</div>
<div class="p2">
<br /></div>
<div class="p1">
This time of year the Serengeti looks like a field of grain. The rains make the grasses grow tall and all those grasses are laden with seeds. Those seeds are of course nowhere as big as modern grains but it is still free calories to the baboons. And so a common sight is to see a group of baboons "harvesting" "grains". It just seems very unlikely to me that our ancestors would just let that opportunity go.</div>
Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0tag:blogger.com,1999:blog-7905580174687117905.post-65262406226644762642013-01-21T12:28:00.001-08:002013-01-21T12:28:48.626-08:00How Microsoft can win the Mobile WarsForbes calls it <a href="http://www.escapistmagazine.com/news/view/121610-Forbes-Analyst-Calls-Game-Over-For-Microsoft">game over for Microsoft</a>. That seems harsh, but there is no denying that Microsoft hasn't been doing well in the battle for the smart phones. Each quarter it is the same story; Windows Phone market share drops a little, iOS picks up a bit and Android surges ahead. 5 years ago Steve Ballmer might have believed that <a href="http://www.businessinsider.com/heres-what-steve-ballmer-thought-about-the-iphone-five-years-ago-2012-6">"There’s no chance that the iPhone is going to get any significant market share. No chance."</a> or that <a href="http://www.webpronews.com/ballmer-android-does-not-compute-2008-11">"Google doesn’t exactly bubble to the top of the list of the toughest competitors we’ve got going in mobile."</a> but he probably changed his mind by now.<br />
<br />
So what's a poor CEO of a waning tech power to do? Hope that the shareholders will let you be CEO for a while longer is probably the first thing, but staying the course doesn't seem like it would do the trick at this point. Desperate times call for desperate measures. Here would be my proposal: switch to Android.<br />
<br />
Or rather fork Android. Windows 8 might be quite nice, but Android just has too much momentum at this point and as developer supporting another platform for another let's say optimistically 5% market share just isn't worth it. But if Microsoft comes out with their own version of Android, all apps developed for Googles will just work. Microsofts Android will of course not come with the standard Google Apps, but the Microsoft Android apps aren't too bad and a <a href="http://www.theverge.com/2012/11/7/3612422/microsoft-office-mobile-ipad-iphone-android-screenshots">port of Office</a> seems in the works. As much as we like to talk about the demise of the desktop, Office & Outlook are for most professionals the tools of their trade.<br />
<br />
But the kicker is the patent angle. Microsoft makes 10-15 dollars on each Android phone <a href="http://bgr.com/2012/08/06/microsoft-android-patent-royalties-q2-2012-samsung-htc/">from most manufacturers</a>. They could easily offer Microsofts Android for free. On a 100-200 dollar phone they patent charge makes the difference between profit or loss so this should really move the needle.<br />
<br />
The advantage of this strategy is that Microsoft can take over an existing ecosystem while actively taking away from Google. Both the networks and the handset manufacturers are by now nervous about Googles influence so no doubt they'd welcome the competition especially if it means changing almost nothing; it's still Android.<br />
<br />
And you'd have to like the irony of Microsoft getting back into the game by way of Open Source.<br />
Douwe Osingahttp://www.blogger.com/profile/11629018494276839359noreply@blogger.com0