Marvel, Stanford & CMU NLP Playlists, Voynich, Bitter Lesson Vol. 2, ICLR 2019, Dialogue Demos
There should be something for everyone in this newsletter. NLP and superheroes team up. We have some superb playlists of the Stanford and CMU NLP courses. We discuss the potential solving of the Voynich manuscript and look at a response to the Bitter Lesson from last month. Finally, there are some summaries (and posters!) from ICLR 2019, some cool dialogue demos, and a selection of some really high-quality blog posts.
This year, we're organizing the first European NLP Summit (EurNLP). Submit an abstract of your ongoing or published work and join us on October 11 in London!
Contributions 💪 If you have written or have come across something that would be relevant to the community, hit reply on the issue so that it can be shared more widely.
I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.
Deep Generative Models for Graphs: Methods & Applications 🔁 Stanford's Jure Leskovec discusses state-of-the-art graph-based neural networks.
Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling 🌁 An ICLR 2019 talk by Jacob Menick and Nal Kalchbrenner on techniques for generating high-quality images with pixel networks.
Stanford NLP Course Playlist 🏛 This Youtube playlist containing all recordings from the current iterations of Stanford's CS224N NLP course by Chris Manning and Abigail See is a must-watch for everyone getting into (or catching up with) NLP.
CMU Neural Nets for NLP Playlist 🤖 CMU's Graham Neubig covers potentially even a wider range of topics including dialog models, knowledge graphs, machine reading, search algorithms, and much more in his NLP course.
Full Stack Deep Learning videos 🥞 Full Stack Deep Learning aims to teach the full stack of getting to production-ready Deep Learning models. Lectures cover troubleshooting, testing and deployment, research directions, and guest lectures are given by Jeremy Howard and Richard Socher.
In the spirit of Marvel's Avengers: Endgame, let's geek out and get heroic with NLP!
Reliving Avengers: Infinity War with spaCy and Natural Language Processing 🕴 This blog post is a nice demonstration of how to analyze a movie script with NLP to reveal, for instance, what each character holds dear.
Linguistics students create language for ‘Captain Marvel’ ✨ Knowledge of linguistics not only helps you to better understand languages, but can also enable you to dream up artificial languages. This article describes how two linguistics students created the language used in Captain Marvel.
Voynich solved? 📖
On the topic of an artificial language, the Voynich manuscript is a manuscript from the early 15th century written in an unknown writing system that has even stumped Alan Turing and is one of the most famous unsolved cases in the history of cryptography. This month, it was claimed to be solved by Dr Gerard Cheshire from the University of Bristol. He claims that the manuscript is written in "proto-Romance", a language that predates today's Romance languages. There have been numerous theories that claimed to have translated the manuscript over the last 100 years and none have been independently verified. This last one, unfortunately, seems to be another failed attempt in this line.
What is interesting about the supposed discovery, though, is that none of it is due to modern cryptography tools employing statistics or ML, but that it only employs classic linguistics and domain knowledge. If you're interested in cracking codes with NLP, check out some of Kevin Knight's work, in particular his ACL 2013 tutorial on decipherment.
The Bitter Lesson Vol. 2 ⏎
Remember Rich Sutton's Bitter Lesson that we discussed in the last edition of this newsletter? UvA's Max Welling provides a rebuttal.
In Do we still need models or just more data and compute?, Max Welling emphasizes the role of data and the bias-variance trade-off in ML: when you have enough data, you do not need much inductive bias; however, with sparse data, such bias will be useful. In particular, he stresses that generative models are better able to generalize to new domains. In many cases, a middle ground between using a generative model and a discriminative model is necessary.
Wasserstein GAN 🤺 This Depth First Learning curriculum by James Allingham will become your go-to resource for learning and understanding WGAN-GP and related GAN methods. With lots of reading and self-study material, this tutorial teaches you everything you need to know about one of the state-of-the-art GAN models.
Transfer NLP 🔧 A modular PyTorch library for transfer learning in NLP by the Feedly team. Have a look at this Colab notebook for examples on how to predict the category of a news article, among other things.
ICLR 2019 things 🏨
I didn't make it to ICLR 2019, but here are some great write-ups from people who did:
Top 8 trends from ICLR 2019 📈 Huyen Chip discusses inclusivity, unsupervised and transfer learning, retro ML, flagging RNNs, GANs, the lack of biologically inspired DL, the popularity of RL, and the forgettability of papers.
ICLR 2019 Notes 🗒 Another edition of David Abel's extensive (56 pages with references!) conference highlights, which focuses on keynotes, the Workshop on Structure & Priors in RL, and a selection of contributed talks.
Lastly, here are all the posters from ICLR 2019.
Dialogue demos 🔉
There've been some cool dialogue demos this month!
Talk to Transformer 🗣 This demo by Adam King lets you talk to the newest GPT-2 (345M parameters) model and is based on HuggingFace's PyTorch implementation of GPT-2.
ConvAI Demo 🤖 This demo by HuggingFace lets you train a bot similar to one that won the ConvAI competition at NeurIPS 2018. See below for the blog post on how to train one yourself.
Articles and blog posts 📰
Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision 🤑 A nice step-by-step guide on how to build a classifier that is able to identify anti-semitic tweets and leverages state-of-the-art techniques such as ULMFiT and Snorkel.
How to Automate Tasks on GitHub With Machine Learning for Fun and Profit 👩💻 A tutorial on how to build a GitHub app that predicts and applies issue labels using Tensorflow and public datasets. In particular, it leverages the little-known GH Archive, which records the public GitHub timeline and makes it easily accessible for further analysis.
An implementation guide to Word2Vec using NumPy and Google Sheets ⚙️ Pretrained language models are all the rage these days—but it is still educational to go back and implement earlier algorithms, such as word2vec. This is a nice guide that helps you implement word2vec line-by-line in numpy.
Now is the Time for Reinforcement Learning on Real Robots 🤖 Alex Kendall argues—contrary to posts such as Deep Reinforcement Learning Doesn't Work Yet—that robotics will soon have its ImageNet moment, i.e. that end-to-end machine learning will enable many real-world applications.
A Recipe for Training Neural Networks 👩🍳 A new blog post by Andrej Karpathy that gives a plethora of tips for successfully training neural networks. Even if you're training neural networks professionally, you're bound to discover something new in this wisdom-filled selection of useful guidelines.
Can you compare perplexity across different segmentations? 👯♂️ Sebastian Mielke shows in this blog post how to compare perplexity values across models using different tokenization, such as subwords and words.
Advice for Better Blog Posts ✍️ Writing blog posts is a great investment, but getting started can be challenging. Rachel Thomas gives some great pieces of advice on how to write better blog posts. In particular, she calls for more people to write blog posts that accompany their papers. I whole-heartedly agree. One aspect to emphasize is that a blog post that accompanies a paper should not just be a summary—it should provide additional context, intuition, examples, and ideally a discussion of applications and future directions, all things that do not usually fit in the paper. One recent blog post that does this nicely is the following:
Universal Transformers 🌍 Mostafa Dehghani provides additional context regarding the short-comings of Transformers and relates their Universal Transformer to other models.
Deep learning generalizes because the parameter-function map is biased towards simple functions 📉 Good research should be interactive and make ideas come alive. The presentation of this paper comes close: By framing it as a dialogue between the reader and the author, the potentially dry material on the optimization and generalization ability of neural networks is made much more engaging.
How to build a State-of-the-Art Conversational AI with Transfer Learning 🦄In this fantastic write-up, HuggingFace's Thomas Wolf shows how to reproduce their winner of the NeurIPS 2018 dialogue competition in less than 250 lines of code and train a model on a cloud instance for less than $20. You can interact with the trained model in the demo above.
Domain Randomization for Sim2Real Transfer 🎲 I am excited every time Lilian Weng publishes a new blog post. This one, which discusses a technique to close the simulation-to-real world gap in robotics, does not disappoint.
ConceptNet 5.7 released 📚 ConceptNet is a resource for integrating common sense information into models. Recently, it has been used by the 1st and 4th best system at the story understanding task at SemEval 2018, by the 2nd best system at the SemEval 2018 task on recognizing differences in attributes, and helped achieve a new state-of-the-art in the Story Cloze Test in November 2018. The new version covers new Japanese knowledge, word senses, and a new database.
Paper picks 📄
Unsupervised Data Augmentation (arXiv) This paper proposes a semi-supervised data augmentation method that enforces consistency between the prediction on the unlabeled and the augmented unlabeled example. This is similar to earlier work discussed in this blog post. The main difference is that instead of using noise to augment the example, it uses other data augmentation techniques, such as backtranslation or AutoAugment for images. This works surprisingly well on both natural language and vision tasks.
Inoculation by Fine-Tuning: A Method for Analyzing Challenge Datasets (NAACL) This paper proposes a more nuanced take on the recently popular practice of evaluating on challenge datasets. Instead of using the dataset just for evaluation, the model is fine-tuned on a few challenge examples. The authors show that after slight exposure, some of the datasets are no longer challenging, while others remain difficult. Such datasets highlight a weakness on the model side and should thus be focused on for analyzing models' capabilities.
On Difficulties of Cross-Lingual Transfer with Order Differences: A Case Study on Dependency Parsing (NAACL) This paper studies an issue that is often neglected in cross-lingual learning, the role of order in the inductive bias of the model. When transferring from English to similar languages, as is often the case, this is not an issue as sentence order is very similar. This might be a challenge, however, if we transfer to more distant languages. The authors study transferring dependency parsing models from English to 30 other languages. They find that the order-sensitive RNNs transfer well to languages that are similar to English, while order-agnostic self-attention-based methods work better overall and transfer better to distant languages. This is another lesson that, on the whole, it is beneficial to consider a holistic set of languages when evaluating your models.