Learning Meaning in NLP, Recurrence in NNs, Q&A with Yoshua Bengio, ML == pseudo science?, ImageNet < 18mins, Where's Waldo?, Frontiers of NLP
Hi all,
This is edition #️⃣3️⃣0️⃣ of this newsletter! This one discusses some super interesting topics:
How can we learn meaning in NLP?
Do we need recurrence in neural networks?
How to build a research lab (and do research) according to Yoshua Bengio?
Is ML a pseudo science?
How can we make ML more accessible?
As always, there are also some cool videos, implementations, blog posts, and research papers. Enjoy with the beverage of your choice! ☕️🍶🍸🍹
I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.
If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.
Learning Meaning in NLP 🗣🤔
Last week, there was a nice discussion on Twitter about learning meaning in NLP. More specifically, the discussion focused on whether it is possible for a model that is only trained on raw text such as a language model to learn the meaning of a sentence. Similar to Matt Gardner, I'd argue that language modeling gives a non-zero signal for learning meaning, but needs to be augmented with more explicit inductive biases to capture a more comprehensive notion of meaning. Thomas Wolf wrote up an excellent overview of the discussion.
If you want an overview of current state-of-the-art language models, check out this AI Journal video about ULMFiT. The folks at feedly have also done a great job of showing how easy it is to apply these methods in this post, where they match the performance of fastText trained on 4 million Amazon reviews using ULMFiT with only 1,000 labeled samples.
While learning the meaning of concepts benefits from grounding, logical words such as "if", "and", or even "no" don't have a referent; there's nothing you can point to in the world that is an "if" or an "and". This blog post gives an overview of the different schools of thought on how children learn such logical words, from "logical nativism" (innate logical concepts that enable the acquisition of such words) to probabilistic induction; it also sketches a new acquisition theory, social bootstrapping and argues that children map these logical words to speech acts with specifically social functions. Understanding better the psycholinguistic aspects of how children acquire language may help us ultimately design better computational models.
Do we need recurrence in neural networks? 🔃
There've been some recent discussions, mostly evoked by this paper (and this accompanying blog post), if we actually need RNNs or if we can just replace every RNN with a feed-forward neural network. While the observation that RNNs in certain settings can be replaced with feed-forward NNs (the Transformer, for instance, is just a feed-forward model) is nothing new, the paper provides some nice theoretical results for a "stable" RNN (roughly, one without exploding gradients) for which such a replacement is possible. Yoav Goldberg shares his thoughts on the results here.
However, the only reason why it is possible to replace RNNs with feed-forward neural networks is that our current RNNs still suck at modeling long-term dependencies. This is emphasized in the paper by highlighting the sensitivity to vanishing gradients; Chris Dyer made this point in his workshop talk at ACL 2018 emphasizing that RNNs are biased towards sequential recency; in this ICLR 2017 paper, the authors show that a LM that only uses the last 5 words is on par with state-of-the-art models; among others.
A counterpoint to this are results from OpenAI Five, which show that large LSTM models can perform successfully in problems with large time horizons. The main takeaway, perhaps, is that in order to create models that can better model long-term dependencies, we need to evaluate our models on tasks that explicitly measure this capability, compared to tasks like language modeling or sentiment analysis, which only require this implicitly. A good example of such a more explicit task is modeling subject-verb agreement.
How to build a research lab 🏛
Yoshua Bengio is probably known to anyone interested in ML. I found the replies Yoshua gave in this Q&A with CIFAR's Graham Taylor extremely lucid, insightful, and full of self-reflection. I think the answers are useful not only for junior faculty starting to build their own lab, but for anyone starting with research or considering of going into research. Highlights of the conversation for me include:
One thing I would’ve done differently is not disperse myself in different directions, going for the idea of the day and forgetting about longer term challenges.
First, it’s not just doing the research, it’s making it known. Going to workshops and conferences, visiting other labs. You don’t have to wait to be invited.
Listen to your gut. Many people lack the self-confidence necessary for that and they miss opportunities.
Is Machine Learning a pseudo science? 🔮
With ML's focus on empirical results and recent prominent voices likening ML to alchemy, there's an ongoing debate about what is the right direction for the field. Often, provocative questions bring out the best discussions and so did this question on Quora: A first affirmative response by Sridhar Mahadevan was subsequently strongly rebuked by both Leonid Boytsov and Slater Victoroff. Do read all the responses for different perspectives on the nature of ML as a field.
Making ML more accessible 🌍
I think one of the most important directions we can take as a community is to make ML more accessible and to empower more people to use ML. Fast.ai is doing great work in this direction. Recently, a team of fast.ai and DIU researchers managed to train ImageNet models to 93% accuracy in just 18 minutes, using 16 public AWD cloud instances, each with 8 NVIDIA V100 GPUs, for $40 in total (see this blog post and reporting from MIT Tech Review).
One of my favourite ML use cases is this Japanese farmer sorting cucumbers with an Arduino, a Raspberry Pi, and Tensorflow. A Raspberry Pi and an ML library is really all you need to create a system that can for instance solve "Where's Waldo?" (check out the first video to see a creepy plastic hand pointing at Waldo).
The Deep Learning Indaba (and its satellite IndabaX events) is another great initiative to empower the ML community in Africa. We're organizing an NLP session at the event, which should have some interesting panel discussions (see below). If you have some big questions that you'd like to have asked, you can reply to this email and we'll do our best to incorporate them. We're also aiming to share slides and transcripts of the discussions after the event (not sure about video yet).
Videos
ICML 2018 videos 📺 In the last newsletter, I said I wasn't able to find the videos of the ICML presentations. Dhananjay Tomar thankfully pointed them out to me. Enjoy!
Robotics conference video tour 🤖 Skynet Today's Andrey Kurenkov recorded his attendance of the Robotics Science and Systems Conference. I would love to see more tours like this of other conferences and events.
Resources and overviews
Model-based ML book 📖 An early access version of John Winn and Christopher Bishop's new ML book. The first chapter "introduces all the essential concepts of model-based machine learning, in the course of solving a murder." What better way to get started with learning ML?
Model scheduling ⏰ Naomi Saphra gives a comprehensive overview of different approaches to dynamically modify a model's configuration during training.
Implementations
Cross-framework Neural POS tagger comparison 🏛 Jonathan Kummerfeld demonstrates how a state-of-the-art POS tagger can be implemented in DyNet, PyTorch, and Tensorflow and highlights the similarities and differences between the frameworks.
Unsupervised MT 🇬🇧➡️🇩🇪 The open-source implementation of Facebook's EMNLP 2018 paper Phrase-Based & Neural Unsupervised Machine Translation.
Cool posts and articles
Everything is Dijkstra 🛣 Eric Jang shows that currency arbitrage (in finance), Q-learning (in RL), and path tracing (in computer graphics) can all be reduced to the classic Djikstra's shortest path algorithm.
Deep Learning in NLP 🤖 Vered Shwartz discusses what Deep Learning has improved and remaining challenges for Deep Learning in NLP.
The man behind Google's AutoML 👨🔬 This article portrays Quoc Le, co-founder of Google Brain and co-creator of breakthroughs such as sequence-to-sequence learning and neural architecture search.
Text-based adventures 🕹 This article discusses Microsoft’s TextWorld, "the OpenAI Gym of Language Learning Agents".
AI and improv🕴 An article by the New York Times on how Piotr Mirowski and Kory Mathewson use a chatbot for improvisation.
Paper picks
Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings (ACL 2018) Bohnet et al. propose a new model for POS tagging and morphological tagging that contextualizes both character and word representations. Their approach trains both a BiLSTM over characters and a BiLSTM over words with separate objective functions to predict the tags of the words. The hidden representations of both models are fed into a meta-BiLSTM, which processes them for prediction. The models are trained jointly using multi-task learning. Interestingly, performance degrades if the meta-BiLSTM is allowed to back-propagate into the word and character-based sub-networks.
Multi-Task Learning for Sequence Tagging: An Empirical Study (COLING 2018) This study surveys multi-task learning for 11 sequence tagging tasks. The authors find that for about half of the tasks, jointly learning all 11 tasks improves upon both single task and pairwise models.
Rapid Adaptation of Neural Machine Translation to New Languages (EMNLP 2018) Neubig and Hu start out by training a massively multilingual model on 58 languages, which is then fine-tuned on low-resource language data. In order to reduce the risk of overfitting during fine-tuning, they propose to additionally train on data from a related language.