TensorFlow 2.0, PyToch Dev Conference, DecaNLP, BERT, Annotated Encoder-Decoder, ICLR 2019 reading, fast.ai v1, AllenNLP v0.7, 10 writing tips, AutoML & Maths for ML books, TensorFlow NLP best practices
Hey all,
Welcome to this month's newsletter edition, which includes some cool video content about TensorFlow and PyTorch; in-depth content about encoder-decoders; BERT, probably the hottest encoder at the moment 🔥; ICLR 2019 reading suggestions; fast.ai and AllenNLP news; 10 tips to make you a more productive scientific writer; lots of resources including open-access books on AutoML and maths for ML; TensorFlow best practices for NLP; and many tools, articles, and blog posts.
I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.
If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.
Talks and presentations 🗣
The Natural Language Decathlon: Multitask Learning as Question Answering 🏅 Richard Socher talks about their recently published DecaNLP task. He discusses the limits of single-task learning. NLP requires many types of reasoning (logical, linguistic, emotional, visual, etc.) If you're interested in multi-task learning, then this is a talk to watch.
TensorFlow 2.0 Changes 🏛 Aurélien Géron draws side-by-side comparisons between the upcoming TensorFlow 2.0 and Pytorch. TensorFlow 2.0 will make Eager mode a lot more prominent and will enable seamless switching between Eager and Graph mode. Sharing weights will get a lot easier (and more like Keras) and tf.contrib will get cleaned up. In all, lots to look forward to!
PyTorch Dev Conference Part 1 👩💻 The first PyTorch dev conference featured talks from Andrej Karpathy, AI2's Mark Neumann, fastai's Rachel Thomas, and many others. Another highlight is the Future of AI Software panel with Soumith Chintala, Jeremy Howard, Noah Goodman, and others.
What's in an encoder-decoder? 🤖
The Annotated Encoder Decoder 📝 The encoder-decoder with attention, which goes back to the seminal Sequence-to-sequence learning paper and the subsequent improvement with attention, is a staple of current NLP tasks. Joost Bastings provides an annotated walk-through of the encoder-decoder, similar to the excellent Annotated Transformer. On the topic of Transformers, check out BERT, a bidirectional Transformer language model that achieved state-of-the-art across 11 NLP tasks in the Paper picks section below.
Towards Natural Language Semantic Code Search 💻 Beyond learning representations from text, we can also use an encoder-decoder to learn representations from code by predicting doc strings. GitHub Engineering describes how such a model can be used for semantic code search.
ICLR 2019 impressions
ICLR 2019 submissions are in. The BigGAN paper has phenomenal image generation results. Below you can find some interesting papers related to transfer learning for NLP.
Multi-task learning, domain adaptation, and semi-supervised learning:
DATNet: Dual Adversarial Transfer for Low-resource Named Entity Recognition
Semi-supervised Learning with Multi-Domain Sentiment Word Embeddings
Cross-lingual learning:
Zero-Resource Multilingual Model Transfer: Learning What to Share
Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification
Unsupervised Hyper-alignment for Multilingual Word Embeddings
Empirical observations on the instability of aligning word vector spaces with GANs
Diagnosing Language Inconsistency in Cross-Lingual Word Embeddings
The Missing Ingredient in Zero-Shot Neural Machine Translation
If you are interested in submissions related to other topics, you can use Stephen Merity's search tool.
New fast.ai and AllenNLP libraries
fast.ai v1 The new version of the fast.ai library provides a single interface to the most commonly used deep learning applications for vision, text, tabular data, time series, and collaborative filtering. In addition, fast.ai announced the launch of a new course.
AllenNLP v0.7 The new version of AllenNLP provides a new framework for training state-machine-based models and several examples of using this for semantic parsing as well as a model for neural open information extraction, and a graph-based semantic dependency parser. They've also released new tutorials, which are simply beautiful to look at.
Improving your writing productivity
This article describes ten simple rules for scientists to make you a more productive writer:
Define your writing time
Create a working environment that really works
Write first, edit later
Use triggers to develop a productive writing habit
Be accountable
Seek feedback and ask for what you want
Think about what you’re writing outside of your scheduled writing time
Practice, practice, practice
Manage your self-talk about writing
Reevaluate your writing practice often
Tools and implementations ⚒
jiant sentence representation learning toolkit 🔨 This toolkit was created at the 2018 JSALT Workshop by the General-Purpose Sentence Representation Learning team and can be used to run experiments that involve multitask and transfer learning across sentence-level NLP tasks.
What-If Tool 🔦 This tool enables the inspection of an ML model inside Tensorboard. It allows us to visualize results and to explore the effects of single features and counterfactual examples (i.e. the most similar example with a different prediction). It's most suitable for analyzing algorithmic fairness.
Resources 📚
Good practices in Modern Tensorflow for NLP 🏋️ This notebook contains many best practices for doing NLP with TensorFlow, such as using feeding and transforming data with tf.data, preprocessing, and model serving.
AutoML book 🤖 Automated machine learning (AutoML) encompasses much more than just Google's architecture search efforts. The open-source chapters from this book from one of the top AutoML groups will give you an overview of automatic hyperparameter optimization, meta learning, neural architecture search, as well as individual AutoML systems.
Maths for ML book 📋 This book aims to provide the necessary mathematical skills to read more advanced ML books and does so in a succinct and accessible manner.
How to visualize decision trees 🌳 This article is a master class in how decision trees can be visualized. In addition, it provides insights into the design of a visualization library.
Counterfactual Regret Minimization ♣️ This article gives an in-depth overview of counterfactual regret minimization, which lies at the heart of DeepStack and Libratus who both recently defeated pros in Heads Up No-Limit Texas Hold’em.
Articles and blog posts 📰
How AI technology can tame the scientific literature 👩🔬 This article gives an overview of the current landscape of tools that allow information extraction from scientific literature and explores how AI can be used to automatically generate and validate hypotheses.
Here's What You Need To Know About 'Artie's Adventure,' The VR/AI Experience Google Just Announced 🐶 This article explores how AI can bring deeper emotional engagement to virtual experiences by being used to power characters and allowing the creators to focus on emotions.
Welcome to Voldemorting, the Ultimate SEO Dis🕴 This is a beautiful Wired article about the recent online practice of voldemorting, i.e. replacing a name with euphemisms or synonyms to deprive someone of attention online, and how it is used in the online world. If you like puns and wordplay, this article is for you. Also: Who wants to create a voldemorting dataset/generator?
Career advice for recent Computer Science graduates 👩💻 Chip Huyen gives an overview of the pros and cons between choosing a PhD, working for a startup, and working for a big company and describes the factors that influenced her personal choice.
Publishing Negative Results in Machine Learning is like Proving Dragons don’t Exist 🐉 This short article describes why publishing negative results is hard and when they are actually publishable.
Machine Translation. From the cold war to Deep Learning ❄️ This article guides us through the history of machine translation, from its beginnings during the cold war, to statistical and phrase-based MT, to the current Deep Learning-based systems.
Paper picks
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding This paper shows that we still have not reached the ceiling with regard to language model pretraining. In particular, using a more expressive encoder (a bidirectional Transformer rather than a unidirectional one) and a deeper model (24 layers) achieve large gains. It is a striking example of what can be achieved with a well-executed pretrained language model. Among other results, the model achieves large improvements on SQuAD and super-human performance on SWAG, a benchmark for commonsense inference that was introduced just a couple of months ago. Have a look here for some comments from the author on Reddit.
Multi-Task Learning as Multi-Objective Optimization (NIPS 2018) This paper casts multi-task learning as multi-objective optimization with the overall objective to find a Pareto optimal solution. Existing algorithms from the gradient-based multi-objective optimization literature scale poorly with the dimensionality of gradients and the number of tasks. Instead, the authors propose an upper bound on the loss, which can be optimized efficiently and prove that it yields a Pareto optimal solution under realistic assumptions. They evaluate on digit classification, scene understanding, and multi-label classification. Overall, this is a nice paper that brings a new principled perspective to the current multi-task learning landscape.
Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model This paper proposes a Bayesian inspired language model. The model first generates an embedding for each lexeme (token) using a language model; it then generates a spelling using a second character-based language model based on that embedding. This approaches is motivated by the duality of patterning, i.e. that the form of a word is separate from its usage. They deal with the open vocabulary by predicting UNK as another lexeme, conditioning the spelling on the hidden state of the model.