BERT, Transfer learning for dialogue, Deep Learning SOTA 2019, Gaussian Processes, VI, NLP lesson curricula, fast.ai lessons, AlphaStar, How to manage research teams

Feb 11, 2019

Hi all,

I hope you've had a good start into 2019! This newsletter covers a ton of material: slides from the creator of BERT and on using transfer learning for dialogue; an MIT lecture on the Deep Learning state of the art as of 2019, Gaussian Processes, and VI from authorities in each area; NLP lesson curricula from CMU, Stanford, and Berkeley and new lessons from fast.ai to kick-start your learning in the new year; an overview of some of the cool things people have been doing with BERT; a discussion of DeepMind's AlphaStar; key takeaways on how to manage research teams; new resources containing the state of the art, lecture slides, and a guide on how to debug your neural network; a long list of tools and exciting new ML packages; loads of articles and blog posts; and exciting new research papers.

A new section 🤓 This newsletter contains a new section, Meaningful Measures, inspired by FiveThirtyEight's Significant Digits. In this section, I'll list numbers related to ML and NLP that stuck out to me in the last month.

Contributions 💪 If you have written or have come across something that would be relevant to the community, hit reply on the issue so that it can be shared more widely.

I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.

If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.

Meaningful Measures 🔢

$1,000 Estimated compensation per review per paper if peer review was compensated. From Avoiding a Tragedy of the Commons in the Peer Review Process

89.25 kbits Online codelength of a state-of-the-art NLP model (BERT) on the MNLI dataset, which measures how well the model is able to compress the data (lower is better). For reference, a uniform encoding model is about 7x less efficient:

(# of examples) x log_2 (# of labels), which is 400,000 x log_2(3) = 635 kbits

From Learning and Evaluating General Linguistic Intelligence. For more information, check out the Paper picks section below.

Slides 🖼

If Not Notebooks, Then What? 📓 AI2's Joel Grus discusses whether notebooks are good for reproducibility in this AAAI 2019 workshop talk. TL;DR: They are not.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 🐵 Jacob Devlin talks about BERT at the Stanford NLP seminar. Includes new results such as the effect of the masking strategy, using synthetic training data, plus some additional insights.

Transfer Learning for Natural Language Generation 🗣 Thomas Wolf compares and contrasts two winning approaches of the The Conversational Intelligence Challenge 2 (ConvAI2) at NeurIPS 2018. Both employ transfer learning with the pretrained OpenAI Transformer but differ in their architectural modifications and objectives for adaptation.

What is Linguistics? 👨‍🏫 Nathan Schneider discusses important concepts in linguistics such as productivity (with the 'wug test') and areas of study in this introductory lecture.

Talks 🗣

Deep Learning: State of the Art (2019) 🏆 Lex Fridman gives a nice overview of exciting recent advances in this MIT lecture. Some highlights (clicking on them will take you to the corresponding position in the video):

"2018, in terms of Deep Learning, is the year of Natural Language Processing."

Segmentation annotation with Polygon-RNN++

"The future depends on some graduate student who is deeply suspicious of everything I have said." – Geoff Hinton

Gaussian Processes 👨‍🏫 Neil Lawrence, one of the authorities on Gaussian Processes, gives an introduction to them at the Machine Learning Summer School 2019 in Stellenbosch, South Africa. Additional notes for the talk and slides can be found here.

Variational Inference: Foundations and Innovations 👨‍💻 At the same summer school, David Blei discusses the ins and outs of variational inference.

Lessons to kick-start the new year 👩‍🎓

CMU: Neural Networks for NLP 👩‍🔬 This updated lecture by Graham Neubig covers much to get you up to speed (and fill any knowledge gaps) on using neural networks in NLP. From structured prediction, to RL, latent random variable models, dialogue, and machine reading, it is hard to find any missing content.

Stanford: Natural Language Processing with Deep Learning 🏛 Arguably the gold standard for cutting-edge NLP courses, the updated Stanford course now also covers more recent topics such as contextual representations, Transformers, and multi-task learning.

Berkeley: Applied Natural Language Processing 👩‍🔧 This course focuses on using existing NLP methods and libraries in Python (including scikit-learn, keras, gensim, and spacy) and applying and extending them to textual problems. Notebooks with exercises such as creating a tokenizer and comparing corpora can be found here.

fast.ai: Practical Deep Learning for Coders 👩‍💻 The 2019 edition of the fast.ai course includes 100% new material including applications that have never been covered by an introductory deep learning course before.

BERT things 🐵

If you're not familiar with BERT, check out the second slide deck above. Despite having been released only (already?) 4 months ago, BERT has taken the NLP world by storm. Here is a non-exhaustive list of cool things people have done recently with BERT:

AlphaStar 🌟

DeepMind's AI AlphaStar became the first AI to beat professional players in StarCraft II. In their blog post, DeepMind claims that "AlphaStar’s success [...] was in fact due to superior macro and micro-strategic decision-making, rather than superior click-rate, faster reaction times, or the raw interface."

DeepMind restricted the number of Actions Per Minute (APM) the agent could make over 5-second, 15-second, and 30-second intervals. As redditors during the AMA pointed out, this can still be exploited by the agent to result in superhuman bursts of APM of 1,500 for certain time periods (the fastest human player reaches 500 APM). Experienced players analyzing the games have similarly concluded that AlphaStar's superhuman micro-management of units mostly led to the victories. Check out this excellent post for a deeper analysis and possible reasons. Ars Technica also wrote a nuanced post about the matches.

AlphaStar mostly builds on existing techniques, likely with domain-specific modifications. From the post: "The neural network architecture applies a transformer torso to the units (similar to relational deep reinforcement learning), combined with a deep LSTM core, an auto-regressive policy head with a pointer network, and a centralised value baseline." The agent is initially trained with supervised learning from human games and then, similarly to AlphaGo, with self-play, this time in a league of diverse agents using population-based training.

Most of these techniques are already often used in NLP, such as the Transformer, LSTM, and pointer network, while others rely on reinforcement learning, which has seen mostly limited success in NLP. It is thus not immediately clear if these advances that rely on self-play and population-based training will bring improvements in a regime where sampling new training data is much more expensive.

Managing Research Teams 👩‍💻👨‍🔬👩‍🔧

IVincent Vanhoucke, principal scientist at Google, discusses challenges in managing research teams in a post split into two parts, Part I and Part II. He stresses a few (succinctly worded) aspects of leadership (my own annotations are in brackets):

Be an adamantium-plated shit umbrella (that deflects burdens from your team).
Shoot down shiny objects (that are distractions) on sight.
Kill (bad ideas, bad collaborations) what doesn’t work.
Become a patient, caring counselor.
Build the narrative (of your team's mission).
Elicit milestones.
De-risk all the things.
Provide cover for new ideas.
Be ready to turn on a dime (when a breakthrough idea comes around).
Lay bare all biases (related to titles, seniority, etc.).
Fight the flag planters (who lay claim to an idea without executing on it).
Identify and cultivate the green thumbs (great experimentalists who can get things to work).
Celebrate in proportion to impact.
Make unknowns and failures part of normalcy.

Resources 📚

Papers with Code with SOTA 📖 950+ ML tasks, 500+ evaluation tables (including SOTA results) and 8500+ papers with code. Probably the largest collection of NLP tasks I've seen including 140+ tasks and 100 datasets.

LectureBank 🤓 A manually-collected dataset of lecture slides from 60 courses covering 5 different domains, including Natural Language Processing, Machine Learning, Artificial Intelligence, Deep Learning and Information Retrieval. Each slide file is annotated with topics and prerequisites. The dataset is further described in the AAAI 2019 paper.

Troubleshooting Deep Neural Networks: A Field Guide to Fixing Your Model 👩‍🔧 An extensive guide on how to debug your neural network by Josh Tobin.

Tools ⚒

Magnitude ↗ A Python library for using vector embeddings in machine learning models, particularly in NLP. It is primarily intended as a simpler and faster alternative to Gensim. It offers features like out-of-vocabulary lookups and streaming of large models over HTTP. For more information, see the EMNLP 2018 paper.

StanfordNLP 🔧 A Python NLP library by Stanford. It allows calling the CoreNLP Java package and inherits additional functionality such as constituency parsing, coreference resolution, and linguistic pattern matching. Models are built on top of PyTorch. It includes pretrained models in 53 languages from 73 treebanks.

Xfer ⛏ A library that enables transfer learning for deep neural networks implemented in MXNet. More information can be found in this blog post.

XLM ⚙️ The original PyTorch of Cross-lingual Language Model Pretraining. It provides a cross-lingual implementation of BERT, with state-of-the-art results on XNLI and unsupervised MT.

Hanabi learning environment 🃏 A new research platform for the game of Hanabi as a new frontier for AI research. It provides an RL environment using an API similar to OpenAI Gym.

Articles and blog posts 📰

Industry

Facebook and the Technical University of Munich Announce New Independent TUM Institute for Ethics in Artificial Intelligence 🏫 The institute will be supported by an initial funding grant of $7.5 million over five years.

Live Transcribe ✍️ Google releases Live Transcribe, a free Android service that automatically transcribes conversations in real-time. It supports over 70 languages and more than 80% of the world's population.

Amazon Is Pushing Facial Technology That a Study Says Could Be Biased 👩 New studies show that Amazon's systems are more biased against female and darker-skinned faces compared to similar services from IBM and Microsoft.

Articles

We analyzed 16,625 papers to figure out where AI is headed next 📉 An MIT Tech Review analysis suggests that the era of deep learning may come to an end.

How to make algorithms fair when you don't know what they're doing 👾 A Wired article on Sandra Wachter's work on using counterfactual explanations to reveal how algorithms come to their decisions.

Your Next Operating System Will Look Like You, Make You Laugh and Remember That You Hate Cilantro 🤖 An Entrepreneur article that looks at the future of operating systems. The article argues that as NLP improves, it will become commoditized and that large companies "will compete on personality, on character, on the "soul" of their (OS) character".

Job loss due to AI — How bad is it going to be? 👩‍💼 An in-depth article by SkyNet Today that argues that the impact of AI on jobs in the near future will not be significantly more disruptive than the impact of automation in the past.

This year’s Super Bowl commercials showed us that tech isn’t that great 🏈 A collection of AI-related Super Bowl ads that range from cringe-worthy to unsettling.

Blog posts

You don't know JAX 👩‍🎓 A brief tutorial covering the basics of JAX, the Python library augmenting numpy with things like autograd, by Colin Raffel.

Evaluating Text Output in NLP: BLEU at your own risk 👩‍💻 Rachael Tatman discusses in-depth the problems with BLEU, the most common automatic metric used for evaluating machine translation systems. She also points to peer reviewed papers and other resources and highlights possible alternatives.

How to teach Git 👩‍🏫 Rachel M. Carmena explains the basics of git with some intuitive drawings that illustrate what goes on in git under the hood. These illustrations should help facilitate teaching git to people working with version control for the first time.

Think your Data Different 📊 A brief overview of when graph embeddings such as node2vec are more helpful than word embedding methods such as word2vec by Zohar Komarovsky and Yoel Zeldes.

The best of GAN papers in the year 2018 part 2 🎨 The second post by Damian Bogunowicz highlighting three more GAN models of 2018: BigGAN, relativistic GAN, and ESRGAN.

Deep Multi-Task Learning – 3 Lessons Learned ⚒ Zohar Komarovsky shares three lessons learned when using multi-task learning with deep neural networks related to: 1) combining losses; 2) tuning learning rates; and 3) using estimates as features.

Generalized Language Models 🤖 Lilian Weng writes about the recent generation of language model-based methods in NLP.

Five Things That Scare Me About AI 😨 Rachel Thomas writes about 5 things that scare her about AI:

Paper picks

Learning and Evaluating General Linguistic Intelligence (arXiv 2019) This paper defines evaluates current methods on their capabilities of general linguistic intelligence, which the authors define as the ability to reuse previously acquired knowledge about language to adapt to new tasks quickly. They propose a new evaluation metric based on online encoding of data that quantifies how quickly an existing agent learns a new task and show that even state-of-the-art models still require a lot of in-domain training examples, are prone to catastrophic forgetting, and overfit to biases of datasets. Overall, this paper is a valuable reminder that even though there has been a lot of progress in the field, we are still some distance away from general linguistically intelligent models.

Natural Questions: a Benchmark for Question Answering Research (TACL 2019) This paper present a new QA dataset of 300,000+ naturally occurring questions posed to Google search. The annotators were presented with a Wikipedia page from the top 5 search results; each annotator provided a long answer (a paragraph) and a short answer (one or more entities) or null if no long/short answer is available. ~300,000 examples have single annotations; ~8k examples with 5-way annotations are used as dev data; ~8k examples with 5-way annotations are used as test data. The authors analyze 25-way annotations on 302 examples to estimate the variability of the data. This looks to be a very challenging QA dataset, which, in contrast to many widely used datasets, consists of naturally occurring questions that were asked without any specific paragraph or document in mind. It is thus less likely to contain biases that can be exploited by current models.

code2seq: Generating Sequences from Structured Representations of Code (ICLR 2019) This ML-on-code paper proposes code2seq, a method that represents a code snippets as the set of compositional paths in its abstract syntax tree; at decoding time, uses attention to select the relevant paths. A website is available that demonstrates the capabilities of the model. This is a cool example of what can be achieved with a more domain-specific representation.

NLP News