Challenges in Few-shot learning; 2019 predictions; JAX; Explainable models; MT reading list; Foundations of ML; AI Index 2018; Karen Sparck Jones; Analysis methods survey; ICLR 2019 rejects

Jan 14, 2019

Happy 2019 everyone 🎉! May your losses decrease consistently, your models generalize to out-of-domain data, and your annotators have high agreement!

This newsletter covers slides on challenges in few-shot learning; predictions for 2019; JAX: autograd + XLA for numpy; lots of resources such as a guide to make models explainable, an MT reading list, a curriculum on foundations of ML, and key take-aways from the AI Index 2018; interviews and articles about pioneers in ML and NLP such as Karen Sparck Jones and Demis Hassabis; lots of interesting articles; papers on analysis methods and satire and exciting papers rejected from ICLR 2019.

I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.

If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.

Slides 🖼

Thoughts on progress made and challenges ahead in few-shot learning 🤖Hugo Larochelle shares his thoughts in this slide deck. The main takeaways:

If you don’t evaluate on never-seen problems/datasets, it’s not meta-learning!
The meta-training distribution of episodes can make a big difference (at least for current methods).
Using "regular training" as initialization makes a big difference.
MAML (a popular meta-learning method) needs to be adjusted to be more robust.

2019 🎆

The state of the art in NLP is currently dominated by models employing ELMo, ULMFiT, BERT or similar representations. In 2019, we should see additional boosts in 2019 as we a) figure out how to use these representations more effectively and b) develop representations capturing complementary information. Some first glances of this (though no details available yet):

A model by Tencent AI Lab employing BERT and other methods tops AI'2 challenging ARC dataset leaderboard.
The inevitable Bigbird further improves the state of the art on GLUE.

However, I sincerely hope that 2019 will not only be about slightly improving numbers on a leaderboard. I'm looking forward to more advances in understanding models, algorithmic innovations, and theoretical insights. If you're interested in what else I'm excited about, you can read here about my favourite 10 ideas of 2018 (that we'll likely see more of in 2019) and listen to my highlights in this TWiML&AI podcast.

Tools and libraries 🛠

JAX: Autograd and XLA ⚙️ Numpy is nice for prototyping operations but for developing full-fledged models, you still need to use a mature DL library. Right? JAX is poised to change this: It enables you to use automatic differentiation with numpy and allows you to run your numpy programs on GPU. If you're most comfortable with numpy, this is worth checking out.

Resources 📚

A Guide for Making Black Box Models Explainable 👩‍🏫 This book by Christoph Molnar provides ML practitioners with the tools to make their models interpretable. It is available completely in the browser, which makes reading a breeze. It focuses less on NLP and computer vision and more on ML models for tabular data. If you are interested in better understanding NLP models, then check out the survey paper in the Paper Picks section👇.

Machine Translation Reading List 🇬🇧➡️🇩🇪 This reading list by the Tsinghua NLP Group provides a Who's Who of relevant papers in MT. It is regularly updated, so be sure to bookmark it if you're interested in MT.

Foundations of Machine Learning 🏛 This ML curriculum by Delta Analytics teaches you fundamentals of ML from fundamental building blocks all the way to NLP. A great starting point for teaching.

AI Index 2018 🤖 This year's report updates last year's metrics and provides global context around the AI conversation. Key takeaways:

AI != CS. The growth of the number of published AI papers outstrips the growth of CS papers (p. 9).
Europe is the largest publisher of AI papers (p. 10).
Government-affiliated AI papers in China increased 400% (p. 14).
US AI authors are cited 83% more than the global average (p. 17).
Lack of gender diversity: 80% of AI professors are male (p. 25) and men make up 71% of applicants for AI jobs in the US (p. 34).
The English-to-German BLEU score is 3.5x higher today than in 2008 (p. 51).

People in ML and NLP 👩👨

Overlooked No More: Karen Sparck Jones, Who Established the Basis for Search Engines 👩‍🔬 This New York Times obituary for NLP pioneer Karen Sparck Jones gives an overview of her work. Karen laid the foundations for many areas in NLP, such as search engines. She introduced the concept of inverse document frequency, which is widely used in IR and as a strong featurization baseline.

Probably the Smartest Brain in Britain 👨‍💻 This article by The Times is a rare interview with DeepMind's co-founder Demis Hassabis that sheds light on his background, motivation, and aspirations.

The 'Godfather of Deep Learning' on Why We Need to Ensure AI Doesn't Just Benefit the Rich 👨‍🏫 Geoffrey Hinton discusses potential risks of AI, such as job loss and weaponization and shares advice for AI research:

If you have intuitions that what people are doing is wrong and that there could be something better, you should follow your intuitions.

One year of deep learning 👨‍🎓 Sylvain Gugger tells his journey into Deep Learning, from textbook writer to fast.ai's first research scientist. He also shares a lot of useful advice. My highlight:

If you’re new to the field and struggling with a part (or all) of it, remember no one had it easy. There’s always something you won’t know and that will be a challenge, but you will overcome it if you persevere.

Interview with Richard Socher 👨‍🔬 Sanyam Bhutani interviews Richard Socher. The interview touches on Richard's background, working in and getting started with NLP, ethics in AI, and lots of useful advice, such as:

Language understanding is far more nuanced (than computer vision) and there isn’t a single NLP task or data set that will solve for the whole complexity of language understanding.

Posts

On intelligence: its creation and understanding 💭 Surya Ganguli aptly summarizes the past progress, collaboration, and future directions of AI, neuroscience, and psychology. Challenges that AI + neuroscience may tackle hand-in-hand in the future include:

Biologically plausible credit assignment;
Incorporating synaptic complexity (going beyond using a single scalar for neurons);
Taking cues from systems-level modular brain architecture (continuously evolving but always staying adaptive; many different modules);
Unsupervised learning, transfer learning and curriculum design;
Building world models for understanding, planning, and active causal learning.

How to Present a Scientific Poster at a Mega-Conference 🖼 It can be overwhelming to present a poster to a large crowd. In addition, as an attendee, you want to be engaged by the presenter. Charles Sutton gives three great practical tips on how to present a poster to a large audience:

Be aware of what’s going on around you.
Use your body language.
Speak up.

Open-Ended Learning with POET 👾 This blog post accompanies the POET research paper by Uber AI. The proposed framework generates new and more challenging environments together with agents that are trained to solve them, leading to agents with increasingly complex and novel capabilities. I hope we see more accompanying blog posts full of examples and beautifully written paragraphs, such as the following:

Open-endedness [...] at its best [...] continue(s) to generate new tasks in a radiating tree of challenges indefinitely.

Evolution is in effect an open-ended process that in a single run created all forms of life on Earth.

Neural Ordinary Differential Equations 🏛 This blog post by Adrian Colyer tries to explain Neural Ordinary Differential Equations, one of the best papers at NeurIPS 2018. Also read the high-level summary by MIT Tech Review for some more context. The proposed model has a lot of potential for time series analysis where it outperforms an RNN on prediction of irregularly-sampled data. Similarly, it would be interesting to apply this model to problems that require reasoning at different time scales in NLP.

Tech Giants, Gorging on AI Professors Is Bad for You 👩‍🏫 This Bloomberg Opinion piece makes the point that tech companies should refrain from hiring too many AI professors as otherwise no one will be there to teach the next generation of innovators in AI.

Tensor Considered Harmful ⚠️ Harvard's Alexander Rush argues in this piece (with lots of examples!) that the Tensor class used in many DL libraries is broken as it leads to bad habits such as exposing private dimensions, keeping type information in documentation, etc. It proposes a proof-of-concept of an alternative approach, named tensors, with named dimensions.

Children Are Using Emoji for Digital-Age Language Learning 💜💙💛 This Wired article looks at how children use emojis to learn language. Come for the emojis, stay for the insights on language learning.

Paper picks 📄

Analysis Methods in Neural Language Processing: A Survey (TACL 2019) This is a clear and well-written survey on methods to analyze, interpret, and better understand neural networks in NLP. It gives a concise overview of the main research directions:

work that targets particular linguistic phenomena (mainly creating datasets that test for a particular characteristic and training 'diagnostic classifiers' on the model representations);
visualization methods;
challenge sets;
adversarial examples;
and explaining model predictions.

Also well worth checking out are the supplementary tables, which provide comprehensive categorizations of existing methods.

Reverse-Engineering Satire, or "Paper on Computational Humor Accepted Despite Making Serious Advance" (AAAI 2019) Yep, that's the actual paper title. The paper builds a parallel corpus of satirical and serious headlines in a clever way: Based on the insight that it is easier to remove the humour from a headline than creating a satirical headline from scratch, they create a game (which can be played here), which asks users to replace words in order to remove the humour from The Onion headlines. They analyze the resulting corpus and find that most satirical headlines are created with a false analogy: The entity in the serious and in the parallel satirical headline both share a common property; for the serious headline, a statement with regard to this property is entailed, while this is not the case for the satirical headline.

ICLR 2019 rejects worth reading 📝

"Everybody is interested in pigeons." - A publisher's feedback to Charles Darwin on the manuscript of On the Origin of Species, suggesting he should write a book about pigeons instead.

Peer review is an imprecise process and gems may sometimes fall through the cracks. Here are three papers that I found interesting, but which were rejected from ICLR 2019.

Transformer-XL: Language Modeling with Longer-Term Dependency This paper proposes a language model based on the Transformer that is a lot faster than a vanilla Transformer. It also proposes a new way to scale the Transformer beyond contexts of fixed length. The model achieves state of the art on WikiText-103, 1B Word Benchmark, PTB, and enwiki8. They also propose a new metric to analyze the length of dependencies that the model can capture. With language models being so useful, new advances in language modelling and stronger language models will play an important role in the future.

Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling This paper performs a super comprehensive and large-scale sutdy on using different pretraining tasks with language model representations across different NLP tasks on GLUE and evaluates the use of many architecture and training choices, such as multi-task learning. Empirical studies like this are important to understand what works and what doesn't. They play a key role in providing a common foundation to help advance the field.

Multi-task Learning with Gradient Communication This paper proposes an interesting new multi-task learning model to help disentangle representations between different tasks. The proposed model allows the model of one task to use the gradients of the other tasks as features, so that the models can coordinate more effectively in learning representations. It's nice to see new ways to do multi-task learning like this that go beyond just sharing parameters.

NLP News

Discussion about this post