ML on code, Understanding RNNs, Deep Latent Variable Models, Writing Code for NLP Research, Quo vadis, NLP?, Democratizing AI, ML Cheatsheets, Spinning Up in Deep RL, Papers with Code, Unsupervised MT, Multilingual BERT, Graph Networks, AutoML

Nov 12, 2018

Hey all,

Welcome to this month’s newsletter edition! There's a lot of cool stuff this time, so better take a break and enjoy this edition with your beverage of choice ☕️🍵🍺🍹. The content includes talks by Stephen Wolfram and Greg Brockman; slides about ML on code, understanding RNNs, deep latent variable models, writing code for NLP research, transfer learning, and how to write good reviews; my take on what's next for NLP; content on democratizing AI; ML cheatsheets, Deep RL resources, and Papers with Code; implementations of unsupervised MT, multilingual BERT, RL libraries by Facebook and Google, Graph networks, and AutoML; and as always lots of cool articles, news, and papers.

I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.

If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.

Talks

Computational Universe 🛰 Stephen Wolfram's 2h lecture for MIT's AGI course. He talks about what he learned from building Wolfram Alpha, in particular what he learning about natural language understanding.

For natural language understanding, the most important thing is knowing a lot of stuff about the world.

Can we rule out near-term AGI? 🤖 OpenAI's Greg Brockman reviews Deep Learning successes and the increasing amount of compute used for training models and muses whether these trends will continue. His conclusion:

While highly uncertain, near-term AGI should be taken as a serious possibility.

Luis von Ahn interview 🐦 Luis von Ahn, founder of Duolingo and reCAPTCHA, talks about the beginnings of both companies. In the end of the interview, he also discusses areas that likely won't be taken over by AI in the near future: artistic endeavours and activities that require interpersonal communication.

Slides

Towards Open-domain Generation of Programs from Natural Language 👩‍💻 Graham Neubig gives an overview of his work on generating source code from natural language.

Trying to Understand Recurrent Neural Networks for Language Processing 🔦 Yoav Goldberg discusses his work on understanding RNNs at the Blackbox NLP workshop at EMNLP 2018.

Deep Latent-Variable Models for Natural Language 🏛 Slides of the tutorial on deep latent variable models at EMNLP 2018.

Writing code for NLP research ⚒ Slides of the tutorial on writing code for NLP research by the AllenAI team at EMNLP 2018.

Transfer learning 🗣 In case you're interested in slides about transfer learning, I've put the slides of all my talks on one page. For the last talk, I've summarized our current understanding of transfer learning with language models.

How to Write Good Reviews for CVPR 📝 An overview that includes why being a reviewer is important, how the CVPR paper decision process works, how to structure a review, with good and bad examples, and tips for reviewing. Even though this is designed for CVPR, the reviewing tips and good and bad examples are also useful for NLP conferences.

Quo vadis, NLP?

EMNLP 🏢 Reflecting back on EMNLP, the last large NLP conference of the year (with 2,500+ attendees), it is hard not to be awestruck by all of the amazing work that is being done in the field. With 549 accepted papers, even keeping track of all relevant papers felt like a challenge. I wrote up a summary of highlights, which still only covers a small part. If you attended EMNLP, consider writing a summary for the rest of us about what interested and excited you.

Datasets 💻 There were also so many new datasets released. I mentioned 15 in the post, but remember that about 50+ papers were eligible for the best resource paper award. I really hope we'll make the automatic discovery and tracking of such datasets easier. Work by AI2 and Google Dataset Search are good steps in this direction. In the meantime, consider adding your dataset to NLP-progress.

Multilinguality 🇬🇧🇩🇪 The other two exciting recent developments for me are a) the availability of large language models trained on many languages (e.g. BERT) and b) the rise of performant unsupervised MT, exemplified by one of the EMNLP best papers and Artetxe et al. (EMNLP 2018). In a similar vein, many people in the fast.ai community have applied ULMFiT to new languages. Together, I'm confident that these advances will help unlock NLP for many more, particularly low-resource languages. Depending on what works better for particular languages and the requirements of the application, in the near term we'll either

use (unsupervised) MT to translate target language data into English and then apply our English models as-is;
or fine-tune a target language-specific language model.

Democratizing AI

Education 👩‍🏫 AI will permeate every aspect of our lives. Making AI education more accessible can help us ensure that AI will not only become safer, as it is not monopolized by a small number of people, but also fairer. This Economist article highlights the efforts of Jeremy Howard's and Rachel Thomas' fast.ai and other platforms for online education.

“No. Greek. Letters,” Mr Howard intones, thumping the table for punctuation.

The article also mentions Google AI researcher Sara Hooker as an example of someone who benefitted from such resources. As always, one article never tells the full story: Taking an online course is not enough to launch you into a successful career in AI. The end result is often the culmination of many factors as Sara explains herself in her depiction of her incredible journey from Mozambique to founding the Google Brain lab in Accra, Ghana.

The hard truth is that effort alone rarely fully explains someone’s achievements. Many people believed in me along the way, pushed me out of my comfort zone, and gave me the opportunity to work on high-impact, non-trivial problems that showcased my ability.

Compute 💻 Besides education, another factor that is often cited as barrier to entering the field of AI is access to compute. Stephen Merity dispels this as a myth in his article The compute and data moats are dead.

It is true that if you want to be at the very cutting-edge of AI, say, you want to train models to play Starcraft or Dota 2, you need access to a large number of GPUs. In all other parts, the compute requirements are falling rapidly as algorithmic innovations enable shrinking of formerly gargantuan models to much more manageable sizes. Neural Architecture Search required 32,400-43,200 GPU hours in its original formulation. In its latest iteration, it now needs less than 16 hours on a single Nvidia GTX 1080Ti GPU. That's more than 1,000x less!

Similarly, you can now train a model on ImageNet in 3 hours for $25. While GPUs have enabled the use of Deep Learning, even a single GPU can still cost a lot of money. Making models run faster on CPUs is thus an important direction spearheaded by, among others, the developers of DyNet and spaCy. However, if you need to train your models with very large batches or need to train very large models, then you sometimes won't get around having to use multiple GPUs. In this case, Thomas Wolf has some practical tips for you that will allow you to train models on multiple GPUs or in distributed setups.

Resources

Machine Learning cheatsheets 📚 ML cheatsheets from Stanford's CS 229 on Supervised, Unsupervised Learning, Deep Learning, Probability and Statistics, and Algebra and Calculus. A super comprehensive resource that has everything in one place and is available in English, Spanish, Farsi, French, Portuguese, and Chinese.

Spinning Up in Deep RL 🎮 A comprehensive educational resources designed by OpenAI with the goal to enable anyone to become a skilled practitioner in deep RL. It consists of crystal-clear examples of RL code, educational exercises, documentation, and tutorials.

Papers with Code 📄 A new website that matches code libraries to thousands of research papers via Semantic Scholar and has the potential to greatly facilitate reproducibility.

Encoder-decoder neural networks 📖 Nal Kalchbrenner's thesis developing the now widely adopted concept of encoder-decoder neural networks for core tasks in natural language processing and natural image and video modelling. Worth reading if you're interested in encoder-decoders and want to get a comprehensive view from first principles, including latest developments such as dilated and masked convolutions.

Code

Monoses 🇬🇧➡️🇩🇪 An implementation of unsupervised statistical machine translation from Artetxe et al. (EMNLP 2018).

BERT 🐵 TensorFlow code and pre-trained models for BERT. A PyTorch implementation is available here.

TRFL 🍄 A TensorFlow library by DeepMind that exposes several useful building blocks for implementing Reinforcement Learning agents.

Horizon 🌅 A PyTorch-based end-to-end platform for applied reinforcement learning by Facebook.

Graph nets 📈 DeepMind's library for building graph networks in TensorFlow.

UncoVec ⚒ An implementation of the word embedding post-processing and evaluation framework described in Artetxe et al. (CoNLL 2018).

AdaNet ⚖️ A lightweight and scalable TensorFlow AutoML framework for training and deploying adaptive neural networks using the AdaNet algorithm.

News

Is artificial intelligence set to become art’s next medium? 🖼 Christie's sells a controversial GAN-generated artwork for $432,500, nearly 45x its high estimate.

NIPS name change 📛 After a survey, the Neural Information Processing Systems conference makes the controversial decision not to change its name.

NIPS inclusion survey 📋 Results of the NIPS inclusion survey are out and raise concerns around representation, respect and awareness of others, community openness, and other aspects.

Blog posts and articles

AI in 2018: A Year in Review -- Ethics, Organizing, and Accountability 🤖 An annual review of the year by the AI Now Institute, which is dedicated to understanding the implications of AI.

Because AI isn’t just tech. AI is power, and politics, and culture.

Why You Should Care About Byte-Level Sequence-to-Sequence Models in NLP 𝟷𝟶𝟷𝟶 Tom Kenter motivates why byte-level seq2seq models are useful.

Aspects of Paraphrasing for Adversarial training and Regularization in Question Answering ✍️ Patrick Lewis gives an overview of how paraphrasing has been used to improve the performance in question answering systems and related natural language tasks.

I heart hangry bagel droids (or: How new words form) ❤️ An overview of the main mechanisms of word formation such as derivation, back-formation, etc. Particularly useful for people without a linguistic background who are wondering if there are any patterns to how words are coined.

Attention? Attention! ⁉️ In this comprehensive review, Lilian Weng looks into how attention was invented, its different variants and how it is used in models, such as transformer and SNAIL.

A Conversation With Quoc Le 👨‍🔬 An extensive interview with Quoc Le about Google AutoML, its pros and cons, challenges, and potential.

20 things I wish I’d known when I started my PhD 🐼 A list of advice from PhD students and postdocs at Oxford's Department of Zoology.

18. The nature of research means that things will not always go according to plan. This does not mean you are a bad student. Keep calm, take a break and then carry on. Experiments that fail can still be written up as part of a successful PhD.

Paper picks

Have a look at my review of EMNLP 2018 for some ideas for interesting papers to read. For more IR-related papers, check out Claudia Hauff's list. For a comprehensive overview of the different sessions, have a look at Patrick Lewis' post.

NLP News

Discussion about this post