The Bitter Lesson, How to Write X, ML Events in 2019

Apr 15, 2019

Hi all,

This newsletter’s spotlight topics are The Bitter Lesson, How to Write X guides, and ML events in 2019. Besides these, there are again lots of slides, resources, tools, articles, blog posts, and papers to explore.

Contributions 💪 If you have written or have come across something that would be relevant to the community, hit reply on the issue so that it can be shared more widely.

I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.

If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.

Meaningful Measures 🔢

1.5 MB An estimate of the amount of information humans extract during language acquisition (mostly related to lexical semantics). This is a Fermi estimate based on viewing words as Gaussian distributions in an n-dimensional space. In contrast, the estimate for syntax is much, much smaller—only around 700 bits.

From: Humans store about 1.5 megabytes of information during language acquisition

Slides 🖼

How Could Machines Learn Like Animals & Human? 🦓🦒 Slides from Yann LeCun's talk at the Harvard MBB Distinguished Lecture. Among other things, he discusses convnets as models of the visual system and unsupervised feature learning (via energy models).

Foundations: How to design experiments in NLU 👩‍🔬 A great presentation by Sam Bowman that discusses the other half (the non-engineering related part) of publishing an influential NLP paper: how to find and understand related work, design effective experiments, and analyze their results.

A Beginner’s Guide - Python, NLP, and Twitter API 🎓 Three mini-lectures by Wei Xu to teach students with some or no programming experience (with sample code in Google Colaboratory). If you're interested in teaching high school students about NLP, this is a good place to start.

The Bitter Lesson 🤖

Rich Sutton argues in this short essay that the only thing that matters in the long run in AI research is how much methods are able to leverage (the increasing amount of available) computation. He argues that incorporating knowledge into our models does not work in the long run; the only thing that matters is how well methods scale. Two paradigms that seem to scale arbitrarily well are search and learning.

While I agree to some extent—after all, the success of Deep Learning is very much a success of scale—incorporating knowledge or inductive biases is not without its own success stories. Perhaps the most notable example are convolutional neural networks (CNNs) and more recently group equivariant and steerable CNNs. Such inductive biases are vastly beneficial in situations where we can't rely on arbitrarily scaling computation such as settings with limited amounts of data, which characterizes most real-world applications.

How to Write... ✍️

...a great ML tutorial 👩‍🏫 Don't show "a 5-liner to train MNIST/ImageNet examples". Show what it would be like to actually use the framework to solve a new problem or how you might go about fixing a subtle error.

...an impactful PhD thesis 👩‍🎓 When choosing a thesis topic, imagine what the impact will be. Restate significant results three times (each time being more precise). People read your paper to write theirs. Your ideas are more likely to spread if you help out. Plan the thesis like a collapsible telescope: the first section should be something low-risk that can be done easily, while the second part should be something high-risk with potentially large impact.

...a great research paper 📃 This presentation by Nando de Freitas, Ulrich Paquet, Stephan Gouws, Martin Arjovsky , and Kyunghyun Cho covers many great ideas. One (very small excerpt): Simon Peyton's 7 simple suggestions:

Don’t wait: write.
Identify your key idea.
Tell one story.
Nail your contributions to the mast.
Related work: later.
Put your readers first.
Listen to your readers.

Resources 📚

How I'm able to take notes in mathematics lectures using LaTeX and Vim 📝 This blog post by Gilles Castel is a master's guide that promises to super-charge your note-taking ability in LaTeX. It covers everything you need to know, including a plethora of useful auto-expanding snippets and advanced tips such as inserting them in environments or specific contexts.

The Autodiff Cookbook 🥘 JAX (which we covered in past newsletters) enables automatic differentiation for numpy and also you to run your numpy programs on GPU. This Colab notebook covers many different recipes and ideas that you can leverage in your own work.

Tools ⚒

Genomic ULMFiT 🔬 An implementation of ULMFiT for genomics classification using Pytorch and Fastai. This is a cool example that shows how universal the notion of pretraining a language model is. The model is pretrained on a large unlabelled genomics corpus and can be used as a feature extractor for parsing genomic data.

ML events in 2019 🏛

Here are the machine learning-focused events that I'm particularly excited about this year.

Deep Learning Indaba, 25-31 August, Nairobi, Kenya 🇰🇪 The Deep Learning Indaba is the main event of an initiative to strengthen ML in Africa. This year, there will be even more content for beginners and more advanced people and more opportunities to get feedback on your own work and exchange ideas. The application deadline has been extended, so sign up if you still can. The Indaba is a great opportunity to connect with peers and experts and help shape the future of ML in Africa. Read this take for a summary of the current state of play of AI in Africa by Mary Carman and Benjamin Rosman.

khipu.ai, 11-15 November, Montevideo, Uruguay 🇺🇾 What the Deep Learning Indaba is for Africa, that is khipu.ai for South America. The event aims to foster collaborations among the research community in South America and features an impressive set of speakers, including Yoshua Bengio, Joelle Pineau, and Jeff Dean. If you hail from or are based in South America, definitely apply! Applications close on June 28.

SE Asia ML School, 8-12 July, Jakarta, Indonesia If you're instead based in South East Asia, then this event is for you. The summer school aims to kickstart an effort to inspire, encourage, and educate people in ML in the South East Asian region. Applications are still open until April 20!

EurNLP, October 11, London, UK 🇬🇧 Smaller in scope than both DL Indaba and khipu.ai, this event aims to connect and foster collaboration among people working on NLP in and around Europe. We're aiming to have a diverse set of high-quality talks and a poster session. Consider submitting your new, previously or concurrently published work and—more importantly—save the date!

Articles and blog posts 📰

Deep learning: From natural to medical images 👩‍⚕️ This blog post by Thijs Kooi gives an overview of the differences between deep learning on natural and medical images. In contrast to natural images, intensity, location, and scale all play an important role in medial image processing.

Text classification using TensorFlow.js: An example of detecting offensive language in browser ⁉️ This is a nice example of using a ML classifier in the browser. The model is built on the Universal Sentence Encoder and trained on a dataset of civil comments (training code is available here).

Massive Multi-Task Learning with Snorkel MeTaL: Bringing More Supervision to Bear 🏊‍♀️ This post gives a nice overview of some of the different ingredients that make up a state-of-the-art approach on a common NLP benchmark these days. One aspect I particularly liked is that error analysis was used to identify examples on which the model performs poorly; separate models were then trained on those examples, which increased the overall accuracy by 0.7.

Remote Servers 💻 Yanai Elazar shares some tips on how to easily and conveniently connect to remote servers for machine learning work.

State-of-the-art Multilingual Lemmatization 🇬🇧🇩🇪 An overview of state-of-the-art sequence-to-sequence models for lemmatization, how well they do (high 90's on most languages on CoNLL 2018), and future challenges.

Are Deep Neural Networks Dramatically Overfitted? 🤖 Lilian Weng reviews recent work on the generalizability of neural networks. The post covers classic theorems on generalization, the expressivity of neural networks, and over-parameterization and includes implementations.

A Visual Exploration of Gaussian Processes 📈 This post is a great introduction of Gaussian Processes that provides some intuition on how they work with interactive figures and hands-on examples.

From Attention in Transformers to Dynamic Routing in Capsule Nets 💊 This post by Samira Abnar draws connections between the main building blocks of transformers and capsule networks: dynamic routing and attention, capsule types and attention heads, and positional embedding and coordinate addition.

The Illustrated Word2vec 🖼 After illustrating the transformer and ELMo, ULMFiT and BERT, Jay Alammar turns to one of the most influential methods in recent NLP, word2vec, with a visual guide of skip-gram with negative sampling.

Neural Networks in 100 lines of pure Python 💻 This is a nice tutorial by Julian Eisenschlos that develops basic neural network building blocks from first principles in numpy and nicely showcases the forward and backward pass for each layer.

Paper picks 📄

Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages (NAACL 2019) It's often difficult to compare models in an ablation study as languages differ across many typological dimensions and training corpora are often different across languages. This paper takes an interesting, synthetic approach to overcoming these differences in order to study an RNN's syntactic performance: The authors create synthetic versions of English that differ just in a single typological parameter (such as verb order) and compare how the RNN performs on this synthetic language compared to English. As expected, morphological case marking makes agreement prediction easier; performance is also higher in subject-verb-object order languages (such as in English) than for subject-object-verb order (as in Japanese).

Linguistic Knowledge and Transferability of Contextual Representations (NAACL 2019) This is a nice paper that sheds more light on our understanding of pretrained language models by means of an extensive analysis of different probing tasks and a comparison to different pretraining objectives. The authors find that feature-based transfer is competitive with task-specific models, but do worse on tasks requiring fine-grained linguistic knowledge. They also highlight an interesting difference between the layer-wise behaviour of LSTMs and Transformers: moving up the LSTM layers yields more task- specific representations, but the same does not hold for transformers.

Multi-task Learning with Sample Re-weighting for Machine Reading Comprehension (NAACL 2019) The paper proposes a new multi-task learning model for reading comprehension. The model is jointly trained on multiple reading comprehension datasets. Whereas the standard way to do multi-task learning is to uniformly sample all tasks uniformly at random, their model learns the sample weights from the auxiliary tasks using language models. In particular, they incorporate ideas from data selection to select auxiliary examples that are relevant to the main task.

NLP News

Discussion about this post

Ready for more?