Transfer learning, Chris Olah, Software 2.0, NMT with attention notebook, gradient boosting in-depth, Defense Against the Dark Arts, interpretability and bias, RL, scene understanding

Jun 25, 2018

Hi all,

It feels like quite a lot has been going on in the last two weeks. Consequently, this newsletter is also more packed than usual. So lean back with your beverage of choice ☕️🍵🍺 and let me take you through some of what happened in the world of ML and NLP.

Highlights There's been so much cool stuff, it's hard to pick favourites. For slides and talks, my highlights are the chat with Christopher Olah about interpreting neural networks and Andrej Karpathy's talk about Software 2.0; the NMT with attention Colaboratory notebook is pretty cool; there's also an awesome in-depth resource about gradient boosting; two overviews of Defense Against the Dark Arts 🔮; some cool articles on interpretability and bias; articles about RL and scene understanding; and lots more articles and papers!

What's hot 🔥

Transfer learning is getting hotter 🔥! Papers on transfer learning have received best paper awards at NAACL 2018 (ELMo) and CVPR 2018 (Taskonomy). If you haven't read them yet, definitely check them out! OpenAI and FAIR are also getting in on the game with a Transformer and unsupervised graphs respectively! There's also been something new from Salesforce (the Natural Language Decathlon 👀). Media is also covering individual methods now---is this too much hype?
Another thing people seem to be hyped up about is watching other people kick a ball ⚽️(aka the FIFA World Cup). ML can surely do that, too 👀 and naturally also knows who's going to win (don't tell it about the group stage games) 🇩🇪.
Talking about looking in the future, ML is going to get tiny!
Don't believe me? Why don't we have a debate? IBM is training a system to do just that. Hopefully we'll get more details soon.
In case you haven't noticed, you can now train very deep models simply using a fancy initialization scheme. How deep? 10,000 layers!
NAACL 2018 took place last week. You can check out Alex Wang's highlights and my highlights.

Slides and talks

Machine Learning Research & Interpreting Neural Networks

Watch this Coffee with a Googler episode with Christopher Olah to get a deep dive into Distill, Lucid, and Deep Dream. If you're not excited about visualizing features and neural networks dreaming, you will be after watching this episode.

Christopher D. Manning: A Neural Network Model That Can Reason (ICLR 2018 invited talk)

Watch Christopher Manning talk about Memory-Attention-Composition Networks at ICLR 2018.

Deep Learning for NLP slides - Kyunghyun Cho

Kyunghyun Cho gave an 8 hour course on Deep Learning for NLP in Korean. The lectures were in Korean, but the slides are in English and available here.

Building the Software 2.0 Stack by Andrej Karpathy

If you're into ML, you've likely ready Andrej Karpathy's article on Software 2.0 (if not, go read it now). In this talk, Karpathy talks about Software 2.0 more in-depth and about his experience building the Software 2.0 Stack at Tesla.

Modelling Natural Language, Programs, and their Intersection (NAACL 2018 Tutorial)

If you're interested in ML models of programs, check out the slides of this tutorial by Graham Neubig and Miltos Allamanis. After going through the tutorial, if you want to get your hands dirty, have a look at CoNaLa, the Code/Natural Language Challenge out of Graham's lab.

Efficient Deep Learning with Humans in the Loop

Zachary Lipton discusses techniques to apply NLP models to problems without large labeled datasets by relying on humans in the loop.

Tools and Implementations

Code and model for the Fine-tuned Transformer by OpenAI

OpenAI has open-sourced the code for their custom fine-tuned Transformer model in TensorFlow. HuggingFace have ported the code to PyTorch.

Neural Machine Translation with Attention

This Colaboratory notebook trains a sequence to sequence (seq2seq) model for Spanish to English translation using tf.keras and eager execution.

Code for Emergent Translation in Multi-Agent Communication

Interested in agents talking to each other? This repository contains the PyTorch implementation of the models described in the paper Emergent Translation in Multi-Agent Communication (ICLR 2018).

NCRF++: An Open-source Neural Sequence Labeling Toolkit — github.com

NCRF++ (see ACL demo paper) is a PyTorch based framework with flexible choices of input features and output structures for NLP sequence labeling tasks. The model design is fully configurable through a configuration file, which does not require any code work.

Resources

How to explain gradient boosting

An in-depth explanation of gradient boosting machines by Terence Parr and Jeremy Howard with lots of examples 👏🏻. If you ever wanted to really understand gradient boosting, this is the resource to read.

Tracking the Progress in Natural Language Processing

A repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks (disclaimer: created by me).

fastdeepnets Research Journal

One of the hardest things to teach about doing research is how to come up with new hypotheses, validate, and iterate on them. Guillaume Leclerc provides a great example of this, laying out the different steps for his Master thesis. If you want to skip ahead, you can find the final paper here.

Defense Against the Dark Arts

Defense Against the Dark Arts: An overview of adversarial example security research and future research directions

Slides (with notes!) from the top auror himself (Ian Goodfellow) on how to defend against adversarial examples.

Attacks against machine learning — an overview

This blog post survey the attacks techniques that target AI (artificial intelligence) systems and how to protect against them.

Interpretability and bias

Many opportunities for discrimination in deploying machine learning systems

Hal Daumé III shows us how "discrimination" might come into a system by walking through the different stages of an arXiv paper recommendation system.

Awesome interpretable machine learning

A list of resources facilitating model interpretability (introspection, simplification, visualization, explanation).

Bias detectives: the researchers striving to make algorithms fair

As machine learning infiltrates society, scientists are trying to help ward off injustice. This Nature feature gives an overview of the researchers working on fairness in ML.

Reinforcement learning

Train a Reinforcement Learning agent to play custom levels of Sonic the Hedgehog with Transfer Learning

Felix Yu walks us through his 5th place solution to the OpenAI retro contest on training a RL agent to play custom levels of Sonic the Hedghehog with Transfer Learning. It's a great post that gives an honest impression of what thinks did and didn't work.

Metacar

Metacar is a a reinforcement learning environment for self-driving cars in the browser. The project contains examples of algorithms created with metacar.

Understanding scenes

Facebook open sources DensePose

Facebook AI Research open sources DensePose, a real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body.

Neural scene representation and rendering

DeepMind introduces the Generative Query Network (GQN), a framework within which machines learn to perceive their surroundings by training only on data obtained by themselves as they move around scenes.

Paper picks

Improving Language Understanding by Generative Pre-Training

This paper by OpenAI is in the same line as recent approaches such as ELMo and ULMFiT. Compared to those, the proposed approach uses a more expressive encoder (a Transformer) and simple task-specific transformations (e.g. concatenating premise and hypothesis for entailment). It achieves state-of-the-art results across a diverse range of tasks. A cool aspect is that the model can even perform a form of zero-shot learning using heuristics.

The Natural Language Decathlon: Multitask Learning as Question Answering

Researchers from Salesforce the Natural Language Decathlon, a challenge that spans ten diverse NLP tasks, from QA and summarization to relation extraction and commonsense pronoun resolution. They frame all tasks as QA and propose a new question answering network that jointly learns all tasks.

GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations

A collaboration between researchers from CMU, NYU, and FAIR. Instead of using features for transfer learning, the authors seek to learn transferrable graphs. The graphs look similar to attention matrices and are multiplied by task-specific features during fine-tuning. They show improvements across some tasks, but the baselines are somewhat weak.

NLP News

Discussion about this post

Ready for more?