Transfer learning, Chris Olah, Software 2.0, NMT with attention notebook, gradient boosting in-depth, Defense Against the Dark Arts, interpretability and bias, RL, scene understanding
Hi all,
It feels like quite a lot has been going on in the last two weeks. Consequently, this newsletter is also more packed than usual. So lean back with your beverage of choice ☕️🍵🍺 and let me take you through some of what happened in the world of ML and NLP.
Highlights There's been so much cool stuff, it's hard to pick favourites. For slides and talks, my highlights are the chat with Christopher Olah about interpreting neural networks and Andrej Karpathy's talk about Software 2.0; the NMT with attention Colaboratory notebook is pretty cool; there's also an awesome in-depth resource about gradient boosting; two overviews of Defense Against the Dark Arts 🔮; some cool articles on interpretability and bias; articles about RL and scene understanding; and lots more articles and papers!
What's hot 🔥
Transfer learning is getting hotter 🔥! Papers on transfer learning have received best paper awards at NAACL 2018 (ELMo) and CVPR 2018 (Taskonomy). If you haven't read them yet, definitely check them out! OpenAI and FAIR are also getting in on the game with a Transformer and unsupervised graphs respectively! There's also been something new from Salesforce (the Natural Language Decathlon 👀). Media is also covering individual methods now---is this too much hype?
Another thing people seem to be hyped up about is watching other people kick a ball ⚽️(aka the FIFA World Cup). ML can surely do that, too 👀 and naturally also knows who's going to win (don't tell it about the group stage games) 🇩🇪.
Talking about looking in the future, ML is going to get tiny!
Don't believe me? Why don't we have a debate? IBM is training a system to do just that. Hopefully we'll get more details soon.
In case you haven't noticed, you can now train very deep models simply using a fancy initialization scheme. How deep? 10,000 layers!
NAACL 2018 took place last week. You can check out Alex Wang's highlights and my highlights.
Slides and talks
Machine Learning Research & Interpreting Neural Networks
Watch this Coffee with a Googler episode with Christopher Olah to get a deep dive into Distill, Lucid, and Deep Dream. If you're not excited about visualizing features and neural networks dreaming, you will be after watching this episode.
Christopher D. Manning: A Neural Network Model That Can Reason (ICLR 2018 invited talk)
Watch Christopher Manning talk about Memory-Attention-Composition Networks at ICLR 2018.
Deep Learning for NLP slides - Kyunghyun Cho
Kyunghyun Cho gave an 8 hour course on Deep Learning for NLP in Korean. The lectures were in Korean, but the slides are in English and available here.
Building the Software 2.0 Stack by Andrej Karpathy
If you're into ML, you've likely ready Andrej Karpathy's article on Software 2.0 (if not, go read it now). In this talk, Karpathy talks about Software 2.0 more in-depth and about his experience building the Software 2.0 Stack at Tesla.
Modelling Natural Language, Programs, and their Intersection (NAACL 2018 Tutorial)
If you're interested in ML models of programs, check out the slides of this tutorial by Graham Neubig and Miltos Allamanis. After going through the tutorial, if you want to get your hands dirty, have a look at CoNaLa, the Code/Natural Language Challenge out of Graham's lab.
Efficient Deep Learning with Humans in the Loop
Zachary Lipton discusses techniques to apply NLP models to problems without large labeled datasets by relying on humans in the loop.
Tools and Implementations
Code and model for the Fine-tuned Transformer by OpenAI
OpenAI has open-sourced the code for their custom fine-tuned Transformer model in TensorFlow. HuggingFace have ported the code to PyTorch.
Neural Machine Translation with Attention
This Colaboratory notebook trains a sequence to sequence (seq2seq) model for Spanish to English translation using tf.keras and eager execution.
Code for Emergent Translation in Multi-Agent Communication
Interested in agents talking to each other? This repository contains the PyTorch implementation of the models described in the paper Emergent Translation in Multi-Agent Communication (ICLR 2018).
NCRF++: An Open-source Neural Sequence Labeling Toolkit — github.com
NCRF++ (see ACL demo paper) is a PyTorch based framework with flexible choices of input features and output structures for NLP sequence labeling tasks. The model design is fully configurable through a configuration file, which does not require any code work.
Resources
How to explain gradient boosting
An in-depth explanation of gradient boosting machines by Terence Parr and Jeremy Howard with lots of examples 👏🏻. If you ever wanted to really understand gradient boosting, this is the resource to read.
Tracking the Progress in Natural Language Processing
A repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks (disclaimer: created by me).
One of the hardest things to teach about doing research is how to come up with new hypotheses, validate, and iterate on them. Guillaume Leclerc provides a great example of this, laying out the different steps for his Master thesis. If you want to skip ahead, you can find the final paper here.
Defense Against the Dark Arts
Slides (with notes!) from the top auror himself (Ian Goodfellow) on how to defend against adversarial examples.
Attacks against machine learning — an overview
This blog post survey the attacks techniques that target AI (artificial intelligence) systems and how to protect against them.
Interpretability and bias
Many opportunities for discrimination in deploying machine learning systems
Hal Daumé III shows us how "discrimination" might come into a system by walking through the different stages of an arXiv paper recommendation system.
Awesome interpretable machine learning
A list of resources facilitating model interpretability (introspection, simplification, visualization, explanation).
Bias detectives: the researchers striving to make algorithms fair
As machine learning infiltrates society, scientists are trying to help ward off injustice. This Nature feature gives an overview of the researchers working on fairness in ML.
Reinforcement learning
Felix Yu walks us through his 5th place solution to the OpenAI retro contest on training a RL agent to play custom levels of Sonic the Hedghehog with Transfer Learning. It's a great post that gives an honest impression of what thinks did and didn't work.
Metacar is a a reinforcement learning environment for self-driving cars in the browser. The project contains examples of algorithms created with metacar.
Understanding scenes
Facebook open sources DensePose
Facebook AI Research open sources DensePose, a real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body.
Neural scene representation and rendering
DeepMind introduces the Generative Query Network (GQN), a framework within which machines learn to perceive their surroundings by training only on data obtained by themselves as they move around scenes.
More blog posts and articles
How Can Neural Network Similarity Help Us Understand Training and Generalization?
A blog post about using neural network similarity as measured by canonical correlation analysis (CCA) to better understand the generalization behaviour of models.
A blog post on the process and rationale behind Twitter Cortex migrating its deep learning framework from Lua Torch to TensorFlow.
Deep-learning-free Text and Sentence Embedding, Part 1
Sanjeev Arora discusses the method proposed in his ICLR 2017 paper "A Simple but Tough-to-beat Baseline for Sentence Embeddings".
Deep Learning: Theory & Practice
Yoel Zeldes summarizes the talks given in the Deep Learning: Theory & Practice conference held in Israel (with guest speakers from abroad). He describes some of the key points that were particularly interesting.
Suicide prevention: how scientists are using artificial intelligence to help people at risk
The Crisis Text Line uses machine learning to figure out who’s at risk and when to intervene. If you're interested in mental health, some of the data is available here. Did you know that Wednesday is the most anxiety-provoking day of the week?
🚀 100 Times Faster Natural Language Processing in Python
Thomas Wolf how to take advantage of spaCy & a bit of Cython for blazing fast NLP.
Paper picks
Improving Language Understanding by Generative Pre-Training
This paper by OpenAI is in the same line as recent approaches such as ELMo and ULMFiT. Compared to those, the proposed approach uses a more expressive encoder (a Transformer) and simple task-specific transformations (e.g. concatenating premise and hypothesis for entailment). It achieves state-of-the-art results across a diverse range of tasks. A cool aspect is that the model can even perform a form of zero-shot learning using heuristics.
The Natural Language Decathlon: Multitask Learning as Question Answering
Researchers from Salesforce the Natural Language Decathlon, a challenge that spans ten diverse NLP tasks, from QA and summarization to relation extraction and commonsense pronoun resolution. They frame all tasks as QA and propose a new question answering network that jointly learns all tasks.
GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations
A collaboration between researchers from CMU, NYU, and FAIR. Instead of using features for transfer learning, the authors seek to learn transferrable graphs. The graphs look similar to attention matrices and are multiplied by task-specific features during fine-tuning. They show improvements across some tasks, but the baselines are somewhat weak.