Transfer learning, Chris Olah, Software 2.0, NMT with attention notebook, gradient boosting in-depth, Defense Against the Dark Arts, interpretability and bias, RL, scene understanding
It feels like quite a lot has been going on in the last two weeks. Consequently, this newsletter is also more packed than usual. So lean back with your beverage of choice ☕️🍵🍺 and let me take you through some of what happened in the world of ML and NLP.
Highlights There's been so much cool stuff, it's hard to pick favourites. For slides and talks, my highlights are the chat with Christopher Olah about interpreting neural networks and Andrej Karpathy's talk about Software 2.0; the NMT with attention Colaboratory notebook is pretty cool; there's also an awesome in-depth resource about gradient boosting; two overviews of Defense Against the Dark Arts 🔮; some cool articles on interpretability and bias; articles about RL and scene understanding; and lots more articles and papers!
What's hot 🔥
Transfer learning is getting hotter 🔥! Papers on transfer learning have received best paper awards at NAACL 2018 (ELMo) and CVPR 2018 (Taskonomy). If you haven't read them yet, definitely check them out! OpenAI and FAIR are also getting in on the game with a Transformer and unsupervised graphs respectively! There's also been something new from Salesforce (the Natural Language Decathlon 👀). Media is also covering individual methods now---is this too much hype?
Another thing people seem to be hyped up about is watching other people kick a ball ⚽️(aka the FIFA World Cup). ML can surely do that, too 👀 and naturally also knows who's going to win (don't tell it about the group stage games) 🇩🇪.
Don't believe me? Why don't we have a debate? IBM is training a system to do just that. Hopefully we'll get more details soon.
In case you haven't noticed, you can now train very deep models simply using a fancy initialization scheme. How deep? 10,000 layers!
Slides and talks
Watch this Coffee with a Googler episode with Christopher Olah to get a deep dive into Distill, Lucid, and Deep Dream. If you're not excited about visualizing features and neural networks dreaming, you will be after watching this episode.
Watch Christopher Manning talk about Memory-Attention-Composition Networks at ICLR 2018.
Kyunghyun Cho gave an 8 hour course on Deep Learning for NLP in Korean. The lectures were in Korean, but the slides are in English and available here.
If you're into ML, you've likely ready Andrej Karpathy's article on Software 2.0 (if not, go read it now). In this talk, Karpathy talks about Software 2.0 more in-depth and about his experience building the Software 2.0 Stack at Tesla.
If you're interested in ML models of programs, check out the slides of this tutorial by Graham Neubig and Miltos Allamanis. After going through the tutorial, if you want to get your hands dirty, have a look at CoNaLa, the Code/Natural Language Challenge out of Graham's lab.
Zachary Lipton discusses techniques to apply NLP models to problems without large labeled datasets by relying on humans in the loop.
Tools and Implementations
This Colaboratory notebook trains a sequence to sequence (seq2seq) model for Spanish to English translation using tf.keras and eager execution.
Interested in agents talking to each other? This repository contains the PyTorch implementation of the models described in the paper Emergent Translation in Multi-Agent Communication (ICLR 2018).
NCRF++ (see ACL demo paper) is a PyTorch based framework with flexible choices of input features and output structures for NLP sequence labeling tasks. The model design is fully configurable through a configuration file, which does not require any code work.
An in-depth explanation of gradient boosting machines by Terence Parr and Jeremy Howard with lots of examples 👏🏻. If you ever wanted to really understand gradient boosting, this is the resource to read.
A repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks (disclaimer: created by me).
One of the hardest things to teach about doing research is how to come up with new hypotheses, validate, and iterate on them. Guillaume Leclerc provides a great example of this, laying out the different steps for his Master thesis. If you want to skip ahead, you can find the final paper here.
Defense Against the Dark Arts
Slides (with notes!) from the top auror himself (Ian Goodfellow) on how to defend against adversarial examples.
This blog post survey the attacks techniques that target AI (artificial intelligence) systems and how to protect against them.
Interpretability and bias
Hal Daumé III shows us how "discrimination" might come into a system by walking through the different stages of an arXiv paper recommendation system.
A list of resources facilitating model interpretability (introspection, simplification, visualization, explanation).
As machine learning infiltrates society, scientists are trying to help ward off injustice. This Nature feature gives an overview of the researchers working on fairness in ML.
Felix Yu walks us through his 5th place solution to the OpenAI retro contest on training a RL agent to play custom levels of Sonic the Hedghehog with Transfer Learning. It's a great post that gives an honest impression of what thinks did and didn't work.
Metacar is a a reinforcement learning environment for self-driving cars in the browser. The project contains examples of algorithms created with metacar.
Facebook AI Research open sources DensePose, a real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body.
DeepMind introduces the Generative Query Network (GQN), a framework within which machines learn to perceive their surroundings by training only on data obtained by themselves as they move around scenes.
More blog posts and articles
A blog post about using neural network similarity as measured by canonical correlation analysis (CCA) to better understand the generalization behaviour of models.
A blog post on the process and rationale behind Twitter Cortex migrating its deep learning framework from Lua Torch to TensorFlow.
Sanjeev Arora discusses the method proposed in his ICLR 2017 paper "A Simple but Tough-to-beat Baseline for Sentence Embeddings".
Yoel Zeldes summarizes the talks given in the Deep Learning: Theory & Practice conference held in Israel (with guest speakers from abroad). He describes some of the key points that were particularly interesting.
The Crisis Text Line uses machine learning to figure out who’s at risk and when to intervene. If you're interested in mental health, some of the data is available here. Did you know that Wednesday is the most anxiety-provoking day of the week?
Thomas Wolf how to take advantage of spaCy & a bit of Cython for blazing fast NLP.
This paper by OpenAI is in the same line as recent approaches such as ELMo and ULMFiT. Compared to those, the proposed approach uses a more expressive encoder (a Transformer) and simple task-specific transformations (e.g. concatenating premise and hypothesis for entailment). It achieves state-of-the-art results across a diverse range of tasks. A cool aspect is that the model can even perform a form of zero-shot learning using heuristics.
Researchers from Salesforce the Natural Language Decathlon, a challenge that spans ten diverse NLP tasks, from QA and summarization to relation extraction and commonsense pronoun resolution. They frame all tasks as QA and propose a new question answering network that jointly learns all tasks.
A collaboration between researchers from CMU, NYU, and FAIR. Instead of using features for transfer learning, the authors seek to learn transferrable graphs. The graphs look similar to attention matrices and are multiplied by task-specific features during fine-tuning. They show improvements across some tasks, but the baselines are somewhat weak.