NLP News - Cat ML Papers, Multi-agent RL tool, TFGAN, MUSE, Intro to GPs, Word Mover's Distance tutorial, Gradient Boosting from scratch, Neuroevolution, More from NIPS '17
Highlights of this newsletter: A collection of ML papers on cats; a tool for multi-agent reinforcement learning; a lightweight library for training GANs; a tool for creating unsupervised multilingual embeddings; an introduction to Gaussian Processes; a tutorial on using the Word Mover's Distance; an introduction to Gradient Boosting; everything you need to know about Neuroevolution; many more highlights, slides, and presentations from NIPS 2017.
"[T]he algorithm is the thing we had a relationship with since the beginning. [...] We learned to fuel it and do whatever it took to please the algorithm.”
- a dystopian quote by a Youtube creator on his relationship with Youtube (source: Buzzfeed)
Fun and games
The First Conference on Pokémonastics will be held at Keio University in May 2018. If you want to geek out on the linguistics of Pokémon names, then this conference is for you.
Arguably, 90% of the Internet is cats. Understanding, modeling and synthesizing our feline friends is thus an important research problem. This website highlights all recent ML papers that employ cats in different ways.
Tools and implementations
MAgent is a a research platform for many-agent reinforcement learning that scales up from hundreds to millions of agents and allows for cool battle scenarios as shown in this demo.
TFGAN is a lightweight library designed to make it easy to train and evaluate GANs. It provides the infrastructure to easily train a GAN and well-tested loss and evaluation metrics.
The Allen Institute for AI (AI2) launches AI2-THOR, an open-source set of 3D photo-realistic scenes hosted within the Unity3D game engine, which can be used as interactive environments for AI agents.
Recent papers by Facebook on unsupervised word-level translation and unsupervised Neural Machine Translation have received good reviews at ICLR 2018. This repository contains the implementation of their first paper, which can be used to learn state-of-the-art multilingual embeddings in an (un)supervised way.
Git is hard: screwing up is easy, and figuring out how to fix your mistakes is next to impossible, as in many cases, you already need to know the solution to your problem in order to be able to google for it. This website contains the most common problems in git and their solutions in plain English.
Introduction to Gaussian Processes - Part I — bridg.land Gaussian Processes may not be as hyped as Deep Neural Networks, but are useful in many ways, e.g. optimizing your hyperparameters (Bayesian Optimization), obtaining confidence estimates, etc. Alex Bridgland provides an excellent intro to GPs in this blog post. You can find the notebook here.
We are very good at measuring the similarity between words (using e.g. word2vec) but what about documents? Word Mover's Distance (adapted from Earth Mover's Distance) provides an intuitive way on how to do this and gensim clearly shows how to use the technique in this excellent tutorial.
Gradient Boosting (most often seen in its 'extreme' variant, XGBoost) arguably is the most successful algorithm in kaggle competitions. This blog post provides a nice introduction to the algorithm.
OpenAI first showed that Evolution Strategies (ES) can be used to train deep neural networks. This blog post highlights a set of compelling findings from 5 recent papers of Uber Engineering that suggest that using genetic algorithms may be a competitive alternative to SGD for training deep neural networks for reinforcement learning.
The main neuroevolution paper from Uber. A simple genetic algorithm (GA) outperforms Q-learning (DQN) and policy gradients (A3C) on hard deep RL problems. The GA parallelizes better than (and is thus faster than) ES, A3C, and DQN. Surprisingly, on some games even random search substantially outperforms DQN, A3C, and ES (but not the GA).
A nice explainer blog post by Lars Hulstaert where he gives some more intuitions on the differences between optimizing with gradient descent and neuroevolution.
More blog posts and articles
Emily Bender highlights for 4 ways in which we can make our research papers more linguistically informed.
With recent advances in speech synthesis, audio samples are now more human-like than ever. This website contains audio samples from the current state-of-the-art model Tacotron 2 as well as a Turing test. Can you differentiate the Tacotron 2 output from speech produced by a human?
The Toxic Comment Classification Challenge on kaggle challenges you to build a model that is able to detect different types of toxicity like threats, obscenity, insults, and identity-based hate in online comments. N-gram based features are useful in this competition and NB-SVM is a strong baseline.
This article gives a nice summary of some of the highlights and most significant developments of Deep Learning applications in text, speech, and vision in 2017.
Awni Hannun gives some practical tips for training sequence-to-sequence models with attention and focuses on a few tips which even Deep Learning practitioners mihgt y . If you have experience training other types of deep neural networks, pretty much all of it applies here. This article focuses on a few tips you might not know about, even with experience training other models.
The polite tone of personal assistants such as Siri and Alexa can get boring at times. Personal assistants often try to add jokes in order to come across as more interesting; a chatbot in Russia takes this one step further and employs a lot of sass and dark humor, much to the enjoyment of its 1.5M daily active users.
Discrimination and harassment is a big issue in the tech sector, but has also recently become a problem at ML conferences. This blog post by Stephen Merity discusses steps that we can take to make our community and conferences more inclusive.
A sober, sceptical take on AlphaZero's much-praised recent victories against the open-source chess engine Stockfish by Jose Camacho Collados.
Textio analyzes what the language used in 25,000 recent job descriptions tells us about the corporate cultural norms of leading tech companies. For example, Amazon emphasizes a "fast-paced environment", while FB highlights "our family".
More from NIPS 2017
The slides of the above Deep Learning: Practice and Trends tutorial by Scott Reed, Nando de Freitas, and Oriol Vinyals.
An Addendum to Alchemy — www.argmin.net Ali Rahimi and Ben Recht expand on a few points raised in response to their somewhat controversial "test of time" talk, which highlighted the growing gap between our field’s understanding of its techniques and its practical successes.
All notebooks from the Learn How to Code a Paper with State-of-the-Art Frameworks Workshop can be found in this repository.
The presentations of all speakers at the Deep Learning At Supercomputer Scale Workshop that aims to reduce the training time of neural network models and increase the productivity of machine learning researchers.
Did you miss NIPS or were only able to attend a few sessions? You can find here the most comprehensive summary of NIPS 2017 in 38 pages of notes by David Abel.
Simon Osindero shares a transcript of his speech at the first Black in AI workshop dinner at NIPS 2017.
Jennifer Wortman Vaughan shares a transcript of her keynote talk at WiML 2017 in which she discusses 9 things she wishes she had known at her first NIPS in 2005.
A new entrant in the increasingly crowded space of Neural Machine Translation frameworks. Sockeye is a production-ready framework as well as a research platform in Python and built on MXNet that provides scalable training and and inference for the most prominent encoder-decoder architectures.
A new reading comprehension and question answering dataset by researchers from DeepMind that makes the task of reading comprehension more complex by requiring the model to read entire books or movie scripts and answer questions about them.
On many hard tasks such as object recognition on CIFAR-100 or ImageNet, machine translation, or language modeling, SGD generalizes better than Adam. I have outlined in a recent blog post different recent approaches that try to mitigate this. Researchers from Salesforce Research propose a simple method that switches from Adam to SGD whenever a triggering condition is evoked and thus helps with genrealization.