NLP News - Cat ML Papers, Multi-agent RL tool, TFGAN, MUSE, Intro to GPs, Word Mover's Distance tutorial, Gradient Boosting from scratch, Neuroevolution, More from NIPS '17
Highlights of this newsletter: A collection of ML papers on cats; a tool for multi-agent reinforcement learning; a lightweight library for training GANs; a tool for creating unsupervised multilingual embeddings; an introduction to Gaussian Processes; a tutorial on using the Word Mover's Distance; an introduction to Gradient Boosting; everything you need to know about Neuroevolution; many more highlights, slides, and presentations from NIPS 2017.
"[T]he algorithm is the thing we had a relationship with since the beginning. [...] We learned to fuel it and do whatever it took to please the algorithm.”
- a dystopian quote by a Youtube creator on his relationship with Youtube (source: Buzzfeed)
Fun and games
The 1st Conference on Pokémonastics — 1stpokemonastics.wordpress.com
The First Conference on Pokémonastics will be held at Keio University in May 2018. If you want to geek out on the linguistics of Pokémon names, then this conference is for you.
Cat Paper Collection — people.eecs.berkeley.edu
Arguably, 90% of the Internet is cats. Understanding, modeling and synthesizing our feline friends is thus an important research problem. This website highlights all recent ML papers that employ cats in different ways.
Tools and implementations
MAgent (AAAI 2018) — github.com
MAgent is a a research platform for many-agent reinforcement learning that scales up from hundreds to millions of agents and allows for cool battle scenarios as shown in this demo.
TFGAN, a lightweight GAN library — research.googleblog.com
TFGAN is a lightweight library designed to make it easy to train and evaluate GANs. It provides the infrastructure to easily train a GAN and well-tested loss and evaluation metrics.
Photorealistic Interactive Environments for AI Agents
The Allen Institute for AI (AI2) launches AI2-THOR, an open-source set of 3D photo-realistic scenes hosted within the Unity3D game engine, which can be used as interactive environments for AI agents.
Multilingual Unsupervised or Supervised Word Embeddings (MUSE) — github.com
Recent papers by Facebook on unsupervised word-level translation and unsupervised Neural Machine Translation have received good reviews at ICLR 2018. This repository contains the implementation of their first paper, which can be used to learn state-of-the-art multilingual embeddings in an (un)supervised way.
Git is hard: screwing up is easy, and figuring out how to fix your mistakes is next to impossible, as in many cases, you already need to know the solution to your problem in order to be able to google for it. This website contains the most common problems in git and their solutions in plain English.
Tutorials
Introduction to Gaussian Processes - Part I — bridg.land Gaussian Processes may not be as hyped as Deep Neural Networks, but are useful in many ways, e.g. optimizing your hyperparameters (Bayesian Optimization), obtaining confidence estimates, etc. Alex Bridgland provides an excellent intro to GPs in this blog post. You can find the notebook here.
Finding similar documents with Word2Vec and Word Mover's Distance — github.com
We are very good at measuring the similarity between words (using e.g. word2vec) but what about documents? Word Mover's Distance (adapted from Earth Mover's Distance) provides an intuitive way on how to do this and gensim clearly shows how to use the technique in this excellent tutorial.
Gradient Boosting from scratch — medium.com
Gradient Boosting (most often seen in its 'extreme' variant, XGBoost) arguably is the most successful algorithm in kaggle competitions. This blog post provides a nice introduction to the algorithm.
Math as code: a cheat sheet for mathematical notation in code form — github.com
Reading academic papers is hard, in particular trying to understand often obscure mathematical notation. This aims to ease developers into mathematical notation by showing comparisons with JavaScript code.
Deep Neuroevolution
Welcoming the Era of Deep Neuroevolution — eng.uber.com
OpenAI first showed that Evolution Strategies (ES) can be used to train deep neural networks. This blog post highlights a set of compelling findings from 5 recent papers of Uber Engineering that suggest that using genetic algorithms may be a competitive alternative to SGD for training deep neural networks for reinforcement learning.
The main neuroevolution paper from Uber. A simple genetic algorithm (GA) outperforms Q-learning (DQN) and policy gradients (A3C) on hard deep RL problems. The GA parallelizes better than (and is thus faster than) ES, A3C, and DQN. Surprisingly, on some games even random search substantially outperforms DQN, A3C, and ES (but not the GA).
Gradient descent vs. neuroevolution — towardsdatascience.com
A nice explainer blog post by Lars Hulstaert where he gives some more intuitions on the differences between optimizing with gradient descent and neuroevolution.
More blog posts and articles
Putting the Linguistics in Computational Linguistics — naacl2018.wordpress.com
Emily Bender highlights for 4 ways in which we can make our research papers more linguistically informed.
With recent advances in speech synthesis, audio samples are now more human-like than ever. This website contains audio samples from the current state-of-the-art model Tacotron 2 as well as a Turing test. Can you differentiate the Tacotron 2 output from speech produced by a human?
Toxic Comment Classification Challenge
The Toxic Comment Classification Challenge on kaggle challenges you to build a model that is able to detect different types of toxicity like threats, obscenity, insults, and identity-based hate in online comments. N-gram based features are useful in this competition and NB-SVM is a strong baseline.
Deep Learning Achievements Over the Past Year — blog.statsbot.co
This article gives a nice summary of some of the highlights and most significant developments of Deep Learning applications in text, speech, and vision in 2017.
Training Sequence Models with Attention
Awni Hannun gives some practical tips for training sequence-to-sequence models with attention and focuses on a few tips which even Deep Learning practitioners mihgt y . If you have experience training other types of deep neural networks, pretty much all of it applies here. This article focuses on a few tips you might not know about, even with experience training other models.
In Russia, There’s an AI Helper That Makes Fun of You—and It’s Wildly Popular — www.technologyreview.com
The polite tone of personal assistants such as Siri and Alexa can get boring at times. Personal assistants often try to add jokes in order to come across as more interesting; a chatbot in Russia takes this one step further and employs a lot of sass and dark humor, much to the enjoyment of its 1.5M daily active users.
Bias is not just in our datasets, it's in our conferences and community — smerity.com
Discrimination and harassment is a big issue in the tech sector, but has also recently become a problem at ML conferences. This blog post by Stephen Merity discusses steps that we can take to make our community and conferences more inclusive.
Is AlphaZero really a scientific breakthrough in AI? — medium.com
A sober, sceptical take on AlphaZero's much-praised recent victories against the open-source chess engine Stockfish by Jose Camacho Collados.
1000 different people, the same words — textio.ai
Textio analyzes what the language used in 25,000 recent job descriptions tells us about the corporate cultural norms of leading tech companies. For example, Amazon emphasizes a "fast-paced environment", while FB highlights "our family".
Everything is a Model — deliprao.com
Delip Rao reviews the recent paper on Learned Indexes by Google and provides 9 takeaways on how Machine Learning will impact systems design. The original paper is also well worth a read!
More from NIPS 2017
Deep Learning: Practice and Trends slides
The slides of the above Deep Learning: Practice and Trends tutorial by Scott Reed, Nando de Freitas, and Oriol Vinyals.
An Addendum to Alchemy — www.argmin.net Ali Rahimi and Ben Recht expand on a few points raised in response to their somewhat controversial "test of time" talk, which highlighted the growing gap between our field’s understanding of its techniques and its practical successes.
MLTrain Workshop materials — github.com
All notebooks from the Learn How to Code a Paper with State-of-the-Art Frameworks Workshop can be found in this repository.
Deep Learning At Supercomputer Scale Workshop
The presentations of all speakers at the Deep Learning At Supercomputer Scale Workshop that aims to reduce the training time of neural network models and increase the productivity of machine learning researchers.
Did you miss NIPS or were only able to attend a few sessions? You can find here the most comprehensive summary of NIPS 2017 in 38 pages of notes by David Abel.
My talk at the inaugural Black in AI workshop dinner — medium.com
Simon Osindero shares a transcript of his speech at the first Black in AI workshop dinner at NIPS 2017.
Nine things I wish I had known the first time I came to NIPS — medium.com
Jennifer Wortman Vaughan shares a transcript of her keynote talk at WiML 2017 in which she discusses 9 things she wishes she had known at her first NIPS in 2005.
Paper picks
Sockeye: A Toolkit for Neural Machine Translation (arXiv)
A new entrant in the increasingly crowded space of Neural Machine Translation frameworks. Sockeye is a production-ready framework as well as a research platform in Python and built on MXNet that provides scalable training and and inference for the most prominent encoder-decoder architectures.
The NarrativeQA Reading Comprehension Challenge (arXiv)
A new reading comprehension and question answering dataset by researchers from DeepMind that makes the task of reading comprehension more complex by requiring the model to read entire books or movie scripts and answer questions about them.
Improving Generalization Performance by Switching from Adam to SGD (arXiv)
On many hard tasks such as object recognition on CIFAR-100 or ImageNet, machine translation, or language modeling, SGD generalizes better than Adam. I have outlined in a recent blog post different recent approaches that try to mitigate this. Researchers from Salesforce Research propose a simple method that switches from Adam to SGD whenever a triggering condition is evoked and thus helps with genrealization.