Highlights of this newsletter: A collection of ML papers on cats; a tool for multi-agent reinforcement learning; a lightweight library for training GANs; a tool for creating unsupervised multilingual embeddings; an introduction to Gaussian Processes; a tutorial on using the Word Mover’s Distance; an introduction to Gradient Boosting; everything you need to know about Neuroevolution; many more highlights, slides, and presentations from NIPS 2017.
“[T]he algorithm is the thing we had a relationship with since the beginning. […] We learned to fuel it and do whatever it took to please the algorithm.”
- a dystopian quote by a Youtube creator on his relationship with Youtube (source: Buzzfeed)
Arguably, 90% of the Internet is cats. Understanding, modeling and synthesizing our feline friends is thus an important research problem. This website highlights all recent ML papers that employ cats in different ways.
Git is hard: screwing up is easy, and figuring out how to fix your mistakes is next to impossible, as in many cases, you already need to know the solution to your problem in order to be able to google for it. This website contains the most common problems in git and their solutions in plain English.
Gaussian Processes may not be as hyped as Deep Neural Networks, but are useful in many ways, e.g. optimizing your hyperparameters (Bayesian Optimization), obtaining confidence estimates, etc. Alex Bridgland provides an excellent intro to GPs in this blog post. You can find the notebook here.
We are very good at measuring the similarity between words (using e.g. word2vec) but what about documents? Word Mover’s Distance (adapted from Earth Mover’s Distance) provides an intuitive way on how to do this and gensim clearly shows how to use the technique in this excellent tutorial.
OpenAI first showed that Evolution Strategies (ES) can be used to train deep neural networks. This blog post highlights a set of compelling findings from 5 recent papers of Uber Engineering that suggest that using genetic algorithms may be a competitive alternative to SGD for training deep neural networks for reinforcement learning.
The main neuroevolution paper from Uber. A simple genetic algorithm (GA) outperforms Q-learning (DQN) and policy gradients (A3C) on hard deep RL problems. The GA parallelizes better than (and is thus faster than) ES, A3C, and DQN. Surprisingly, on some games even random search substantially outperforms DQN, A3C, and ES (but not the GA).
With recent advances in speech synthesis, audio samples are now more human-like than ever. This website contains audio samples from the current state-of-the-art model Tacotron 2 as well as a Turing test. Can you differentiate the Tacotron 2 output from speech produced by a human?
The Toxic Comment Classification Challenge on kaggle challenges you to build a model that is able to detect different types of toxicity like threats, obscenity, insults, and identity-based hate in online comments. N-gram based features are useful in this competition and NB-SVM is a strong baseline.
Awni Hannun gives some practical tips for training sequence-to-sequence models with
attention and focuses on a few tips which even Deep Learning practitioners mihgt y . If you have experience training other types of deep neural networks,
pretty much all of it applies here. This article focuses on a few tips you
might not know about, even with experience training other models.
The polite tone of personal assistants such as Siri and Alexa can get boring at times. Personal assistants often try to add jokes in order to come across as more interesting; a chatbot in Russia takes this one step further and employs a lot of sass and dark humor, much to the enjoyment of its 1.5M daily active users.
Discrimination and harassment is a big issue in the tech sector, but has also recently become a problem at ML conferences. This blog post by Stephen Merity discusses steps that we can take to make our community and conferences more inclusive.
Textio analyzes what the language used in 25,000 recent job descriptions tells us about the corporate cultural norms of leading tech companies. For example, Amazon emphasizes a “fast-paced environment”, while FB highlights “our family”.
Ali Rahimi and Ben Recht expand on a few points raised in response to their somewhat controversial “test of time” talk, which highlighted the growing gap between our field’s understanding of its techniques and its practical successes.
The presentations of all speakers at the Deep Learning At Supercomputer Scale Workshop that aims to reduce the training time of neural network models and increase the productivity of machine learning researchers.
A new entrant in the increasingly crowded space of Neural Machine Translation frameworks. Sockeye is a production-ready framework as well as a research platform in Python and built on MXNet that provides scalable training and and inference for the most prominent encoder-decoder architectures.
A new reading comprehension and question answering dataset by researchers from DeepMind that makes the task of reading comprehension more complex by requiring the model to read entire books or movie scripts and answer questions about them.
On many hard tasks such as object recognition on CIFAR-100 or ImageNet, machine translation, or language modeling, SGD generalizes better than Adam. I have outlined in a recent blog post different recent approaches that try to mitigate this. Researchers from Salesforce Research propose a simple method that switches from Adam to SGD whenever a triggering condition is evoked and thus helps with genrealization.