🎅 NeurIPS 2018; The nature of research; Advances in image generation, protein folding, and RL
Happy holiday season wherever you are based! 🎅⛄️🏖😎
It's only been a month since the last newsletter, but soo much has been going on in the world of ML and NLP. NIPS changed its name (now: NeurIPS) and already took place (see below for some highlights). There've been major advances in three ML areas: image generation, protein folding, and reinforcement learning. There've also been some insightful articles about what it means to do research. Finally, we saw some high-profile articles about NLP, tons of papers, and lots of articles.
I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.
NeurIPS 2018 🤖
Session 1: Room 220 E 🗣 If you're interested in NLP, then this session is for you. It covers the following papers:
On the Dimensionality of Word Embedding. A theoretically motivated method to choose the optimal dimensionality of word embeddings.
Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces. Aligning non-parallel corpora speech and text using techniques from cross-lingual word embedding models, e.g. MUSE.
Diffusion Maps for Textual Network Embedding. Measuring connectivity between texts in a graph.
A Retrieve-and-Edit Framework for Predicting Structured Outputs. Using a retrieval model and an edit model for structured outputs.
Visualization for Machine Learning Tutorial 🎨 A gorgeous tutorial on best practices, theoretical and practical insights for visualization. At 1:40:29, the presenters discuss a case study on visualizing the embeddings of Google's multilingual NMT model.
Unsupervised Learning Tutorial 👩🎓 A two-part tutorial on unsupervised learning. In the first part (slides), Alex Graves discusses the pros and cons of auto-regressive models (models that condition on previous timesteps such as an RNN) for unsupervised learning. This is particularly interesting in light of recent advances in image generation (see below). In the second part (slides), Marc'Aurelio Ranzato discusses the differences between unsupervised learning in computer vision and NLP. The NLP part starts at 1:16:00.
Test of Time Award 🏛 Olivier Bousquet accepts the test-of-time-award (at 55:45) for the paper on The Tradeoffs of Large Scale Learning, which showed for the first time theoretically that stochastic gradient descent is a good choice in the large-data regime. Three main takeaways from his talk:
Scientific truth does not follow the fashion. Do not hesitate being a contrarian if you have good reasons.
Experiments are crucial. Do not aim at beating the state-of-the-art. Aim at understanding phenomena.
On the proper use of mathematics: A theorem is not like a subroutine that can apply blindly. Theorems should not limit creativity.
What Bodies Think About 👀 Michael Levin talks about bioelectric computation outside the nervous system. Some highlights from his talk:
Despite their brains being liquefied during metamorphosis, caterpillars retain their memories as butterflies. 😮
Planarians (flat worms that can regenerate parts of their body) also regenerate their memories when their brains are removed.
If a planarian is cut into pieces, each piece regrows the rest of the body. 😵
Regeneration is a computational problem. There are huge opportunities for ML applied to regenerative medicine.
Somatic tissues (cells in the body) form bioelectric networks like the brain and make decisions about anatomy.
The nature of research: Asking the right questions
There have been some great articles this month about what it means to do research. In particular, this article by Daniel Lemire touches on a very important point:
Yet here is how many researchers work. They survey the best papers from the last major conference or journal issue in their field. Importantly, they make sure to read what everyone is reading and to make sure to make theirs the frame of minds of the best people. They make sure that they can repeat the most popular questions and answers. They look at the papers, look for holes or possibilities for improvement and work from there. What this ensures that there are a few leaders (people writing about genuine novel ideas) followed by a long and nearly endless stream of “me too” papers that offer minor and inconsequential variations.
This struck a chord with me because it feels so familiar. Reading papers for inspiration is great. At the same time, reading too many bears the risk of groupthink. Even Jeff Dean recommended to read many papers superficially. Research requires to dig deeper, not to follow in existing footsteps just because they are convenient. Remember the underlying problem, rather than the method used to address it. Read papers on a problem you care about and seek to understand what the model gets wrong. Asking the right questions, ones that have not been asked before, is important.
Vincent Vanhoucke shares his thoughts on what it means to be a research scientist:
Measuring progress in units of learning, as opposed to units of solving, is one of the key paradigm shifts one has to undergo to be effective in a research setting.
On a similar note, Herman Kamper, Stephan Gouws, and I collected advice from 20+ NLP experts on 4 big questions regarding NLP and research, such as by Yoshua Bengio:
Be ambitious. Do not limit yourself to reading NLP papers. Read a lot of machine learning, deep learning, reinforcement learning papers. A PhD is a great time in one's life to go for a big goal, and even small steps towards that will be valued.
Research has significantly changed over the last 100 years. Time and money invested in research is increasing, but research output is staying mostly constant. This article explores possible causes such as that the productivity of scientists is decreasing, potentially due to information overload. There are some tools such as AI2's Semantic Scholar that seek to help with this (see here for a brief analysis). Another possible reason might be that "constant external stimulation inhibits deep thinking", as argued in this article. Finally, the research environment has become more hectic, with higher expectations and mis-aligned incentives, leaving less time for critical thinking. In 10 years, we won't remember the incremental improvements, but only the big ideas. It's up to us to leave a mark
Advance #1: Image generation 🌈
Just in the last months, there have been some amazing advances in image generation.
Another impressive recent result was achieved by subscale pixel networks with multidimensional scaling. The impressive achievement here is to show that auto-regressive networks can generate high-fidelity images (see the ICLR reviews for more context).
We have finally had two advances by NVIDIA:
An interactive virtual world rendered completely using deep learning. This hints that we might soon see neural networks generate not only the textures but entire graphics of 3D video games.
A style-based generator for GANs that combines style transfer techniques with image generation The model can separate high-level attributes from low-level ones and can generate life-like images along different dimensions. See here for a video of the results.
Advance #2: Protein folding 👩🔬
In a biennial competition on protein folding, the model by DeepMind significantly improved upon the competition. The competition (CASP13) focuses on modelling novel protein folds. The improvement by DeepMind is about twice the improvement that would have been expected given the recent historical trend. This post for provides more context on the significance of the results. It is well worth reading. In particular, it explores in-depth how academic groups (and in particular pharma) have neglected this problem. It also features this beautiful quote on the nature of research:
I believe that science, at its most creative, is more akin to a hunter-gatherer society than it is to a highly regimented industrial activity, more like a play group than a corporation. – Marc Kirschner
Advance #3: Reinforcement learning 👾
Uber announced that they solved Montezuma's Revenge, one of the hardest Atari games with very sparse rewards. Their approach introduces some clever tricks such as remembering good states for exploration and returning to states first before exploring. Here are the slides of the presentation at the DeepRL Workshop at NeurIPS 2018. Alex Irpan comments on the approach in this post.
Presentations and slides
Creativity and AI 👾 Demis Hassabis talks about the current state of machine learning and reinforcement learning, including Atari, AlphaGo, imagination move 37, the current state of machine creativity, imagination (both neuroscience background and recent AI work) at the Royal Academy of Arts.
Open Domain Dialogue Generation and Practice in Microsoft Xiao Bing 📋 Slides (translated by CognitionX) on the techniques used in Microsoft Xiaoice, a chatbot used by hundreds of millions in Asia.
Deep Chit-Chat: Deep Learning for ChatBots 💬 Slides of the chatbot tutorial at EMNLP 2018.
Controlling Text Generation Harvard's Alexander Rush discusses how to generate text based on other inputs (structured data, multimodal, etc.) We are missing bigger and richer models. Specific choices can be exposed as discrete latent variables.
Tools and implementations ⚒
Visualizing the Loss Landscape of Neural Nets 🎨 Code for the NeurIPS 2018 paper Visualizing the Loss Landscape of Neural Nets that allows you to visualize the loss landscape of the model near the optimal parameters.
PyText 🔨 Facebook's just open-sourced PyTorch NLP library that enables fast experimentation and includes prebuilt architectures and utilities.
NIPS --> NeurIPS 📛 The Neural Information Processing Systems Conference changes its acronym to NeurIPS. See here for an article about the name change by Daniela Witten, Elana Fertig, Anima Anandkumar, and Jeff Dean.
Next ICLR in Addis Ababa 🇪🇹 The next ICLR will take place in Addis Ababa, Ethiopia.
Germany plans 3 billion in AI investment 💰 Germany announces plans to invest 3 billion in AI.
ELLIS launch 🚀 In other European AI news, the European Laboratory for Learning and Intelligent Systems (ELLIS) officially launched at NeurIPS 2018.
CMU Wilderness Multilingual Speech Dataset 💬 A dataset of over 700 different languages providing audio, aligned text and word pronunciations.
Visual Commonsense Reasoning 🖼 A new task and large-scale dataset for cognition-level visual understanding.
AWS ML training 👩🎓 A new AWS ML curriculum consisting of 30+ ML courses and 45+ hours of content.
A Programmer's Introduction to Mathematics 👩💻 This book aims to teach mathematics to programmers. Each chapter is accompanied by real-world examples and implementations.
PhD Applications — Everything You Need to Know 👩🎓 Tim Dettmers gives comprehensive advice for PhD applications in ML and NLP.
Privacy Preserving Deep Learning with PyTorch & PySyft 👤 An in-depth tutorial for learning about privacy preserving machine learning.
Blog posts and articles
what words mean - towards a multidisciplinary approach to bias in nlp 🔬 A neuroscience and linguistics inspired take on recent advances in transfer learning and the increasing problem of bias in NLP.
Inside Facebook's fight to beat Google and dominate in AI ⚔️ This article discusses the origin of Facebook AI Research (FAIR) and the rivalry between FAIR and Google. It explores applications where supervised learning is not enough; even though it mentions low-resource MT, it only discusses reinforcement learning.
Finally, a Machine That Can Finish Your Sentence 🤖 A New York times article that does a good job of presenting the opportunities and challenges associated with recent advances in transfer learning for NLP.
How to Teach Artificial Intelligence Some Common Sense 📋 An in-depth Wired article about ongoing research to incorporate common sense in AI.
Multitask learning: teach your AI more to make it better 📖 A nice article about multi-task learning with four practical examples on how to use it in practice by Alexandr Honchar.
The deepest problem with deep learning 🕳 Gary Marcus argues for a combination of neural networks and symbolic reasoning for natural language understanding.
Meta-Learning: Learning to Learn Fast 👩🏫 An incredibly comprehensive and well-written article about meta-learning by Lilian Weng. Definitely worth a read!
Variational Autoencoders Explained in Detail 🖼 Yoel Zeldes explains how to implement VAEs, including understandable TensorFlow code.
The AI apocalypse is already here 🤖 This article illustrates the problems with current AI with the example of Predictim, a company that claims to analyze the social media accounts of potential babysitters for drug use, explicit content, and bullying or disrespectful behavior and produces a "riskiness" score.
Machine Learning & AI Main Developments in 2018 and Key Trends for 2019 🤖 KDNuggets interviews leading AI researchers including Anima Anandkumar, Pedro Domingos, Nikita Johnson, and Rachel Thomas about their highlights of 2018 and predictions for 2019.
Unsupervised Neural Networks Fight in a Minimax Game 🤺 A post by Jürgen Schmidhuber on the minimax principle employed by unsupervised models.
I don’t think you should tie your dreams to external markers of status, like working for a specific big-name company, or making a specific amount of money, or attaining a specific job title. Figure out what you value in life, and then stay true to your values. You’ll never have to regret a single decision.
Reward learning from human preferences and demonstrations in Atari (NeurIPS 2018) In a rare collaboration between DeepMind and OpenAI, the authors combine learning from expert demonstrations (directly mimicking what the human is doing) and trajectory preferences (a binary signal which of two trajectory segments is better). The agent is first pretrained on expert demonstrations via imitation learning. They learn a reward model that predicts the reward from expert demonstrations and trajectory preferences, which is used to improve the agent.
Molecular Transformer for Chemical Reaction Prediction and Uncertainty Estimation (ML for Molecules and Materials Workshop, NeurIPS 2018) This article treats the problem of biochemical reaction prediction as an MT problem (strings of reactants-reagents need to be "translated" into their products). They apply a Transformer to this task and find that the model outperforms all algorithms in the literature (top-1 accuracy >90%) on a standard benchmark dataset. This again shows a) how universally useful the seq2seq framework is and b) that the inductive bias of the Transformer architecture is not only useful for natural language, but for a wide set of modalities.
For some more holiday reading, check out the 41 NLP papers at NeurIPS 2018.