NLP News - Paperclip maximizer, Generative models, Debugging ML, Evolution Strategies, Interpretability, arXiv, ICLR 2018, Multi-hop QA
This edition of NLP News is full of awesome content: Play a benevolent but misguided AI? ✅ Find out how to debug and unit test ML models? ✅ Learn about Evolution Strategies and AlphaGo Zero? ✅ Read case studies on how to apply NLP? ✅ Understand how to make models more interpretable? ✅ Learn more about the role of arXiv for ML and NLP? ✅ Get up-to date on ICLR 2018? ✅ Use cool new datasets on video captioning and multi-hop QA? ✅
Fun and games
If you're interested in AI, chances are that you're aware of Nick Bostrom's famous thought experiment of a paperclip maximizer, a benevolent AI whose sole purpose is to create paperclips, which (spoiler alert) ends up destroying humanity. You can now play the paperclip maximizer yourself and experience the slippery slope of valuing one objective above everything else. The game is simple, addictive, and has even gone viral.
Note: If you're playing the game, there is a known bug that you might not be able to release the hypnodrones (don't ask) due to not having enough in-game memory. Simply open your browser's JS console and type: "memory = memory + 10".
Just in time for Halloween, you can now co-author horror stories together with Shelley, a language model trained on the eerie stories on r/nosleep. Just respond to the stories she starts every hour on her Twitter account and create the first AI-Human horror anthology ever.
Presentations and slides
Jeff Dean's lecture in front of Y Combinator's AI group and the subsequent QA session touch on many themes that are important to Google and to the ML community more broadly such as learning to learn and multi-task learning.
If you're interested in generation or just want to catch up to the latest advances in deep generative models, these slides by Shakir Mohamed and Danilo Rezende are a must-read.
The Deep Learning Book is a great resource for everyone wanting to learn more about Deep Learning. Going through the book by yourself can be a bit daunting, however. Follow along with these videos of the Deep Learning Book Club and talks of leading researchers for each chapter to help you better understand the contents.
Debugging Machine Learning — www.slideshare.net Michał Łopuszyński gives valuable hints for debugging machine learning models in his presentation at PyDataWarsaw 2017: 1) check your code; 2) check your data; 3) examine your features; 4) examine your data points; 5) examine your model; 6) watch out for overfitting; 6) watch out for data leakage; 7) watch out for covariate shift; 8) remember monitoring & maintenance.
Every software engineer knows unit tests are important. But there are no clear best practices and no solid tutorial for how to unit test ML. Google Brain intern Chase Roberts describes unit tests for common scenarios in this blog post.
TensorBoard is a great way to visualize your parameters, metrics, or any other data-dependent parameters. This tutorial gives a clear overview of TensorBoard's most important functions.
David Ha provides a guide to evolution strategies (ES), a technique that has been shown to be useful for many problems where reinforcement learning (RL) is commonly applied. In contrast to RL, ES uses black-box optimisation and can thus ignore gradient information, which allows it to be evaluated more efficiently.
Applying NLP to real-world problems bears many challenges, starting with defining the task. This article describes one such case study, the challenges in trying to extract tasks from emails.
Latent Dirichlet Allocation (LDA) is a useful utensil in the toolbox of every NLP practitioner that allows to reveal themes in a collection of documents. This blog post describes a deficiency of the classic LDA, its inability to specify topics that are known in advance and how GuidedLDA can be used to address this.
Transparency and bias
Bias and transparency are becoming more important issues to consider in the design of automated systems. In this post, Alison Powell outlines some of the directions we should be taking to address the problems of ethics, algorithms and accountability.
To mitigate the effect of unintended consequences when deploying ML models in real-world applications, designing explainable or interpretable models is one important research direction. Catherine Olsson describes in this post how we can use insights from cognitive science such as using prototypes and criticisms to make our models interpretable.
The benefit of ML and NLP is often argued from a business perspective, citing their ability to improve products and disrupt entire industries. This VentureBeat article takes a more emotional angle: It argues -- somewhat philosophically -- that NLP will not only help us better understand one another, but also give us a better understanding of ourselves.
Using the right language in job postings has its benefits. Textio takes at look at the best and worst benefits and perks to mention if you want your role to be filled faster.
About the arXiv
*ACL conferences (ACL, NAACL, EACL) and TACL update their policy to protect double-blind review without sacrificing the positive effects of preprint publishing. The main change: Preprints are not allowed to be posted anymore from 1 month before submission to the notification.
A blog post about training BrundageBot, a neural network that keeps up with the latest ML papers on arXiv by predicting which arXiv papers Miles Brundage, "the Michael Jordan of tweeting arXiv preprints" tweeted.
On the topic of arXiv, this post and paper by Charles Sutton and Linan Gong provides some interesting statistics on the usage of preprints across computer science. For instance, in 2017, fully 23% of papers had e-prints on arXiv, compared to only 1% ten years ago.
If you have time this week or next weekend and you want to experience the bleeding-edge of ML, browse through the 1003 submissions of the International Conference on Learning Representations (ICLR 2018). Credit for the above graph goes to Oriol Vinyals.
Did one of the ICLR 2018 paper's results seem too good to be true or do you want to re-implement the method in your favourite framework? Why not participate in the ICLR 2018 Reproducibility Challenge hosted by Joelle Pineau. Target participants are students taking graduate-level ML courses in Fall 2017.
More articles and blog posts
2D word embedding matrices are commonly used these days, but what about taking things to the third dimension? Stitchfix's Chris Moody shows in this post how we can use 3D word embedding matrices to find clothing items with similar styles.
AI2 Key Scientific Challenges 2017 — allenai.org The Allen Institute for Artificial Intelligence (AI2) makes available ten $10,000 awards to researchers working on key scientific challenges related to facilitating high-impact research in artificial intelligence. Application deadline is November 10.
AlphaGo Zero, the Go-playing program's fourth (and final) iteration learns Go without any human supervision entirely through self-play and arguably is the strongest Go player in history, defeating the version that defeated Lee Sedol 100-0.
Resources and implementations
An open-source implementation of SLING, a parser for annotating text with frame semantic annotations trained using Tensorflow and Google's DRAGNN framework. The paper is here.
An NLP library for Apache Spark featuring common NLP utilities such as tokenization and normalization and downstream implementations such as named entity recognition and sentiment analysis.
Andrew Ng joins the board of Woebot, a startup that seeks to build a chatbot that helps people deal with mental health issues.
While machine translation is getting increasingly close to human-level performance, translations (particularly between less common language pairs) should still be taken with a grain of salt. For instance, last week, a Palestinian man was arrested in Jerusalem as his 'Good morning' message posted on Facebook was mistranslated to 'attack them' and 'hurt them'.
Brian Johnson, Head of Knowledge at Pinterest describes in this post how Pinterest uses NLP to understand how a person’s interests and preferences evolve over time across different categories.
This year's ICLR 2017 best paper Understanding deep learning requires rethinking generalization reinvigorated interest in gaining a better understanding of the generalization behaviour of deep neural networks. In this tradition, this paper with Yoshua Bengio as co-author seeks to provide new theoretical explanations and new direct analyses for generalization in Deep Learning. It also proposes a new family lf generalization terms that takes these new insights into account.
Generative Adversarial Networks are all the rage these days. This paper provides a succinct overview of the most popular current GAN architectures and discusses challenges and directions.
Data such as text often has a hierarchical structure. Existing embeddings in Euclidean space cannot easily model this property. Nickel & Kiela thus propose to learn embeddings in a hyperbolic space, in particular an n-dimensional Poincaré ball. The learned embeddings achieve state-of-the-art performance on determining lexical entailment and require far fewer dimensions than traditional embeddings.
Question answering (QA) has seen many improvements in recent years, particularly fuelled by new datasets such as SQuAD. Existing datasets, however, focus on single-hop reading comprehension, i.e. extracting an answer to a question from a given paragraph. Welbl et al. introduce two new datasets for multi-hop reading comprehension, which is much closer the real-world task of open-domain question answering. The datasets require models to first identify relevant documents among a number of candidate documents and then determine the correct answer.
The SWC is a corpus of aligned Spoken Wikipedia articles from the English, German, and Dutch Wikipedia. It contains hundreds of hours of aligned audio from a diverse set of readers about a diverse set of topics and is thus a great resource for research in cross-lingual speech recognition.
Are you interested in multimodal applications, but bored of image captioning? Then try video captioning with Atomic Visual Actions (AVA). AVA is a new dataset that provides multiple action labels for each person in extended video sequences. It consists of URLs for publicly available videos from YouTube, annotated with a set of 80 atomic actions (e.g. "walk", "kick (an object)", "shake hands") with a total of 210k action labels.
Sentence classification is useful for extracting information in many domains, but labeled data is often not available. PubMed 200k RCT is a new dataset based on PubMed consisting of 200,000 abstracts of randomized controlled trials, totaling 2.3 million sentences. Each sentence of each abstract is labeled with their role in the abstract, e.g. background, objective, method, etc.