NLP News - Review of EMNLP 2017, Analyzing Bias, Google Brain AMA, DRAGNN, and AllenNLP
In this edition of NLP News, I will outline impressions and highlights of the recent EMNLP 2017 and provide links to videos, proceedings, and reviews to catch up on what you missed. Another theme of this newsletter are approaches that analyze different types of bias: From racial bias in police-community interactions and sexism in tennis interviews to gender bias in recruiting and hate speech on reddit.
Presentations and slides
All talks and EMNLP sessions were live-streamed and can still be watched. Cross-reference with the conference program here to find the talks you've missed. Highlights were the keynotes in the Jutland room every morning and the best paper presentations at the end of the conference.
Dan Jurafsky's keynote had two compelling, diametrically opposite focal points: a) The high-stakes, high-reward interdisciplinary effort of analyzing the racial bias in police-community interactions by processing police body-camera videos; and b) the more frivolous analysis of the language of food by processing restaurant menus on Yelp. Another important takeaway is that cross-disciplinary research, while rewarding, is very hard to publish and can be particularly frustrating for students. The whole keynote is worth watching.
The MANN tutorial was one of the most attended tutorials at EMNLP. Have a look at the slides here to make sure that you are up-to-date with the most recent memory augmented models.
Cross-lingual word representations enable cross-linguistic transfer and facilitate reasoning over the semantics of multiple languages. The tutorial gives an overview of the wealth of existing literature and tries to unify different approaches. Slides are available here.
Yoav Goldberg's invited talk at DepLing 2017. He discusses whether LSTMs can learn hierarchy, e.g. verb agreement (they can, but only if explicitly supervised) and gives an overview of easy-first parsing with BiLSTM representations. The main takeaway: The best parsers in the world are based on 1st-order decomposition over a BiLSTM. We still do not understand very well why this simple model works so well and what these vectors actually encode.
Blog posts and articles
Review blog posts are IMO always a great resource, which a) allow conference-goers to check if they have missed anything and b) to update non-attendees. Olof Mogren provides some high-level highlights of EMNLP 2017 here, with a focus on subword-level modeling, multilingual NLP, and inspiration from infants learning. Just as last year, I have also written another highlights blog post, which can be found here. I was mostly excited about cool new datasets, clusters, and lots of research in different areas.
Around the same time as EMNLP, SIGDial, another major conference took place. Mihail Eric reviews the event and touches on the following topics: 1) What is the optimal dialogue data?; 2) What are more interesting tasks than slot-filling?; and 3) What is wrong with our evaluation metrics?
A New York Times article discusses a recent study of a team of Cornell NLPers where they look at gender bias in tennis interviews. In particular, they analyze whether questions such as "After practice, can you put tennis a little bit behind you and have dinner, shopping, have a little bit of fun?” are only posed to women.
As this Westworld scene demonstrates, writing believable dialogue is key to bringing an AI to life.
Dialogue for Conversational UI is hard to write. While not directly NLP-related, these creative writing-related tips are still valuable if you find yourself in need of creating a conversationalist chatbot.
Fast.ai, one of the best online courses for learning Deep Learning, has decided to base their future courses on Pytorch rather than keras or Tensorflow. One of the main motivations for this change is the fact that dynamic methods such as attention and teacher forcing are hard to implement in the latter two, but are easy in Pytorch due to its dynamic computation graphs. DyNet shares similar benefits. For more information regarding Pytorch, have a look at the resources in this past newsletter.
The answers of Google Brain researchers at their 2nd reddit. A few highlights:
Most promising research areas: Learning from weak supervision.
Unsupervised learning: Predicting causal future (next word/frame) works; autoencoding not so much.
Related to to the above, Vincent Vanhoucke outlines that the reason for this is to "better explore the frontier that few others have the resources to explore". This also echoes the sentiment sometimes found in academia that we can make best use of our resources if everyone worked on problems for which they are the most suited, rather than reaching for low-hanging fruits.
When learning about new ML or NLP models, trying to implement them from scratch is often a good choice to understand their trade-offs and assumptions. Erik Lindernoren provides a great repository of reference implementation for many of the most common ML methods. The repo contains even more advanced implementations such as a handwriting generator or an RL model.
Remember Parsey McParseface? Tensorflow DRAGNN (Dynamic Recurrent Acyclic Graphical Neural Network) can be seen as Parsey's successor and is a framework to define and train a dynamic pipeline in Tensorflow. Yep, now you can actually train your DRAGNN.
In the last newsletter, we discussed Google's New Speech Commands Dataset, which contains 65k utterances of 30 short words. AudioSet is New Speech Command's big brother: It consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. It covers a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds.
Talking about speech, Facebook open-sources a Pytorch implementation of VoiceLoop, a neural text-to-speech (TTS) that is able to transform text to speech in voices that are sampled in the wild. You can find some demo samples here.
A not really brief monograph that provides an introduction to key concepts, algorithms, and theoretical frameworks in machine learning. The document is aimed at electrical engineers, but should be helpful for anyone new to ML or interested to learn more.
AllenNLP is an open-source NLP research library by the Allen Institute for Artificial Intelligence, which is built on PyTorch (the library, not the institute). See here for more information. It is great to see models and libraries being made available to be investigated and probed. More than anything, such a showcase demonstrates that our models are at the mercy of any linguist, that they have a poor grasp of anything compositional, and that we still have a long way to go. The EMNLP 2017 workshop on building linguistically generalizable NLP systems deserves to be mentioned in this context.
We've seen many instances of chatbots that tried to hold a conversation over the last year. After chatbots that turned Nazi, one has to wonder what the next step on the evolutionary ladder of chatbots is. According to Microsoft, it is playing Exploding Kittens. On a brighter note, chatbots are gradually becoming more useful and connecting with more users on a more intimate level.
This Wired article follows three startups that try to reduce the bias inherent in hiring in tech. The idea is that the trained models can ignore a job applicant's gender, age, and ethnicity. But as we know, there is no such thing as bias-free data.
Anti-fake-news startup Factmata raises seed funding from Mark Cuban — www.businessinsider.com Factmata, a London-based startup that seeks to score news articles for quality and adds context has raised seed funding from high-profile US investors including billionaire Mark Cuban, Zynga founder Mark Pincus, and Brightmail founder Sunil Paul.
MILABOT was developed by Montreal Institute for Learning Algorithms for the Amazon Alexa Prize competition. The chatbot consists of an ensemble of diverse generation and retrieval models; it contains template-based, bag-of-words, seq2seq, and latent variable models. The system selects an appropriate response from the models in its ensemble using RL.
Hate speech and racial bias is a current pressing issue, and is even more of a problem online where users can hide behind anonymity. In 2015, reddit closed several subreddits, in particular r/fatpeoplehate and r/CoonTown due to violations of Reddit’s anti-harassment policy. This article examines the implications of this ban. The main finding: Closing hate subreddits shuts down haters. The ban worked for reddit: More accounts than expected discontinued using the site; those that stayed drastically decreased their hate speech usage by at least 80% and the problem was not inherited by other subreddits.
For more reading material, have a look at my EMNLP review blog post, which highlights interesting papers.
Thanks for taking the time to read this newsletter. I really appreciate it.
Let me know if you have some interesting, NLP-related content, such as an implementation of a research paper, a cool repo, or an insightful blog post that you'd like to see featured in a future edition of this newsletter.