NLP News - Data selection, ML & NLP in Esports, VQA, bias & lyric annotations

For this issue, we will take a look at different areas that optimize different parts of the training
NLP News
NLP News - Data selection, ML & NLP in Esports, VQA, bias & lyric annotations
By NLP News • Issue #4 • View online
For this issue, we will take a look at different areas that optimize different parts of the training data. We will then talk briefly about how ML & NLP can be used in esports. We’ll have some Deep Learning-related reports from industry and paper highlights on VQA, NMT, and bias detection. Finally, we will talk about a cool new dataset of annotated song lyrics.

NLP in-depth: Data selection
It is common wisdom that the nature of the training data is at least as important as the choice of the model. Different areas deal with particular aspects of the training data. While many of these areas have been heuristics-based, a common thread of recent work is that data selection policies are increasingly learned:
  • Large amounts of unlabelled data are often available, but annotating all examples is prohibitive. Active learning can be used to interactively obtain annotations from human experts for helpful unlabelled examples during training. Recent work reframes active learning as RL (Fang et al., EMNLP 2017).
  • For online algorithms, often not just the choice of the training data but also the order in which it is presented to the model is relevant. Curriculum learning (Bengio et al., ICML 2009) orders the training data to maximize the model’s performance. One application where this makes a difference is learning word embeddings (Tsvetkov et al., ACL 2016).
  • Other work deals with selecting data that is particularly relevant for the model. For NMT, we can select data that is similar to the in-domain data (van der Wees et al., EMNLP 2017). For transfer learning, we can select examples that are helpful for our target domain (Ruder & Plank, EMNLP 2017).
ML & NLP in Esports
A major highlight of the past week has been the defeat of pro gamers in the multiplayer online battle arena game Dota 2 1v1 by a bot created by OpenAI. Reddit coverage can be found here and here. Denny Britz provides some perspective on the hype here. While the milestone is impressive, many of the same techniques have already been applied to beat pros in other esports, e.g. SSBM (Firoiu et al., 2017). The main takeaway: Self-play, i.e. pitting your model against itself, is a powerful catalyst. It will be interesting to see what other applications we can find for self-play. Related approaches might be dual learning, e.g. for NMT (He et al., NIPS 2016). Other things we can do with esports? How about automatically predicting video highlights using audience chat reactions (Fu et al., EMNLP 2017).
Conference Countdown
Industry Insights: Deep Learning is 🔥
Plasticity wants to help chatbots seem less robotic
Tetra raises a $1.5M seed round to bring deep learning to voice transcription
Paper picks
Dataset spotlight
Did you enjoy this issue?
NLP News
NLP News is a biweekly Natural Language Processing & Machine Learning newsletter from academia & industry curated by @seb_ruder.
Carefully curated by NLP News with Revue. If you were forwarded this newsletter and you like it, you can subscribe here. If you don't want these updates anymore, please unsubscribe here.