Welcome to the second edition of NLP News. This time, we will take a more in-depth look at NMT and at one of its key components, beam search. We will also look at food from different angles: from analyzing linguistic markers in menus, to predicting recipes, to learning multimodal representations. Enjoy!
Beam search is an integral component of every state-of-the-art NMT model, but is generally taken for granted. Recent work takes a deeper look at beam search, which demonstrates that vanilla beam search has still a lot of potential for improvement.
What if you already know something about how the output of your model should look like? Grid beam search (Hokamp & Liu, ACL 2017) allows you to incorporate arbitrary constraints on the output of your model without changing your model’s parameters.
In some applications such as neural constituency parsing, we not only store words but also parsing actions on the beam, which can lead to search failure. Word-synchronous beam search (Fried et al., 2017) keeps separate beams based on the number of decoded words.
Joshua Browder’s DoNotPay has beaten 375k parking tickets and saved $9.3 million in fines. This year, it has been helping asylum seekers and can now assist with more than 1k+ legal issues. A great example of making an impact with NLP.
Censorship is commonplace on WeChat and Sina Weibo and goes as far as banning Winnie the Pooh. With more Chinese funding going to AI, we have to make sure that ML & NLP applications are being used ethically.
Google launches an AI initiative that seeks to study how humans interact with AI systems. In other news, we are also running out of cool AI-related acronyms. Remaining: HAIR (for hipsters), LAIR (earmarked for Elon Musk), BAIT, CHAI, …
Compositionality is an inherent part of natural language. Researchers from DeepMind propose a model that can learn abstract hierarchical compositional visual concepts (a mouthful). Using an existing model (the beta-VAE) to learn disentangled representations from visual input, their model learns abstractions from a small number of symbol-image pairs, e.g. example images paired with the symbol “apple”. Using logical operators (AND, etc.) the model then learns to compose these concepts. Blog post.
In recent years, more and more complex models have been used to achieve state-of-the-art results in language modelling on The Penn Tree Bank, the MNIST of NLP. Melis et al. show that a carefully tuned LSTM with current best practices, e.g. with weight-tying, recurrent dropout, down-projection, etc. outperforms all more complex models. The takeaway: LSTMs are here to stay (if that wasn’t clear so far); tune your hyperparameters if you care about state-of-the-art.
If you’re a foodie and always wanted to know how the language of menus in cheap, middle-priced, and expensive restaurants differs, this is the paper for you. Jurafsky and colleagues examine linguistic markers in menus for evidence of 1) authenticity (expensive food is more likely to be highlighted as authentic); 2) educational capital (more expensive restaurants use “fancier” words); 3) framing of plenty (cheaper restaurants use words implying abundance); and 4) implicit signaling of quality (status anxiety of cheaper establishments).
Staying with the food theme, researchers from MIT create a dataset of over 1M cooking recipes and 800k food images. They propose an image-recipe retrieval task and train a neural network to find a recipe given an image. An online demo is also available. Rather than just using the model to retrieve a recipe, I would have loved to see them actually generate the recipe given an image, for instance using the neural checklist model (Kiddon et al., 2016).
We have seen in some papers in the last issue that learning supervised sentence representations is useful for many tasks. Kiela et al. propose to use a multimodal task, i.e. image captioning to learn grounded sentence representations. They show that the learned representations are useful for different classification tasks, entailment, and word similarity. The takeaway: Large enough datasets that require some form of NLU are useful for inducing good general-purpose representations. On which task will representations learned on the image-recipe data from above perform well?
Task-oriented, domain-specific dialogue datasets are scarce. To alleviate this, researchers from Stanford introduce a new dataset consisting of 3,031 multi-turn dialogues with an in-car assistant. The dataset covers three domains: calendar scheduling, weather information retrieval, and point-of-interest navigation.