NLP News - LaTeX in FB, ML glossary, Language of Hip Hop, VI reference, Faster LSTMs, Lego sets

Oct 02, 2017

This edition of the newsletter contains tons of interesting articles and resources: A comprehensive glossary of ML terms? ✅ What words are "most hip hop"? ✅ All you ever wanted to know about variational inference? ✅ 15% faster LSTMs in keras? ✅ Which lego set has the most surprising colors? ✅ And lots more...

Facebook Messenger now has LaTeX support!

Facebook Messenger finally has LaTeX support! (We all waited for this, right?) Simply wrap your LaTeX with $$ on each side.

Slides and presentations

Posters and Presentations of the ACL — www.slideshare.net

Unknown to most, the ACL Anthology does not only consist of research papers. It also contains -- as of now -- 87 posters and presentations that have been presented at events this year.

Towards Teaching Machines to Read and Reason

Are you interested in Machine Reading and Reasoning? Check out the slides of Sebastian Riedel's presentation on his latest work at UCL and Bloomsbury AI at the RE•WORK London Deep Learning Summit.

Resources

Facebook AI Research Sequence-to-Sequence Toolkit in Python — github.com

Facebook AI Research's sequence-to-sequence toolkit written in Python with implementations of the fully convolutional model in Convolutional Sequence to Sequence Learning.

Machine Learning Glossary — developers.google.com

An extensive glossary of Machine Learning terms powered by Google. Your go-to reference the next time you confuse empirical risk with structural risk minimization. Don't forget to bookmark!

One of the largest publicly available chest x-ray datasets — www.nih.gov

The dataset is a good example of how NLP and computer vision can complement it each other: It was collected by applying NLP to radiology reports to mine them for 8 disease categories. It consists of scans of more than 30,000 patients, including many with advanced lung disease. The paper can be found here and the dataset is available here.

15% faster stacked LSTM in keras — gist.github.com

If you're into NLP, chances are that you're using stacked LSTMs in keras. By simply changing the way the stacking is applied, your LSTM can now run 15% faster using this new implementation.

Implementation of The Rainbow Parser — github.com

To add to the colourful parsing literature, this GitHub project contains code for the spectral Rainbow Parser for training and decoding with latent-variable probabilistic context-free grammars (L-PCFGs).

Variational Inference & Deep Learning: A New Synthesis

Diederik Kingma's PhD thesis is now online and serves as a key resource for everyone interested in learning more about how variational (Bayesian) inference, generative modeling, and its intersections with Deep Learning. Kingma is the creator of Adam and Variational Autoencoders.

Articles and blog posts

The arXiv of the future will not look like the arXiv — www.authorea.com

The role of arXiv has been hotly debated in recent months. This is an interesting analysis that adds to the picture. It surveys the strengths of arXiv and its weaknesses and tries to identify possible improvements based on new technologies not previously available.

When taking a stand involves sitting — languagelog.ldc.upenn.edu

Jason Eisner analyzes the metaphorical dimensions of the recent debate about NFL players kneeling during the US national anthem.

A new theory sheds more light on the generalization ability of NNs — www.quantamagazine.org

An article about Naftali Tishby's theory of the information bottleneck as a means for better understanding the generalization behaviour of deep neural networks. Briefly, Tishby argues that NNs learn via compression and that training generally consists of a short "fitting" phase (where the model learns to label the data) and a much longer "compression" phase.

Untangling biases and nuances in double-blind peer review at scale — coling2018.org

Emily Bender and Leon Derczynski discuss how to remove biases from reviewing in this new COLING 2018 PC blog post.

Reddit Limits Noxious Content by Giving Trolls Fewer Places to Gather — www.nytimes.com

A NTY article about a study showing that reddit reduced hate speech by shutting down two particularly poisonous forums.

The Language of Hip Hop — pudding.cool

All you ever wanted to know about the language of hip hop: from the words that are "most hip hop" to the words central to each artist. With beautiful and interactive diagrams!

An end-of-life chatbot that helps with difficult final decisions — www.newscientist.com

A virtual assistant helps people who are terminally ill feel less anxious about death and more ready to complete their last will and testament.

We Need to Talk About Kevin: My attempt to build a trustworthy Twitter bot — hackernoon.com

Kevin is an active member of #TheResistance and contributes to liberal liberal social media campaigns. He is also a bot. A deep dive into what it takes to make a convincing Twitter bot.

Joint IWPT + DepLing Panel — medium.com

Miryam de Lhoneux shares her notes of the Joint IWPT and DepLing Panel discussion featuring Eva Hajicova, Joakim Nivre, Stephan Oepen, Kenji Sagae and David Hall.

LEGO color themes as topic models — nateaff.com

We can use topic models to explore the topics of articles, expose hidden semantic structures, reveal common themes, and more. Guess what they're also useful for? To explore the color themes of LEGO sets! Wheee!

A 10 min intro to sequence-to-sequence learning in Keras — blog.keras.io

Using keras and interested in sequence-to-sequence models? Here's a 10 min intro on how to implement seq2seq RNNs in keras.

Some Thoughts on the Future of NLP Conferences

Dirk Hovy shares his thoughts on how the recent growth in our field will impact the future of NLP conferences. In particular, he explores how it will affect reviewing, organization, and structure of the actual conference.

Industry insights

Google AI chief says language understanding is the holy grail of AI — www.facebook.com

John Giannandrea, Google's AI chief says that language understanding is the holy grail of AI. You can find the full article and video on TechCrunch.

CallDesk raises $2.5 million for its AI agent for customer support calls — techcrunch.com

Paris-based CallDesk raises $2.5 million for its Siri-for-customer-calls virtual assistant. It already has 10 big enterprise clients and supports 54 languages.

Challenge set Evaluation of DeepL MT system

Remember DeepL whose MT system seemed to blow the competition out of the water? Pierre Isabel describes how they evaluated DeepL using their challenge set consisting of 108 handcrafted short English sentences that target particular weaknesses of MT systems (see paper). DeepL reduces the error rate compared to Google's model by 50%!

Paper picks

[1709.07432] Dynamic Evaluation of Neural Sequence Models

Krause et al. propose to use dynamic evaluation to improve language models by adapting models to the recent history. They improve the state-of-the-art on Penn Treebank, WikiText-2, and the Hutter prize dataset.

[1709.07809] Neural Machine Translation

Philipp Koehn, one of the creators of phrase-based machine translation, shares his 117-pages long draft chapter on neural machine translation. A great way to get started with NMT for beginners; for experts, the chapter on current challenges still provides plenty food for thought.

[1709.09816] Edina: Building an Open Domain Socialbot with Self-dialogues

An overview of Edina, the University of Edinburgh's entry for the Amazon Alexa Prize competition. The main novelty lies in the use of self-dialouges, which are conversations that were created by a single Amazon Mechanical Turk worker playing both participants in a dialogue. The complete model consists of a rule-based model backing off to a matching score, backing off to a generative neural network.

NLP News