NLP News - PyTorch DeepMoji, AutoML, GANs, DisSent, and DilatedRNN
This edition of the newsletter touches on many diverse topics, such as implementing an emotion detection model in PyTorch, augmenting neural networks with prior information, sonifying Trump tweets, real-time translation, making WaveNet 1,000x faster, and a new parallelizable type of RNN.
Articles and blog posts
Understanding emotions — from Keras to pyTorch — medium.com
Blog post about a Pytorch implementation of the DeepMoji model by HuggingFace 🤗 with several interesting implementation details in Pytorch. A good introduction for writing code in Pytorch for anyone who wants to try it out.
Attention in Neural Networks and How to Use It — akosiorek.github.io
A blog post on attention in neural networks with two implementations of soft attention. The focus is on visual attention, but the variants such as hard, soft, and Gaussian attention as well as interesting models such as the Spatial Transformer are still relevant for NLPers.
NLP superstar Regina Barzilay wins MacArthur "genius grant" — news.mit.edu
Regina Barzilay, a professor at MIT CSAIL who does research in natural language processing and machine learning, is a recipient of a 2017 MacArthur Fellowship. Regina is mostly known for interpreting and generating languages, even dead ones.
Google Home, Alexa, and Siri Are Forcing Us to Make a Serious Decision — www.wired.com A Wired article on the dilemma of choosing among the closed systems of Alexa, Google Home, and Siri, which raises interesting questions such as "Do we need an open standard for dialogue agents?" and "How would one look like?".
Google's Learning Software Learns to Write Learning Software — www.wired.com
A Wired article on Google's AutoML meta-learning efforts. Automatically generating and deploying complex AI systems, however, poses the risk of increased bias.
GANs are Broken in More than One Way: The Numerics of GANs — www.inference.vc
GANs for NLP are slowly becoming a hot research topic. Ferenc Huszár argues in this blog post that GANs are broken at both the computational and algorithmic levels and that the optimization of GANs is a generalization (rather than a special case) of gradient descent.
TensorFlow Lattice: Flexibility Empowered by Prior Knowledge — research.googleblog.com
Incorporating prior knowledge into our models is an important way to generalize in face of limited data. This blog post introduces Google's Lattice Networks (NIPS 2017), a layer that allows to incorporate monotonicity information into models.
Industry Insights
How AI Could Change Amazon: A Thought Experiment — hbr.org
This article discusses how AI may not only provide incremental benefits to a business, but how sufficient accuracy can cause a fundamental change in the business model, e.g. for Amazon from shopping-then-shipping to shipping-then-shopping.
WaveNet launches in the Google Assistant — deepmind.com
The original WaveNet made waves a year ago, but was still too slow to use in production. Over the last year, Google researchers made it 1,000x faster and higher quality. It's now being used to generate voices for US English and Japanese across all platforms.
Real-time language translation with Pixel Buds — techcrunch.com
With Google's new Pixel Buds bluetooth headphones, we're one step closer to Douglas Adam's Babelfish and Star Trek's Universal Translator. They work with a Pixel 2 and allow close-to-real-time translation.
Paper picks
Word Translation Without Parallel Data (arXiv)
Much work has concentrated on Neural Machine Translation and learning cross-lingual word representations in recent years. Most of these approaches still require large amounts of parallel data. Facebook researchers propose a three-step approach that intelligently combines unsupervised adversarial learning with a way to refine the learned alignment by using frequent words as anchor points and a distance metric that reduces density around hubs in the vector space.
DisSent: Sentence Representation Learning from Explicit Discourse Relations (arXiv)
In recent years, we've seen different ways of learning sentence embeddings: Some are completely unsupervised, others depend on paraphrases or leverage a certain task, e.g. entailment. Nie et al. learn sentence embeddings by predicting discourse markers in a corpus. They automatically create a large training set for this task by collecting sentence pairs from the BookCorpus that are connected with frequent discourse markers. The resulting embeddings complement recently proposed embeddings leveraging entailment.
Dilated Recurrent Neural Networks (NIPS 2017)
RNNs and LSTMs are common building blocks for Deep Learning-based NLP models, but have three weaknesses: 1) extracting complex (long-term) dependencies; 2) vanishing/exploding gradients; and 3) efficient parallelization. Chang et al. propose a model that is similar to WaveNet, but for RNNs: They introduce dilated skip-connections (that skip intermediate states) and remove the dependency on the previous layer, which enables the RNN to be parallelized. The resulting DilatedRNN outperforms many more sophisticated architectures across different tasks.