NLP PyTorch libraries; GAN tutorial; Jupyter tricks; TensorFlow things; Representation Learning; Making NLP more accessible; Michael Jordan essay; Reproducing Deep RL; Rakuten Data Challenge; NAACL Outstanding Papers

Apr 23, 2018

This newsletter has a lot of content, so make yourself a cup of coffee ☕️, lean back, and enjoy.
This time, we have two NLP libraries for PyTorch; a GAN tutorial and Jupyter notebook tips and tricks; lots of things around TensorFlow; two articles on representation learning; insights on how to make NLP & ML more accessible; two excellent essays, one by Michael Jordan on challenges and opportunities for AI, the other by Amid Fish on reproducing a deep RL paper; the Rakuten Data Challenge; and lots of reading material including the NAACL Outstanding Papers.

Generative Adversarial Networks (GANs) are still as popular as ever.

Talks

Tools, implementations, and resources

Supporting Rapid Prototyping with a Toolkit — github.com

PyTorch-NLP (torchnlp) is a library designed to make NLP with PyTorch easier and faster. It contains neural network layers, text processing modules, and datasets.

Pytorch NLP library based on FastAI — github.com

Quick NLP is a deep learning NLP library inspired by fast.ai. It extends the fastai library to allow for quick and easy running of NLP models.

Alphabetical list of free datasets for NLP — github.com

An alphabetical list of free/public domain datasets with text data (mostly unstructured) for use in NLP.

Tutorials

Generative Adversarial Networks (GANs) in 50 lines of code (PyTorch) — medium.com

Generative Adversarial Networks in 50 lines of PyTorch code.

Sentiment Classification from Keras to the Browser — medium.com

A concise tutorial on creating a sentiment classifier in Keras and porting it to the browser.

Jupyter Tips, Tricks, Best Practices — github.com

A collection of tips, tricks, and best practices to give you a productivity boost when using Jupyter notebooks.

Open-source entity coreference resolution systems

A list of open-source entity coreference resolution systems.

TensorFlow things

Introducing TensorFlow Probability — medium.com

An introduction to Tensorflow Probability, a probabilistic programming toolbox for ML researchers and practitioners to quickly and reliably build sophisticated generative models or models that leverage uncertainty.

Building an Iris classifier with eager execution — medium.com

A walk-through of how to build an Iris classifier using one of TensorFlow's newest additions, eager execution that promises to make development a lot simpler and easier to debug.

Classifying text with TensorFlow Estimators

A tutorial on how to use TensorFlow pre-made and custom Estimators to classify text, written by Julian Eisenschlos and me.

Representation learning

Goals and Principles of Representation Learning — www.inference.vc

Ferenc Huszár summarizes the takeaways of the DALI workshop on the Goals and Principles of Representation Learning. They include transfer learning, disentanglement, and self-supervision.

What a Disentangled Net We Weave: Representation Learning in VAEs — towardsdatascience.com

An intuitive tutorial on representation learning with Variational Auto-encoders (VAEs).

Making NLP and ML accessible

Designing (and Learning From) a Teachable Machine — design.google

A blog post on UX insights gained from Google's Teachable Machine on how to make ML models accessible.

Technical Experts Need to Get Better at Telling Stories — hbr.org

A post by Harvard Business Review on the different marketing approaches that complex stories require.

Introducing Semantic Experiences with Talk to Books and Semantris — research.googleblog.com

Two examples of how NLU capabilities can drive applications that weren't possible before: Talk to Books enables exploring books on the sentence level. Semantris is a word association game powered by ML.

Industry insights

How Google Plans To Use AI To Reinvent The $3 Trillion US Healthcare Industry — www.cbinsights.com

An extensive report that explores Google's many healthcare initiatives and areas of potential future expansion.

Algorithmic Impact Assessments: A Practical Framework for Public Agency Accountability

A report by the AINow Institute providing a practical framework to assess automated decision systems and to ensure public accountability for public agencies.

Supervised Word Vectors from Scratch in Rasa NLU — medium.com

A new NLU pipeline by Rasa, which uses uses very little memory and handles hierarchical intents and messages containing multiple intents.

Paper picks

NAACL-HLT 2018 Outstanding Papers

Are you unsure which papers to read this week? Why not check out these NAACL-HLT 2018 Outstanding Papers already available on arXiv:

Delete, Retrieve, Generate: A Simple Approach to Sentiment and Style Transfer (NAACL-HLT 2018)

The authors propose a new method for text attribute transfer that consists of three steps:

extract content words by deleting phrases associated with the sentence's original attribute value;
retrieve new phrases associated with target attribute;
combine them using a neural model.

The method generates grammatical and appropriate responses in 22% more cases than the state-of-the-art. The paper also has an accompanying worksheet.

Detecting Linguistic Characteristics of Alzheimer's Dementia by Interpreting Neural Models (NAACL-HLT 2018)

The authors analyze the linguistic characteristics of Alzheimer's disease (AD) patients using DementiaBank by training models to distinguish language samples from AD and control patients. They interpret characteristics via analysis based on activation clusteirng and first-derivative saliency techniques.

Multi-Reward Reinforced Summarization with Saliency and Entailment (NAACL-HTL 2018)

The authors propose a new summarization model that uses RL with two new reward functions:

ROUGE-Sal: modifies ROUGE metric by up-weighting salient phrases detected via a keyphrase classifier;
Entail: gives high reward scores to logically-entailed summaries judged via an entailment classifier.

They combine the rewards via a novel multi-reward optimization, where rewards are updated in alternate mini-batches and achieve state-of-the-art on CNN/Daily mail datasets.

NLP News

Discussion about this post