Intrinsic dimension video; Ilya Sutskever meta-learning; ICLR 2018 presentations; Pointer Sentinel Mixture Models, PyTorch, einsum tutorials; accessible, open-source AI; lessons of 2 years of AI research
Here is your bi-weekly dose of NLP and ML goodness!
This time, we have: a cool explainer video of an ICLR 2018 paper; Ilya Sutskever giving a talk on meta-learning; all the ICLR 2018 presentations; loads of cool tutorials featuring Pointer Sentinel Mixture Models, PyTorch internals, and Einsum; content around accessible and open-source AI; cool NLP applications such as generating Tinder profiles or predicting wine prices; lessons of 2 years of AI research, a compelling article about bias, and a lot more.
I've also added a new category What's hot 🔥.
Here I'll summarize at a glance any cool results, neat tricks, jaw-dropping demos, or anything else that I've found particularly compelling over the last two weeks. Let me know if you like this format 👍🏻 or prefer not to see this again 👎🏻 at the bottom of the newsletter.
What's hot 🔥
Harder: A realistic evaluation of semi-supervised (SSL) approaches is challenging. SSL is featured in Amazon's letter to shareholders. I also wrote an ACL 2018 paper and a blog post on a successful type of SSL approaches.
Better: FB researchers improve the SotA on ImageNet by 2% by pre-training to predict 1.5k hashtags of 1B pictures.
ICLR 2018 took place last week from April 30 - May 3. You can find links to all invited talks and oral presentations on their Facebook page.
Tim Rocktäschel provides an introduction to Einsum, the Swiss Army Knife of tensor operations. Einsum provides an elegant way to express a wide array of tensor operations such as dot products, outer products, matrix-vector multiplications, as well as more complex ones.
Christian S. Perone gives a tour around the PyTorch codebase and guides us through PyTorch internals and C/C++ structs. This should be useful for those interested in understanding what happens beyond the user-facing API.
Once in a while it is good to remind ourselves that we don't need to get in the weeds and fiddle with TensorFlow in order to train an NER tagger. This tutorial shows you how to use the tried Stanford NER tagger and NLTK to train an NER model for non-English languages.
This repository shows at a glance all the Deep Learning tools Facebook has open-sourced, including FastText, ParlAI (for dialogue), MUSE (for learning cross-lingual embeddings), etc.
Did you ever want to contribute to open-source software, but didn't know how to start? William Horton takes us from discovering a new DL method on Twitter to contributing an implementation to the fast.ai library.
Making AI accessible
Nature Publishing Group has recently announced a new closed-access journal, Nature Machine Intelligence. More than 2000 ML researchers have already signed a statement that they will not submit to, review, or edit for this new journal. If you work in ML or NLP and care about open-access, consider signing the statement.
The Gradient: A Note from the Editors The Gradient, a digital publication that aims to cut through both the hype and the cynicism of AI news coverage to offer sober and sophisticated reporting on the latest developments in AI research has just launched.
Cool NLP applications
Have you ever wondered how a typical Tinder profile looks like? Leon Fedden scrapes 40k Tinder Dating profiles and generates new profiles using a CharLSTM for the biography and a GAN for the image.
Predicting the price of wine from its description Can you put a dollar value on “elegant, fine tannins,” “ripe aromas of cassis”, or “dense and toasty”? It turns out a machine learning model can. Sara Robinson explains in this post how to use the Keras functional API and TensorFlow in order to predict the price of wine from its description.
Articles and blog posts
Tom Silver shares lessons learned during his first two years of AI research ranging from general life lessons to relatively specific tricks of the AI trade. A good read if you are just starting out or considering research.
Yoel Zeldes describes how use word embeddings and A* search to transform one word into another word via semantically similar words,
e.g. tooth --> retina --> x-ray --> light.
RiseML compares Google's TPUv2 against Nvidia's V100 on training a ResNet-50 on ImageNet. Throughput is similar but TPUs come out on top regarding cost to reach a certain accuracy.
Rachel Thomas gives a short overview of using Deep Learning for structured data, particularly creating embeddings for categorical variables.
According to Engadget, 2017 was the year society started taking algorithmic bias seriously. Eric Wang provides a lucid write-up of two competing theories of fairness and associated challenges.
This blog post compares the most popular ways of computing sentence similarity and investigates how they perform.
This Atlantic piece argues that the scientific paper is obsolete and charts the recent history of different mediums aiming to communicate technical concepts, from Mathematica to Jupyter notebooks.
European scientists argue in an open letter that the proposed Ellis institute is essential to avoid brain drain to big tech firms in the US.
Rasa announced the closure of a $1.1 million funding round to grow its bot platform and open source natural language understanding (NLU) for businesses. The company also announced that its Rasa Stack recently surpassed 100,000 downloads, up from 30,000 in September 2017.
VentureBeat interviews Javier Soltero, Microsoft's new product lead for Cortana.
PearRead is a dataset of scientific peer reviews available to help researchers study this important artifact. The dataset consists of over 14k paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR, as well as over 10k textual peer reviews written by experts for a subset of the papers.
Cornell Newsroom is a large dataset for training and evaluating summarization systems. It contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications. The summaries are obtained from search and social metadata between 1998 and 2017 and use a variety of summarization strategies combining extraction and abstraction.
Iyyer et al. propose syntactically controlled paraphrase networks (SCPNs) and use them to generate adversarial examples. They train SCPNs to produce a paraphrase of a sentence with a specified syntax (constituent tree). They generate training data for SCPNs via back-translated sentences augmented with parses and show that the generated paraphrases better fool existing models and improve robustness.
Perez-Beltrachini et al. bootstrap a text generator from large-scale noisy datasets where the data (e.g. DBpedia facts) and related texts (e.g. Wikipedia abstracts) are loosely aligned. They also introduce a new content selection mechanism and use multi-instance learning to automatically discover correspondences between data and text pairs.
Coates et al. propose to average pre-trained word embeddings from different sources and provide analysis and justification why averaging (despite a dimensional mismatch) might be useful.