NLP in Industry, Leaderboard madness, fast.ai NLP, Transfer learning tools

Jul 22, 2019

Hi all,

This month's newsletter covers some cool examples of how NLP is used in industry, some discussion about recent state-of-the-art models and leaderboards, talks about transfer learning, two new fast.ai courses (one on NLP), lots of tools that enable you to leverage pretrained models in your own applications, and an array of interesting blog posts and papers.

Contributions 💪 If you have written or have come across something that would be relevant to the community, hit reply on the issue so that it can be shared more widely.

I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.

If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.

NLP in Industry 🏭

Investors are excited about natural language processing and expect many more applications, particularly fuelled by transfer learning as pointed out by this post. The extensive State of AI Report in particular predicts that a wave of new start-ups will apply recent breakthroughs from NLP research. Specifically, they expect these start-ups to raise over $100M in the next 12 months. The State of AI Report is a good resource to get up to speed with the major developments in AI in general, from RL to NLP and computer vision, with infos on talent, industry, and politics.

One among many cool applications transfer learning enables is more intelligent code autocompletion by training a language model on millions of lines of code. An example of this by TabNine, which uses a GPT-2 model can be seen below (see this post for more infos).

Another great example of NLP used in industry is automatic meeting scheduling. In this setting, NLP can be used to understand temporal expressions in emails. This blog post by x.ai does a great job in communicating how to think through and frame a complex problem in a way that makes it feasible with current methods. Their final model is an encoder-decoder based on BERT that predicts patterns of time expressions from emails at around 93% accuracy.

Another industry where NLP can help is fashion. In this post, Stitchfix describes how NLP (i.e. a BERT-based model) is used in their style selection algorithm that helps stylists pick clothes for customers. In particular, the model is fine-tuned to predict the probability of items in the inventory chosen by stylists based on the provided text description. The model representation is then used as input to the overall style algorithm.

For more examples of how NLP is being applied in industry, check out the below playlist of talks from spaCy IRL 2019. In particular, Peter Baumgartner's talk is worth watching as he gives general lessons on managing NLP projects in industry.

Leaderboard madness 🏅

(((ل()(ل() 'yoav))))👾 @yoavgo

oh wow seems like this boring public hyperparameter search is going to take a while. https://t.co/wc4HAKkNnM

Another month, another state-of-the-art pretrained language model. The newest instantiation, RoBERTa, now is state of the art on the GLUE leaderboard, narrowly beating out XLNet. The ingredients seem to be an improved pretraining of BERT_large with more data and longer training time (more details are not available at this point).

On the topic of leaderboards, Anna Rogers discusses the current focus on ever-larger state of the art-beating Transformer models and how this shapes NLP research in general. In particular, she argues:

“More data & compute = SOTA” is NOT research news.

On the whole, I don't think the current trend of large pretrained models is only bad news: 1) Pushing the direction of bigger models is important as they can teach us about how far we can take the current paradigm and about its fundamental limitations. 2) While efficient ML has been an ongoing research area, these gargantuan models make painfully clear that working on more efficient methods is an important direction today. 3) The size of the models and the amount of compute might seem daunting now, but bigger models and bigger compute rarely matter in the long run, as Stephen Merity accurately observes. 4) Relative to the entire field, not much money is spent on training ever bigger models (James Bradbury notes that the cost of training XLNet is $30,720 rather than $250,000).

Talks 🗣

The New Era of NLP ⏳ This SciPy 2019 keynote by fast.ai's Rachel Thomas gives an overview of transfer learning. It also discusses extensively the danger of disinformation and censorship via information glut.

spaCy IRL 2019 Playlist 📹 A playlist of great talks from spaCy IRL 2019 with many focusing on interesting applications of NLP in industry, from entity linking, to dealing with biomedical documents, to financial NLP. Matthew and Ines also give an account of the past, present & future of spaCy that should be of interest for any spaCy user.

Resources 📚

fast.ai NLP course ⏩ fast.ai has launched a new course focusing exclusively on NLP. It covers both fundamentals as well as cutting-edge methods such as transfer learning, attention and the Transformer, applications such as text generation, as well as important issues such as disinformation, bias, and working with languages other than English.

Deep Learning from the Foundations 🏛 This is another excellent fast.ai course to check out, which shows how to build a state of the art deep learning model from scratch.

Textual Adversarial Attack and Defense ⚔️🛡A curated list of must-read papers on adversarial attacks and defenses for text.

nlp-library 📖 A collection of essential papers for the NLP practitioner curated by Mihail Eric.

Tools ⚒

The tools this month are all about pretrained language models and transfer learning:

PyTorch-Transformers 👾Arguably one of the most impactful tools for current NLP, the rebranded pytorch-pretrained-bert library by HuggingFace contains a plethora of pretrained state-of-the-art models implemented in PyTorch. As such, it is a great starting point to do cutting-edge NLP.

Grover 🦍 On the topic of defenses, this repo contains Grover, a pretrained language model for neural fake news—both generation and detection.

Adapter-BERT 🔌 Another common pretrained language model is BERT. This repo contains an implementation of adapters that adjust only a few parameters per task, which makes fine-tuning the same model for many tasks more parameter-efficient.

German BERT 🇩🇪 BERT unfortunately is only available in English. While there is a multilingual version available, it performs worse than a model trained for the desired language. To ameliorate this, deepset open-sources a German version of BERT.

Multilingual Universal Sentence Encoder for Semantic Retrieval 🌍 Universal Sentence Encoder is another pretrained model that has been particularly popular due to its ease of use in TensorFlow Hub. Google now open-sources three new multilingual universal sentence encoders for retrieving semantically similar text (covering 16 languages): one optimized for performance; one for speed; and one for QA retrieval.

Kashgari ⚙️ An NLP transfer learning library built on top of tf.keras that allows you to easily build state-of-art models for NER, POS tagging, and text classification tasks.

Articles and blog posts 📰

Write With Transformer ✍️ An interactive writing interface by HuggingFace that demonstrates you can use a language model as a writing assistant. In this post, Lysandre describes how the scalable demo was built.

How to Label Data 📝 An extensive guide by Tal Perry on how to annotate data for an applied NLP project that walks you through seven stages of an annotation life cycle.

Trigger warning ⚠️The Hidden Story Behind the Suicide PhD Candidate Huixiang Chen ⚠️ This sad story serves as a reminder of how important mental health is, particularly during a PhD. If you feel mentally unwell, talk to your family and friends or get help.

4 Tips For Communicating Technical Ideas to a Non-tech Audience 🗣 Communicating content to a non-technical audience (whether in the form of a presentation or a blog post) can often be challenging. This post by Jason Webster contains four actually useful and practical tips on how to communicate more effectively:

Tailor your talk to your audience.
Grab your audience's attention from the start.
Tell a story.
Make your information relatable.

NLP Pedagogy Interview: Emily M. Bender & Jason Eisner 👩‍🏫 These two extensive interviews in a series by David Jurgens explore in-depth what goes into a Computational Linguistics and NLP curriculum and discuss many different considerations (how much linguistics; error analysis as a final project; ethics, etc).

Papers + blog posts 📑

Sparse Networks from Scratch (blog post, paper) This blog post by Tim Dettmers describes a method to train neural networks that are kept sparse during training and achieve comparable performance to dense models. The post gives some background and intuition why sparse learning works and even provides a code snippet to implement the method.

Uniform convergence may be unable to explain generalization in deep learning (blog post, paper) This post provides evidence why generalization bounds based on uniform convergence (how "representationally rich" the function class is as in e.g. VC dimension) might not make sense for overparameterized deep neural networks.

Paper picks 📄

Unsupervised word embeddings capture latent knowledge from materials science literature (Nature 2019) We know that unsupervised representations such as word embeddings that are learned from large corpora can capture a lot of information about some of the relationships in the data. This paper shows that word embeddings trained on the materials science literature capture (perhaps unsurprisingly) information about structure-property relationships in materials. What is remarkable is that such embeddings when trained on past data can also recommend materials for functional applications before they have been discovered in the first place. This shows strikingly that there is a lot of untapped potential in the scientific literature that can be exploited using NLP.

NLP News

Discussion about this post