Hi all, This edition is about better ways to report experimental results and model errors, ethics and
|
September 16 · Issue #44 · View online |
|
Hi all, This edition is about better ways to report experimental results and model errors, ethics and NLP, and how to make your BERT model more efficient. We also cover the SemEval 2020 tasks and a particularly extensive number of resources, including ones on causality and academic job search. You’ll find as usual several high-quality blog posts and interesting papers—all topped off with a pinch of magic 🧙♂️.
Contributions 💪 If you have written or have come across something that would be relevant to the community, hit reply on the issue so that it can be shared more widely. I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue. If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.
|
|
|
Any sufficiently advanced technology is indistinguishable from magic. —Arthur C. Clarke Magic 🧙♀️ according to Arthur C. Clarke is still far out of reach with current models—although they may on occasion appear more sophisticated than they actually are (the Clever Hans effect, as discussed in the last newsletter). In the meantime, we can at least enjoy generated text about magic, such as the below delightful Harry Potter 🧙♂️–NLP paper 📄 cross-over brought to you by Jonathan Fly and GPT-2 fine-tuned on arXiv papers by HuggingFace:
|
Harry Potter-NLP cross-over (credit: Jonathan Fly)
|
|
Two recent EMNLP 2019 papers observe that the current standard procedures for reporting experimental results and model errors respectively are flawed and propose improvements. Error analysis
Standard method: Randomly select 50–100 incorrect questions and roughly label them into N error groups.
Problems: small sample size, subjective, imprecise, true cause is not tested
Proposed solution: Errudite ( Wu et al., 2019). Uses a domain-specific language to extract clearly defined attributes from examples of the entire dataset. Counterfactual analysis via rewrite rules. The accompanying blog post provides more details (and a UI, see below).
|
Errudite UI (Wu et al., 2019)
|
Reporting experimental results
Standard method: Training multiple instantiations of each model, choosing the best model of each type based on validation performance, and comparing their performance on test data.
Problems: is a function of the computational budget; comparing models with different budgets for tuning hyper-parameters yields different conclusions
Proposed method: Report expected validation accuracy as a function of hyperparameter tuning budget ( Dodge et al., 2019). Recommendations for improving scientific reporting in the form of a checklist (see below).
|
Presence of checklist items across 50 randomly sampled EMNLP 2018 papers that involved modeling experiments (Dodge et al., 2019).
|
|
As NLP models become more powerful, we need to be conscious of their societal impact. Ethics and NLP is already starting to play a bigger role in the community, with a dedicated track at ACL 2020. In addition, initiatives such as the Partnership on AI consult on the responsible publication of papers such as the recent CTLR language model by Salesforce. As discussed on Twitter, in order to counter-act future ethical crises, ethics cannot just be an after-thought but must be taking into consideration at the inception of a project. In addition, ethics needs to be integrated into the curriculum. If you want to learn more about ethics and NLP, have a look at these papers:
|
|
Model size of different pretrained language models (in millions of parameters; credit: HuggingFace)
|
In the last newsletter, we discussed techniques for compressing big models like BERT such as pruning and distillation. Shortly afterwards, multiple approaches came out to make big Transformer models more efficient:
-
Distilling BERT Models with spaCy: Yves Peirsman distills multilingual BERT fine-tuned on a sentiment analysis dataset into spaCy’s convolutional neural networks, similar to Tang et al. (2019).
-
DistilBERT: Victor Sanh distills BERT-base in a smaller language model that performs similarly on downstream tasks while being faster. The model, however, still requires a lot of compute for pretraining.
-
Multilingual MiniBERT: Tsai et al. (EMNLP 2019) similarly propose to train a smaller (3 layer) BERT model by distilling multilingual BERT.
-
Adaptive attention span: Facebook researchers propose an adaptive attention span that makes it more efficient to scale Transformers to long sequences.
|
|
The tasks for the International Workshop on Semantic Evaluation (SemEval) 2020 have been announced. If you are unsure what to work on or want to tackle challenging problems, then these are great starting point as most of them provide reliable (and often novel) dataset. They range from analyzing memes to selecting what part of a text should be emphasized and cover the following areas:
-
Lexical Semantics (Semantic Change Detection, Cross-lingual Lexical Entailment, Word Similarity in Context);
-
Common Sense (Common Sense Explanation, Counterfactual Detection, Extracting Textbook Definitions)
-
Humour (Humour in News Headlines, Analysis of Memes, Sentiment Analysis on Code-Mixed Data, Emphasis Selection for Written Text in Visual Media)
-
Societal Applications (Propaganda Detection, Offensive Language Detection)
|
|
How do we get to general purpose NLU? 🤖 Emily Bender argues that models that are trained only on the form of language (e.g. via language modelling) won’t learn meaning. Instead, we need to pay attention to linguistic structure and how language is used.
|
|
|
Naki 🌎 Naki is a list of corpora, resources, and scientific papers for NLP for American Native / indigenous languages created by Manuel Mager.
Causality chapter 📖 The Causality chapter of the Fairness and ML book is now freely available. If you’re interested in causality, this is arguably one of the most extensive and didactic overviews of the topic.
|
|
CausalML 🔧 This Python library by Uber provides a suite of causal inference methods using machine learning algorithms based on recent research. Typical use cases include campaign targeting optimization or personalized engagement.
FARM 👩🌾 FARM by Deepset provides allows you to easily adapt pretrained language models to downstream tasks. Standardized interfaces allow flexible extension and experiment tracking and visualizations support debugging. In addition, FARM enables running inference both via an API or in a nice UI using docker containers. You can see below how it compares to two other popular transfer learning libraries, PyTorch-Transformers and spaCy-PyTorch-Transformers.
|
|
|
|
Planning paper writing ✍️ Devi Parikh who has produced a host of amazing papers with her team shares tips on iterating on a paper with your collaborators:
- Iterate on the paper in a hierarchical fashion (coarse to fine).
- Iterate on small chunks at a time.
- Plan for multiple iterations on every section.
- Schedule each iteration.
- Preprocessing on the GPU.
- Applying max-pooling before batch norm and ReLU.
- Label smoothing.
- Using CELU activations.
- ‘Ghost’ batch norm (batch norm applied to a subset of a larger batch).
- Frozen batch norm scales.
- Input patch whitening.
- Exponential moving averaging of parameters.
- Test-time augmentation.
A Rare Universal Pattern in Human Languages 🗣 This article discusses a recent paper that demonstrates that even though some languages are spoken more quickly than others, the efficiency of different languages (in terms of bits as measured by information per syllable) is roughly the same across languages.
AI Is Coming for Your Favorite Menial Tasks 👩💻 This article focuses on an underappreciated aspect of the discussion around job loss and transformation due to AI: As decision making is cognitively draining, certain menial tasks may provide a sense of accomplishment. If all of these are done by AI, then only tasks that require very taxing novel decision making will remain.
- Automating routines enables us to be more creative.
- ML gives us superpowers in the physical world.
- ML helps us make better decision.
- Automating dangerous jobs makes us safer.
- ML will help us understand each other better.
Evolution Strategies 🐒🚶♂️ Lilian Weng reviews classic evolution strategies methods—black-box optimization algorithms that are part of the family of evolutionary algorithms—and discusses applications in deep RL.
On Creativity in Academia 👩🏫 This post by Tim Dettmers highlights the dilemma of creativity in academia, which is about finding strange ideas that are still valid. Rather than coming up with valid ideas straight away, one needs to hammer on and reassessing ideas until they work.
Dialogue State Tracking 🗣This post by Wluper gives an overview of dialogue state tracking, in particular how to leverage persona information and dialogue history.
|
|
Universal Adversarial Triggers for Attacking and Analyzing NLP ( blog post, paper) A new attack that concatenates a short phrase to the front or end of an input. It is universal in that the exact same phrase can be appended to any input from a dataset to cause a specific target prediction.
|
|
|
Did you enjoy this issue?
|
|
|
|
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
|
|
|
|