NLP Progress, Restrospectives and look ahead, New NLP courses, Independent research initiatives, Interviews, Lots of resources
Happy 2020 everyone 🎉! May your models be sample-efficient, your generalisation error be low, and your inter-annotator agreement be high!
This newsletter features lots of stuff: an update on new datasets in NLP progress, retrospectives of the past decade and the past year and prognoses for this year, new NLP courses, information about independent research initiatives, and ML-related interviews.
What I'm particularly excited about is the extensive list of resources, which range from useful references on interpretability and NER, to databases with a huge number of NLP datasets, as well as repositories and tools that enable you to organise and learn about important concepts in NLP. Lastly, you'll find as usual a selection of exciting tools, articles, and papers.
Contributions 💪 If you have written or have come across something that would be relevant to the community, hit reply on the issue so that it can be shared more widely.
I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.
NLP progress 📈
Updates during the last month include:
A new state of the art summarisation datasets
A new state of the art on coreference resolution
A new semantic parsing task, UCCA parsing
A new state of the art on AMR parsing
A new task, spoken language understanding
XSum, a new dataset for summarisation
Retrospectives and look ahead 🔮
The past decade
Bandit learning 🎰 Sebastien Bubeck highlights major advances of the past decade in bandit learning where significant advances are much more theoretical in nature compared to other parts of ML. He also mentions optimisation-related improvements, such as demystifying Nesterov accelerated gradient.
New Scientist 👩🔬 New Scientist ranks the top 10 discoveries of the decade. The list features big breakthroughs such as the discovery of the Higgs boson and CRISPR as well as ones that potentially flew under your radar. Check it out to make sure that you didn't miss anything important.
The past year
NLP Year in Review—2019 📅 Elvis provides an extensive list of major highlights of the NLP year, with regard to publications, creativity and society, tools and datasets, articles and blog posts, ethics, and education.
10 ML & NLP Research Highlights of 2019 🎆 My list of 10 research directions that I found exciting and impactful.
My Top 10 Deep RL Papers of 2019 🤖 Robert Lange shares his favourite deep reinforcement learning papers related to large-scale projects, model-based RL, compositionality & priors, multi-agent RL, and learning dynamics.
Analytics Vidhya 📊 Highlights of ML and Deep Learning in 2019, including AI in business, ethics, and an extensive list of NLP advances.
ML and AI 2019 Round-up 📰 Xavier Amatriain's highlights focus on NLP, combining knowledge and structure, and self-supervision, among others.
High Hopes for 2020 🙏 Leading AI researchers express their hopes for AI in 2020. A common theme is achieving generalisation through learning in simulation (Anima Anandkumar), robust off-policy learning (Chelsea Finn), self-supervised learning (Yann LeCun). For NLP, Richard Socher is convinced that we'll see significant advances in summarisation in 2020.
New NLP courses 👩🎓
With the new semester approaching, new NLP courses are starting and existing ones are getting an update. Here is a selection of courses with publicly available resources:
CS 521: Statistical Natural Language Processing by Natalie Parde at the University of Illinois introduces advanced topics in statistical and neural NLP and provides an overview of active research in those topic areas
CS 11-747: Neural Networks for NLP by Graham Neubig at CMU demonstrates how to apply neural networks to natural language problems
CS 224n: Natural Language Processing with Deep Learning by Chris Manning and Matthew Lamm at Stanford introduces students to cutting-edge research in NLP
Independent research initiatives 👩🔬
Being an independent researcher is not an easy path. Funding oneself and identifying a project worthy of pursuit are only two of the challenges. Read Andreas Madsen's account for what it takes to prevail as an independent researcher. However, being independent does not mean to go at it completely alone. There are several online research groups that are open for collaboration. The following are only the ones that I'm aware of:
FOR.ai, a multi-disciplinary team of scientists and engineers
Masakhane.io, a research collaboration with a focus on machine translation for African languages
AI-ON, an open research community
Night AI, a distributed research collective
If none of the above fit your needs, why not start your own research collaboration? The most important part is to find like-minded people. ML-oriented communities such as the kaggle or fast.ai forums are good starting points.
Instead of talks, in this edition I'd like to share three interview series around ML that are well worth watching:
Sayak Paul's interviews with top data scientists and ML researchers and engineers
Chai Time Data Science—Sanyam Bhutani's interviews with kagglers and ML researchers
Artificial Intelligence: AI Podcast—Lex Fridman's interviews with technology luminaries
This Week in Machine Learning (TWIML)—Sam Charrington interviews ML reseachers and top data scientists
Interpretability references 📒 A list of references on interpretability and analysis methods containing a wide range of methods and landmark papers in convenient BibTeX format (with links).
NER papers 📗 An exhaustive paper list covering papers on named entity recognition (NER) from top conferences of the last eight years.
The Big Bad NLP Database 📓 A huge list of (currently) 213 NLP datasets with extensive metadata information (languages, task description, size) and download links.
LRE Map 🗺 A database containing around 6,000 language resources and tools published at LREC conferences.
NLU Sense 🤯 A mind map of most important concepts in current NLP and how they relate to each other. A great resource to get a bird's eye view of the field.
Concepts in Neural Networks for NLP 🔖 A repository that gives an overview of the most important concepts necessary to understand cutting-edge research in neural network models for NLP.
PracticalAI (website is unavailable)🔭 A free tool to discover & organise the top-community created ML content. It allows you to easily create and share decks of resources. For instance, you can follow what I'm reading here.
Tavolo ⚙️ tavolo gathers implementations of useful ideas from the community (by contribution, from Kaggle etc.) and makes them accessible in a single PyPI hosted package that compliments the tf.keras module.
Tokenizers 🔨 This library provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Thanks to the Rust implementation, tokenisation of a GB of text on a CPU takes less than 20 seconds.
Articles and blog posts 📰
Perfectly Privacy-Preserving AI 🔍 This post gives a nice overview of four pillars of privacy-preserving ML as well as recent insights and solutions to address them:
Training data privacy (the training data cannot be reverse-engineered);
Input privacy (a user's input is unobservable by other parties);
Output privacy (a model's is not visible to other parties);
Model privacy (a model cannot be stolen).
Just 700 Speak This Language (50 in the Same Brooklyn Building)🇳🇵 A New York Times article about what it means to be a speaker of a truly low-resource language (here: the Nepalese language Seke).
What does BERT dream of? 😴 This post explores Deep Dream-style feature visualisations for BERT to identify what sentences maximally BERT's neurons. This is a lot harder for text than for images and the authors explore reasons why the dreaming may fail.
Do We Really Need Model Compression? 🗜 Mitchell A. Gordon explores obstacles to training small models from scratch (as an alternative to model compression).
Methods to grow your own data sets for Conversational AI 🗣 This post gives an overview of methods to augment your training data, including using methods like BERT and GPT-2.
Bayesian Neural Networks Need Not Concentrate 👴🏻 A critical post about possible limitations of Bayesian Neural Networks that arose out of a discussion on Twitter. In his response, Andrew Gordon Wilson makes the case for Bayesian Deep Learning.
Is MT really lexically less diverse than human translation? 🌍 Some recent studies have suggested that machine translation are less lexically diverse (have a smaller number of different words) than human-written translations. This post analyses this hypothesis based on WMT19 systems and finds no difference in lexical diversity between MT and human translations. It also finds no correlation between lexical diversity and MT quality.
Papers + blog posts 📑
The Dark Secrets of BERT (blog post, paper) This post probes fine-tuned BERT models for linguistic knowledge. In particular, the authors analyse how many self-attention patterns with some linguistic interpretation are actually used to solve downstream tasks. TL;DR: They are unable to find evidence that linguistically interpretable self-attention maps are crucial for downstream performance.
Paper picks 📄
Universals of word order reflect optimization of grammars for efficient communication (PNAS 2020) Languages have different word orders but—interestingly—certain word-order related patterns such as ordering the object after the verb and using prepositions (that precede the noun) almost always co-occur. These word-order correlations are known as Greenberg universals. Hahn et al. provide evidence that this a result of languages optimising for efficient communication and that the most efficient grammars are also more likely to satisfy the Greenberg universals.
Reformer: The Efficient Transformer (ICLR 2020) Two downsides of current Transformer architectures are the time complexity of attention (every token needs to attend to every other token) and the memory it takes to store all its parameters. Kitaev et al. propose to address these issues by replacing the standard attention with one that uses locality-sensitive having, which makes it more efficient, and replacing the standard residual layers with reversible ones, so that only a single copy of activations rather than one for every layer needs to be stored.
Have a look at this post for some of my favourite papers from 2019 and ICLR 2020.