Hi all, This newsletter discusses accelerating science, memorizing vs learning to look things up, and
|
February 25 · Issue #49 · View online |
|
Hi all, This newsletter discusses accelerating science, memorizing vs learning to look things up, and a Schmidhuber-centric view of the last decade. It also features slides on transfer learning and Deep Learning essentials, multiple translation corpora (speech-to-text, comprehensive translations for language learning), a Greek BERT, and ARC. Finally, it includes the blog posts and papers that I particularly enjoyed reading over the past months, including the Illustrated Reformer and the Annotated GPT-2, an analysis of NLP and ML papers in 2019, and oLMpics.
Contributions 💪 If you have written or have come across something that would be relevant to the community, hit reply on the issue so that it can be shared more widely. I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue. If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.
|
|
|
|
|
Machine learning is already used in science, from astrophysics to high energy density physics to train ML models to mimic the output of slower simulators. In practice, large speedups are common—the challenge is for the models’ predictions to be accurate enough to be useful in practice. Current ML models typically need large amounts of training data, which is expensive to obtain in this setting as some simulators may take days to produce an output. The latest approach in this line employs efficient neural architecture search. What is particularly exciting is that the model needs only a few thousand examples and in one case—modelling the global aerosol-climate—only a few dozen.
|
|
If we want to recall a fact, there are typically two strategies: We either try to learn it by heart or remember where to find it (such as by querying StackOverflow or in our paper management system). While the first approach is fast and does not require any additional resources, our memory may occasionally fool us. In contrast, the second approach may take longer but provides us with additional evidence. Current deep neural networks make a similar trade-off when answering complex questions: They either try to store all knowledge in a huge number of parameters or learn how to retrieve documents to use as evidence. For the first approach, it was recently shown that T5, a huge neural network consisting of 11B parameters can store enough knowledge in its parameters to outperform previous retrieval-based systems on open-domain question answering tasks. Around the same time, REALM—a model of the second approach—achieved another significant improvement by learning the retrieval mechanism as part of pretraining. While T5 could be made even bigger to store potentially more facts in its parameters, increasing the number of parameters soon gets prohibitively expensive. Retrieving documents also has additional benefits: It is much more interpretable than querying a black box and enables seamless updates of the underlying knowledge corpus.
|
|
In his second blog post and tweet, Jürgen Schmidhuber (pronounced “You_again Shmidhoobuh”) gives a somewhat biased overview of the last decade, focusing mainly on advances enabled by research from his lab. Nevertheless, it is educational to view recent advances through this lens: feed-forward neural networks as limited RNNs and LSTMs; ResNets as a special case of highway networks; and GANs as an application of the Curiosity Principle. It is also useful to remind ourselves that—despite the increasing popularity of attention and Transformers—the LSTM is “arguably the most commercial AI achievement” and has got more citations per year than all other computer science papers of the 20th century.
|
|
Deep Learning Essentials (Part 1, Part 2) 🏛 In these two-part slides, Ruslan Salakhutdinov gives an overview of fundamentals for deep learning, from supervised learning to deep generative models.
|
|
CoVoST 💬 CoVoST is a diverse multilingual speech-to-text translation corpus by Facebook that includes speech in 11 languages (French, German, Dutch, Russian, Spanish, Italian, Turkish, Persian, Swedish, Mongolian and Chinese), their transcripts and English translations. If you are interested in speech-to-text applications or translation of spoken language, then this is a great starting point.
The Missing Semester of Your CS Education 💻 An MIT curriculum that teaches you about the tools that you need to do computer science in practice. It teaches you to how master the command-line, use a powerful text editor, version control, and much more.
CS 287 Advanced Robotics – Fundamental Knowledge 🤖 This exam study handout for Pieter Abbeel’s course summarizes the main math for key RL techniques, such as Value Iteration, Policy Iteration, policy gradient, TPO, Q-learning, and different optimization methods in around 20 pages.
|
|
|
GreekBERT is the latest new BERT model, this time for the Greek language. It was trained on the Greek Wikipedia, the Greek part of the European Parliament proceedings, and on Greek text in Common Crawl.
ARC The Abstraction and Reasoning Challenge, originally proposed by Francois Chollet and now hosted on kaggle with a price of $20,000 tasks models with learning complex, abstract patterns from just a few examples. Some of the rules that need to be learned are quite complex. Can you spot the pattern in the example below? You can explore the tasks in ARC with this interactive website.
|
The apparent rule of this ARC example: Apply the input pattern to sections in the output that correspond to filled squares in the input example.
|
|
An Opinionated Guide to ML Research 🗺 Jon Schulman shares advice for ML research on how to choose problems and organize your time. The advice touches on how to develop a good taste for what problems to work on, climbing incrementally towards high goals, when to switch problems, and the importance of personal development.
- using a teach model to guide a student;
- using asymmetric self-play (two agents setting tasks for the other agent to solve);
- automatically generating goal with a GAN;
- based on latent skills and trajectories in the skill space.
Contrastive Self-Supervised Learning 🤳 Ankesh Anand gives an overview of recent contrastive methods that learn by distinguishing between positive and negative examples (in contrast to generative models) such as Deep InfoMax, Contrastive Predictive Coding, and MoCo with a focus on models for computer vision.
2020 Duolingo Shared Task on Simultaneous Translation And Paraphrase for Language Education 🦉 If you are interested in machine translation and language learning, then consider taking part in this shared task. For language learning, it is often useful to have multiple plausible translation to grade learners’ responses against a large set of human-curated acceptable translations. Compared to other datasets, the shared task provides data in 5 languages pairs with comprehensive translations. The task is also interesting from a paraphrasing perspective, as high-quality automatic translations of each input sentence are provided, which can be used to generate paraphrases.
Illustrating the Reformer 🦍 An illustrated version of the Reformer by Alireza Dirafzoon, which provides a nice walk-through of the key ingredients such as locality-sensitive hashing and reversibility of this efficient transformer.
|
One does not simply implement an approach based on its description in the paper. https://abstrusegoose.com/588
|
ML and NLP Publications in 2019 📑 Marek Rei (with data from Jonas Pfeiffer and Andrew Caines) shares a deep analysis of publication trends in 2019 in ML and NLP venues. This year, for the first time the analysis also contains statistics for individual countries. The data is available here.
Yoshua Bengio’s blog – first words ✈️ Yoshua Bengio has started blogging (if this doesn’t convince you to start a blog, then I don’t know what does). His first blog post focuses on the importance of remote presentations to minimize air travel in order to reduce the carbon footprint of the community. You can sign the petition here—I did.
- More solid theoretical understanding of GNN;
- New cool applications of GNN;
- Knowledge graphs become more popular;
- New frameworks for graph embeddings.
|
|
BERT, ELMo, & GPT-2: How contextual are contextualized word representations? (Blog post, paper) Kawin Ethayarajh and his co-authors analyse contextual representations in recent models and find the following:
- The representations of all words in all layers are distributed only in a narrow part of the embedding space.
- Upper layers produce more context-specific representations.
- Less than 5% of the variance of contextual embeddings is explained by a static embedding (so embeddings are very contextual).
- If we create new static embeddings by taking the first principal component of a contextual representation in a lower layer, the resulting embeddings outperform GloVe and fastText on analogies.
Polyglot Word Embeddings Discover Language Clusters (Blog post, paper) Shriphani Palakodety shows how multilingual skip-gram representations can be used for unsupervised language identification via clustering. The approach has been applied to analyse text from a refugee crisis and a crisis between two nuclear adversaries. I particularly appreciate the discussion of how to select the number of clusters, which often goes unsaid.
|
|
oLMpics - On what Language Model Pre-training Captures The authors propose eight reasoning tasks, which require operations such as comparison, conjunction, and composition to evaluate the capabilities of current pretrained language models. As it is often hard to tell what a probe captures in isolation, they employ zero-shot and control baselines to control for the effect of fine-tuning on the task dataset. Very thoughtful! They find that different LMs have qualitatively different reasoning abilities, e.g. RoBERTa succeeds in tasks where BERT fails. They also find that reasoning abilities are context-dependent (e.g. based on expected scale of numbers) and that current models fail on about half of all tasks. Overall, this is an extensive study and well worth reading.
|
Did you enjoy this issue?
|
|
|
|
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
|
|
|
|