COVID-19 edition 😷
Hi all,
I hope you're all staying safe in these trying times. 🤒 This is a shorter version of the usual newsletter that is all about how NLP and ML can be used to help with the current pandemic. If you are not interested or already saturated with the ongoing COVID-19 coverage, then consider skipping this one.
This newsletter features work on translation, question answering, search, and opinion mining related to COVID-19. It also covers resources and things you can do to help as well as some fun bits.
Contributions 💪 If you have written or have come across something that would be relevant to the community, hit reply on the issue so that it can be shared more widely. In addition, let me know if there's a particular topic that would be worthwhile to discuss in the next edition.
I really appreciate your feedback, so let me know what you love ❤️ and hate 💔 about this edition. Simply hit reply on the issue.
If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦.
Self-isolation can be hard on each of us—but have you considered the impact it might have on our models?
For more ML-themed puns, check out Julia Gong's work on Facebook, Instagram, Twitter, and her website.
COVID-19 🔬
Translation 🌍
One of the most impactful ways to help with the spread of COVID-19 is to ensure everyone has access to the most up-to-date information and guidelines on the virus. That means translating content accurately into most of the world's languages. If you are a speaker of a non-European, low-resource language then consider filling out this form by CMU researchers to help with their translation efforts.
The importance of up-to-date guidelines and restrictions also means that translating a few key phrases goes a long way. Platforms like Google Translate only support translation in around 100 languages, however. Using cross-lingual word embeddings (specifically MUSE), Daniel Whitenack translated the phrase "wash your hands" into 544 languages. Among them are low-resource languages such as Pijin (spoken on the Solomon Islands), Takar (Cameroon), and Waffa (Papua New Guinea).
Question answering and search 📚
Another way to keep people informed is to enable them to automatically find or ask for information. Many people have already developed interfaces to look for answers to COVID-19 related questions. Most of these are based on semantic search: Given a set of question-answer pairs, we embed the questions using a pre-trained sentence embedding model. For a new question, we embed it in the same embedding space and return the answer corresponding to the most similar question.
For a walk-through on how to build a multilingual assistant that can answer questions about COVID-19—complete with how to build the API and host it on Google Cloud—check out this post by Anna Krogager from ML6. She scrapes question-answer pairs from official FAQs in Belgium and uses the Universal Sentence Encoder to embed questions and queries.
Two search interfaces that can be used to provide answers in English to COVID-19 related questions based on the CORD-19 dataset are covidsearch by researchers from Korea University and covidex by researchers from the University of Waterloo and NYU. Both additionally highlight relevant entities in the article. While code for the former will be available in April according to the GitHub README, the latter is based on a T5 model pre-trained on the medical domain and code is already available.
Another BERT-based question answering assistant for COVID-19 related questions is due to deepset.ai. They provide a BERT model that is fine-tuned on COVID-19 related articles, which can be directly used in 🤗Transformers. They are also explicitly looking for contributions from the community if you are interested in helping.
For a different flavour of pre-trained models, Gabriele Sarti fine-tuned models pre-trained on biomedical data on NLI data, which can be used for exploring the CORD-19 dataset.
Finally, if you are more interested in capturing certain relations, have a look at the relation embeddings trained by Luis Espinosa-Anke using the SeVeN pipeline. In addition to the first-order relationships captured by regular word embeddings, relation embeddings enable finding similar relations such as <disease, bacteria> relations in the embedding space.
Opinion mining 📈
Tracking news coverage and online media is another way to monitor the spread of the virus and can be useful to pre-empt mass hysteria. Two examples are an analysis (in Spanish) by Grupo BID that uses Twitter data focused on South America as well as a post about the news coverage tracking the outbreak by Aylien.
For more on how industry supports reporting on COVID-19, provides machine-supported diagnosis or helps developing policies have a look at this recent edition of This Week in NLP.
Resources and how to help 📑
The COVID-19 and AI virtual conference hosted by Stanford's Human-centered AI institute has released the video recordings of their sessions.
For a list of COVID-19 related data for NLP, epidemiological, and biomedical applications have a look at Stanford's CS472 course on data science and AI for COVID-19.
If you are interested in working with this data or on COVID-19 related research, the Association for Computational Linguistics is hosting an emergency workshop on NLP for COVID-19 virtually with ACL 2020. The workshop invites submissions related to any aspect of NLP applied to combat the COVID-19 pandemic. Submissions will be openly and rapidly reviewed. Submission deadline is June 30.
If the above resources are not helpful not to you, then Robert Munro proposes several additional ways to help:
Help the people around you interpret information.
Translate information from experts into more languages.
Prepare data that might be directly related to the response.
Analyze data that is not directly related to the response.
Research using existing disaster response datasets.
Miscellaneous 🖼
If you find yourself stuck in too many video conference calls, consider spicing up your video call background using the latest research on background matting published at CVPR 2020 (see above). The new approach enables using photos or videos taken with a handheld camera as a background in lieu of having a professional green screen and works with higher fidelity compared to previous methods.
If you are struggling with teaching online classes, have a look at Yoav Artzi's online teaching setup for inspiration. He uses Zoom, two monitors, headphones, and a webcam.