One of the most impactful ways to help with the spread of COVID-19 is to ensure everyone has access to the most up-to-date information and guidelines on the virus. That means translating content accurately into most of the world’s languages. If you are a speaker of a non-European, low-resource language then consider filling out this form by CMU researchers
to help with their translation efforts.
The importance of up-to-date guidelines and restrictions also means that translating a few key phrases goes a long way. Platforms like Google Translate only support translation in around 100 languages, however. Using cross-lingual word embeddings (specifically MUSE
), Daniel Whitenack translated the phrase “wash your hands” into 544 languages
. Among them are low-resource languages such as Pijin (spoken on the Solomon Islands), Takar (Cameroon), and Waffa (Papua New Guinea).
Question answering and search 📚
Another way to keep people informed is to enable them to automatically find or ask for information. Many people have already developed interfaces to look for answers to COVID-19 related questions. Most of these are based on semantic search: Given a set of question-answer pairs, we embed the questions using a pre-trained sentence embedding model. For a new question, we embed it in the same embedding space and return the answer corresponding to the most similar question.
For a walk-through on how to build a multilingual assistant that can answer questions about COVID-19—complete with how to build the API and host it on Google Cloud—check out this post
by Anna Krogager from ML6. She scrapes question-answer pairs from official FAQs in Belgium and uses the Universal Sentence Encoder
to embed questions and queries.
Two search interfaces that can be used to provide answers in English to COVID-19 related questions based on the CORD-19 dataset
by researchers from Korea University and covidex
by researchers from the University of Waterloo and NYU. Both additionally highlight relevant entities in the article. While code for the former will be available in April according to the GitHub README
, the latter is based on a T5 model
pre-trained on the medical domain and code is already available
Finally, if you are more interested in capturing certain relations, have a look at the relation embeddings
trained by Luis Espinosa-Anke using the SeVeN pipeline
. In addition to the first-order relationships captured by regular word embeddings, relation embeddings enable finding similar relations such as <disease, bacteria>
relations in the embedding space.
Opinion mining 📈
Tracking news coverage and online media is another way to monitor the spread of the virus and can be useful to pre-empt mass hysteria. Two examples are an analysis (in Spanish)
by Grupo BID that uses Twitter data focused on South America as well as a post about the news coverage
tracking the outbreak by Aylien.
For more on how industry supports reporting on COVID-19, provides machine-supported diagnosis or helps developing policies have a look at this recent edition
of This Week in NLP.
Resources and how to help 📑
For a list of COVID-19 related data for NLP, epidemiological, and biomedical applications have a look at Stanford’s CS472 course
on data science and AI for COVID-19.
If you are interested in working with this data or on COVID-19 related research, the Association for Computational Linguistics is hosting an emergency workshop on NLP for COVID-19
virtually with ACL 2020. The workshop invites submissions related to any aspect of NLP applied to combat the COVID-19 pandemic. Submissions will be openly and rapidly reviewed. Submission deadline is June 30.
- Help the people around you interpret information.
- Translate information from experts into more languages.
- Prepare data that might be directly related to the response.
- Analyze data that is not directly related to the response.
- Research using existing disaster response datasets.