Translation š
One of the most impactful ways to help with the spread of COVID-19 is to ensure everyone has access to the most up-to-date information and guidelines on the virus. That means translating content accurately into most of the worldās languages. If you are a speaker of a non-European, low-resource language then consider filling out
this form by CMU researchers to help with their translation efforts.
The importance of up-to-date guidelines and restrictions also means that translating a few key phrases goes a long way. Platforms like Google Translate only support translation in around 100 languages, however. Using cross-lingual word embeddings (specifically
MUSE), Daniel Whitenack
translated the phrase āwash your handsā into 544 languages. Among them are low-resource languages such as Pijin (spoken on the Solomon Islands), Takar (Cameroon), and Waffa (Papua New Guinea).
Question answering and search š
Another way to keep people informed is to enable them to automatically find or ask for information. Many people have already developed interfaces to look for answers to COVID-19 related questions. Most of these are based on semantic search: Given a set of question-answer pairs, we embed the questions using a pre-trained sentence embedding model. For a new question, we embed it in the same embedding space and return the answer corresponding to the most similar question.
For a walk-through on how to build a multilingual assistant that can answer questions about COVID-19ācomplete with how to build the API and host it on Google Cloudā
check out this post by Anna Krogager from ML6. She scrapes question-answer pairs from official FAQs in Belgium and uses the
Universal Sentence Encoder to embed questions and queries.
Two search interfaces that can be used to provide answers in English to COVID-19 related questions based on the
CORD-19 dataset are
covidsearch by researchers from Korea University and
covidex by researchers from the University of Waterloo and NYU. Both additionally highlight relevant entities in the article. While code for the former will be available in April according to the
GitHub README, the latter is based on a
T5 model pre-trained on the medical domain and code
is already available.
Finally, if you are more interested in capturing certain relations, have a look at
the relation embeddings trained by Luis Espinosa-Anke using
the SeVeN pipeline. In addition to the first-order relationships captured by regular word embeddings, relation embeddings enable finding similar relations such as
<disease, bacteria> relations in the embedding space.
Opinion mining š
Tracking news coverage and online media is another way to monitor the spread of the virus and can be useful to pre-empt mass hysteria. Two examples are
an analysis (in Spanish) by Grupo BID that uses Twitter data focused on South America as well as a
post about the news coverage tracking the outbreak by Aylien.
For more on how industry supports reporting on COVID-19, provides machine-supported diagnosis or helps developing policies have a look at
this recent edition of This Week in NLP.
Resources and how to help š
For a list of COVID-19 related data for NLP, epidemiological, and biomedical applications have a look at
Stanfordās CS472 course on data science and AI for COVID-19.
If you are interested in working with this data or on COVID-19 related research, the Association for Computational Linguistics is
hosting an emergency workshop on NLP for COVID-19 virtually with ACL 2020. The workshop invites submissions related to any aspect of NLP applied to combat the COVID-19 pandemic. Submissions will be openly and rapidly reviewed. Submission deadline is June 30.
- Help the people around you interpret information.
- Translate information from experts into more languages.
- Prepare data that might be directly related to the response.
- Analyze data that is not directly related to the response.
- Research using existing disaster response datasets.
Miscellaneous š¼