Connor Leahy talked about
EleutherAI, a grassroots collective of researchers who developed not only
GPT-Neo, an open-source LM in the style of GPT-3 but also worked on BioML research and ML-generated art (read more about the art below)—all of this in the past year.
This blog post provides a great overview of their progress so far. To join or contribute, you can head over to their
Discord.
Matthias Gallé discussed the
BigScience project, also known as The Summer of Language Models 21, a one-year long research workshop on very large language models. The project aims to create and share a large multilingual dataset and to train a very large language model. A diverse set of working groups are dedicated to different parts of the data and model creation process, from data sourcing to prompt engineering, dealing with metadata, and retrieval. To get up to speed on the progress so far, you can watch updates from the first event from July 30, 2021
here. To join the project, fill out the form
here.
Salomon Kabongo talked about the work of
Masakhane. Masakhane is a grassroots organisation that aims to strengthen NLP research in African languages. So far, they have released models and datasets for diverse tasks such as machine translation, named entity recognition, and others in many African languages. To get involved,
join the Google group and Slack channel.
On the whole, my impression is that ML and NLP have become much more accessible, in part thanks to research collaborations such as the above, which are open to anyone as long as you’re excited and motivated to contribute. Other collaboration opportunities are the
fast.ai or the
HuggingFace communities. If you are looking to work in ML or NLP and need collaborators and guidance, I encourage you to join one of the above collaborations.
For conducting academic collaborations, I shared some lessons of my first external collaboration (and first long paper during my PhD) with Barbara Plank (see below).