Current Issues with Transfer Learning in NLP 👩🏫 Muhammad Khalifa summarizes some of the main challenges of transfer learning in NLP: computational intensity, reproducibility, task leaderboards, similarity to human learning, shallow language understanding, and a high carbon footprint.
Slice-based Learning ✂️ The authors of snorkel, the data programming toolkit that we’ve covered in the past, discuss slice-based learning that can be used to improve performance on certain subsets of the data, or slices. Slice-based learning has been used to achieve
state-of-the-art performance on the SuperGLUE benchmark.
PhD 101 👩🎓 A collection of advice from Volkan Cirik on doing a PhD:
- Dealing with failure.
- Learn to learn.
- You are not your work or ideas.
- Research is not linear.
- Your relationship to your adviser is important.
- Avoid tunnel vision.
- Research is hard. You need support.
Gaussian Process, not quite for dummies 👩🏫 Gaussian Processes are a powerful tool, but can be hard to grasp if they are tackled directly. Yuge Shi introduces Gaussian Processes from first principles, starting from non-linear logistic regression and 2D Gaussians in this lucid blog post.
Do We Encourage Researchers to Use Inappropriate Data Sets? 🤯 Ehud Reiter argues that NLP as a field incentivizes using poor quality datasets. In particular, it seems to be accepted that if a dataset has been used before, it’s fine to use in the future—despite any issues that it might have. This is in particular concerning given the increasing number of papers that find issues with our current datasets, from question answering (
CNN / DailyMail), natural language inference (
SNLI / MNLI), to bilingual lexicon induction (
MUSE).
- Use minimalism to achieve clarity.
- Decide on two or three points you want every reader to remember.
- Limit each paragraph to a single message.
- Keep sentences short, simply constructed and direct.
- Don’t slow the reader down.
- Don’t over-elaborate.