Pytorch is a Deep Learning library designed specifically for implementing dynamic neural networks, which are particularly suited for NLP tasks with dynamic-length sequences. Other libraries that natively handle dynamic computation graphs are Chainer and DyNet. Pytorch 0.2.0 is now out with many long-awaited features such as broadcasting and advanced indexing. Soumith Chintala speaks here with O'Reilly about why many ML researchers are beginning to embrace Pytorch.
ACL 2017 is over and ICML 2017 has just gotten started. Both conferences are host to an array of awesome papers. ACL 2017 Proceedings can be found here and ICML 2017 Proceedings can be found here. Videos of CVPR 2017 can be found here.
Interpretability is becoming more and more important. The ICML 2017 committee has acknowledged this by awarding the best paper award to Understanding Black-box Predictions via Influence Functions by Koh & Liang. It develops tools that allow us to scale up influence functions, a classic technique from statistics to modern ML settings in order to understand black-box predictions. For anyone who wants to read more, here is a great overview of ideas on interpreting ML by O'Reilly.
Sequence-to-sequence (seq2seq) is one of the most popular frameworks for Deep Learning. It has been used to achieve state-of-the-art performance on machine translation, image captioning, speech generation, or summarization. Oriol Vinyals and Navdeep Jaitly given an overview of seq2seq in this tutorial and outline its future directions.
While open-source frameworks are increasingly lowering the barrier to building ML models, it is still a hassle to get data annotated. The creators of open-source NLP library spacy.io now present Prodigy, a tool that uses active learning to make data annotation more efficient.
In order to improve upon our models, we have to understand what kind of errors they make and how much they have of overfit to the particular biases inherent in the data. QA is one particular task, where models have achieved startlingly close-to-human-level performance, as can be seen on the SQuAD leaderboard here. Jia & Liang craft adversarial examples that probe certain parts of these QA models: Accuracy drops from an average of 75% F1 to 36% across sixteen models! Also have a look here for slides from Yoav Goldberg on the problems with SQuAD and for some brief thoughts from Paul Mineiro here.
One of the recent game changers in computer vision has been the combination of ImageNet + a large pre-trained CNN. In NLP, we have so far not found a model that is as useful for transfer learning (see also my post here). The closest we have in terms of data size and model capacity is Machine Translation. Researchers from Salesforce now show that we can pre-train not only the word vectors but the entire LSTM embeddings using MT, which can then be transferred successfully to a wide range of tasks. The paper and an MIT Tech Review post can be found here and here.
Researchers from Google show that we can achieve results that are competitive with the state-of-the-art across multiple tasks using small shallow feed-forward neural networks. This is a great result that enables training and deploying smaller and more accurate models in resource-constrained environments such as mobile phones.
This ACL 2017 outstanding paper shows that we can use multi-task learning, in particular unsupervised video prediction and language entailment generation as auxiliary tasks, to improve the challenging task of video captioning.
Distant supervision using tweets containing positive and negative emoticons has been a common method to achieve state-of-the-art performance on sentiment analysis. The authors of this EMNLP 2017 paper take this one step further and show that a model pre-trained on 1.2B tweets containing 64 different emojis obtains state-of-the-art performance on 8 sentiment, emotion and sarcasm detection datasets. The paper can be found here.
NLP is already used nowadays to automate the writing of particularly formulaic articles such as sports news or financial announcements. Researchers from Harvard present a new dataset for the task of generating text conditioned on a small number of database records. The dataset contains data records paired with descriptive documents.