ML Practica, I/O CodeLabs, All About NLP, Retro games, Dark side of AI, Computer vs. Doctors, Misspellings, Fairness, Causal Inference, X.ai, HuggingFace, Amazon

Jun 04, 2018

Skimmed between sessions at NAACL-HLT

Hi all!

Whether you're at a conference or sipping your morning coffee ☕️at work or at home, I hope you enjoy this newsletter edition.

This time, we have: lots of resources including a new interactive ML course by Google, all CodeLabs from Google I/O, and a collection of 6,500 surveys, tutorials, and libraries about NLP; research using retro games in cool ways; finding misspellings (and pretentious words) in a word embedding space; articles about the dark side of AI; computers vs. doctors, round 2; a nice, interactive article about the delayed impact of fairness; an intro to causal inference for Deep Learning; news about startups (X.ai and HuggingFace) as well as Amazon doing more NLP; and new research papers.

What I'm thinking about 🤔

Transfer learning: Fine-tuning is better than using fixed feature extractors for transferring ImageNet models.
Data augmentation: Super important in CV, but didn't get a lot of attention until recently. Cubuk et al. learn a data augmentation policy (think: architecture search) and achieve SotA on CIFAR-10, CIFAR-100, SVHN, and ImageNet. More recent work in NLP, too (see e.g. this, this, and this paper from NAACL 2018 and this blog post).

Tools and resources

Introducing Machine Learning Practica — blog.google

An interactive course by Google AI that walks you through the basics of image classification.

Google I/O 2018 — codelabs.developers.google.com

All CodeLabs from Google I/O.

TutorialBank: Learning NLP Made Easier — alex-fabbri.github.io

This blog post introduces a new search engine called All About NLP (AAN) that can be used to search about 6,500 surveys, tutorials, libraries, and codebases related to NLP curated by researchers at Yale.

NLP Architect by Intel AI Lab — github.com

A Python library for exploring the state-of-the-art deep learning topologies and techniques for natural language processing and natural language understanding.

Retro games things

Gym Retro — blog.openai.com

The full version of OpenAI's Gym Retro, a platform for RL on games. which now includes 1,000+ games across a variety of backing emulators.
Game on!

Pokemon names

Turns out Pokemon names are a good dataset for studying cross-lingual sound symbolism---who knew? You can scrape Pokemon names in German, English, French, Japanese, Korean, and Chinese from this website. Read to catch 'em all?

Cool NLP stuff

Query words before and after translation with a pretentiousness vector

A simple spell checker built from word vectors — medium.com

Even after several years, word embeddings are still a source of delight and awe. Turns out, misspelled words reside in a different part of the vector space than their correct counterparts. Similarly, words that sound more pretentious are in a different location from more neutral synonyms (see above).

AI---good or evil?

Evil Software Du Jour: Google's Cocktail Party Algorithm — www.skynettoday.com

Recent privacy concerns over work from Google showcased how easy it is for tech journalists to immediately focus on unlikely worst-case outcomes. This article looks at the ethical implications of a new method for audio-visual speech separation.

How a Pentagon Contract Became an Identity Crisis for Google — www.nytimes.com

A $9 million deal for the use of AI technology has fractured the internet giant’s work force and risks driving away top engineering talent.

Industry news

AI Chatbots Try to Schedule Meetings—Without Enraging Us — www.wired.com

However trivial it may sound, it's a monstrously difficult challenge. This article discusses some of the challenges faced by X.ai and its employees.

Hugging Face raises $4 million for its artificial BFF — techcrunch.com Chatbot startup Hugging Face has raised a $4 million seed round led by Ronny Conway from a_capital. Existing investors Betaworks, SV Angel and Kevin Durant are also participating.

Microsoft acquires Semantic Machines, advancing the state of conversational AI — blogs.microsoft.com

Microsoft has acquired Semantic Machines Inc., a Berkeley, Calif.-based company that has developed a new approach to building conversational AI.

Amazon Scientists Use Transfer Learning to Accelerate Development of New Alexa Capabilities — developer.amazon.com

Amazon is becoming more vocal about NLP research: It has released two blog posts discussing NAACL-HLT research papers and is sponsoring the Widening NLP workshop.

Paper picks

Parsing Tweets into Universal Dependencies (NAACL-HLT 2018)

The authors extend the Universal Dependencies guidelines to cover Twitter-specific phenomena and create a new Twitter treebank for English. To deal with ambiguities, they propose to distill an ensemble of 20 transition-based parsers into a single one.

Noising and Denoising Natural Language: Diverse Backtranslation for Grammar Correction (NAACL-HLT 2018)

The authors synthesize parallel data for grammatical error correction by noising a clean monolingual corpus. They use a seq2seq model to translate clean examples to noisy ones and propose additional beam search noising procedures to introduce more diversity. Starting with models trained on roughly 1.3M sentences, they nearly match performance of training with 3M sentences

Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs (ICLR 2018)

The authors propose contextual decomposition, an interpretation algorithm for analysing predictions of standard LSTMs by decomposing the output. The method allows identification of words and phrases of contrasting sentiment and extraction of positive and negative negations from a model trained on phrase-level annotations. The key insight is that gating dynamics in LSTMs are a vehicle for modeling interactions between variables.

P.S.: Hope you enjoyed this newsletter! I'm at NAACL in New Orleans until Friday. Let me know if you'd like to chat.

NLP News

Discussion about this post