ML Practica, I/O CodeLabs, All About NLP, Retro games, Dark side of AI, Computer vs. Doctors, Misspellings, Fairness, Causal Inference, X.ai, HuggingFace, Amazon
Skimmed between sessions at NAACL-HLT
Hi all!
Whether you're at a conference or sipping your morning coffee ☕️at work or at home, I hope you enjoy this newsletter edition.
This time, we have: lots of resources including a new interactive ML course by Google, all CodeLabs from Google I/O, and a collection of 6,500 surveys, tutorials, and libraries about NLP; research using retro games in cool ways; finding misspellings (and pretentious words) in a word embedding space; articles about the dark side of AI; computers vs. doctors, round 2; a nice, interactive article about the delayed impact of fairness; an intro to causal inference for Deep Learning; news about startups (X.ai and HuggingFace) as well as Amazon doing more NLP; and new research papers.
What I'm thinking about 🤔
Transfer learning: Fine-tuning is better than using fixed feature extractors for transferring ImageNet models.
Data augmentation: Super important in CV, but didn't get a lot of attention until recently. Cubuk et al. learn a data augmentation policy (think: architecture search) and achieve SotA on CIFAR-10, CIFAR-100, SVHN, and ImageNet. More recent work in NLP, too (see e.g. this, this, and this paper from NAACL 2018 and this blog post).
Tools and resources
Introducing Machine Learning Practica — blog.google
An interactive course by Google AI that walks you through the basics of image classification.
Google I/O 2018 — codelabs.developers.google.com
All CodeLabs from Google I/O.
TutorialBank: Learning NLP Made Easier — alex-fabbri.github.io
This blog post introduces a new search engine called All About NLP (AAN) that can be used to search about 6,500 surveys, tutorials, libraries, and codebases related to NLP curated by researchers at Yale.
NLP Architect by Intel AI Lab — github.com
A Python library for exploring the state-of-the-art deep learning topologies and techniques for natural language processing and natural language understanding.
Retro games things
The full version of OpenAI's Gym Retro, a platform for RL on games. which now includes 1,000+ games across a variety of backing emulators.
Game on!
Turns out Pokemon names are a good dataset for studying cross-lingual sound symbolism---who knew? You can scrape Pokemon names in German, English, French, Japanese, Korean, and Chinese from this website. Read to catch 'em all?
Cool NLP stuff
A simple spell checker built from word vectors — medium.com
Even after several years, word embeddings are still a source of delight and awe. Turns out, misspelled words reside in a different part of the vector space than their correct counterparts. Similarly, words that sound more pretentious are in a different location from more neutral synonyms (see above).
AI---good or evil?
Evil Software Du Jour: Google's Cocktail Party Algorithm — www.skynettoday.com
Recent privacy concerns over work from Google showcased how easy it is for tech journalists to immediately focus on unlikely worst-case outcomes. This article looks at the ethical implications of a new method for audio-visual speech separation.
How a Pentagon Contract Became an Identity Crisis for Google — www.nytimes.com
A $9 million deal for the use of AI technology has fractured the internet giant’s work force and risks driving away top engineering talent.
More articles and blog posts
Are computers better than doctors? — hackernoon.com
Notes of a critical expert panel discussion on the CheXNet paper. Main takeaways: 1. There exists a critical gap in the labeling of medical data. 2. Clinical significance of results is important. 3. Peer review in medicine and AI is important.
Notes on ICLR 2018 — noecasas.com
Notes on ICLR 2018 by Noe Casas with a focus on advances in NLP.
Delayed Impact of Fair Machine Learning — bair.berkeley.edu
A great discussion of fairness criteria with interactive visualizations for the example of a bank providing loans to different populations.
Why thousands of AI researchers are boycotting the new Nature journal
Neil Lawrence argues in a Guardian op-ed that taxpayers should not have to pay twice to read the findings of ML researchers.
A nice Science magazine article that sheds some light on whether machine learning should incorporate insights from developmental psychology and try to encode more "basic instincts" into ML models.
How AI learned to be creative — thegradient.pub With the success of deep learning, algorithms have pushed into another domain that humans thought was safe from automation: the creation of compelling art.
Holy NLP! Understanding Part of Speech Tags, Dependency Parsing, and Named Entity Recognition — pmbaumgartner.github.io
For everyone who needs a refresher, this blog post provides a walk through of 3 common NLP tasks and looks at how they can be used together to analyze text.
A Report on the Review Process of ACL 2018 — acl2018.org
This blog post looks at recent changes made to the reviewing process for ACL 2018 to deal with the high number of papers in order to obtain as many high-quality reviews as possible at a low cost.
ML beyond Curve Fitting: An Intro to Causal Inference and do-Calculus — www.inference.vc
Following Judea Pearl's recent dismissal of Deep Learning as curve fitting, Ferenc Huszár gives an intro to causal inference and highlights why it is important for Deep Learning applications.
Industry news
AI Chatbots Try to Schedule Meetings—Without Enraging Us — www.wired.com
However trivial it may sound, it's a monstrously difficult challenge. This article discusses some of the challenges faced by X.ai and its employees.
Hugging Face raises $4 million for its artificial BFF — techcrunch.com Chatbot startup Hugging Face has raised a $4 million seed round led by Ronny Conway from a_capital. Existing investors Betaworks, SV Angel and Kevin Durant are also participating.
Microsoft acquires Semantic Machines, advancing the state of conversational AI — blogs.microsoft.com
Microsoft has acquired Semantic Machines Inc., a Berkeley, Calif.-based company that has developed a new approach to building conversational AI.
Amazon Scientists Use Transfer Learning to Accelerate Development of New Alexa Capabilities — developer.amazon.com
Amazon is becoming more vocal about NLP research: It has released two blog posts discussing NAACL-HLT research papers and is sponsoring the Widening NLP workshop.
Paper picks
Parsing Tweets into Universal Dependencies (NAACL-HLT 2018)
The authors extend the Universal Dependencies guidelines to cover Twitter-specific phenomena and create a new Twitter treebank for English. To deal with ambiguities, they propose to distill an ensemble of 20 transition-based parsers into a single one.
The authors synthesize parallel data for grammatical error correction by noising a clean monolingual corpus. They use a seq2seq model to translate clean examples to noisy ones and propose additional beam search noising procedures to introduce more diversity. Starting with models trained on roughly 1.3M sentences, they nearly match performance of training with 3M sentences
Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs (ICLR 2018)
The authors propose contextual decomposition, an interpretation algorithm for analysing predictions of standard LSTMs by decomposing the output. The method allows identification of words and phrases of contrasting sentiment and extraction of positive and negative negations from a model trained on phrase-level annotations. The key insight is that gating dynamics in LSTMs are a vehicle for modeling interactions between variables.
P.S.: Hope you enjoyed this newsletter! I'm at NAACL in New Orleans until Friday. Let me know if you'd like to chat.