Why did the Bayesian suddenly feel patriotic when he heard fireworks 🎇? He assumed independence.Happy belated 4th of July to readers in the US 🇺🇸!Hi all,I'm experimenting with a new format that is more concise and conversational. Rather than presenting you with a list of links that are loosely thematically clustered, I'll contextualize several themes that were common in the last two weeks and provide pointers to relevant articles and blog posts. That should make the newsletter overall more succinct, which should hopefully make it easier to find what you're interested in (and prevent Gmail from clipping it ✂️). This is a learning experience for me, so please do let me know what you love ❤️ and hate 💔 about the new format. Simply hit reply on the issue. If you were referred by a friend, click here to subscribe. If you enjoyed this issue, give it a tweet 🐦. RL and teamwork 🎮 OpenAI has announced that they are starting to beat human players in 5v5 Dota 2 matches. Their approach uses a massively scaled-up version of PPO and a large LSTM to represent each model, two approaches that were thought to struggle with modelling long-term dependencies. Models are trained via self-play.Remarkably, the models learn to act as a team by putting more weight on the teammates' reward functions as training progresses. Stephen Merity, Catherine Olsson, and Alex Irpan, among others, have commented on why teamwork is important for AIs to learn. Sam Altman, president of YC and OpenAI sponsor, is also bullish about the potential of RL. DeepMind trained agents for a Capture The Flag game using their population-based training method, which jointly trains a population of agents rather than a single one. Without explicitly encouraging them, teammates also learn to cooperate.Finally, OpenAI trained an agent to state-of-the-art performance on Montezuma's Revenge, a notoriously difficult game, using a single demonstration. They use states in the demonstration to construct a curriculum that becomes progressively harder: At the first subtask, the model starts from the state just before the reward. After learning to achieve the reward, the model starts from a state that is slightly earlier, and so on. Different to imitation learning, the model does not learn to copy the exact behaviour but can divert from the demonstration if it is beneficial.Another way to learn from demonstrations is inverse reinforcement learning, which aims to learn the reward function from observed behaviour. TheGradient has a great overview article about this method. Check out these articles if you want to read about the fundamental flaw of RL and how to fix it.For humans and AI working together, have a look at this article where Harvard Business Review highlights the results of a survey of 1,500 companies. On the topic of human and AI collaborating, remember IBM's Project Debater discussed in the last newsletter? IBM presented a demo paper at NAACL 2018 about a component of it, which is further discussed in this blog post. The discussed model is trained to analyze and summarize a discussion, but still seems far away from being an active participant in a debate.For RL on robots, models similarly perform better if they're trained on more data. Many algorithms, however, require data to be on-policy, i.e. the data has to be generated with the current policy. Google researchers designed a new off-policy algorithm that can benefit from large amounts of data acquired in the past.Tracking the state of AI 🤖 Nathan Benaich and Ian Hogarth released a super comprehensive state of AI report 👏🏻. They highlight lots of trends in research and industry: transfer learning (only in CV, though), AI hardware, bias, healthcare, etc. If you're interested in the state-of-the-art for a particular NLP task, I've created a resource for that. Feel free to add your own results. For a look in the future, check out this interview with Oren Etzioni, CEO of AI2, on his perspective on the promise and peril of AI. Countries besides the US and China are also starting to become aware of the promise of AI. This article gives a good overview of the different national strategies adopted thus far. If you're interested in the future of NLP, I wrote a perspective at TheGradient. Facebook and fake news 📰 Facebook is beefing up its NLP game by acquiring London-based startup Bloomsbury AI, which featured---among others---UCL professor Sebastian Riedel as Head of Research. The team might work on combatting fake news for Facebook. NLP is already being used in Facebook to detect duplicate fake news stories that fact-checkers have already debunked. If you're interested in helping to tackle misinformation, Facebook also announced WhatsApp Research Awards for Social Science and Misinformation, with up to $50,000 per research proposal. Amazon also announced new research awards (not to combat fake news, though). Doing research 📃 Nadia Eghbal looks at the concept of an independent researcher in this post. While compute is often an obstacle, the availability of papers and resources online has arguably lowered the threshold for doing research in ML and NLP. Current limiting factors rather seem to be access to guidance and mentorship and knowledge of best practices, e.g. how to design experiments, validate hypotheses, present results, etc. The Good Citizen of CVPR workshop did a great job of showcasing some of these best practices (videos of talks are available).Conference things 🗣 Have a look at Eric Jang's ICRA 2018 highlights, if you're wondering whether Deep Learning is the way to go in robotics. TL;DR: He came out of the conference rather disappointed in traditional approaches and even more convinced of end-to-end learning, big data, and DL. Oops. For the 10 coolest papers of CVPR 2018, check out this post.The new kid on the block, the Society for Computation in Linguistics invites submissions to its second meeting, SCiL 2019. Deadline is August 1. The conference will take place in both New York and Paris at the same time. Let's see if multi-local conferences catch on as they make it easier for participants to arrange visa. To this end, ACL this year also allows remote presentations. As the research community grows, travel restrictions are becoming an even bigger problem, which can hit even senior members of the community, as exemplified by this post of Maarten de Rijke who is unable to attend SIGIR. Initiatives like SCiL that enable more open science are thus more important than ever. Miscellaneous 🎲 As usual, there are also a few things, which are hard to fit into any one category. Jay Alammar gives a comprehensive, visual overview of the ubiquitous Transformer model. On the applied side, Yoel Zeldes walks us through how to automatically tune hyperparameters in practice and Danial Khosravi shows us how to deploy DL models in the browser (with a nice sentiment analysis demo). Jacob Buckman gives a lucid overview of the computation graph model of Tensorflow and Sylvain Gugger demonstrates how AdamW and super convergence can be used to significantly accelerate model training.What did you think of this format? Hit reply or click on the appropriate button below 👇 to let me know.
Share this post
RL and teamwork, Tracking the state of AI, Facebook and fake news, Doing research, Conference things