RL and teamwork
🎮 OpenAI has announced that they are starting to beat human players in 5v5 Dota 2 matches
. Their approach uses a massively scaled-up version of PPO and a large LSTM to represent each model, two approaches that were thought to struggle with modelling long-term dependencies. Models are trained via self-play.
Remarkably, the models learn to act as a team by putting more weight on the teammates’ reward functions as training progresses. Stephen Merity
, Catherine Olsson
, and Alex Irpan
, among others, have commented on why teamwork
is important for AIs to learn. Sam Altman
, president of YC and OpenAI sponsor, is also bullish about the potential of RL.
DeepMind trained agents for a Capture The Flag game
using their population-based training
method, which jointly trains a population
of agents rather than a single one. Without explicitly encouraging them, teammates also learn to cooperate.
Finally, OpenAI trained an agent to state-of-the-art performance on Montezuma’s Revenge
, a notoriously difficult game, using a single demonstration
. They use states in the demonstration to construct a curriculum that becomes progressively harder: At the first subtask, the model starts from the state just before the reward. After learning to achieve the reward, the model starts from a state that is slightly earlier, and so on. Different to imitation learning, the model does not learn to copy the exact behaviour but can divert from the demonstration if it is beneficial.
For humans and AI
working together, have a look at this article
where Harvard Business Review highlights the results of a survey of 1,500 companies. On the topic of human and AI collaborating, remember IBM’s Project Debater
discussed in the last newsletter
? IBM presented a demo paper at NAACL 2018
about a component of it, which is further discussed in this blog post
. The discussed model is trained to analyze and summarize a discussion, but still seems far away from being an active participant in a debate.
For RL on robots
, models similarly perform better if they’re trained on more data. Many algorithms, however, require data to be on-policy
, i.e. the data has to be generated with the current policy. Google researchers designed a new off-policy algorithm
that can benefit from large amounts of data acquired in the past.
Tracking the state of AI
🤖 Nathan Benaich and Ian Hogarth released a super comprehensive state of AI report
👏🏻. They highlight lots of trends in research and industry: transfer learning (only in CV, though), AI hardware, bias, healthcare, etc. If you’re interested in the state-of-the-art for a particular NLP task, I’ve created a resource for that
. Feel free to add your own results. For a look in the future, check out this interview with Oren Etzioni, CEO of AI2, on his perspective on the promise and peril of AI
. Countries besides the US and China are also starting to become aware of the promise of AI. This article
gives a good overview of the different national strategies adopted thus far. If you’re interested in the future of NLP, I wrote a perspective at TheGradient
📃 Nadia Eghbal looks at the concept of an independent researcher in this post
. While compute is often an obstacle, the availability of papers and resources online has arguably lowered the threshold for doing research in ML and NLP. Current limiting factors rather seem to be access to guidance and mentorship and knowledge of best practices, e.g. how to design experiments, validate hypotheses, present results, etc. The Good Citizen of CVPR workshop
did a great job of showcasing some of these best practices (videos of talks are available).
🗣 Have a look at Eric Jang’s ICRA 2018 highlights
, if you’re wondering whether Deep Learning is the way to go in robotics. TL;DR: He came out of the conference rather disappointed in traditional approaches and even more convinced of end-to-end learning, big data, and DL. Oops. For the 10 coolest papers of CVPR 2018, check out this post
The new kid on the block, the Society for Computation in Linguistics
invites submissions to its second meeting, SCiL 2019
. Deadline is August 1. The conference will take place in both New York and Paris at the same time. Let’s see if multi-local conferences catch on as they make it easier for participants to arrange visa. To this end, ACL this year also allows remote presentations. As the research community grows, travel restrictions are becoming an even bigger problem, which can hit even senior members of the community, as exemplified by this post of Maarten de Rijke
who is unable to attend SIGIR. Initiatives like SCiL that enable more open science are thus more important than ever.
What did you think of this format? Hit reply or click on the appropriate button below 👇 to let me know.